Slashdot Mirror


Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."

226 comments

  1. Fixed-point arithmetic by mkremer · · Score: 5, Informative

    Use Fixed-point arithmetic.
    In Mathematica make sure to specify your precision.
    Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.

    1. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 5, Insightful

      Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

    2. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 1, Interesting

      Or simply don't use the broken "cloud computing" model. If you have some calculations to do, and care the least about the results, how about buying a computer that does those calculations for you?

    3. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 1

      I'm not so sure of that. Different apps address hardware and FPUs differently. If the app doesn't have a way to test its platform, then there are ways to check an instance of an OS to see what it does, in terms of mantissa, and basic vector math, record that as metadata, and then a future host could be compensated for. When the world went from 32bit to 64bit CPUs, lots changed. Intel has an ugly history with FPUs. Where precision is important, it's always nice to have done your own quick check.

    4. Re:Fixed-point arithmetic by Giant+Electronic+Bra · · Score: 4, Informative

      Yes, you can do this, but its not feasible for all calculations. Things like trig functions are implemented on FP numbers, and once you start using FP its better to just keep using it, converting back and forth is just bad and defeats the whole purpose anyway. So in reality you end up with applications that DO use FP (believe me, as an old FORTH programmer I can attest to the benefits of scaled integer arithmetic!). Its one of those things, we're stuck with FP and once we assume that, then the whole question of small differences in results of machine-level instructions or of minor differences in libraries on different platforms, etc. you will probably find that arbitrary VMs won't produce exactly identical results when you run on different platforms (AWS, KVM, VMWare, some new thing).

      Is it ia huge problem though? The results produced should be similar, the parameters being varied were never controlled for anyway. Its how often the rounding errors between two FPUs are identical. Neither the new nor the old results should be considered 'better' and they should generally be about the same if the result is robust. A climate sym for example run on two different systems for an ensemble of runs with similar inputs should produce statistically indistinguishable results. If they don't then you should know what the differences are by comparison. In reality I doubt very many experiments will be in doubt based on this.

      --
      "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
    5. Re:Fixed-point arithmetic by Jane+Q.+Public · · Score: 4, Insightful

      "Is it ia huge problem though?"

      If tools like Mathematica are dependent on the floating-point precision of a given processor, They're Doing It Wrong.

    6. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 2, Insightful

      protip: When discussing the difference between Fixed Point and Floating Point, the abbreviation "FP" is useless.

    7. Re:Fixed-point arithmetic by EdIII · · Score: 0

      Wow. You Smart.

    8. Re:Fixed-point arithmetic by raddan · · Score: 4, Interesting

      Experiments can vary wildly with even small differences in floating-point precision. I recently had a bug in a machine learning algorithm that produced completely different results because I was off by one trillionth! I was being foolish, of course, because I hadn't use an epsilon for doing FP, but you get the idea.

      But it turns out-- even if you're a good engineer and you are careful with your floating point numbers, the fact is: floating point is approximate computation. And for many kinds of mathematical problems, like dynamical systems, this approximation changes the result. One of the founders of chaos theory, Edward Lorenz, of Lorenz attractor fame, discovered the problem by truncating the precision of FP numbers from a printout when he was re-entering them into a simulation. The simulation behaved completely differently despite the difference in precision being in the thousands. That was a weather simulation. See where I'm going with this?

    9. Re:Fixed-point arithmetic by Joce640k · · Score: 4, Informative

      Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

      Wrong.

      In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".

      The exact results of a calculation can depend on how a compiler optimized the code. Change the compiler and all bets are off. Different versions of the same software can produce different results.

      If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.

      OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.

      --
      No sig today...
    10. Re:Fixed-point arithmetic by NEDHead · · Score: 4, Funny

      I have a mechanical calculator that is extremely reliable, so long as you oil it.

    11. Re:Fixed-point arithmetic by Giant+Electronic+Bra · · Score: 5, Insightful

      I think the problem is that people PERCEIVE it to be a problem. Nothing is any more problematic than it was before, good numerical simulations will be stable over some range of inputs. It shouldn't MATTER if you get slightly different results for one given input. If that's all you tested, well, you did it wrong indeed. Mathematica is fine, people need to A) understand scientific computing and B) understand how to run and interpret models. I think most scientists that are doing a lot of modelling these days DO know these things. Its the occasional users that get it wrong I suspect.

      --
      "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
    12. Re:Fixed-point arithmetic by Giant+Electronic+Bra · · Score: 5, Informative

      Trust me, its a subject I've studied. The problem here is that your system is unstable, tiny differences in inputs generate huge differences in output. You cannot simply take one set of inputs that produces what you think is the 'right answer' from that system and ignore all the rest! You have to explore the ensemble behavior of many different sets of inputs, and the overall set of responses of the system is your output, not any one specific run with specific inputs that would produce a totally different result if one was off by a tiny bit.

      Of course Lorenz realized this. Simple experiments with an LDE will show you this kind of result. You simply cannot treat these systems the way you would ones which exhibit function-like behavior (at least within some bounds). Lorenz of course also realized THAT, but sadly not everyone has got the memo yet! lol.

      --
      "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
    13. Re:Fixed-point arithmetic by immaterial · · Score: 5, Insightful

      For a guy who started off a reply with an emphatic "Wrong" you sure do seem to agree with the guy you quoted.

    14. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 2, Insightful

      Urgh. Getting math to be deterministic is a major pain in the neck. Most folks are completely ignorant of the fact that sum/avg/stddev in just about all distributed databases will return a *slightly* different result everytime you run the code (and are almost always different between vendors/platforms, etc.).

    15. Re:Fixed-point arithmetic by gl4ss · · Score: 4, Insightful

      the question was not about compilers or indeed about software, but about fpu's, about firing up the same instance, with the same compilers and indeed with the same original binary.

      it sounds like just fishing for reasons to have a budget to keep old power hw around.

      I would think that if the results change so much to matter depending on fpu, that the whole calculation method is suspect to begin with and exploits some feature/bug to get a tuned result(but assuming that the cpu/vm adheres to the standard that they would be the same - if the old one doesn't and the new one does then I think that a honest scientist would want to know that too).

      --
      world was created 5 seconds before this post as it is.
    16. Re:Fixed-point arithmetic by RightwingNutjob · · Score: 3, Informative

      If you want exact results from a fixed number of significant bits, you want magic.

      Whatever calculation you're making, be aware of the dynamic range of the intermediate results. Structure your calculations so that all intermediate results stay well within the dynamic range of the datatype. If you want to compute the standard deviation of 2048x2048 32-bit integers, use a 64 bit or 128 bit integer to compute the intermediate sum(x^2). If you try to use an IEEE double, you'll end up overflowing the 53 bits they give you because 2^11 * 2^11 * 2^32=2^54.

      If you can, reformulate your calculation steps so to minimize the sensitivity to random errors on the order of a machine epsilon.

      An electronic computer manual from UNIVAC/Boroughs/IBM written for pure mathematicians in ~1953 will tell you the same thing.

    17. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Does any part of the 'problem' that exists here, go back to clock/timing, synchronization? I ask because if we're talking about bit flip mis-triggers at the silicon level, it seems to me that a software workaround would be the only norm.

      Not sure why I'm bring this up, but I recall a software-hardware discussion/diatribe about it being physically impossible for software to do damage to hardware. Software can only screw up software.

    18. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      You've missed the point. The hardware is almost guaranteed to be x86 based and if not IEEE compliant just like x86, and there is literally no conceivable chance of ANY FP discrepancies between different Virtualisation or Cloud systems.

      The poster needs to get a clue about this before asking [question from COMP SCI 101] "on the cloud".

      Does Euler's algorithm work "on the cloud"?
      What is O(n^2) "on the cloud"?
      How do I test for Turing completeness "on the cloud"?

      ENOUGH!

    19. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · · Score: 2

      Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

      Ignorant, perhaps; but likely correct that 'subtle differences in floating point handling exist between ostensibly binary-compatible platforms' + ' "the cloud" reduces your control, and sometimes even your information, about what platform you are running on at any given time' = 'Floating Point Fun Time'.

      To actually solve such a problem, sophisticated understanding is certainly required (especially since any practical user will probably want the solution to be fast as well as correct); but the act of combining a modicum of understanding of floating point issues with consideration of how 'the cloud' operates is not without value in letting you know just how screwed you may well be.

    20. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · · Score: 3, Funny

      Or simply don't use the broken "cloud computing" model. If you have some calculations to do, and care the least about the results, how about buying a computer that does those calculations for you?

      In other news, many problems become much easier when you assume a suitably large pile of money.

      Incidentally, the same is true of explosives, amphetamines, and hookers.

    21. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      This is not about cloud computing. If you publish some algorithmic results obtained via FP hardware, as well as the algorithm, then another researcher will try to replicate that result on another vendor's hardware--even if both adhere to IEEE standards. The results had better be damn similar. If not, then the entire algorithm is so sensitive to small perturbations among inputs that its outputs are of questionable value anyway. A better approach by the original author is to repeat the experiment on as many available platforms as possible to find out which of the two possible outcomes above are the case. If the latter, it's time to re-think the algorithm, or its expression in a programming language.

    22. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · · Score: 3, Funny

      "How do I test for Turing completeness "on the cloud"?"

      This one is actually a nontrivial challenge. Once the tape starts to get damp, you need to keep track of the probability that executing a given head-moving operation will cause the tape to snap and abruptly leave you with a confused finite state machine...

    23. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · · Score: 0

      protip: When discussing the difference between Fixed Point and Floating Point, the abbreviation "FP" is useless.

      Even if you manage to contribute the first worthless comment on the subject?

    24. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 1

      The simulation behaved completely differently despite the difference in precision being in the thousands. That was a weather simulation. See where I'm going with this?

      I think so, but it doesn't look like you are going in the right direction. If your model or code can produce wildly different results when you enter data with tiny errors in it, in most cases that is a problem with how you are treating the problem and your code is not flawed, but fundamentally answering the wrong questions. Is the real world data being fed into such models accurate to anywhere near the precision you complain about from floating point truncation or data entry? in most fields, and assuming you put some thought into your algorithm so as to not accumulate rounding errors obscenely fast, your answer should not be affected by changes in numbers smaller than measurement errors.

      If you are dealing with an unstable model or system, your answer should account for this and not just be a single number answer. Or it is possible that is a sign there is no usable answer from your model at that point. The divergence or sensitivity to floating point errors is not ultimately a problem in such cases, it is simply illuminating a more fundamental problem.

      It reminds me of an argument that broke out between two hardware engineers over neighbouring analogue circuits. Bob tells Abel, "Your circuit puts out too much noise, and it is messing with my circuit." Abel responds, "Your circuit is too sensitive to noise, as mine doesn't put out much and you have a large loop here picking it up." Bob continues to complain, "But if you reduce the noise from your circuit, that won't matter, mine will work as it should." The final line was from Abel, "No, changing my circuit would have only eliminated the sources of noise you know about, Your circuit is still sensitive to noise, and now you will be getting problems from much harder to trace sources."

    25. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Exact and reproducible are very different things though., even if the former implies the latter. Also, when do you need 53 bits of precision for a standard deviation? At worst, simple scaling can keep things within the precision of a double precision floating point number.

    26. Re:Fixed-point arithmetic by __aaltlg1547 · · Score: 1

      When the computation involves a subtraction of numbers that are about the same value.

    27. Re:Fixed-point arithmetic by philip.paradis · · Score: 4, Funny

      Incidentally, the same is true of explosives, amphetamines, and hookers.

      I don't have to be a mathematician to say that sounds like one hell of a party.

      --
      Write failed: Broken pipe
    28. Re:Fixed-point arithmetic by philip.paradis · · Score: 1

      Let's look at some documentation instead of speculating.

      --
      Write failed: Broken pipe
    29. Re:Fixed-point arithmetic by goodminton · · Score: 1

      Thanks for your comments, I really appreciate them. Your mention of experiments was spot on with the use cases I'm trying to learn about. I've worked with many scientists who use commercial software packages for biomedical research where their experimental results may be archived for 10+ years before being reanalyzed. I recently helped a colleague pull a Windows 2000 server out of storage to rerun an experiment. We got it going after some difficulty and that got me thinking about virtualizing the harddrive, which then lead me to wonder about the portability of virtualized machines between hardware hosts (including cloud providers) and the resulting reproducibility issues that could occur. I then read through several interesting papers showing variability of floating point math in commercial hypervisors, which lead to my posting on Slashdot. Thanks again. Some interesting links: http://faculty.cs.gwu.edu/~timwood/papers/im2013_tech.pdf http://www.vmware.com/pdf/hypervisor_performance.pdf http://www.cc.iitd.ernet.in/misc/cloud/XenExpress.pdf

    30. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

      The first step in solving a problem is realizing it exists.

    31. Re:Fixed-point arithmetic by goodminton · · Score: 1

      Nice, I'm the OP and I appreciate your comment and experimental design. I agree that in situations where you are coding the algorithm, it's easier to control/adjust for the variability with each hardware platform. The use cases I'm really wanting to learn more about are with commercial software packages that have traditionally been run on dedicated hardware and are now being virtualized and moved across multiple hardware types. I like your approach though and may do some testing along those lines.

    32. Re:Fixed-point arithmetic by tlhIngan · · Score: 5, Informative

      Don't use floating point if you can avoid it.

      If you can't, and the results are EXTREMELY important (remember, floating point is an APPROXIMATION of numbers), then you have to read What Every Computer Scientist Should Know About Floating Point Numbers. (Yes, it's an Oracle link, but if you google it, most of the links are PDFs while the Oracle one is HTML).

      If you're worried about your cloud provider screwing with your results, then you're definitely doing it wrong (read that article).

      And yes, lots of people, even scientists, do it wrong because the idealized notion of what a floating point type is and how it actually works in hardware is completely different. Floating point numbers are tricky - they're VERY easy to use, but they're also VERY easy to use wrongly, and it's only if you know how the actual hardware is doing the calculations can you structure your programs and algorithms to do it right.

      And no actual hardware FPU or VPU (vector unit - some do floating point) implements the full IEEE spec. Many come close, but none implement it exactly - there's always an omission or two. Especially since a lot of FPUs provide extended precision that goes beyond IEEE spec.

    33. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      If, whatever you're doing, a tad off fifteen digits downstream past the decimal makes the outcome completely different, you aren't calculating what you think you are, and are lost in one of those butterfly effect problems.

    34. Re:Fixed-point arithmetic by Chalnoth · · Score: 3, Informative

      Yup. And if you want to use any kind of parallelism to compute the final result, you're going to have quite a hard time ensuring that the order of operations is always the same.

      That said, there are libraries around that make use of IEEE's reproducibility guarantees to ensure reproducible results. That will likely correct any reproducibility issues that would otherwise be introduced by the compiler, but you still have the order of operations issue (which is a fundamental problem).

      Personally, I think a better solution is to simply assume that you're never going to get reproducible floating-point results, and design the system to handle small, inconsistent rounding errors. I think that's a much easier problem to deal with than making floating-point reproducible in any modestly-complex system.

    35. Re:Fixed-point arithmetic by goodminton · · Score: 1

      Thank you, I appreciate the link. I'm the OP and I didn't mean to speculate about or disparage Mathematica, I meant to use it as an example of a commercial software package where the person running the calculation doesn't control the code or compiling process.

    36. Re:Fixed-point arithmetic by noh8rz10 · · Score: 3, Funny

      that link has a lot of words.

    37. Re:Fixed-point arithmetic by lorinc · · Score: 1

      If you are really having a precision problem, even in double precision, then it means you are facing an ill-conditioned problem. And if you are facing an ill-conditioned problem, then there is nothing a technological tool can do for you. Try to reformulate the problem to avoid bad conditioning, and FP will be fine.

    38. Re:Fixed-point arithmetic by goodminton · · Score: 5, Informative

      Awesome link! I'm the OP and I really appreciate your response. The reason I'm looking into this is that I work with many scientists who use commercial software packages where they don't control the code or compiler and their results are archived and can be reanalyzed years later. I was recently helping someone revive an old server to perform just such a reanalysis and we had so much trouble getting the machine going again I started planning to clone/virtualize it. That got me thinking about where to put the virtual machine (dedicated hardware, cloud, etc) and it also got me curious about hypervisors. I found some papers indicating that commercial hypervisors can have variability in their floating point math performance and all of that culminated in my post. Thanks again.

    39. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      And those words have a lot of letters.

    40. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      > OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.

      That is not true. SSE has 64 bits of precision, x86 FPU 80. So just changing the return value of CPUID (which can easily happen if changing hypervisor or hardware) can cause any code to switch from SSE to FPU or vice versa and thus give different results.
      This also applies to other extensions like AVX etc., but only if the actual software implementation was written to compute things differently, so it's more of a software than hardware issue.

    41. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Which means you must state your results and its margin of error. Then your results are still reproduceable if you get a different result if it's still within the margin.

      What you won't get is the Nth digit to be the same. But since it's unlikely your own hardware is correct you shouldn't be relying on that anyway if it's outside of the tolerance.

    42. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      People also need to understand that it's correct to a margin of error and not expect the results to be the same everytime beyond that.

    43. Re:Fixed-point arithmetic by apol · · Score: 1

      If small differences in the floating-point precision make your results vary a lot it is a sign that your computation is useless. For in this case your model is producing more random noise than information. Concerns about reproducibility are obviously frivolous in this case.

    44. Re:Fixed-point arithmetic by philip.paradis · · Score: 2

      No worries at all; the intent of my post was to encourage the GP to consult documentation specific to the implied case that the Mathematica developers hadn't considered the problem. I believe your submission was a good one, as it isn't always a guarantee that developers will have considered the implications of floating point calculations in any given codebase. Getting people to think about things is never a bad thing.

      --
      Write failed: Broken pipe
    45. Re:Fixed-point arithmetic by rmstar · · Score: 1

      I didn't mean to speculate about or disparage Mathematica, I meant to use it as an example of a commercial software package where the person running the calculation doesn't control the code or compiling process.

      The fact is that people rarely control that to the level that is necessary to definitively ensure the kind full reproducibility you are asking for. You still use compilers and libraries, and their behavior may vary.

      That said, processors that do not follow the IEEE standard are very rare nowadays, and I don't see why they would be more frequent in the future. Perhaps GPUs, but there you already trade some degree of reliability for speed and low cost.

      Given that, your most likely priority is making sure the same software environment can be reproduced.

    46. Re:Fixed-point arithmetic by Jane+Q.+Public · · Score: 1

      "No worries at all; the intent of my post was to encourage the GP to consult documentation specific to the implied case that the Mathematica developers hadn't considered the problem."

      I am GP, I "implied' no such thing, and I am really getting pretty damned weary of people on Slashdot ASSUMING I meant something I didn't even write.

      My sentence begins with *IF* which is not an implication. The word "if" by itself implies nothing. And the sentence contains the word dependent.

      Mathematica is not dependent on machine-specific floating-point. But using it is an option if you want the speed.

      Nice that you cleared up whether that IF was true or false, but kindly keep your assumptions to yourself.

    47. Re:Fixed-point arithmetic by Jane+Q.+Public · · Score: 1

      I meant to add: my own guess -- but that's all it was -- was that Mathematica was almost certainly not dependent on machine-specific floating point. Because designing it that way would not have been too bright. And "not too bright" is not a phrase often used to describe Stephen Wolfram.

    48. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      That is not the problem Giant Electronic Bra is talking about. What you say about margins of error is true for well-designed algorithms who are stable over the range of expected inputs. The problem is that if you implement an (entirely correct) symbolic algorithm naively you can easily arrive at a computer algorithm that is not stable, where the wrong inputs inputs can lead to completely arbitrary results (and where talking of a "margin of error" doesn't make sense for that very reason).

      At least in my maths program this was taught in a compulsory undergraduate Numerical Analysis class. My impression was that the vast majority of students had no idea (despite the professor's explanations) why they were asked to rework algorithms that were logically perfectly fine into more cumbersome versions e.g. to avoid subtractions. They did what was asked of them and passed the class but the entire topic was so foreign to the way they were used to think about mathematics that many didn't even see the issue.

      (Same with the optimization issues we covered in that class - that it can make a real difference in runtime whether you iterate first over the rows and then over the columns of a 2-dimensional array or vice versa, depending on how your software stores arrays in memory, was a huge puzzle for minds far brighter than mine.)

    49. Re:Fixed-point arithmetic by AmiMoJo · · Score: 1

      A better option, assuming you want to trade accuracy and reproducibility for performance, would be an arbitrary precision maths library. When performing a calculation if the result does not fit in the current number of available bits or required or if precision would be lost more bits are added.

      As an added bonus all the results are no long compiler or architecture dependent.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    50. Re:Fixed-point arithmetic by amck · · Score: 4, Insightful

      Getting the result to be deterministic is only the start of the problem. How do you know it is _correct_, or more properly, know the error bounds involved? How much does it matter to your problem?

      e.g. If I am doing a 48-hour weather forecast, I can compare my results with observations next week; I can treat numerical error as a part of "model" error along with input observational uncertainty, etc.

      I might validate part of my solutions by checking that, for example, the total water content of my planet doesn't change. For a 48-hour forecast, I might tolerate methods that slightly lose water over 48 hours in return for a fast solution. For a climate forecast/projection, this would be unacceptable.

      Getting the same answer every time is no comfort if I have no way of knowing if its the right answer.

      --
      Anyone who believes exponential growth can go on forever in a finite world is either a madman or an economist
    51. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      We're talking about Mathematica for Christ's sake, not Joe Blogg's My First Compiler (for values of Joe Bloggs not in the set "knows that (a+b)+c != a+(b+c)".)

      Every non-broken compiler in the world allows you to prevent it from making unsafe floating-point optimizations. And this feature is usually turned on by default. No bets are off if you change the compiler unless you used the --optimize-assuming-I-dont-care-if-my-code-breaks-when-I-change-compiler flag.

      Now move from 80-bit x86 arithmetic to IEEE standard and that's a different story (which no doubt you'll blame on the compiler when it happens to you).

      If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.

      In assembler, I presume? *sigh* WRONG.

    52. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Ha, that's a very misleading rule of thumb. In those cases, FPUs give the exact answer. The problem is the errors in the original data stay the same size as they were, which makes them very large relative errors. If you then use that result in an algorithm that's sensitive to relative errors, all bets are off. But if you're studying the bitwise properties of an algorithm and this occurs, you know you are injecting zero extra noise, so in some ways it's a wonderful thing!

      Numerical analysis is funny like that.

    53. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      If guaranteed precision is required use binary-code decimal (BCD) and carry a fixed number of digits after the decimal point. Better yet learn to code your own arithmetic routines in assembly language.

    54. Re: Fixed-point arithmetic by Anonymous Coward · · Score: 0

      This is the answer that deserves a beer in this thread.

    55. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Many come close, but none implement it exactly - there's always an omission or two.

      You're being slightly disingenuous here. The relevance of this to the matter at hand is zero. Not implementing some corner of the spec is very different to not interoperating with other devices on IEEE-defined operations at IEEE-defined precisions that are supported. Interoperability on standard operations on standard types is basically perfect these days, even on GPUs, barring the odd hardware bug.

      Especially since a lot of FPUs provide extended precision that goes beyond IEEE spec.

      Put your x87 into 64-bit mode then. That's the only take-home from this. Most compilers won't even emit x87 code, unless you force them to, so the problem is becoming moot.

    56. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      If you were, you'd have said "explosives + amphetamines + hookers = one hell of a party" ;-)

    57. Re:Fixed-point arithmetic by Giant+Electronic+Bra · · Score: 1

      Hope it was at least slightly useful ;) Fixed scaled arithmetic is AWESOME if you can get away with it. Back in the 80's when I was doing real time stuff that went on things flying through the air (and space) we loved it. Of course back then FPUs were a LOT slower (and usually a separate chip!). There are times when you can get away with infinite precision too, its slow but things like Java BigDecimal DO work, and the advantage is you get fine control over most sources of error (or exceptions and explicit changes in precision that you asked for). Nowadays though avoiding FP is like fighting City Hall, you might win now and then, but you'll probably regret it, lol.

      --
      "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
    58. Re: Fixed-point arithmetic by tolkienfan · · Score: 1

      IEEE math is deterministic, but implementations may not be. There have been many hardware bugs. In a simulation reproducibility may be critical.

    59. Re:Fixed-point arithmetic by Gibgezr · · Score: 2

      >(Same with the optimization issues we covered in that class - that it can make a real difference in runtime whether you iterate first over the rows and then over the columns of a 2-dimensional array or vice versa, depending on how your software stores arrays in memory, was a huge puzzle for minds far brighter than mine.)

      If you are still curious, read the short article at http://en.wikipedia.org/wiki/Instruction_prefetch, and when you come to the bit about prefetching texels, think of those texels as data coming from certain rows/columns of your array. Then think about the way a 2 dimensional array is laid out in linear memory, and whether the next few texels (array cells) is closer you are about to process are closer to the current one if they are from the same row or instead, the same column. In one case, they are going to be packed tightly together, and so will be more likely to be all prefetched into the cache; in the other case, they will be spread out over the memory addresses, and be less likely to all wind up in the cache.

      As a game programmer, I attended a conference where one extremely knowledgable fellow demonstrated a crazy thing: he could insert reads into array processing loops where the read DID NOTHING with the single data element it had just read; the whole loop would run faster, though, because that 'useless' read caused a prefetch of data that would be used. It was nuts, it made no sense if you just looked at the code, but it was a significant measurable speedup.

    60. Re: Fixed-point arithmetic by tolkienfan · · Score: 1

      It could be a huge problem. See butterfly effect. Even 1ulp difference can compound in large classes of simulation.

    61. Re: Fixed-point arithmetic by tolkienfan · · Score: 1

      Whole classes of simulation demand exact reproducibility, and at the same time "local" results can vary wildly with the smallest difference in input. And they need the speed that modern floating point hw provides. Global results shouldn't be so dependent, but that's not the point. And yes, scientists want to understand such these details. But that's also besides the point.

    62. Re:Fixed-point arithmetic by Talderas · · Score: 0

      Here's a sentence from the link.

      "The term floating-point number will be used to mean a real number that can be exactly represented in the format under discussion."

      Many of those words do not have a lot of letters.

      --
      "Lack of speed can be overcome. In the worst case by patience." --Znork
    63. Re: Fixed-point arithmetic by Giant+Electronic+Bra · · Score: 1

      Right, but what conclusions can you draw from a simulation/model where changing one bit changes the entire result? Models are inexact by definition, so if your model can produce basically arbitrary answers out of the solution space depending on the LSB of some input value then its inherently useless. You need either A) a stable model that produces similar results for similar inputs (IE not a 'chaotic' system) OR B) you need to examine an ENSEMBLE of outputs derived from many possible sets of inputs so that you can understand the possible behaviors as a whole. ANY case where you can't do one of these two things means "scrap your model, it is worthless".

      --
      "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
    64. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      Unless the error bars on those original two numbers was a factor of ~2^50 smaller than the original numbers, you don't need 53 bits of precision to subtract two similarly sized numbers from real world data. Errors in the measurement will cause it to deviate from the actual difference by far larger amounts. And if your error bars were that small, you should either chose a better representation for the original numbers, or rescale the numbers being looked at so it doesn't matter.

    65. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      For a guy who started off a reply with an emphatic "Wrong" you sure do seem to agree with the guy you quoted.

      Wrong!

    66. Re:Fixed-point arithmetic by Gorobei · · Score: 1

      Exact and reproducible are very different things though., even if the former implies the latter. Also, when do you need 53 bits of precision for a standard deviation? At worst, simple scaling can keep things within the precision of a double precision floating point number.

      "Exact and reproducible" are somewhat sad proxies for "accurate and precise." I once had a mathematician working for me who produced very precise standard deviations, the only problem was that the numbers were sometimes negative.

    67. Re:Fixed-point arithmetic by I_Wrote_This · · Score: 1

      In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".

      And if your code is sensitive to that then you are using the wrong algorithm. So it's the algorithm which needs to be fixed. not the FPU environment.

      All floating[-point work is approximate. It's up to you to ensure that the significances are greater than the approximations.

    68. Re:Fixed-point arithmetic by Richy_T · · Score: 1

      Why buy one computer and wait a thousand hours for the result when you can rent a thousand computers and have the result in an hour?

      (assuming your calculation can be scaled that way, of course).

    69. Re:Fixed-point arithmetic by cthulhu11 · · Score: 1

      Agreed. The numerical methods class I took in college was an eye-opener wrt the gaps in representable numbers and how offsets can quickly compound. Things like re-ordering operations to avoid very large or very small divisors or results.

    70. Re:Fixed-point arithmetic by gizmo71 · · Score: 1

      Another expression where the order of evaluation is critical...

    71. Re:Fixed-point arithmetic by david_thornley · · Score: 1

      Floating point is extremely useful, and the usual double type works just fine in a very large variety of applications. The most common way to get in trouble with it is accounting, denoting money by dollar amounts and cents as fractions. Even then, the answers will be approximately right. Any application where numbers aren't too horribly far apart in magnitude, don't go through really long and involved calculations, and where you don't need the exact right result will give you no trouble with normal floating-point, and that's most applications.

      When you're doing fancy numeric or scientific calculations, you should at a minimum read the excellent paper you provided a link for, and probably should also find somebody who knows more about numeric analysis than that. At that point, everything you said becomes very, very important.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    72. Re:Fixed-point arithmetic by johnkzin · · Score: 1

      "If you want exact results from a fixed number of significant bits, you want magic. "

      I want magic!!!

      I mean, I don't care about the computer math part, I just want magic. If we're defining that as a bullet item on the path to some other solution, I'm in full support of that goal!

    73. Re:Fixed-point arithmetic by david_thornley · · Score: 1

      No. If your code is sensitive to associativity and doesn't take care in what order it performs operations you're using the wrong algorithm. If, say, a is a very large positive number, and b is a comparatively large negative number, and c is small in absolute value, you will get a good answer with (a+b) + c and a bad one with a + (b + c).

      The whole idea behind pivoting in solving simultaneous linear equations, for example, is to order the computations to get the best result.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    74. Re:Fixed-point arithmetic by david_thornley · · Score: 1

      Are you talking about integers where you have a virtual decimal point? Because if you're talking about integer arithmetic with scale multipliers you're doing floating point.

      For many purposes, you're going to want to use a function that returns irrationals (trig functions, square roots, etc.) and unless you're incredibly lucky with your input data you don't get infinite precision. Floating point is also more robust at keeping significant digits if the intermediate results are larger than you've allowed for (and it's hard to prove reasonable bounds for intermediate results).

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    75. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      For a guy who started off a reply with an emphatic "Wrong" you sure do seem to agree with the guy you quoted.

      The guy he quoted was saying math is math is math. He rebutted that (correctly) and said you need to write your own math routines to suit your problem, not rely on Mathematica.

      Understand?

    76. Re:Fixed-point arithmetic by RightwingNutjob · · Score: 1

      He must have been using imaginary data.

    77. Re:Fixed-point arithmetic by sjames · · Score: 1

      None of that SHOULD make a difference. IEEE is deterministic but implementations are not perfect. Correct code should be fine on any compliant implementation, but subtly incorrect code may 'just happen' to work on a particular implementation. The problem is compounded by compilers that themselves take a few liberties with correctness in the generated code to make it run faster (sometimes much faster).

    78. Re:Fixed-point arithmetic by Anonymous Coward · · Score: 0

      > it sounds like just fishing for reasons to have a budget to keep old power hw around.

      Or, you know, investigating the accuracy of new systems _before_ using them for production?

  2. bend reality by goombah99 · · Score: 5, Funny

    The result is always the same, but the definition of reality is changing. The result of every single calculation is in fact 42 in some units. The hard part is figuring out the units.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:bend reality by sgbett · · Score: 0

      froty-second post!!!!42

      --
      Invaders must die
    2. Re:bend reality by weilawei · · Score: 1

      This should be +5 Insightful, as it is, in fact, true.

    3. Re:bend reality by cheater512 · · Score: 2

      Once you define the unit of truth that is. :P

    4. Re:bend reality by Anonymous Coward · · Score: 1

      Otherwise it's between 41.999 and either 42.999, 42.499, 42.599, or 42.999 depending on how you round your fp number.

    5. Re:bend reality by yoink! · · Score: 2

      We most certainly need Slashdot VirtualCrypto to gild comments like these. Karma alone is not enough and this comment is too damned funny.

  3. WTF? by Anonymous Coward · · Score: 0

    Floating point and integer operations are well defined.
    Unless someone fucks up with implementing the floating point unit the result should be exactly the same.

    1. Re:WTF? by wiredlogic · · Score: 2

      They may be well defined but nobody implements fully standards compliant FP units and they have subtle differences in output. Even with identical hardware, configurable settings like rounding modes may also differ between instances.

      --
      I am becoming gerund, destroyer of verbs.
    2. Re:WTF? by larry+bagina · · Score: 3, Informative

      Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).

      So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    3. Re:WTF? by elwinc · · Score: 1
      Intel x87 scalar FP instructions use an 80 bit internal format for higher precision. Intel SSE2 vector FP instructions use 64 bits. You will see last bit variations depending on which instructions the compiler chooses. In fact, I've heard of cases where a JIT compiler vectorized a calculation sometimes (directing code to SSE2 hardware), and left it scalar other times (directing it to 80 bit x87 hardware). Might only make a difference in the last bit, but last bit variations can add up over a few billion calculations.

      I can recall a physics simulation I was involved in years ago that got differences of 10% depending on what hardware we ran it on. Turned out the Sun &SGI workstations used 64 bit FP, while the IBM box used some 128 bit or something like that. Took a while to track that one down...

      --
      --- Often in error; never in doubt!
    4. Re:WTF? by Anonymous Coward · · Score: 0

      No.

      The IEEE floating point standard specifies how to encode and calculate FPU operations, yes; but the problem is that FPU results change based on how many calculations are performed due to rounding errors. As a result, any compile-time optimization can add or remove roundings from your result depending on your optimizer settings. If that wasn't bad enough, consider also the fact that some processors provide fused multiply-add operations, which compute a multiply followed by an add in both less time and with less roundings than separate operations. On certain architectures FMA is mandatory for FPU operations and separate ops aren't provided.

    5. Re:WTF? by Guy+Harris · · Score: 1

      Intel x87 scalar FP instructions use an 80 bit internal format for higher precision. Intel SSE2 vector FP instructions use 64 bits. You will see last bit variations depending on which instructions the compiler chooses.

      And the compiler may choose differently depending on whether it's compiling for 32-bit or 64-bit x86.

    6. Re:WTF? by gnasher719 · · Score: 2

      So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

      Google for FP_CONTRACT. Quote from the C Standard:

      A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the expression evaluation method. The FP_CONTRACT pragma in provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.

    7. Re:WTF? by Anonymous Coward · · Score: 0

      Well, it doesn't really matter what bit precision was used for the individual calculations. What matters is (as you say) what you do with the intermediate results. In java there is the strictfp keyword which ensure that all intermediate results are truncated back to IEEE single/double precision values

    8. Re:WTF? by gweihir · · Score: 3, Informative

      They do not. IEEE754 has no "grey area". The results must match bit-exact or you are not IEEE754.

      Of course, there can be implementation bugs. For example, Qemu does co-processor emulation only with 64 bit floats instead of the required 80 bit. Nobody seem to really care however. The other thing is of course that if reproducibility is more important than correctness, I suspect the math is done wrong.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    9. Re:WTF? by gweihir · · Score: 1

      No. The FPU does 80 bits to satisfy the precision requirements for 64 bit IEEE754.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    10. Re:WTF? by gweihir · · Score: 1

      The C standard is pretty useless here. Have a look at the really bad precision required. What you need to look at is IEEE754.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:WTF? by philip.paradis · · Score: 1

      So fix the compiler, or stop compiling for 32 bit. RAM is cheap, especially when you're talking about the cost per GiB of hundreds of gibibytes of it.

      --
      Write failed: Broken pipe
    12. Re:WTF? by Guy+Harris · · Score: 1

      So fix the compiler

      "Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors". Sounds good to me, these days, but I'm not responsible for making those decisions about GCC, so there's not much I can do about it.

      or stop compiling for 32 bit. RAM is cheap, especially when you're talking about the cost per GiB of hundreds of gibibytes of it.

      At this point, I don't know how many *desktop/laptop* 32-bit x86 boxes there are out there, but, in any case, somebody got concerned that the tests didn't pass on a 32-bit machine, so.... Personally, I don't care, as 99 44/100% of the arithmetic done by packet sniffers such as tcpdump is integer arithmetic, where it doesn't matter, but....

    13. Re:WTF? by philip.paradis · · Score: 1

      "Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors".

      I think this really is the best option, all things considered.

      --
      Write failed: Broken pipe
    14. Re:WTF? by Guy+Harris · · Score: 1

      "Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors".

      I think this really is the best option, all things considered.

      Or, if the CPU on which you're running supports SSE (i.e., is a Pentium III or nower), default to SSE, so if you have an old machine it still defaults to something that'll run. If you're targeting some old no-SSE processor and building on some shiny "new" system, you have to use some -m option or whatever, but, well, get over it....

    15. Re:WTF? by Anonymous Coward · · Score: 0

      IEEE754 tells you how floating point operations are to be performed. However, only the C standard (or whatever standard or other specification document is relevant for the language of your choice) tells you which floating point operations may or may not be performed given a certain piece of code.

      For example, take the following code

      float a = 1.0, b = 3.0;
      float c = (a / b) * b;

      This is source code, on which the IEEE standard doesn't tell you anything (there are no floating point operations, just a sequence of characters).

      Now the C standard tells you that this describes the following sequence of operations (or any sequence equivalent to it):
      1. load register 1 with a
      2. load register 2 with b
      3. divide register 1 by register 2.
      4. multiply register 1 with register 2.
      5. store register 1 into c.

      Now comes the catch: The C standard does not require that the two registers used are of type float; they may as well be double, long double or even a larger-precision type not exposed by the language.

      Now, after this translation, IEEE754 kicks in and tells you how to do step 3 and step 4. Which of course depends on the precision of register 1 and register 2.

    16. Re:WTF? by necro81 · · Score: 2

      But one problem that IEEE754 can't address is when and where rounding errors show up calculations. If in my code I write A * B / C, one cannot guarantee whether that's executed (A * B) / C or as A * (B / C). If the exponents of the different numbers are substantially different, then you can indeed end up with different results. Different platforms may compile and execute the problem differently, and that I think is the problem that submitter is getting at.

    17. Re:WTF? by gweihir · · Score: 1

      That would not be a cloud-problem at all. Unless Mathematica is unable to offer a consistent execution model. In that case, the issue here would be using an unsuitable tool (Mathematica), not the cloud.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  4. Forget it by Anonymous Coward · · Score: 0

    Really, forget it. In 5 or 10 years cloud providers will not offer any compatibility to your current form of backup.

    (And I really hope people miraculously realize that going to cloud does not solve any problems but creates more and more complex, and abandon the cloud in herds)

    1. Re:Forget it by Anonymous Coward · · Score: 0

      Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.

    2. Re:Forget it by fuzzyfuzzyfungus · · Score: 1

      Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.

      Unless somebody fucks up, LANL's nuclear simulations become the cloud, toward the end.

    3. Re:Forget it by WaffleMonster · · Score: 1

      Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.

      Real men use grids, pansy hipsters use clouds.

  5. Good luck by timeOday · · Score: 2, Insightful

    This problem is far broader than arithmetic. Any distributed system based on elements out of your control is bound to be somewhat unstable. For example, an app that uses google maps, or a utility to check your bank account. The tradeoff for having more capability than you could manage yourself, is that you don't get to manage it yourself.

    1. Re:Good luck by Anonymous Coward · · Score: 0, Offtopic

      Any distributed system based on elements out of your control is bound to be somewhat unstable.

      You've just explained many of the problems with government in one concise sentence.

    2. Re:Good luck by Anonymous Coward · · Score: 0

      This problem is far broader than arithmetic. Any distributed system based on elements out of your control is bound to be somewhat unstable.

      In this case, it's not out of your control. Floating point is by definition NOT precise. Multiple floating point operations can easily compound the error until the result falls below your needed level of precision, and when you don't control the hardware this can often happen without your realization. (The same is actually true when you DO control the hardware, just for the record).

      The fact of the matter is that if you're running a highly critical application where you absolutely MUST have precision, you need to use fixed point math, not floating point. Especially if you're going to to doing sequences of operations.

      Or put it another way, No True Physicist uses anything other than fractions.

    3. Re:Good luck by ghettoimp · · Score: 1

      And with anarchy, too.

    4. Re:Good luck by Anonymous Coward · · Score: 0

      Floating point is by definition NOT precise.

      Wrong. Floating point arithmetic, if done strictly by the IEEE rules, is exact. It just doesn't map exactly to real number arithmetic.

      Of course, in the real world, floating point arithmetic is not always done strictly following IEEE rules (for example, the C and C++ standards allow intermediate results to be stored in higher precision, which not only means that results are not exactly reproducible on different C or C++ implementations, it may even mean that algorithms relying on the exact precision may fail. But that's not inherent to the definition of floating point.

    5. Re:Good luck by Sique · · Score: 1
      And with anybody else than you. Every company is an entity you don't control. Every person is an entity you don't control. You don't even control yourself completely (and depending on which neuroscientist you ask, it's questionable if you control anything about you anyway).

      People always find smug sentences how not to trust the government, and then easily forget that the same sentence is valid for any organisation. The government at least is a little bit controllable by you. A company you don't own is not. A state you are not a citizen of is not. A private association of people you are not a member of is not.

      You are surrounded by entities you don't control, and all you find to whack on is the government? Ah! The freedom of expression that allows you to whack the government all you want without fear of retaliation! Try that with a company or a big orivate organisation, and you might find yourself on the wrong end of a lawsuit that will ruin you and anyone even loosely associated with you...

      --
      .sig: Sique *sigh*
  6. I'm research the long-term consistency and ... by oDDmON+oUT · · Score: 1

    First sentence seems stilted at best.

    --
    Some days it's just not worth
    chewing through my restraints.
    1. Re:I'm research the long-term consistency and ... by goodminton · · Score: 1

      I'm the OP and I agree that I should've proofreader my post.

    2. Re:I'm research the long-term consistency and ... by wonkey_monkey · · Score: 1

      Aleksandr Orlov. Computer-ma-bob.

      --
      systemd is Roko's Basilisk.
  7. Easiest solution by ShaunC · · Score: 3, Funny

    Just scroll down a couple of posts. "Quite soon the Wolfram Language is going to start showing up in lots of places, notably on the web and in the cloud."

    Problem solved!

    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
  8. Numerical instability by Anonymous Coward · · Score: 5, Insightful

    If the value your computing is so dependent of the details of float point implementation that you'er worried about it, you probably have an issue of numerical stability and the results you are computing are likely useless, so this is really a mute point.

    1. Re:Numerical instability by Anonymous Coward · · Score: 1

      this is really a mute point.

      Pitty your knot.

    2. Re:Numerical instability by Anonymous Coward · · Score: 0

      This.

      If the difference between results on different machines is big enough to matter, then fix the code.
      Else, you're fine (by definition the difference is small enough to not matter).

      Nowhere did we need to "adjust" for anything.

    3. Re:Numerical instability by brantondaveperson · · Score: 3, Funny

      This is the only answer so far that makes sense, which is a pity because

      A) It's an AC
      and
      B) The point is moot, not mute.

      But we all knew that, didn't we.

    4. Re:Numerical instability by qwak23 · · Score: 1

      The point was mute, not moot. Everybody was thinking it, but no one could say it.

      Oh Anonymous Coward, if only it were socially acceptable you wouldn't have to hide your shame.

    5. Re:Numerical instability by kevinatilusa · · Score: 1

      Reminds me of one of Lloyd Trefethen's maxims about numerical mathematics (http://people.maths.ox.ac.uk/trefethen/maxims.html ):

      "If the answer is highly sensitive to perturbations, you have probably asked the wrong question."

    6. Re:Numerical instability by Cro+Magnon · · Score: 1

      To be fair, maybe he really was trying to say the point couldn't talk.

      --
      Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
    7. Re:Numerical instability by Anonymous Coward · · Score: 0

      I think that doing the grammar nazi thing over two hours after the post falls out of the statue of limitations.

    8. Re:Numerical instability by qwak23 · · Score: 1

      Foolish nonsense! Everyone knows points can talk, haven't you ever heard of a talking point?

  9. Use infinite precision software packages by shutdown+-p+now · · Score: 4, Informative

    What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.

    1. Re:Use infinite precision software packages by david_thornley · · Score: 1

      If you're allowing division, you need rational arithmetic (Common Lisp has it, for example, and you can have rational complex numbers), not just bignums.

      If you're allowing irrational numbers, or using functions that may produce irrational numbers (like roots, trig functions, logarithms), bignum rationals can no longer be exact.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    2. Re:Use infinite precision software packages by shutdown+-p+now · · Score: 1

      The question was about repeatability of results, not perfect precision - the latter is orthogonal (though, as it is usually done in software, it will also solve the repeatability problem). For many real-world applications, you don't need perfect precision so long as you can properly account for the margin of error.

  10. Your chances are pretty darned good by Red+Jesus · · Score: 5, Informative

    Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.

    In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.

    If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm (Sun also released a version of this at one point) to make the transcendental functions behave predictably.

    1. Re:Your chances are pretty darned good by Anonymous Coward · · Score: 0

      I'm not a Reddit user, but this is an example of the perfect use case for their interface: your answer should be on the top of the page. Thread closed. ;)

    2. Re:Your chances are pretty darned good by Anonymous Coward · · Score: 0

      Uh, the article submitter was asking about computing on the cloud. How do you ensure the use of -mpc64 (or GCC, for that matter) on the cloud?

  11. Re:Arbitray precision by norpy · · Score: 1

    Most of the time arbitrary precision is not necessary and it's easier (and faster) to just use a float. There are times when it matters, but for the most part people aren't doing things where it matters.

    The submitter should know better about using integer operations for things that require precision though.

  12. iEEE 754 by Jah-Wren+Ryel · · Score: 3, Insightful

    Different results on different hardware was a major problem up until CPU designers started to implement the IEEE754 standard for floating point arithmetic. IEEE754 conforming implementations should all return identical results for identical calculations

    However, x86 systems have an 80-bit extended precision format and if the software uses 80-bit floats on x86 hardware and then you run the same code on an architecture that does not support the x86 80-bit format (say, ARM or Sparc or PowerPC) then you are likely to get different answers.

    I think newer revisions of IEEE754 have support for extended precision formats up to 16-bytes, but you need to know your hardware (and how your software uses it) to make sure that you are doing equal work on systems with equal capabilities. You may have to sacrifice precision for portability.

    --
    When information is power, privacy is freedom.
    1. Re:iEEE 754 by Anonymous Coward · · Score: 0

      IEEE754 doesn't define a type with 80-bits of precision. The IEEE754 type on x86 hardware is usually a double, 64-bits wide with 53-bits of precision (i.e. binary64 under IEEE754). Using Intel's 80-bit extended precision float is precisely what you don't want to do if you want to remain portable--across systems and across time.

  13. differnt isnt always wrong by Anonymous Coward · · Score: 0

    another important question is what makes you think the numbers you are getting now
    are "correct" or just what the computer is telling you they are...
    if they are "mostly" correct who cares if the definition of mostly changes in the future.

  14. You need to know some numerical analysis by daniel_mcl · · Score: 5, Insightful

    If your calculations are processor-dependent, that's a bad sign for your code. If your results really depend on things that can be altered by the specific floating-point implementation, you need to write code that's robust to changes in the way floating-point arithmetic is done, generally by tracking the uncertainty associated with each number in your calculation. (Obviously you don't need real-time performance since you're using cloud computing in the first place.) I'm not an expert on Mathematica, but it probably has such things built in if you go through the documentation, since Mathematica notebooks are supposed to exhibit reproduceable behavior on different machines. (Which is not to say that no matter what you write it's automatically going to be reproduceable.

    Archiving hardware to get consistent results is mainly used when there are legal issues and some lawyer can jump in and say, "A-ha! This bit here is different, and therefore there's some kind of fraud going on!"

    --
    I used to read Caltizzle. I was a lot cooler than you.
    1. Re:You need to know some numerical analysis by Anonymous Coward · · Score: 0

      Maybe he's doing a chaotic simulation. For example, if you simulate galaxies colliding, then the uncertainty of individual star positions increases exponentially. You don't care about bounding it, you just want to simulate a possible timeline. Now if you notice something interesting in one particular simulation and you'd like to run it again to zoom on it, you really need reproducible arithmetics. Keeping the uncertainty in check by running with a billion digits of precision would take too long.

    2. Re:You need to know some numerical analysis by rockmuelle · · Score: 5, Insightful

      This.

      Reproducibility (what we strive for in science) is not the same as repeatability (what the poster is actually trying to achieve). Results that are not robust on different platforms aren't really scientific results.

      I wish more scientists understood this.

      -Chris

    3. Re:You need to know some numerical analysis by Red+Jesus · · Score: 4, Interesting

      While that's true in many cases, there are some situations in which we need . Read Shewchuk's excellent paper on the subject.

      When disaster strikes and a real RAM-correct algorithm implemented in floating-point arithmetic fails to produce a meaningful result, it is often because the algorithm has performed tests whose results are mutually contradictory.

      The easiest way to think about it is with a made-up problem about sorting. Let's say that you have a list of mathematical expressions like sin(pi*e^2), sqrt(14*pi*ln(8)), tan(10/13), etc and you want to sort them, but some numbers in the list are so close to each other that they might compare differently on different computers that round differently, (e.g. one computer says that sin(-10) is greater than ln(100)-ln(58) and the other says it's less).

      Imagine now that this list has billions of elements and you're trying to sort the items using some sort of distributed algorithm. For the sorting to work properly, you *need* to be sure that a < b implies that b > a. There are situations (often in computational geometry) where it's OK if you get the wrong answer for borderline cases (e.g. it doesn't matter whether you can tell whether sin(-10) is bigger than ln(100)-ln(58) because they're close enough for graphics purposes) as long as you get the wrong answer consistently, so the next algorithm out (sorting in my example, or triangulation in Shewchuk's) doesn't get stuck in infinite loops.

    4. Re:You need to know some numerical analysis by brantondaveperson · · Score: 2

      notice something interesting in one particular simulation and you'd like to run it again to zoom on it,

      If the thing you're zooming in on is dependant of the behaviour of floating point numbers, then it's not interesting from any point of view other than that. It certainly won't represent anything physically meaningful, which since we're talking about galaxy simulations I assume is the point.

    5. Re:You need to know some numerical analysis by goodminton · · Score: 1

      I'm the OP and I really appreciate this comment. I did give some thought as to whether it was reproducibility or repeatability and I decided on reproducibility because the experimental equipment (underlying hardware and firmware) would be different, different analysts would be involved, and the replication of analysis would be occurring after a long period of time. I agree though that it's not clear cut in my post.

    6. Re:You need to know some numerical analysis by Anonymous Coward · · Score: 0

      No, it does represent something physically meaningful. In chaotic systems, rounding errors behave the same as slightly perturbed inputs. No one knows the exact initial conditions anyway, so there's no reason to fight the rounding errors.

      For example, if in 1% of the runs one star is ejected at 100 times the escape velocity, then this is a real behavior that will occur in 1% of real galaxy collisions. You need to zoom in on a particular simulation that exhibits the phenomenon to understand the mechanism.

  15. Obligatory Comic by ttucker · · Score: 4, Funny
  16. Library by blueg3 · · Score: 2

    It depends on what you mean by "cloud", which is sort of a catchall term. As you've pointed out, on SaaS clouds you're going to have no guarantee of consistency, even if no time passes -- you don't know that the cloud environment is homogeneous. For (P/I)aaS clouds, you can hopefully hold constant what software is running. For example, if you have your Ubuntu 12.04 VM that runs your software, when you fire up that VM five years from now, its software hasn't changed one bit. You of course have to worry about whether or not the form you have the VM in is even usable in five years. You would hope that, even with inevitable hardware changes, if none of the software stack changes, then you'll get the same results. I'd guess that if they're running all on hardware that really correctly implements IEEE floating-point numbers, than you will in fact get consistent results. But I wouldn't bet on it.

    What you really need, unfortunately, is a library that abstracts away and quantifies the uncertainty induced by hardware limitations. There are a variety of options for these, since they're popular in scientific computing, but the overall point is that using such techniques, you can get consistent results within the stated accuracy of the library.

  17. Depends... by whiplashx · · Score: 1

    If you are depending on serious precision, floating point was not the way to go in the first place. Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

    1. Re:Depends... by cdrudge · · Score: 1

      Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

      If only there was some type of standard adopted that would make it so this wasn't the case...

    2. Re:Depends... by gnasher719 · · Score: 1

      If you are depending on serious precision, floating point was not the way to go in the first place. Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

      It's not just about the precision, it is about getting reasonable results.

      For example, if a*b >= c*d, is it guaranteed that sort (a*b - c*d) won't fail because of a negative argument? (It's not if the difference is calculated in higher than double precision). Is it guaranteed that -1
      A simple situation: If -pi/2 = x = y = pi / 2, is it guaranteed that sin (x) = sin (y)? The implementation has to be just slightly clever to guarantee this.

    3. Re:Depends... by gnasher719 · · Score: 1

      If you are depending on serious precision, floating point was not the way to go in the first place. Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

      Bloody html!!! I'll use FORTRAN (.le.) instead of less-equal :-(

      It's not just about the precision, it is about getting reasonable results.

      For example, if a*b >= c*d, is it guaranteed that sort (a*b - c*d) won't fail because of a negative argument? (It's not if the difference is calculated in higher than double precision). Is it guaranteed that -1 .le. sin (x), cos (x) .le. 1? This has nothing to do with the actual precision. If you use 500 bit precision, I still wouldn't want sin (x) = 1 + 2^(-500) for any input value.

      A simple situation: If -pi/2 .le. x .le. y .le. pi / 2, is it guaranteed that sin (x) .le. sin (y)? The implementation has to be just slightly clever to guarantee this.

  18. You may have bigger issues by guruevi · · Score: 1

    If you're worried about your program generating different results on different arch, you have some serious coding issues.

    The math should be the same on all systems. If you're worried, try 2 different systems against a known or manually calculated result, that's how the Pentium-type bugs were discovered (if you remember).

    Typically major issues in your processing units will be discovered quickly because of the ubiquity in the market. Unless you're using a custom built or compromised chip on eg primes, you shouldn't worry and even if it were compromised (the Chinese ARM chips or NSA-controlled crypto accelerators) you'll still get a valid result, just less secure.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  19. Solved problem by jrumney · · Score: 1

    The problem of inconsistent floating point calculations between machines has been solved since 1985. I'm sure moving your app into the cloud doesn't suddenly undo 28 years of computing history.

    1. Re:Solved problem by gnasher719 · · Score: 1

      The problem of inconsistent floating point calculations between machines has been solved since 1985. I'm sure moving your app into the cloud doesn't suddenly undo 28 years of computing history.

      Except it hasn't. On a PowerPC or Haswell processor, a simple calculation like a*b + c*d can legally give three different results because of the use of fused multiply-add. In the 90's to early 2000's, you would get different results because of inconsistent use of extended precision.

    2. Re:Solved problem by Red+Jesus · · Score: 2

      Not quite. IEEE754 mandates correct rounding for addition, subtraction, mutiplication, division, and square roots, but it only requires "faithful rounding" for the transcendental functions like sin, cos, and exp. That means that, for example, that even the floating point number nearest arcsin(1) is above it (i.e. correct rounding in this case requires that you round up), a math library that rounds arcsin(1) *down* is still compliant. The only requirement is that it round ot one of the two nearest floating point numbers.

    3. Re:Solved problem by Anonymous Coward · · Score: 0

      That doesn't actually solve the problem. Calculation order among other things still matters, and compilers are at liberty to reorder instructions.

  20. Frist 3D pirnter prost by Hognoxious · · Score: 1, Offtopic

    The solution is to use a 3D printer to make your own cloud.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:Frist 3D pirnter prost by Anonymous Coward · · Score: 1

      The solution is to use a 3D printer to make your own cloud.

      But if I make my own cloud, how can I pay for it using bitcoins?

      And will it be able to run on a Raspberry Pi?

      Imagine a beowulf cluster of clouds...

      ... Wait, did Natalie Portman leave? The grit is just starting to get hot.

  21. Re:Arbitray precision by weilawei · · Score: 3, Funny

    So I'm supposed to do all my calculations without any Pi? How can you have any Pi if you don't eat your machine?

  22. Rounding error by msobkow · · Score: 1

    If you're not allowing for rounding errors, your result is invalid in the first place.

    If you don't want rounding errors, use a packaged based on variable precision mathematics, like a BCD package.

    --
    I do not fail; I succeed at finding out what does not work.
  23. Why should the results change? by Anonymous Coward · · Score: 0

    Assuming there are no bugs like the Pentium fdiv bug, then there is only one way to simulate a floating point instruction correctly.
    An x87 register has 80 bits and everyone simulating those with 64 bit doubles because he wants to use SSE instructions does it wrong.

    I hope mathematica sets the rounding mode before performing calculations.

    1. Re:Why should the results change? by Anonymous Coward · · Score: 0

      An x87 register is not the only way of performing floating point calculations, nor is it the best way.

  24. one small problem by Anonymous Coward · · Score: 0

    $ make
    ERROR on line 7: Unable to reduce M_PI to a rational number; ran out of hard disk space trying to save the result.

    I'm sorta kidding, but I'm also pointing out a serious flaw in this proposal.

    If you really want it done right, use interval arithmetic and iterate each calculation until the error is within acceptable tolerance. This can also require insane amounts of storage space, but at least it will allow you to stop after a finite number of digits of e or pi based on what your computation requires in order to give an accurate answer.

    1. Re:one small problem by shutdown+-p+now · · Score: 1

      Well, the original question was about hardware floating point arithmetic, which has the same problem.

    2. Re:one small problem by Anonymous Coward · · Score: 0

      If you really want it done right, use interval arithmetic and iterate each calculation until the error is within acceptable tolerance. This can also require insane amounts of storage space, but at least it will allow you to stop after a finite number of digits of e or pi based on what your computation requires in order to give an accurate answer.

      PI is a ratio. Just like 1/3, neither can be precisely represented using decimals. You can prevent filling up your disk if you cut the umbilical cord and stop using imprecise numerical representations. Yes, it is completely possible, but only if you understand how math works.

    3. Re:one small problem by istartedi · · Score: 1

      PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK, there isn't any commonly used FP hardware that has such a bit, so yeah; 1/3 is not perfectly represented.

      This reminds me of the arguments you get from people when you try to explain that 0.9 repeating is exactly equal to 1.0.

      Their minds really get blown when you explain that 0.9 repeating is just 0.3 repeating + 0.3 repeating + 0.3 repeating. All those 3s add up to 9, all the way out into infinity. It's the same as 3*(1/3), so plainly it equals 1.0; but their minds still have a hard time dealing with 0.9 repeating equaling 1.0.

      A more succinct way to get over it? Repeating decimals are just alternative representations of numbers. The symbol known as 0.9 repeating just happens to map to the same number as 1.

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    4. Re:one small problem by Anonymous Coward · · Score: 0

      and pi could be represented perfectly if it had a PI bit.

    5. Re:one small problem by petermgreen · · Score: 1

      PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK,

      You'd need more than one extra bit to represent reccuring binary fractions because you need to store the point at which the pattern repeats. And you would still only be able to store a subset of rational numbers exactly because you would still have a limited number of bits.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    6. Re:one small problem by istartedi · · Score: 1

      You'd need more than one extra bit to represent reccuring binary fractions because you need to store the point at which the pattern repeats.

      Grrr.. yep; one bit would only cover cases like 0.789789789... It would fail on 0.768989121212...

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    7. Re:one small problem by ChunderDownunder · · Score: 1

      PI is irrational

      22/7... It all goes back to the Babylonian representation of time when there were only 22 hours in a day and thus 154 hours a week. Then some bright spark asked 'Wouldn't it be nice if there were a couple of extra hours in the day', and so the 24/7 paradigm was born. Some thought this change was irrational (c.f. daylight saving), so a formal definition of circumference = pi x diameter was adopted.

    8. Re:one small problem by wonkey_monkey · · Score: 1

      PI is a ratio. Just like 1/3, neither can be precisely represented using decimals.

      pi can't be printed in full in any base, but 1/3 can. 0.1 in base 3.

      --
      systemd is Roko's Basilisk.
    9. Re:one small problem by wonkey_monkey · · Score: 1

      Grrr.. yep; one bit would only cover cases like 0.789789789...

      With only one bit, how would you know it's not 0.78907890... or 0.7890078900...?

      --
      systemd is Roko's Basilisk.
    10. Re:one small problem by Anonymous Coward · · Score: 0

      Precisely, you could store n/3, n/7, n/15, n/31, n/63, n/127 ...

      Then 1/5 = 3/15, 1/9 = 7/63, 1/11 = 93/1023, 1/13 = 315/4095, 1/17 = 15/255, 1/19 = 13797/262143. So you need at least 18 bits of mantissa just for 1/19. I haven't worked it out but I suspect you'll exhaust a float before 1/30 and a double well before 1/100. Sounds like a lot of hardware complexity just to support rationals up to a 7-bit denominator. File under "cool idea, but impractical".

    11. Re:one small problem by Anonymous Coward · · Score: 0

      Now look what you've made me do. I've snorted my earl grey all over my keyboard!

    12. Re:one small problem by TangoMargarine · · Score: 1

      Well, 1/3rd is perfectly representable as long as you're storing it in base 3... ;)

      --
      Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
    13. Re:one small problem by p1p3 · · Score: 1

      PI is a ratio. Just like 1/3, neither can be precisely represented using decimals.

      pi can't be printed in full in any base, but 1/3 can. 0.1 in base 3.

      Guess what base I'm using? (Hint: pi = 1)

    14. Re:one small problem by Anonymous Coward · · Score: 0

      PI is irrational

      22/7...

      It all goes back to the Babylonian representation of time when there were only 22 hours in a day and thus 154 hours a week. Then some bright spark asked 'Wouldn't it be nice if there were a couple of extra hours in the day', and so the 24/7 paradigm was born. Some thought this change was irrational (c.f. daylight saving), so a formal definition of circumference = pi x diameter was adopted.

      Some Babylonian said, "Wouldn't it be easier to divide the day into 24 hours than 22 hours? Heck, why not divide one hour into 60 minutes? And one minute into 60 seconds! Also, have you seen my plan for dividing a square mile?"

  25. Google Docs by chrisgagne · · Score: 1

    Seeing as I get floating point math artifacts for simple arithmetic operations (e.g., balancing a household budget) in Google Doc spreadsheets...

  26. Ye Old Text by Anonymous Coward · · Score: 3, Insightful

    This has pretty much been the bible for many, many, many years now: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

    If you haven't read it, you should - no matter if you're a scientific developer, a games developer, or someone who writes javascript for web apps - it's just something everyone should have read.

    Forgive the Oracle-ness (it was originally written by Sun). Covers what you need to know about IEEE754 from a mathematical and programming perspective.

    Long story short, determinism across multiple (strictly IEE754 compliant) architectures while possible is hard - and likely not worth it. But if you're doing scientific computing, perhaps it may be worth it to you. (Just be prepared for a messy ride of maintaining LSB error accumulators, and giving up typically 1-3 more LSB of precision for the determinism - and not only having to worry about the math of your algorithms, but the math of tracking IEEE754 floating point error for every calculation you do).

    What you can do, easily, however is understand the amount of error in your calculations and state the calculation error with your findings.

    1. Re:Ye Old Text by Anonymous Coward · · Score: 0

      I should probably amend to this, that deterministic IEEE754 calculations obviously have an inherent performance cost with them anyway (extra operations/registers/etc tracking/checking error as you go) - and as such if this 'is' an option for you, you're likely not in a performance critical domain - in which case, if it wasn't already obvious, any precision you trade off for determinism can of course be bought back at an even greater cost w/ 64/128/265/etc bit floating point emulation.

      double/quad math libraries are easy to come by (eg: http://gcc.gnu.org/onlinedocs/libquadmath/) - any higher and you typically need to roll your own (the math isn't that hard to follow, and implementing it yourself typically takes less than a day w/ unit tests).

    2. Re:Ye Old Text by goodminton · · Score: 1

      Awesome! Thank you for both the posts, I'm the OP and I really appreciate them. Your comment about scientific computing is spot on with the use cases I'm interested in.

  27. yeah, don't be lazy by rewindustry · · Score: 2

    floats are soft option, only gets us all in trouble.

    remember

    we are pentium of borg, division is futile

    1. Re:yeah, don't be lazy by flargleblarg · · Score: 1

      floats are soft option, ...

      Too many shadows, whispering voices
      faces on posters, too many choices
      If? When? Why? What?
      How much have you got?
      Have you got it? Do you get it?
      If so, how often?
      Which do you choose
      a hard or soft option?

  28. Re:Arbitray precision by fractoid · · Score: 1

    Responsible programmers store each value in the manner most suitable for that value. The reality is that very few applications actually care about the exact to-the-bit result of floating point ops, and floating point arithmetic should always be regarded as inexact.

    --
    Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
  29. Is it just a language barrier? by s.petry · · Score: 3, Informative

    My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.

    Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.

    Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.

    I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.

    Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.

    Having "Cloud" does not change how precision works in Math languages.

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    1. Re:Is it just a language barrier? by multimediavt · · Score: 2

      His whole question and narrative is telling. This is obviously someone that has no idea what he is doing nor why. He is also most likely in violation of Wolfram's license agreement on top of his lack of computational knowledge. He should have stopped at web statistics and stayed there.

    2. Re:Is it just a language barrier? by Anonymous Coward · · Score: 0

      Burn The Cloud-Head-Revolved Witch!

      (jeez, the man was just askin..)

  30. First, identify the problem by putaro · · Score: 1

    You ask:

    Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same?

    If the calculation is 2 + 2 I'd says the odds are pretty good you're going to get 4. I assume you're actually doing some difficult calculations that may push some of the edge cases in the floating point system. What I would do is make some test routines that stress the areas that you're interested in and run and check the results of those before doing any serious calculations. For the most part, you're going to have to assume that the basic functions work and there aren't simply specific combos like 17454423.2 + 99921234.1 that always gives the wrong answer since you can't check for those really but the usual concern is around the edge case handling and you should be able to define what you think is normal and make sure that your environment conforms to your definition of normal.

  31. Try it on a single PC first by TheloniousToady · · Score: 1

    I currently have a Matlab script that produces slightly different FIR filter design coefficients each time I run it - when run on the same version of Matlab on the same machine. And this is with Matlab, whose primary selling point is its industrial-strength mathematical "correctness".

    Also, I once used a C compiler that wouldn't produce consistent builds, and not just by a timestamp. The compiler vendor said that a random factor was used to decide between optimization choices that scored equally. We finally had to ask the vendor to remove that "feature" so we could reproduce a build, which was required as a condition for software release.

    So, good luck reproducing math results in the cloud, and over many years.

    1. Re:Try it on a single PC first by Megane · · Score: 1

      So "Monte-Carlo optimization"? Wow. You should submit that story to thedailywtf.com, there's probably a free coffee mug in it for you.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  32. Write a test suite by Anonymous Coward · · Score: 0

    Write a test suite that verifies all the behaviour you expect a system to provide to your code. Then, 10 years from now on whatever system you have to use, run those tests and make sure they pass.

  33. Simulate IEEE754-compliant FPU? by Dputiger · · Score: 1

    Can't Mathematica be told to stick to an 80-bit precision output? If you can specify that in software, it shouldn't matter what code the underlying platform runs on.

  34. Fuzzy Logic by Tablizer · · Score: 1

    cloud + jello * cotton / fog = fuzz

    1. Re:Fuzzy Logic by Anonymous Coward · · Score: 0

      No, cloud + jello * cotton / fog = fluff because you used non-utf characters

  35. What if it's not reproducible? by gnasher719 · · Score: 1

    Floating-point arithmetic will produce rounded results. The rounded result of a single operation will depend on the exact hardware, compiler etc. that is used. x86 compilers many years ago sometimes used extended precision instead of double precision, giving slightly different results (usually more precises). PowerPC processors and nowadays Haswell processors have fused multiply-add, which can give slightly different results (usually more precise). So the same code with the same inputs could give slightly different results.

    The IEEE floating-point standard requires double precision with a 53 bit mantissa. They might have required a 54 or 52 bit mantissa, which would give slightly different rounding errors.

    Now my point: If your code performing all these operations produces almost the same results on different implementations, then it is quite likely that your code is right. If you get vastly different results, then your code is likely wrong or the problem is very hard.

    Some developers think that getting identical answers means that the answers are good. That's not true at all. If you have small differences due to slightly different rounding then there is a good chance that your results are good. Identical results guarentee nothing.

  36. Associated concern by cold+fjord · · Score: 1

    If you haven't already you may want to have a look at Interval arithmetic since it addresses some associated issues. It is supported in various development environments and libraries.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  37. This is just toooo technical by NEDHead · · Score: 1

    I still have trouble with 1+1=10

    1. Re:This is just toooo technical by Megane · · Score: 1

      There are only 10 kinds of people in the world: those who understand binary, and those who don't.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  38. IEEE 754-2008 by TheSync · · Score: 1

    If the math has been calculated with IEEE 754-2008, it is IEEE 754-2008 (aka ISO/IEC/IEEE 60559:2011). Should not matter what you are running it on...

  39. Consistency of results by Anonymous Coward · · Score: 0

    To obtain consistency of results, you'll have to not rely on the hardware numerics at all. Use a software implementation of floating point (MPFR is a good one but there are others) that is included in your application. It is unlikely that the basic integer operations will change underneath you (but it is still possible). If you can't accept the hit for a full software implementation, rely on IEEE 754 arithmetic and code your algorithms very very carefully. Numerical analysis is difficult and full of traps for the naïve or unwary. Again, use a good library where possible.

    Well written numeric code will be tolerant of slight deviations in results. e.g. the x86 uses 80 bit intermediaries while most other platforms use 64 and this causes subtle differences and a well designed algorithm will tolerate either, a poorly designed one will fail on one or both.

  40. Pi is not a ratio. It is irrational. by Anonymous Coward · · Score: 1

    It is not like 1/3.

    You need to go back to math class.

    1. Re:Pi is not a ratio. It is irrational. by wonkey_monkey · · Score: 1

      Pi is not a ratio.

      Who needs to go back to math class now?

      The number [pi] is a mathematical constant that is the ratio of a circle's circumference to its diameter,

      That's a ratio, in case you missed it. A ratio.

      --
      systemd is Roko's Basilisk.
  41. Hardware Arb Precision Decimal Processors by the+eric+conspiracy · · Score: 1

    I could see one thing happening over time. Right now a lot of software does calculations involving decimal fractions in floating point. The problem with this is that in general you cannot precisely represent a decimal fraction using a binary floating point number. This is why you often see results like a-b = 0.19999999999999.

    Well I think it is possible that we could see development of hardware arithmetic units that would internally use arbitrary precision fixed point calculations to do these sorts of calculations to eliminate these sorts of errors. So when you run your current programs on these processors the improved representation of decimal fractions would lead to slightly different results.

  42. Unfortunately... by Anonymous Coward · · Score: 0

    Actual mileage varies greatly. On x86 hardware C long double is 12 bytes, but on other hardware it's 16. If you're running an iterative process you are almost certainly not going to converge to the same result. Arbitrary precision arithmetic will mitigate the problem, but there are lots of opportunities to get in trouble. This is why regression tests are so important for mathematical codes.

    You really need an analytic case you can check against. That can be really hard to come up with in many circumstances. It is hard to write code that will produce the same results with different revisions of a compiler on a single processor. It gets a lot harder when the processor changes. Numerical analysts worry a lot about such things. There have been papers which showed wildly divergent results for well written code on different processors. Compiler optimization can do things that have very unintuitive results.

    1. Re:Unfortunately... by gl4ss · · Score: 1

      if the results are different enough to lead to different logical conclusions about what was being calculated then the whole method of using it as basis for decisions/deductions is pretty suspect and one should ask if the scientist in question chose 12bytes vs 16bytes to get the result he wanted.

      otoh, having the flags on to behave per standard it should behave per standard.

      --
      world was created 5 seconds before this post as it is.
  43. Hamming's Motto by dido · · Score: 1

    You would do well to remember a quotation attributed to Richard W. Hamming: "The purpose of computing is insight, not numbers."

    --
    Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
  44. Perfect reproduction is difficult / undesireable by ljhiller · · Score: 1
    This came up before with Java, which, in its original incarnation, demanded exact reproduction of floating point results...with horrible horrible results. Generally, when people perform floating point calculations, they want AN answer, not THE answer, because they know there isn't a unique exact answer.

    This issue was described far better than I can in William Kahan's essay, How Java's floating point hurts everyone everywhere

  45. Integer, floating and interval arithmetic by Mr+Z · · Score: 1

    I remember a quote, attributed (likely incorrectly) to Seymour Cray: "Do you want it fast, or do you want it accurate?"

    If you want absolutely exact arithmetic, code it entirely with arbitrary precision exact integer arithmetic. All rational real numbers can be expressed in terms of integers, and you can directly control the precision of approximation for irrational real numbers. Indeed, if your rational numbers get unwieldy, you can even control how they are approximated. And complex numbers, of course, are just pairs of real numbers in practice. (Especially if you stick to rectangular representations.) If you stick to exact, arbitrary precision integer arithmetic and representations derived from that arithmetic that you control, then you can build a bit-exact, reproducible mathematics environment. This is because integer arithmetic is exact, and you have full control of the representation built on top of that. Such an environment is very expensive, and not necessarily helpful. You can even relax the order of operations, if you can defer losses of precision. (For example, you can add a series of values in any order in integer arithmetic as long as you defer any truncation of the representation until after the summation.)

    If you venture into floating point, IEEE-754 gives you a lot of guarantees. But, you need to specify the precision of each operation, the exact order of operations, and the rounding modes applied to each operation. And you need to check the compliance of the implementation, such as whether subnormals flush to zero (a subtle and easy to overlook non-conformance). Floating point arithmetic rounds at every step, due to its exponent + mantissa representation. So, order of operations matters. Vectorization and algebraic simplification both change the results of floating point computations. (Vectorization is less likely to if you can prove that all the computations are independent. Algebraic simplification, however, can really change the results of a series of adds and subtracts. It's less likely to largely affect a series of multiplies, although it can affect that too.)

    And behind curtain number three is interval arithmetic. That one is especially interesting, because it keeps track at every step what the range of outcomes might be, based on the intervals associated with the inputs. For most calculations, this will just result in relatively accurate error bars. For calculations with sensitive dependence on initial conditions (ie. so-called "chaotic" computations), you stand a chance of discovering fairly early in the computation that the results are unstable.

  46. False assumption by bertok · · Score: 4, Informative

    This assumption by the OP:

    Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.

    ... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:

    N[Sin[1], 50]

    Will always evaluate to exactly:

    0.84147098480789650665250232163029899962256306079837

    And, as expected, evaluating to 51 digits yields:

    0.841470984807896506652502321630298999622563060798371

    Notice how the last digit in the first case remains unchanged, as expected.

    This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.

    Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.

    1. Re:False assumption by ByteSlicer · · Score: 2

      Notice how the last digit in the first case remains unchanged, as expected.

      It only remains unchanged because it rounds down.
      N[Sin[1], 48] will end with ...60798
      N[Sin[1], 47] will end with ...6080
      Calculated on Wolfram Alpha.

    2. Re:False assumption by twistedcubic · · Score: 1

      It's true the author of the post hasn't done any research, but the answers to the question (like use GNU MPFR) are valuable to others.

  47. The one time you acutally use Java. by VortexCortex · · Score: 1

    In x86 based processors we've had BCD (binary coded decimal) instructions for ages. I use those in my assembly project, or emulate unlimited bit length floating points with integer math in my big-num libs. However, modern languages do not rely on the hardware features like BCD.

    In Matlab you should used fixedpoint math. That's pretty dumb, but it garauntees the precision will be the same on whatever platform.

    Lacking a bignum lib with garaunteed behaviors, one could just use Java. Java emulates floating point values. That's why I don't use it: I NEED FPU speed. Java makes garauntees about its floating point operation behaviors -- which can varry by processor. The processor may have a 64 bit float type, but use 80 (or more) bits of internal representation, and only clip it to 64 bits on mem-write. You should treat hardware FPU calculations as imprecice -- This is why my physics engine has an epsilon (error bar) for equalities and such -- Without an error tolerance desynch on multiple clients is prevalent and minor rounding errors can lead to physics explosions when small values are divided beyond the precision of the machine. However, with Java your floating point behaviors are garaunteed. If you can't use fixed point or your application doesn't have support for binary coded decimal or equivalent bigint facilites with garauntees about precision, then USE JAVA DAMNIT. It's (mostly) cross platform -- That's its selling point: Write once, debug everywhere, but at least your slow as death floats will produce the same values.

    1. Re:The one time you acutally use Java. by VortexCortex · · Score: 1

      guarantee - Gua ran tee; Hooked on phonics didn't work for me!

  48. Not true in the real world. by knorthern+knight · · Score: 1

    > Floating point and integer operations are well defined. Unless someone fucks up
    > with implementing the floating point unit the result should be exactly the same.

    Not true in the real world. See http://slashdot.org/story/13/07/28/137209/same-programs--different-computers--different-weather-forecasts There was a scientic paper about the same weather model producing different forecast outputs on different machines.

    --

    I'm not repeating myself
    I'm an X window user; I'm an ex-Windows user
  49. Floating point is hard. by Animats · · Score: 1

    Kahan, of course, is the authority on this.

    Handling of floating point overflow is a big problem. Under Windows on x86, you can get exact (as in at the right instruction location) floating point exceptions, and I've used that to catch overflow in a physics engine. But on some CPUs, there's a speed penalty for enabling exact FPU exceptions. Java and Go don't support floating point exceptions; they return NaN or +INF or -INF or 0 (for underflow). One problem with IEEE floating point is that you don't have trichotomy. When you compare with a NAN, the result is always supposed to be false. So a != b and !(a == b) are not equivalent.

    Doing a numerical compare against a NaN should raise an exception. That way, you can crunch your matrices at full speed, any operation with a NaN as an input has a NaN as an output, and if there's a NaN in the final results, code that uses it without checking for it faults out. But when IEEE floating point was designed, FPUs were separate chips. (In some cases, separate boards.) So the floating point design group didn't have the mandate to affect what the branching part of the CPU did.

    As a result, you can generate a NaN, miscompare against it (all comparisons return false) and take the wrong branch in the code without recognizing the problem. Not many people care about this stuff, but where it matters, it's usually about something important.

  50. No, for many reasons by khb · · Score: 1

    The short answer is no. The long answer is no ... and a very long list of reasons why.

    Start with reading Goldbergs classic paper "What Every Computer Scientist Should Know About Computer Arithmetic" Sun's floating point group made some improvements to the paper and paid for rights to redistribute. Oracle continues to do so. http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

    If that isn't depressing enough, and you use trig functions, read http://www.scribd.com/doc/64949170/Ng-Argument-Reduction-for-Huge-Arguments-Good-to-the-Last-Bit you can get the source from netlib for "fdlibm" which is under a BSD flavor license.

    If the purely software issues haven't made you realize that you haven't got much of a prayer, please note that different revs of the same intel chips sometimes provide slightly different results (sometimes intentionally, sometimes as a result of tweaking the order of execution in the out of order execution engine). Older x87 arithmetic was 80-bit, newer x64 arithmetic is pure 64-bit, providing no end of fun. Using the SSE instructions provides more variation.

    If the pretty much (in principle) "simple" and potentially deterministic software issues aren't enough consider the reality of hw. Chessin has a very good, yet amusing, explanation of the key problems http://queue.acm.org/detail.cfm?id=1839574

    Lest you think they only apply to a particular generation of boutique processor, most HPC ensembles are now built out of standard server motherboards and chips.

    http://www.csm.ornl.gov/srt/conferences/ResilienceSummit/2010/pdf/michalak.pdf The issue of undetected soft errors is big and growing, as can be seen from the activity in the literature. SC13 "ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection" (which has lots of good citations of earlier work, including field data such as 27 soft errors per week leading to fatal node failures (that is, wrong enough results that while the hw didn't detect any problem, the issue caused the node to crash) on just one ensemble (ASC Q). its going mainstream in that HPCwire caught wind and in 31 Oct 2013 had a nice tabloidesqe writeup entitled "Addressing the Threat of Silent Data Corruption"

    Neutron's don't only disrupt memory elements, but can hit logic as well. See the upcoming issue (already available via IEEE xplorer for member/subscribers) JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 1, JANUARY 2014 The 10th Generation 16-Core SPARC64 Processor for Mission Critical UNIX Server" which details the lengths some (but not many) go to ensure that there are no undetected errors (wide range of techniques, ranging from where wires are placed on the chip, ECC, parity, residue arithmetic, automatic retry, etc.). No doubt there are some good (similar) papers in the IBM Technical Journal.

    No doubt a good literature search would turn up dozens of other papers, and circuit design textbooks cover some of the territory.

    In principle, interval arithmetic could provide a solution (you might not get the same interval, but if the intervals nest, you have consistent results and if they are disjoint you have a bug ... and if they nest, the narrower one is "sharper" which is better). In practice, most algorithms haven't been reworked for good interval implementation, languages don't provide very good support, nor does most hardware. All fixable in principle, but unlikely to be the solution you seek for todays off the shelf virtual systems available cheaply.

    1. Re:No, for many reasons by michael_cain · · Score: 1

      The LANL presentation (and related material) is important to anyone who conducts extremely long-running calculations and thinks that they can have repeatability. Nor are the results new. 20+ years ago, paired lock-stepped 68020 processors with external hardware continuously checking pin states on output found that single-bit differences in results occurred about once every 30 days (proprietary data that I was shown under NDA, don't know that it was ever published). Contemporary hardware has considerably smaller geometry and runs at much higher clocks and lower voltages.

  51. Mathematics is more than real numbers by jandersen · · Score: 1

    I have seen some of the answers given by other people, and many seem to miss the point of floating point calculations. Floating point is by its very nature imprecise, and when you choose to use it, you have to keep that in mind - the task you want to perform must be one where a certain degree of imprecision does not matter. What you are after is not exact reproducibility, but simply that your results stay within accepted error margins, and depending on the nature of your calculations, these may be very wide - I believe you can still find astronimical measurements where ther error margin is something like +/- 200%.

    However, it is a misconception to equate "maths" with "doing numbers", as only a fairly minor part of mathematics have to with numbers; and there are, in fact, computer tools out there for non-numerical calculations, like GAP (http://www.gap-system.org/). And although I haven't seen Mathematica for many years, I believe one of its main features is the ability to solve equations symbolically - ie without numerical caulculations - the result of which is going to be either correct and therefore precise, or incorrect.

  52. Strictfp and equivalents? by Anonymous Coward · · Score: 0

    The strictfp keyword in Java guarantees identical floating point calculations. Conveniently, many or most cloud services support Java. There may be equivalent ways in other languages.

  53. Trouble with math libraries by Anonymous Coward · · Score: 0

    One, hopefully helpful comment to the original question:

    IEEE-754 is well-defined and reproducable, assuming its' user understands what he's doing, and understands to enforce IEEE-754 compliance on compilation.

    There's another problem, though: IEEE-754 defines essentially only addition, subtraction, multiplication, division and square root as operations with precisely defined rounding. If your routine uses any other routine (say, from math.h), all bets are off. Multiple implementations of math libraries exist, and the results may even change when system libraries are updated. This causes functions as simple as sin() to possibly produce considerably different results on the least significant bits of a result, and of course these differences can amplify over the course of computation.

    In short: know what you're doing, and don't make assumptions. Maybe only reasonable assumption to make is that a same process produces same result for same computation with same inputs... unless you're using specific versions of Java runtime, where JIT used different code for some transcendental functions inside JITted code and outside it. That was real pain in the ass.

    But, in the end, "the cloud" is likely to be the smallest contributor to the problems you might encounter. Everything above virtualized hardware is really to blame, and should actually be responsible to know what's being done. You as user, are on top of that stack.

  54. Two choices by gigaherz · · Score: 1

    1. You can tweak your algorithms so that they minimize the error instead of accumulating it -- which you should be doing regardless of your need for reproducibility --, or

    2. You can use alternative methods like software implementations of floating point, "decimal" (look at the System.Decimal type in .NET for an example), or even arbitrary-precision libraries.

  55. Well by luis_a_espinal · · Score: 1

    How Reproducible Is Arithmetic In the Cloud?

    As reproducible as you configure it to be. Fundamentally no different from running Mathematica (or a similar package) on a Beowulf cluster or in any assortment of machinery.

    "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time

    And configuration, and choice of numeric data types, and choices of operators (.ie. division vs multiplication).

    In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types.

    I doubt anyone is making such type of research. And the only way to ensure replicability of results is by strictly using fixed-precision numeric data types (instead of relying on floating point types.)

  56. Re:many problems become much easier by careysb · · Score: 1

    "Large" is a relative term. Original estimate for healthcare.gov was 5 billion. They went with the cheapest bidder for 1 billion.

  57. Prime95 and 80387 Coprocessor Bug by Anonymous Coward · · Score: 0

    Some of the Mersenne Prime hunters (who, incidentally, abuse the FPU to get arbitrary precision fixed point calculations on huge numbers, like millions of digits) once posted a reward, which has gone unclaimed for many years, for anyone who could show a practical consequence of the 80387 math bug which involved approximately the fifth significant digit.

    The point is that nothing is built that precisely, and only a few standards measurements even care about that degree of precision....for example, the Ohm is only known to a few parts per million. This sensitive dependence on initial conditions is why weather forecasts, such as hurrican path forecasts, are based on ensemble averages.

    OP, go, RTFM...and if you require bit-for-bit reproduction, then you need a bit-accurate system to run on and a way to double check if more than about 10^14 operations are required. There will be a cost to this reproducability, compared to running on the native floating-point units on the different machines used....whether you simulate the original machine instructions or use a language like Mathematica (or bignum, or, or, or...) that exactly specifies the arithmetic in arbitrary precision.

    Otherwise, some statement needs to be made about the sensitivity of the calculation to its inputs and numerical errors, and those consequences followed through on. If the claim is that results are close if inputs are close, (that is, it converges in spite of small errors), then these small errors aren't important. Chaos (Lorenz style, and weather style), however, states that the small errors grow until eventually the bounds fill the entire space of possible outputs, and you will need to take the ensemble approach to get a distribuition.

  58. Sampling by benob · · Score: 1

    If your results depend on hardware, software and so on, what you are doing is sampling from the solution space. You can then model that distribution and perform significance testing vs that distribution. What is the probability of your result being correct? your result belonging to the true distribution?

    Statistics over mathematical proofs. That's what you want to do.