Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?
goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."
Use Fixed-point arithmetic.
In Mathematica make sure to specify your precision.
Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.
The result is always the same, but the definition of reality is changing. The result of every single calculation is in fact 42 in some units. The hard part is figuring out the units.
Some drink at the fountain of knowledge. Others just gargle.
This problem is far broader than arithmetic. Any distributed system based on elements out of your control is bound to be somewhat unstable. For example, an app that uses google maps, or a utility to check your bank account. The tradeoff for having more capability than you could manage yourself, is that you don't get to manage it yourself.
First sentence seems stilted at best.
Some days it's just not worth
chewing through my restraints.
Just scroll down a couple of posts. "Quite soon the Wolfram Language is going to start showing up in lots of places, notably on the web and in the cloud."
Problem solved!
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
If the value your computing is so dependent of the details of float point implementation that you'er worried about it, you probably have an issue of numerical stability and the results you are computing are likely useless, so this is really a mute point.
What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.
Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.
In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.
If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm (Sun also released a version of this at one point) to make the transcendental functions behave predictably.
Most of the time arbitrary precision is not necessary and it's easier (and faster) to just use a float. There are times when it matters, but for the most part people aren't doing things where it matters.
The submitter should know better about using integer operations for things that require precision though.
Different results on different hardware was a major problem up until CPU designers started to implement the IEEE754 standard for floating point arithmetic. IEEE754 conforming implementations should all return identical results for identical calculations
However, x86 systems have an 80-bit extended precision format and if the software uses 80-bit floats on x86 hardware and then you run the same code on an architecture that does not support the x86 80-bit format (say, ARM or Sparc or PowerPC) then you are likely to get different answers.
I think newer revisions of IEEE754 have support for extended precision formats up to 16-bytes, but you need to know your hardware (and how your software uses it) to make sure that you are doing equal work on systems with equal capabilities. You may have to sacrifice precision for portability.
When information is power, privacy is freedom.
They may be well defined but nobody implements fully standards compliant FP units and they have subtle differences in output. Even with identical hardware, configurable settings like rounding modes may also differ between instances.
I am becoming gerund, destroyer of verbs.
If your calculations are processor-dependent, that's a bad sign for your code. If your results really depend on things that can be altered by the specific floating-point implementation, you need to write code that's robust to changes in the way floating-point arithmetic is done, generally by tracking the uncertainty associated with each number in your calculation. (Obviously you don't need real-time performance since you're using cloud computing in the first place.) I'm not an expert on Mathematica, but it probably has such things built in if you go through the documentation, since Mathematica notebooks are supposed to exhibit reproduceable behavior on different machines. (Which is not to say that no matter what you write it's automatically going to be reproduceable.
Archiving hardware to get consistent results is mainly used when there are legal issues and some lawyer can jump in and say, "A-ha! This bit here is different, and therefore there's some kind of fraud going on!"
I used to read Caltizzle. I was a lot cooler than you.
http://www.smbc-comics.com/?id=2999
It depends on what you mean by "cloud", which is sort of a catchall term. As you've pointed out, on SaaS clouds you're going to have no guarantee of consistency, even if no time passes -- you don't know that the cloud environment is homogeneous. For (P/I)aaS clouds, you can hopefully hold constant what software is running. For example, if you have your Ubuntu 12.04 VM that runs your software, when you fire up that VM five years from now, its software hasn't changed one bit. You of course have to worry about whether or not the form you have the VM in is even usable in five years. You would hope that, even with inevitable hardware changes, if none of the software stack changes, then you'll get the same results. I'd guess that if they're running all on hardware that really correctly implements IEEE floating-point numbers, than you will in fact get consistent results. But I wouldn't bet on it.
What you really need, unfortunately, is a library that abstracts away and quantifies the uncertainty induced by hardware limitations. There are a variety of options for these, since they're popular in scientific computing, but the overall point is that using such techniques, you can get consistent results within the stated accuracy of the library.
If you are depending on serious precision, floating point was not the way to go in the first place. Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.
Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).
So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.
Do you even lift?
These aren't the 'roids you're looking for.
If you're worried about your program generating different results on different arch, you have some serious coding issues.
The math should be the same on all systems. If you're worried, try 2 different systems against a known or manually calculated result, that's how the Pentium-type bugs were discovered (if you remember).
Typically major issues in your processing units will be discovered quickly because of the ubiquity in the market. Unless you're using a custom built or compromised chip on eg primes, you shouldn't worry and even if it were compromised (the Chinese ARM chips or NSA-controlled crypto accelerators) you'll still get a valid result, just less secure.
Custom electronics and digital signage for your business: www.evcircuits.com
The problem of inconsistent floating point calculations between machines has been solved since 1985. I'm sure moving your app into the cloud doesn't suddenly undo 28 years of computing history.
The solution is to use a 3D printer to make your own cloud.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
So I'm supposed to do all my calculations without any Pi? How can you have any Pi if you don't eat your machine?
If you're not allowing for rounding errors, your result is invalid in the first place.
If you don't want rounding errors, use a packaged based on variable precision mathematics, like a BCD package.
I do not fail; I succeed at finding out what does not work.
I can recall a physics simulation I was involved in years ago that got differences of 10% depending on what hardware we ran it on. Turned out the Sun &SGI workstations used 64 bit FP, while the IBM box used some 128 bit or something like that. Took a while to track that one down...
--- Often in error; never in doubt!
Seeing as I get floating point math artifacts for simple arithmetic operations (e.g., balancing a household budget) in Google Doc spreadsheets...
This has pretty much been the bible for many, many, many years now: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If you haven't read it, you should - no matter if you're a scientific developer, a games developer, or someone who writes javascript for web apps - it's just something everyone should have read.
Forgive the Oracle-ness (it was originally written by Sun). Covers what you need to know about IEEE754 from a mathematical and programming perspective.
Long story short, determinism across multiple (strictly IEE754 compliant) architectures while possible is hard - and likely not worth it. But if you're doing scientific computing, perhaps it may be worth it to you. (Just be prepared for a messy ride of maintaining LSB error accumulators, and giving up typically 1-3 more LSB of precision for the determinism - and not only having to worry about the math of your algorithms, but the math of tracking IEEE754 floating point error for every calculation you do).
What you can do, easily, however is understand the amount of error in your calculations and state the calculation error with your findings.
Well, the original question was about hardware floating point arithmetic, which has the same problem.
floats are soft option, only gets us all in trouble.
remember
we are pentium of borg, division is futile
Responsible programmers store each value in the manner most suitable for that value. The reality is that very few applications actually care about the exact to-the-bit result of floating point ops, and floating point arithmetic should always be regarded as inexact.
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.
Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.
Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.
I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.
Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.
Having "Cloud" does not change how precision works in Math languages.
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
You ask:
Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same?
If the calculation is 2 + 2 I'd says the odds are pretty good you're going to get 4. I assume you're actually doing some difficult calculations that may push some of the edge cases in the floating point system. What I would do is make some test routines that stress the areas that you're interested in and run and check the results of those before doing any serious calculations. For the most part, you're going to have to assume that the basic functions work and there aren't simply specific combos like 17454423.2 + 99921234.1 that always gives the wrong answer since you can't check for those really but the usual concern is around the edge case handling and you should be able to define what you think is normal and make sure that your environment conforms to your definition of normal.
I currently have a Matlab script that produces slightly different FIR filter design coefficients each time I run it - when run on the same version of Matlab on the same machine. And this is with Matlab, whose primary selling point is its industrial-strength mathematical "correctness".
Also, I once used a C compiler that wouldn't produce consistent builds, and not just by a timestamp. The compiler vendor said that a random factor was used to decide between optimization choices that scored equally. We finally had to ask the vendor to remove that "feature" so we could reproduce a build, which was required as a condition for software release.
So, good luck reproducing math results in the cloud, and over many years.
Can't Mathematica be told to stick to an 80-bit precision output? If you can specify that in software, it shouldn't matter what code the underlying platform runs on.
Table-ized A.I.
Intel x87 scalar FP instructions use an 80 bit internal format for higher precision. Intel SSE2 vector FP instructions use 64 bits. You will see last bit variations depending on which instructions the compiler chooses.
And the compiler may choose differently depending on whether it's compiling for 32-bit or 64-bit x86.
Floating-point arithmetic will produce rounded results. The rounded result of a single operation will depend on the exact hardware, compiler etc. that is used. x86 compilers many years ago sometimes used extended precision instead of double precision, giving slightly different results (usually more precises). PowerPC processors and nowadays Haswell processors have fused multiply-add, which can give slightly different results (usually more precise). So the same code with the same inputs could give slightly different results.
The IEEE floating-point standard requires double precision with a 53 bit mantissa. They might have required a 54 or 52 bit mantissa, which would give slightly different rounding errors.
Now my point: If your code performing all these operations produces almost the same results on different implementations, then it is quite likely that your code is right. If you get vastly different results, then your code is likely wrong or the problem is very hard.
Some developers think that getting identical answers means that the answers are good. That's not true at all. If you have small differences due to slightly different rounding then there is a good chance that your results are good. Identical results guarentee nothing.
If you haven't already you may want to have a look at Interval arithmetic since it addresses some associated issues. It is supported in various development environments and libraries.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
I still have trouble with 1+1=10
So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.
Google for FP_CONTRACT. Quote from the C Standard:
A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the expression evaluation method. The FP_CONTRACT pragma in provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.
If the math has been calculated with IEEE 754-2008, it is IEEE 754-2008 (aka ISO/IEC/IEEE 60559:2011). Should not matter what you are running it on...
It is not like 1/3.
You need to go back to math class.
I could see one thing happening over time. Right now a lot of software does calculations involving decimal fractions in floating point. The problem with this is that in general you cannot precisely represent a decimal fraction using a binary floating point number. This is why you often see results like a-b = 0.19999999999999.
Well I think it is possible that we could see development of hardware arithmetic units that would internally use arbitrary precision fixed point calculations to do these sorts of calculations to eliminate these sorts of errors. So when you run your current programs on these processors the improved representation of decimal fractions would lead to slightly different results.
PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK, there isn't any commonly used FP hardware that has such a bit, so yeah; 1/3 is not perfectly represented.
This reminds me of the arguments you get from people when you try to explain that 0.9 repeating is exactly equal to 1.0.
Their minds really get blown when you explain that 0.9 repeating is just 0.3 repeating + 0.3 repeating + 0.3 repeating. All those 3s add up to 9, all the way out into infinity. It's the same as 3*(1/3), so plainly it equals 1.0; but their minds still have a hard time dealing with 0.9 repeating equaling 1.0.
A more succinct way to get over it? Repeating decimals are just alternative representations of numbers. The symbol known as 0.9 repeating just happens to map to the same number as 1.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
You would do well to remember a quotation attributed to Richard W. Hamming: "The purpose of computing is insight, not numbers."
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
This issue was described far better than I can in William Kahan's essay, How Java's floating point hurts everyone everywhere
PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK,
You'd need more than one extra bit to represent reccuring binary fractions because you need to store the point at which the pattern repeats. And you would still only be able to store a subset of rational numbers exactly because you would still have a limited number of bits.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
I remember a quote, attributed (likely incorrectly) to Seymour Cray: "Do you want it fast, or do you want it accurate?"
If you want absolutely exact arithmetic, code it entirely with arbitrary precision exact integer arithmetic. All rational real numbers can be expressed in terms of integers, and you can directly control the precision of approximation for irrational real numbers. Indeed, if your rational numbers get unwieldy, you can even control how they are approximated. And complex numbers, of course, are just pairs of real numbers in practice. (Especially if you stick to rectangular representations.) If you stick to exact, arbitrary precision integer arithmetic and representations derived from that arithmetic that you control, then you can build a bit-exact, reproducible mathematics environment. This is because integer arithmetic is exact, and you have full control of the representation built on top of that. Such an environment is very expensive, and not necessarily helpful. You can even relax the order of operations, if you can defer losses of precision. (For example, you can add a series of values in any order in integer arithmetic as long as you defer any truncation of the representation until after the summation.)
If you venture into floating point, IEEE-754 gives you a lot of guarantees. But, you need to specify the precision of each operation, the exact order of operations, and the rounding modes applied to each operation. And you need to check the compliance of the implementation, such as whether subnormals flush to zero (a subtle and easy to overlook non-conformance). Floating point arithmetic rounds at every step, due to its exponent + mantissa representation. So, order of operations matters. Vectorization and algebraic simplification both change the results of floating point computations. (Vectorization is less likely to if you can prove that all the computations are independent. Algebraic simplification, however, can really change the results of a series of adds and subtracts. It's less likely to largely affect a series of multiplies, although it can affect that too.)
And behind curtain number three is interval arithmetic. That one is especially interesting, because it keeps track at every step what the range of outcomes might be, based on the intervals associated with the inputs. For most calculations, this will just result in relatively accurate error bars. For calculations with sensitive dependence on initial conditions (ie. so-called "chaotic" computations), you stand a chance of discovering fairly early in the computation that the results are unstable.
Program Intellivision!
This assumption by the OP:
Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.
... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:
N[Sin[1], 50]
Will always evaluate to exactly:
0.84147098480789650665250232163029899962256306079837
And, as expected, evaluating to 51 digits yields:
0.841470984807896506652502321630298999622563060798371
Notice how the last digit in the first case remains unchanged, as expected.
This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.
Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.
In x86 based processors we've had BCD (binary coded decimal) instructions for ages. I use those in my assembly project, or emulate unlimited bit length floating points with integer math in my big-num libs. However, modern languages do not rely on the hardware features like BCD.
In Matlab you should used fixedpoint math. That's pretty dumb, but it garauntees the precision will be the same on whatever platform.
Lacking a bignum lib with garaunteed behaviors, one could just use Java. Java emulates floating point values. That's why I don't use it: I NEED FPU speed. Java makes garauntees about its floating point operation behaviors -- which can varry by processor. The processor may have a 64 bit float type, but use 80 (or more) bits of internal representation, and only clip it to 64 bits on mem-write. You should treat hardware FPU calculations as imprecice -- This is why my physics engine has an epsilon (error bar) for equalities and such -- Without an error tolerance desynch on multiple clients is prevalent and minor rounding errors can lead to physics explosions when small values are divided beyond the precision of the machine. However, with Java your floating point behaviors are garaunteed. If you can't use fixed point or your application doesn't have support for binary coded decimal or equivalent bigint facilites with garauntees about precision, then USE JAVA DAMNIT. It's (mostly) cross platform -- That's its selling point: Write once, debug everywhere, but at least your slow as death floats will produce the same values.
You'd need more than one extra bit to represent reccuring binary fractions because you need to store the point at which the pattern repeats.
Grrr.. yep; one bit would only cover cases like 0.789789789... It would fail on 0.768989121212...
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
if the results are different enough to lead to different logical conclusions about what was being calculated then the whole method of using it as basis for decisions/deductions is pretty suspect and one should ask if the scientist in question chose 12bytes vs 16bytes to get the result he wanted.
otoh, having the flags on to behave per standard it should behave per standard.
world was created 5 seconds before this post as it is.
22/7... It all goes back to the Babylonian representation of time when there were only 22 hours in a day and thus 154 hours a week. Then some bright spark asked 'Wouldn't it be nice if there were a couple of extra hours in the day', and so the 24/7 paradigm was born. Some thought this change was irrational (c.f. daylight saving), so a formal definition of circumference = pi x diameter was adopted.
Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.
Unless somebody fucks up, LANL's nuclear simulations become the cloud, toward the end.
> Floating point and integer operations are well defined. Unless someone fucks up
> with implementing the floating point unit the result should be exactly the same.
Not true in the real world. See http://slashdot.org/story/13/07/28/137209/same-programs--different-computers--different-weather-forecasts There was a scientic paper about the same weather model producing different forecast outputs on different machines.
I'm not repeating myself
I'm an X window user; I'm an ex-Windows user
They do not. IEEE754 has no "grey area". The results must match bit-exact or you are not IEEE754.
Of course, there can be implementation bugs. For example, Qemu does co-processor emulation only with 64 bit floats instead of the required 80 bit. Nobody seem to really care however. The other thing is of course that if reproducibility is more important than correctness, I suspect the math is done wrong.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
No. The FPU does 80 bits to satisfy the precision requirements for 64 bit IEEE754.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
The C standard is pretty useless here. Have a look at the really bad precision required. What you need to look at is IEEE754.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Kahan, of course, is the authority on this.
Handling of floating point overflow is a big problem. Under Windows on x86, you can get exact (as in at the right instruction location) floating point exceptions, and I've used that to catch overflow in a physics engine. But on some CPUs, there's a speed penalty for enabling exact FPU exceptions. Java and Go don't support floating point exceptions; they return NaN or +INF or -INF or 0 (for underflow). One problem with IEEE floating point is that you don't have trichotomy. When you compare with a NAN, the result is always supposed to be false. So a != b and !(a == b) are not equivalent.
Doing a numerical compare against a NaN should raise an exception. That way, you can crunch your matrices at full speed, any operation with a NaN as an input has a NaN as an output, and if there's a NaN in the final results, code that uses it without checking for it faults out. But when IEEE floating point was designed, FPUs were separate chips. (In some cases, separate boards.) So the floating point design group didn't have the mandate to affect what the branching part of the CPU did.
As a result, you can generate a NaN, miscompare against it (all comparisons return false) and take the wrong branch in the code without recognizing the problem. Not many people care about this stuff, but where it matters, it's usually about something important.
So fix the compiler, or stop compiling for 32 bit. RAM is cheap, especially when you're talking about the cost per GiB of hundreds of gibibytes of it.
Write failed: Broken pipe
The short answer is no. The long answer is no ... and a very long list of reasons why.
Start with reading Goldbergs classic paper "What Every Computer Scientist Should Know About Computer Arithmetic" Sun's floating point group made some improvements to the paper and paid for rights to redistribute. Oracle continues to do so. http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If that isn't depressing enough, and you use trig functions, read http://www.scribd.com/doc/64949170/Ng-Argument-Reduction-for-Huge-Arguments-Good-to-the-Last-Bit you can get the source from netlib for "fdlibm" which is under a BSD flavor license.
If the purely software issues haven't made you realize that you haven't got much of a prayer, please note that different revs of the same intel chips sometimes provide slightly different results (sometimes intentionally, sometimes as a result of tweaking the order of execution in the out of order execution engine). Older x87 arithmetic was 80-bit, newer x64 arithmetic is pure 64-bit, providing no end of fun. Using the SSE instructions provides more variation.
If the pretty much (in principle) "simple" and potentially deterministic software issues aren't enough consider the reality of hw. Chessin has a very good, yet amusing, explanation of the key problems http://queue.acm.org/detail.cfm?id=1839574
Lest you think they only apply to a particular generation of boutique processor, most HPC ensembles are now built out of standard server motherboards and chips.
http://www.csm.ornl.gov/srt/conferences/ResilienceSummit/2010/pdf/michalak.pdf The issue of undetected soft errors is big and growing, as can be seen from the activity in the literature. SC13 "ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection" (which has lots of good citations of earlier work, including field data such as 27 soft errors per week leading to fatal node failures (that is, wrong enough results that while the hw didn't detect any problem, the issue caused the node to crash) on just one ensemble (ASC Q). its going mainstream in that HPCwire caught wind and in 31 Oct 2013 had a nice tabloidesqe writeup entitled "Addressing the Threat of Silent Data Corruption"
Neutron's don't only disrupt memory elements, but can hit logic as well. See the upcoming issue (already available via IEEE xplorer for member/subscribers) JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 1, JANUARY 2014 The 10th Generation 16-Core SPARC64 Processor for Mission Critical UNIX Server" which details the lengths some (but not many) go to ensure that there are no undetected errors (wide range of techniques, ranging from where wires are placed on the chip, ECC, parity, residue arithmetic, automatic retry, etc.). No doubt there are some good (similar) papers in the IBM Technical Journal.
No doubt a good literature search would turn up dozens of other papers, and circuit design textbooks cover some of the territory.
In principle, interval arithmetic could provide a solution (you might not get the same interval, but if the intervals nest, you have consistent results and if they are disjoint you have a bug ... and if they nest, the narrower one is "sharper" which is better). In practice, most algorithms haven't been reworked for good interval implementation, languages don't provide very good support, nor does most hardware. All fixable in principle, but unlikely to be the solution you seek for todays off the shelf virtual systems available cheaply.
I have seen some of the answers given by other people, and many seem to miss the point of floating point calculations. Floating point is by its very nature imprecise, and when you choose to use it, you have to keep that in mind - the task you want to perform must be one where a certain degree of imprecision does not matter. What you are after is not exact reproducibility, but simply that your results stay within accepted error margins, and depending on the nature of your calculations, these may be very wide - I believe you can still find astronimical measurements where ther error margin is something like +/- 200%.
However, it is a misconception to equate "maths" with "doing numbers", as only a fairly minor part of mathematics have to with numbers; and there are, in fact, computer tools out there for non-numerical calculations, like GAP (http://www.gap-system.org/). And although I haven't seen Mathematica for many years, I believe one of its main features is the ability to solve equations symbolically - ie without numerical caulculations - the result of which is going to be either correct and therefore precise, or incorrect.
So fix the compiler
"Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors". Sounds good to me, these days, but I'm not responsible for making those decisions about GCC, so there's not much I can do about it.
or stop compiling for 32 bit. RAM is cheap, especially when you're talking about the cost per GiB of hundreds of gibibytes of it.
At this point, I don't know how many *desktop/laptop* 32-bit x86 boxes there are out there, but, in any case, somebody got concerned that the tests didn't pass on a 32-bit machine, so.... Personally, I don't care, as 99 44/100% of the arithmetic done by packet sniffers such as tcpdump is integer arithmetic, where it doesn't matter, but....
"Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors".
I think this really is the best option, all things considered.
Write failed: Broken pipe
"Fix the compiler" presumably meaning "change the compiler not to support non-SSE x86 processors" or, at least, "change the compiler not to *default* to supporting non-SSE processors".
I think this really is the best option, all things considered.
Or, if the CPU on which you're running supports SSE (i.e., is a Pentium III or nower), default to SSE, so if you have an old machine it still defaults to something that'll run. If you're targeting some old no-SSE processor and building on some shiny "new" system, you have to use some -m option or whatever, but, well, get over it....
PI is a ratio. Just like 1/3, neither can be precisely represented using decimals.
pi can't be printed in full in any base, but 1/3 can. 0.1 in base 3.
systemd is Roko's Basilisk.
Grrr.. yep; one bit would only cover cases like 0.789789789...
With only one bit, how would you know it's not 0.78907890... or 0.7890078900...?
systemd is Roko's Basilisk.
1. You can tweak your algorithms so that they minimize the error instead of accumulating it -- which you should be doing regardless of your need for reproducibility --, or
2. You can use alternative methods like software implementations of floating point, "decimal" (look at the System.Decimal type in .NET for an example), or even arbitrary-precision libraries.
Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.
Real men use grids, pansy hipsters use clouds.
But one problem that IEEE754 can't address is when and where rounding errors show up calculations. If in my code I write A * B / C, one cannot guarantee whether that's executed (A * B) / C or as A * (B / C). If the exponents of the different numbers are substantially different, then you can indeed end up with different results. Different platforms may compile and execute the problem differently, and that I think is the problem that submitter is getting at.
How Reproducible Is Arithmetic In the Cloud?
As reproducible as you configure it to be. Fundamentally no different from running Mathematica (or a similar package) on a Beowulf cluster or in any assortment of machinery.
"I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time
And configuration, and choice of numeric data types, and choices of operators (.ie. division vs multiplication).
In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types.
I doubt anyone is making such type of research. And the only way to ensure replicability of results is by strictly using fixed-precision numeric data types (instead of relying on floating point types.)
That would not be a cloud-problem at all. Unless Mathematica is unable to offer a consistent execution model. In that case, the issue here would be using an unsuitable tool (Mathematica), not the cloud.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Well, 1/3rd is perfectly representable as long as you're storing it in base 3... ;)
Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
PI is a ratio. Just like 1/3, neither can be precisely represented using decimals.
pi can't be printed in full in any base, but 1/3 can. 0.1 in base 3.
Guess what base I'm using? (Hint: pi = 1)
"Large" is a relative term. Original estimate for healthcare.gov was 5 billion. They went with the cheapest bidder for 1 billion.
If your results depend on hardware, software and so on, what you are doing is sampling from the solution space. You can then model that distribution and perform significance testing vs that distribution. What is the probability of your result being correct? your result belonging to the true distribution?
Statistics over mathematical proofs. That's what you want to do.