Slashdot Mirror


CPUs/Compilers for Numerical Simulations?

X43B asks: "I'm building a 'luggable' computer for numerical simulation work (very niche, I know). My goal is to have the best single precision floating point performance for under $1000. I have decided on a Shuttle XPC layout. I can build a AMD 3500+ for ~$80 less than a Prescott 3.4Ghz. I know the AMD is supposed to be a better 'general purpose' CPU however I found this comparison which says the Intels are better for floating point. Additionally, even though the AMD is somewhat cheaper, I have found the free Intel Linux FORTRAN compiler quicker than gfortran. So even if the AMD had similar performance for cross compiling, the Intel would be ~10% faster with the free compiler. Does anyone have any recommendations on AMD vs Intel for single precision floating point operations? If you recommend the AMD, what (cheap or free) compiler can be used that is comparable to the Intel?"

56 comments

  1. My Beliefs by MBCook · · Score: 4, Informative
    OK, here is my impression from years of reading hardware sites.

    The P4 has amazing floating point performance, but you have to use packed SSE2/3 to get it. For general (non-packed SSE or x86) floating point performance, the Athlon lines are strong.

    If you can get a low end Athlon 64 (like one of the single channel versions) that might be great for you. They are the "budget" versions but have great FPUs, more registers if your software can use it, and are true 64-bit.

    As for the Athlon (non-64), I wouldn't personally. I would think you could get a low end Athlon 64 (like I said above) for a reasonable price that would smoke it.

    Last of all, the Intel compiler is designed for Intel chips (duh), but the code can be run by Athlons and Opterons and even on the AMD chips it's code is often better performing than GCC code. That said, if you get a P4, using that compiler is probably a must because it is sooooo good at seting up floating point stuff and gets much better performance (but then again, what do you expect?). So give it a try no matter what you buy, it will probably help your performance.

    So those are my theories/impressions. You can get SFF PC that will hold just about any processor. Too bad money is an object because that dual-cpu Iwill Opteron SFF that will come out later this year would kill anything else in a SFF (assuming you can take advantage of the 2nd CPU with whatever you're doing, which I assume you can).

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    1. Re:My Beliefs by sash · · Score: 1

      My experience is that, on real-world FPU-intensive code, an AthlonXP or a PowerPC G3 are about twice as fast as a Pentium 4, per MHz.
      Things are different if you can vectorize your code and use Single Instruction Multiple Data - there the Pentium4 is supposed to be faster. But this was not the case for my code.

      The other point, is that single-precision FP is not faster than dual precision FP, because the ix86 and PowerPC FPUs always use full precision internally - so it only makes a difference if you do few operations on a lot of data.

  2. $80? by jeffy124 · · Score: 4, Insightful

    Try thinking this through in terms of return-on-investment. If $80 is all you're going to save on a product that's around $1000, it may not be worth it, especially given what else you know is going into each product. First, it's an 8% savings, hardly a significant bargain. Second, from what it sounds like you'll be doing with it, $80 extra for an Intel chip is very small sum for the increased performance (10% from your description above) you'll be getting in return. Third, assume you do find similar performance for an AMD, but might require payment for a compiler negating the $80 savings. Finally, ask yourself if searching for a free (as in beer) AMD compiler is worth $80 of your time if you already have everything in place for an Intel.

    I think you're best bet is the little extra you spend for the Prescott model.

    --
    The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
    1. Re:$80? by HoneyBunchesOfGoats · · Score: 2, Interesting

      ...[your] best bet is the little extra you spend for the Prescott model.

      And the little extra you spend on the massive heatsink you buy to keep it cool in an enclosed space.

      A few SFF systems come with their own heatsink, most notably Shuttle XPCs with their proprietary ICE heatsink. I'd go for one of those if using a Prescott.

  3. save the money for tools by ghostlibrary · · Score: 5, Insightful

    Save your CPU money for the compiler (e.g. PGF90, or what have you), or for development tools, and for more RAM. A good compiler will give you a better handle on numerical accuracy than a 'better' CPU stuck with an off compiler. More RAM will keep you from being stingy in single/double precision allocations.

    Also, factor in your work time. A few percentage difference between CPUs won't count nearly as much as the 2 hours you saved because you had a good debugger, or the 4 hours saved because your editor makes it easier to write and jump around your code.

    Your accuracy will be pretty much the same, you just have to understand how computers represent floats and plan accordingly. Use accurate representations even if they're slower to get the numerical accuracy you need, then optimize the slow parts.

    Only optimize the stuff that runs slow. That means profile, don't just guess. You'll often be surprised by where the bottlenecks are.

    Higher accuracy usually means more memory (going with doubles rather than floats) or work (converting to integers within your desired floating range to control floating point accuracy). CPU won't be a biggie, but having lots of RAM will help.

    If you have a choice between spending 3 months writing and optimizing code, or spending 1 month writing code that isn't optimized, think of what 2 months of runtime will do. If it takes you a while to write the code, just buy your super-accurate machine _after_ coding, when it's time to do your real runs (since chip speeds already increase).

    In fact, if it takes you 3 months to optimize, you'd be better off keeping the slow code, doing another project, and 3 months later just buying a faster PC to run the slow old code :)

    All this off the top of my head, hope it helps.

    --
    A.
    1. Re:save the money for tools by innosent · · Score: 1

      In fact, if it takes you 3 months to optimize, you'd be better off keeping the slow code, doing another project, and 3 months later just buying a faster PC to run the slow old code :)

      Wow, you must have been on the Windows development team, working on KDE and Gnome on the side. What ever happened to writing efficient code? I miss the days of Linux 1.2.13, when a kernel could still fit on a floppy, and when window managers didn't have 500 features constantly running that you never use.

      --
      --That's the point of being root, you can do anything you want, even if it's stupid.
    2. Re:save the money for tools by samael · · Score: 3, Insightful

      Writing efficient code is great - if what you want to end up with at the end is some really efficient code.

      If what you want is....results, then efficient code is one of many ways of getting it, and not necessarily the most efficient one.

    3. Re:save the money for tools by ghostlibrary · · Score: 1

      I wrote:
      >"if it takes you 3 months to optimize, you'd be better off keeping the slow code, doing another project, and 3 months later just buying a faster PC to run the slow old code :)"

      innosent asked:
      >What ever happened to writing efficient code?

      There's slow, and there's efficient. Sometimes brute force and ignorance _is_ better. Here's an example.

      We had a routine using an ephemeris. It loaded in the data at 1 minute intervals, since we needed 1 minute resolution accuracy. 3 coordinates for each minute for 6 months = 3/4 million data points. That's a lot of memory overhead.

      One programmer wanted to rewrite it so we only loaded some fraction of the points, and interpolated. Problem is, a linear interpolation wasn't fast enough, also, because of how the other data structures generated from the ephemeris were set up, they expected a continuous set (basically we were making advantage of vector calculations, so even if we interpolated, we'd have to make the full vector epheremeris before doing other calculations, and change basically the entirely underlying computation code).

      When written, the ephemeris was a memory hog and slowed things down. When it came time to deploy it, though, even though it was dog slow on our original test machines, it was more cost effective to just buy a new $2000 laptop for the project that ran it really speedy. Faster PCs and cheap RAM had won the day.

      If we'd gone for the interpolation scheme, it would have cost about 3 months for 1.5 programmers, at a cost to the taxpaper of around $48,000.

      So, should we have gone for the more 'efficient' method, which was more complex and thus incured more development time, or go with the brute-force simple approach that saves the taxpayer $48k (and gives the customer a new laptop computer!).

      It was an easy call. In real world HPC, you need to always balance resources.

      --
      A.
    4. Re:save the money for tools by Anonymous Coward · · Score: 0
      3/4 million data points. That's a lot of memory overhead

      Please.

      Call us back when you've wrote a real computer program. 750,000 points is NOTHING. That's small enough you can keep the entire array of points in RAM and hardly notice it.

    5. Re:save the money for tools by TheLink · · Score: 1

      Doh, I think that's what they did ($2000 laptop remember?), but that's what some people call inefficient.

      --
    6. Re:save the money for tools by ghostlibrary · · Score: 1

      > 750,000 points is NOTHING.

      Bwah ha ha. Tossing 12MB as a basic data structure is huge-- in a real program, that uses that basic data structure to create others. Plus that's not including visualizing it. Think 2000 targets, each with a dozen mathematical alterations to that structure. Then plot them. 12MB*2000*12*plotting overhead=a lot more than you want to store in ram.

      750,000 points when we coded took 12 seconds to gronk through subsequent calculations. On a realtime system, do you want a latency of 12 seconds every time you click on a new target? No. With a faster system and more ram, that latency dropped to unnoticeable, but even then that's because we did clever coding past that.

      Since you're all contempty, all I can say is, write a realtime computer program and you'll find that, the smaller you make your most basic data structures, the less ballooning you get. It's people like you who figure they can toss in the kitchen sink at the atomic level that run into trouble when they try to build realtime systems with it.

      --
      A.
    7. Re:save the money for tools by potat0man · · Score: 1

      Save the taxpayers money by using $2,000 laptops? What about desktops at less than half that price?

      ;-)

      I know I know... you don't own monitors or work underground or near a particle accelerater that destroys crt's....

  4. Are you good with your hands? by Anna+Merikin · · Score: 1

    If so, build your own luggable case from a convenient toolbox, cabinet or whatever and use a multiprocessor outfit of either brand.

    This will be the fastest solution for i86 and should outperform a single 3500 Intel or Opteron by a considerable amount on 32-bit apps -- about a fifty-to eighty per cent increase can be expected from a 2-processor system. Tyan makes some interesting, relatively inexpensive SMP mobos, check them out http://www.tyan.com

    If you must use standard off-the-shelf cases and such, then you must deduct the extra money the ITX case and mobo costs from a 2-CPU array outlined above.

    Understand that tomorrow, it will be obsolete....

  5. Re:there is a difference by gl4ss · · Score: 1

    you talking just out of your ass or was your friend high?

    because what you're saying "does not compute".
    especially the last bit.

    and come on, "I've heard of people comparing the results and saying the output produced by Intel appears more attractive, but I haven't seen it." yeah right!

    --
    world was created 5 seconds before this post as it is.
  6. g4 by gumbi+west · · Score: 1

    If you are looking for good cheap single precision performance, vecocity engine is for you (IBM/Motorola g4s and g5s). The single precision performance is about 4 flops / Hz.

  7. Re:My Beliefs (Continued) by MBCook · · Score: 4, Informative
    I have something to add that another post reminded me of. If memory latency is important to you (I know know much about numerical simulations so I don't know), then you want an x86-64 chip by AMD. Becuase of the on-die memory controller, the memory latency is substantially lower than on P4s (especially the high end P4s with the huge clock speeds).

    The last thing I have to say is that as another poster pointed out, are you stuck in the Wintel world? Because the G4 and G5s (the later especially) are supposed to be VERY good at this kind of thing. So you could use a XServe G5 (pretty small) or just a normal G5 (not as small). They aren't that cheap (probably couldn't get one in your budget, but maybe used) but they should preform great. They are also true 64-bit. Also, IBM sells G5 computers, so you're not stuck buying an Apple (you might be able to get a cheaper one that way too). Not sure about the sizes of those though.

    Just more stuff to think about.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  8. Re:there is a difference by green+pizza · · Score: 1

    Do you improve the audio quality of your CDs with a green marker as well?

  9. Intel has heard of /. by highwindarea · · Score: 1

    On the Intel site, if you try to download the free compiler it asks where you heard of the site. /. Is one of the options.

    --
    I think this internet thing sounds like a good idea
    1. Re:Intel has heard of /. by Anonymous Coward · · Score: 0

      And in case you check it, you are denied the download. (Just kidding.)

  10. Re:there is a difference by TheSHAD0W · · Score: 0

    I haven't seen the output, I don't know what the difference looks like, so there's no way I can make a judgement like that, but I do know the output is different.

  11. IWILL Dual Opteron SFF by DeadBugs · · Score: 3, Informative

    You may want to look into the IWILL Dual Opteron SFF PC It's in a small form factor design like a Shuttle XPC, but with support for Dual AMD Opterons.

    Even if you don't have the money for both CPU's right now...it's a good start and you could add the 2nd CPU later. This would be the most powerful small form factor number cruncher.

    --
    http://www.kubuntu.org/
  12. Re:there is a difference by Deliveranc3 · · Score: 1

    Shadow they/we think you are wrong because floating points are the same as decimals in a traditional sense.

    The floating point units in AMD and INTEL processors don't have to deal with infinite decimal points (like mathematicians often do) but 32 bits of depth or 64 (if you use double registers or an AMD system) or so on. It's entirely inside the programming.

    Since both computers use the x86 instruction set the calculations they do on the data are carried out in identical ways, if there was a diffrence one would be ruled wrong and a huge scandal and recall would be immediatly released (it happened twice before both times with Intel chips).

    If there is some other explanation which I didn't address please send it in.

  13. opteron by wikinerd · · Score: 1

    I would get an Opteron148/150 or 244/246 depending on whether the work is threaded or not.

    1. Re:opteron by wikinerd · · Score: 1

      But why Opteron, you might ask.
      I would prefer an Athlon64 over an Opteron because of the extra reliability that ECC offers. If I know it well, A64 has the ECC circuit builtin, but many motherboards do not use it. With an Opteron you know that you get true ECC and also registered memory (which is SLOWER but more reliable).
      IMHO reliability is more important than speed in numerical apps. You could lose months of work just because of a single bit error. Opteron is 100% realiable and has been used successfully even in supercomputers.
      For an overview of Opteron models and tech, check my article here

  14. Re:there is a difference by CaptainCheese · · Score: 4, Interesting

    AMD and Intel both subscribe to the IEEE 754 standard for FPU units, which defines the functions of single and double precision FPU operations and various other things, like how to handle the inevitable rounding errors.

    No FPU meeting this standard will produce different results than any other FPU. They're just faster or slower at doing it.

    You'll only start getting differences when you hack non-standard speed optimisations into your code. It's unfair to blame Intel and AMD for people writing incompetently coded software - they just provide the stick, it's the coder who's beating you with it.

    --
    -- .sigs are a waste of data...turn them off...
  15. Intel also has great libraries & VTune by Salis · · Score: 3, Interesting

    Intel has a set of optimized mathematical libraries for all sorts of applications (linear algebra, image processing, random number generation, FFT's, etc). Not only are they optimized for Intel systems, but they save you the time of coding it yourself.

    Intel also provides the VTune Performance Analyzer, which allows you to trace the path through your programs and determine where the bottlenecks are.

    I've used the Intel Linux Fortran compiler and I am very happy with it. Code that runs fine on my Sun workstation (950 Mhz, 6 gig RAM) at school works 4-5x faster on my home PC (2.8 Ghz, 1 gig RAM). It's got all the fancy optimization options, but a simple -O3 -ipo will get you 90% there.

    My two bits.

    --
    Favorite /. tagline: "On the eighth day, God created FORTRAN." And it was good.
  16. Re:there is a difference by cperciva · · Score: 2, Insightful

    No FPU meeting this standard [IEEE 754] will produce different results than any other FPU.

    Correct as far as arithmetic operations go, but not for other functions. Trigonometric functions are quite a different story, and the results will vary between processors -- older Intel (co-)processors were accurate to 4.5 ulp, whereas recent ones are accurate to 1.5 or 1.0 ulp, for example.

    For that matter, as far as I'm aware IEEE 754 doesn't make *any* requirements of the trigonometric functions; they might behave as random number generators for all the standard says.

  17. Re:there is a difference by TheSHAD0W · · Score: 1

    The FPUs use different strategies in performing their calculations, and as a result they don't produce identical results. I'm not sure which uses what method. But the floating point results aren't part of the program flow, per se, so minor differences aren't considered flaws. The difference that resulted in the recall of the Pentium (division is futile! Prepare to be approximated) was a lot larger than was acceptable.

  18. ICPC and AMD by blackcoot · · Score: 2, Interesting

    i have not tried this with intel's fortran compiler, *but*, from what i've heard, the intel c/c++ compiler produces code that performs substantially better than gcc on an amd processor. does the code running on an amd perform comparably to code running on an "equivalent" (take that term to mean what you will) intel box? i have no idea, but *if* i remember correctly, i was seeing a good 15-20% increase over gcc on the stuff i was doing targeting a p4. since amd makes chips which are, in theory, binary compatible with p4s, it may be worth a shot. another thing to recommend the intel compilers: their native support of openmp. if you do go with a dual box, you can give the intel compiler hints about how to parallelize your code to take full advantage of all n processors (don't know if you'll have ht turned on or not). hope this helps at least a little bit.

  19. Re:My Beliefs (Continued) by Hes+Nikke · · Score: 1

    the iMac G5 is *almost* in this guys budget. ;)

    --
    Don't call me back. Give me a call back. Bye. So yeah. But bye our, well, but alright we are on a shirt this chill.
  20. don't worry about it by jeif1k · · Score: 3, Insightful

    I wouldn't start relying on special compilers; once you go down that road, you start putting processor-dependent features into your code, you start battling with compatibility issues, your code becomes less usable by others, and you have less choice in software from others that you can use; it all becomes a huge waste of time.

    Instead, check for yourself which system (not processor, but system) gives you the most bang for the buck using the most standard compiler you can find. If you use gcc, I believe systems based on AMD's 64bit chips still win.

    And, realistically, 10-20% differences are not worth investing a lot of time or energy in anyway; that corresponds to a few months of progress in processor and systems development.

  21. CPU? Use the GPU! by Bazman · · Score: 4, Interesting

    Maybe you just want a better graphics card? Nowadays you can run numerical calculations on the graphics card's processor - and no, you don't get random noise all over your screen, its not simple memory-mapped graphics! Plus it gives you the excuse to buy a machine that can play Doom 3.

    More info here: http://www.gpgpu.org/

    Whatever you do, make sure you have a properly tuned ATLAS library:

    http://math-atlas.sourceforge.net/

    I don't know if anyone has got ATLAS or BLAS to work on GPUs yet.

    Baz

  22. Re:there is a difference by Piquan · · Score: 3, Informative

    AMD and Intel both subscribe to the IEEE 754 standard for FPU units,

    This is true for "normal" floating-point operations and SSE, but 3DNow! is not IEEE-compliant. There are also some ways to introduce non-compliance in SSE, such as the LDMXCSR, RCP, and RSQRT instructions. (The former can change over/underflow behavior, and the other two are approximation functions.)

  23. ATLAS by Piquan · · Score: 1

    Just in case anybody interested hasn't heard of it, the ATLAS library is a C / Fortran 77 library for linear algebra (which is a significant part of scientific programming). It tunes itself at compile-time, to your particular processor and number of CPUs (and whatever else might be affecting your FP performance) by doing tests.

    The author also has some quick n' dirty notes for floating-point issues.

  24. Re:there is a difference by Anonymous Coward · · Score: 0

    They are both IEEE standards compliant. End of story. (Unless someone was dumb enough to use 3DNow, which is not compliant.)

  25. Re:there is a difference by CaptainCheese · · Score: 1

    fair enough.

    That may change though - 754 is under revision right now...

    --
    -- .sigs are a waste of data...turn them off...
  26. References by TheSHAD0W · · Score: 1
    1. Re:References by Gothmolly · · Score: 1

      You're quoting two blogs as PRIMARY sources to back up your claim, which itself is being made in another blog (Slashdot)? Son, you're not going to convince anyone that way. I could quote the GNAA and gain as much credibility as your two sources give you. Point people at real documentation (think: ACM, Intel or AMD whitepapers, IEEE publication, etc). Otherwise, crawl back under your bridge.

      --
      I want to delete my account but Slashdot doesn't allow it.
    2. Re:References by TheSHAD0W · · Score: 1

      Been lookin', ain't been findin'. My original post certainly wasn't meant as a troll, and if you feel that way... [shrug]

  27. Re:there is a difference by gl4ss · · Score: 1

    there is a standard for what floating points should 'look' like and behave.

    you could do the calculus by hand and get to the same endpoint.

    different strategies don't necessarely mean different results either.

    please, don't spread misinformation and cyber legends.

    --
    world was created 5 seconds before this post as it is.
  28. Re:My Beliefs (Continued) by MBCook · · Score: 1

    OOh. Excelent point. They are G5s (fast, should be good for what he's doing), contain the monitor already (which a SFF PC wouldn't), and are relativly small (since they are only 2" thick). That would probably be about ideal for him. Greap point!

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  29. Exactly what math? by TheLink · · Score: 1

    Exactly what math do you want to do?

    Go see:

    SPEC FPU results

    Look at the details, you'll see that different processors have different strengths depending on the tasks. To see what tasks they are you can look at:

    SPEC FPU

    Another thing the Intel compiler does work well for AMD too. But you may have to force it to recognize it as AMD to turn on even more optimizations. Apparently you get a boost as a nonIntel CPU, but if you disable the "intel-only" detection you can get even more of a boost.

    --
  30. Re:My Beliefs (Continued) by gumbi+west · · Score: 1
    I have a big numerical intigrator that runs for minutes to weeks (depending on the problem). All calculations are done on doubles and the memory bandwidth didn't appear to mater much.

    I don't know what caused it, but our units were in P3 cycles. One P3 cycle is worth one AMD (x86-32) cycle, and a P4 cycle is worth 1/2 a P3 cycle. A G5 cycle is worth about one 3 cycle as well (they just ported the program).

  31. Re:My Beliefs (Continued) by jaoswald · · Score: 1

    While I think the iMac G5 is an extremely interesting computer, your use of the term "fast" is somewhat misleading, in that "G5" is an extremely vague term.

    The iMac G5s provide only a 600 MHz FSB, not to mention topping out at a single 1.8 GHz G5.

    The PowerMac G5, on the other hand, *starts* at DUAL 1.8 GHz G5s, has up to 1.25 GHz front side bus per processor.

    An utterly oversimplified model based on no real data would estimate that reducing the G5 CPU to flat-panel dimensions has cost a factor of 2--4 in *peak* performance in what you mean by "G5."

  32. Use the GPU and a suitable language by exp(pi*sqrt(163)) · · Score: 1

    In terms of sheer numerical processing ability modern GPUs leave CPUs standing in the dust. Get a top of the line nVidia graphics card. Preferably PCI Express because the big bottleneck is getting data back out of the card. The hardest part is that the code typically needs to be 'disguised' as a rendering problem but if you use a programming language like Brook you can write in a C-like way and get amazing performance without having to touch a graphics API. One catch is precision - but many iterative algorithms are fairly robust in the face of imprecision.

    --
    Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
    1. Re:Use the GPU and a suitable language by rgbe · · Score: 2, Informative

      Kayvon Fatahalian et al. have a good comparison for matrix-matrix multiplication between CPU's and GPU's.
      One major disadvantage of the GPU at the moment is that, as far as I know, no standard software (such as LINPACK, FFTW, etc) supports it.

    2. Re:Use the GPU and a suitable language by exp(pi*sqrt(163)) · · Score: 1

      That paper is probably out of date now. The new PCI Express boards are supposed to directly address the bandwidth issues.

      --
      Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
  33. Some interesting results by rgbe · · Score: 2, Insightful

    I spent a summer benchmarking a couple of new computers for a the University of Otago Physics department. They were looking into buying a cluster for their Bose-Einstien condensate experiments, it was my job to see where things were going slow. I found the major bottleneck was in the network. But I also made comparisons between a P4 2.4GHz and AMD Athon-XP 2400+, the results are interesing.

  34. 25.6 GFLOPS with CS301 or up to 500 GFLOPS FPGA by Anonymous Coward · · Score: 0

    If you want some real floating point power check out the CS301 co-processor from ClearSpeed: http://www.clearspeed.com
    For really high float performance you can take a Virtex-4 FPGA from Xilinx: http://www.xilinx.com and configure it with multipliers. Depending on your desired precision you can pack several hundred multipliers on it and easily achieve up to 500 GFLOPS performance.

  35. No IEEE floats by pkhuong · · Score: 2, Insightful

    GPUs don't do IEE floats. That might be bad for his purpose...

    --
    Try Corewar @ www.koth.org - rec.games.corewar
  36. Portland Compilers by XenonOfArcticus · · Score: 2, Informative

    Compiler support is critical. Forget GCC. It's not a high-performance compiler. Look into the CodePlay C++ compiler under Windows, or the Portland Compiler Group products under Windows and Linux.

    --
    -- There is no truth. There is only Perception. To Percieve is to Exist.
  37. nvida is pretty close by Anonymous Coward · · Score: 0

    AFAIK Nvidia's GeforceFX5xxxx and Geforce6xxx GPUs have 32-bit IEEE-like floats (no denormals or signalling NaNs), but for many numerical simulations, they're close enough. Seeing as how if you are generating denormals, you pretty much have exhausted the dynamic range of the float representation you are working with anyhow (or scaled things drastically wrong), that's probably not a biggy, on the Nan side, well we can all hope that a numerical simulation doesn't depend on that...

    Of course, most gpus have to handle all the infinity cases correctly anyhow (because of 3d projection singularities), so at least the other side of the number line is pretty well handled on a GPU.

    On the other hand, Ati can get fp, but only up to 24-bit. Unfortunatly, that's probably not quite single precision. Was kinda hoping their next gen had 32-bits to bring GPUs into the mainstream, but I guess we'll have to wait a bit to rely on a 32-bit gpu (other than nvidia).

  38. just get an abacus by Anonymous Coward · · Score: 0

    hey, it worked for the chinese, and they built a 4000 mile long wall.

  39. Re:My Beliefs (Continued) by sash · · Score: 1

    There is a commercial reason: Intel now wants to sell the Itanium for "workstation" use, so it cut down the number of FPUs in Pentium 4, relying only on SSE for multimedia performance. From "Athlon XP Meets P4":
    The picture is similar in 3D rendering (OpenGL) - the AMD Athlon XP's three FPU units helped it to outstrip the Pentium 4, with 2 FPU units. Ideally, you can employ the following equation:
    Performance = Clock Speed x Operations/Cycle
    This equation helps explain the theory behind why the AMD Athlon XP, although clocked at a lower speed, is able to reach the same performance than a faster-clocked Intel Pentium 4...