The Potential of Science With the Cell Processor

← Back to Stories (view on slashdot.org)

The Potential of Science With the Cell Processor

Posted by ryuzaki0 on Saturday May 27, 2006 @11:35PM from the making-a-station-play dept.

prostoalex writes "High Performance Computing Newswire is running an article on a paper by computer scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory. They have evaluated the processor's performance in running several scientific application kernels, then compared this performance against other processor architectures. The full paper is available from Computer Science department at Berkeley."

31 of 176 comments (clear)

Cell + Linux = success by Anonymous Coward · 2006-05-27 23:37 · Score: 3, Funny

OS X is closed source. This means that it is the work of the devil - its purpose is to make the end users eat babies.

Linux is the only free OS. Yes the BSD lincenses may appear more free, but as they have no restrictions, they are actually less free than the GPL. You see, restricting the end user more actually makes them more free than not putting restrictions on them. You must be a dumb luser for not understanding this.

And you obviously dont have a real job. A real job involves being a student or professional academic. You see, academics are the ones who know all about productivity - if you work for a commercial organisation you obviously do not know anything about computers. Usability is stupid. Whats wrong with the command line? If you cant use the command line then you shouldnt be using a computer. vi should be the standard word processor - you are such a luser if you want to use Word. Installing software should have to involve recompiling the kernel of the OS. If you dont know how to do this, you are a stupid luser who should RTFM. Or go to a Linux irc channel or newsgroup. After all, they are soooo friendly. If you dont know how the latest 2.6 kernel scheduling algorithm works then they will tell you to stop wasting their time, but they really are quite supportive.

Oh, and M$ is just as evil as Apple. Take LookOUT for instance. You could just as easily use Eudora. Who needs groupware anyway, a simple email client should be all we use (thats all we use as academics, why cant businesses be any different).

And trend setters - Linux is the trend setter. It may appear KDE is a ripoff from XP, but thats because M$ stole the KDE code. We all know they have GPL'ed code hidden in there somewhere (but not the things that dont work, only the things that work could possibly have GPL'ed code in it).

And Apple is the suxor because they charge people for their product. We all know that its a much better business model to give all your products away for free. If you charge for anything, then you are allied with M$ and will burn in hell.
What about the compiler? by Watson+Ladd · 2006-05-27 23:43 · Score: 2, Insightful

The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.

--
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
1. Re:What about the compiler? by Anonymous Coward · 2006-05-27 23:55 · Score: 5, Insightful
  
  Hand optimization _is_ relevant to scientific programmers
2. Re:What about the compiler? by TommyBear · 2006-05-28 00:07 · Score: 5, Insightful
  
  Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.
3. Re:What about the compiler? by suv4x4 · 2006-05-28 00:19 · Score: 2, Interesting
  
  The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.
  
  Actually bullshit. We're talking scientific applications here, and it's not uncommon that programs written to run on supercomputers *are* optimized by an assembly wizard to squeeze every cycle out of it.
4. Re:What about the compiler? by Anonymous Coward · 2006-05-28 00:58 · Score: 2, Informative
  
  Insightful? Ah... no.
  
  Scientific users code to the bleeding edge. You give them hardware that blows their hair back and they will figure out how to use it. You give them crappy painful hardware (Maspar, CM*) that is hard to optimize for, then they probably won't use it.
  
  Assembly language optimization is not a big deal. Right now the biggest thing bugging me is that I have to rewrite a core portion of a code to use SSE, since SSE is so limited for integer support. As this is a small amount of work, and the potential gains are so large (about 4x), it doesn't make sense not to do this. Some of it will be hand coded and optimized assembler. This is how we have to program. Scientists need the fastest possible cycles, and as many of them as possible ... at least the ones I know need this. There are a few who do all their analysis on Excel spreadsheets. They don't need much in the way of speed. The rest of us do.
5. Re:What about the compiler? by samkass · 2006-05-28 02:08 · Score: 5, Insightful
  
  What seems to be more important than that is:
  
  "According to the authors, the current implementation of Cell is most often noted for its extremely high performance single-precision (32-bit) floating performance, but the majority of scientific applications require double precision (64-bit). Although Cell's peak double precision performance is still impressive relative to its commodity peers (eight SPEs at 3.2GHz = 14.6 Gflop/s), the group quantified how modest hardware changes, which they named Cell+, could improve double precision performance."
  
  So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.
  
  --
  E pluribus unum
6. Re:What about the compiler? by JanneM · 2006-05-28 02:09 · Score: 3, Informative
  
  Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.
  
  It makes sense for a game developer - and even more an embedded developer. You spend the time to optimize once, and then the code is run on hundreds of thousands or millions of sites, over years. The time you spend can effectively be amortized over all those customers.
  
  For scientific software the calculation generally changes. You write code, and that code is typically used in one single place (the lab where the code was written), and only run a comparatively few times, indeed sometimes only once.
  
  For a game developer to spend three months extra to shave a few seconds of one run of a piece of code makes perfect sense. For an embedded developer using a couple of months' worth of development cost to be able to use a slower, cheaper chip, shaving a dollar of the production of perhaps tens of millions of gadgets makes sense.
  
  For a graduate student (cheap as they are in the funny-mirror economics of science) to spend three months to make one single run of a piece of software run a few hours faster does not make sense at all.
  
  In fact, disregarding the inherent coolness factor of custom hardware, in most situations it just doesn't pay to make custom stuff for science when you can just run it for a little longer to get the same result. In fact, not infrequently have I heard about labs spending the time and effort to make custom stuff, but by the time they're done, the off the shelf hardware had already caught up.
  
  --
  Trust the Computer. The Computer is your friend.
7. Re:What about the compiler? by penguin-collective · 2006-05-28 02:27 · Score: 2, Insightful
  
  Except for a tiny minority of specialists, most scientific programmers, even those working on large-scale problems, have neither the time nor the expertise to hand-optimize. Many of them don't even know how to use optimized library routines properly.
8. Re:What about the compiler? by FromWithin · 2006-05-28 02:55 · Score: 2, Informative
  
  So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.
  
  Did you not read the last bit?
  
  On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance. If Cell were to include at least one fully utilizable pipelined double precision floating point unit, as proposed in their Cell+ implementation, these speedups would easily double.
  
  So it's really great already. If it was tweaked a bit, it would be ludicrously great.
9. Re:What about the compiler? by cfan · 2006-05-28 03:23 · Score: 2, Interesting
  
  >So the Cell is great because there's going to be millions of them sold in >PS3's so they'll be cheap. But it's only really great if a new
  >custom variant is built. Sounds kind of contradictory.
  
  No, the Cell is great because, as the pdf shows, it has an incredible Gflops/Power ratio, even in its current configuration.
  
  For example, here are the Gflops (double precision) obtained in 2d FFT:
  
  Cell+ Cell X1E AMD64 IA64
  
  1K^2 15.9 6.6 6.99 1.19 0.52
  2K^2 26.5 6.7 7.10 0.19 0.11
  
  So a single, normal, Cell can be compared with the processor of a Cray (that uses 3 times more power and costs a lot more).
10. Re:What about the compiler? by john.r.strohm · 2006-05-28 03:24 · Score: 4, Interesting
  
  Irrelevant to most C/C++ code wallahs doing yet another Web app, perhaps.
  
  Irrelevant to people doing serious high-performance computing, not hardly.
  
  I am currently doing embedded audio digital signal processing, On one of the algorithms I am doing, even with maximum optimization for speed, the C/C++ compiler generated about 12 instructions per data point, where I, an experienced assembly language programmer (although having no previous experience with this particular processor) did it in 4 instructions per point. That's a factor of 3 speedup for that algorithm. Considering that we are still running at high CPU utilization (pushing 90%), and taking into account the fact that we can't go to a faster processor because we can't handle the additional heat dissipation in this system, I'll take it.
  
  I have another algorithm in this system. Written in C, it is taking about 13% of my timeline. I am seriously considering an assembly language rewrite, to see if I can improve that. The C implementation as it stands is correct, straightforward, and clean, but the compiler can only do so much.
  
  In a previous incarnation, I was doing real-time video image processing on a TI 320C80. We were typically processing 256x256 frames at 60 Hz. That's a little under four million pixels per second. The C compiler for that beast was HOPELESS as far as generating optimal code for the image processing kernels. It was hand-tuned assembly language or nothing. (And yes, that experience was absolutely priceless when I landed on my current job.)
11. Re:What about the compiler? by adam31 · 2006-05-28 04:42 · Score: 3, Informative
  
  Actually bullshit.
  Actually, it's not bullshit. Simple C intrinsics code is the way to go to program the Cell... there's just no need for hand-optimized asm. Intrinsics has a poor rep on x86 because SSE sucks. 8 registers. A source operand must be modified on each instr, no MADD, MSUB, etc.
  But Cell has 128 registers and a full set of vector instructions. There's no danger of stack spills. As long as the compiler doesn't freak out about aliasing (which is easy), and it can inline everything, and you present it enough independent execution streams at once... the SPE compiler writes really, really nice code.
  The thing that does need to be hand-optimized still is the memory transfer. DMA can be overlapped with execution, but it has to be done explicitly. In fact, algorithms typically need to be designed from the start so that accesses are predictable and coherent and fit within ~180kb. (Generally, someone seeking performance would do this step long before asm code on any platform anyway...)
12. Re:What about the compiler? by adam31 · 2006-05-28 08:10 · Score: 3, Informative
  
  I am also an experienced assembly programmer, and I too shared your mistrust of the compiler. However, I started SPE programming several months ago and I promise you that the compiler can work magic with intrinsics now. Knowledge of assembly is still helpful, because you need to have in mind what you want the compiler to generate... make sure it sees enough independent execution clumps that it can cover latencies and fill both the integer pipe and FP pipe, understand SoA vs AoS, etc. But you get to write with real variable names, not worry about scheduling/pairing of individual instructions or loop unrolling issues.
  Some of my best VU routines that I spent a couple weeks hand-optimizing, I re-wrote with SPE intrinsics in an afternoon. After some initial time figuring out exactly how the compiler likes to see things, it was a total breeze. My VU code ran in 700 usec while my SPE code ran in 30 usec (@ ~1.3 IPC! Good work, compiler).
  The real worry now is becoming DMA-bound. For example, assuming you're running all 8 SPEs full-bore, and you write as much data as you read. At 25.6 GB/s, you get 3.2 GB/s per SPE, so 1.6 GB/s in each direction (assuming perfect bus utilization), so @3.2 GHz, that's 0.5 Bytes/cycle. So, for a 16-byte register, you need to execute 32 instructions minimum or you're DMA-bound!
  Food for thought.
13. Re:What about the compiler? by jericho4.0 · 2006-05-28 16:46 · Score: 2, Informative
  
  Maybe true on our computers, but not on supercomputers.
  
  --
  "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
14. Re:What about the compiler? by Tough+Love · 2006-05-29 08:57 · Score: 2, Informative
  
  A programmer hour is much more valuable than a machine hour
  
  You forgot to take into account the team of scientists waiting for the machine to produce a result.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
What about the programmer? by Anonymous Coward · 2006-05-27 23:50 · Score: 5, Insightful

"The paper did a lot of hand-optimization, which is irrelevent to most programmers. "

But not to programmers who do science.

"What gcc -O3 does is way more importent then what an assembly wizard can do for most projects."

Not an unsurmountable problem.
1. Re:What about the programmer? by zCyl · 2006-05-28 07:59 · Score: 2, Insightful
  
  Hand optimization or writing portions of code in assembler is
  the last thing 85% of these people want to do. They don't want
  to be computing experts to do their science/research.
  
  When you're talking about reuseable modules like an FFT or matrix multiplication, then many scientists doing simulations would love to have a hand optimized FFT or matrix module to plug in as a simulation component. Even if they don't know a drop of assembly themselves, having the optimized module available can make a large difference in running time for big simulations.
Re:Xbox 2 is a "commodity" by MooUK · 2006-05-28 00:27 · Score: 2, Insightful

I think you misunderstand what HPC actually is.

High performance computing is that which you'd want to throw a huge Beowulf cluster at, or possibly a supercomputer or twenty. Not three small pathetic cores.
WTF? by SmallFurryCreature · 2006-05-28 00:57 · Score: 4, Insightful

First off you are talking about consoles being sold at a loss. NOT their components.
IF IBM was the maker of the chip they would most certainly not sell them at a loss. Why should they? Sony might sell the console at a loss to recoup the loss from game sales but IBM has no way to recoup any losses.
Then again IBM is in a parnetship with Sony and Toshiba so the chip is probaly owned by this partnership and Sony will just be making the chips it needs itself.
So any idea that IBM is selling Cells at a loss is insane.
Then the cost of the PS3 is mostly claimed to be in the Blu-ray drive tech. Not going to be off much intrest to a science setup is it? Even if they want to use a blu-ray drive they need just 1 in a 1000 cell rig. Not going to break the bank.
No the cell will be cheap because when you run an order of millions of identical cpu's prices drop rapidly. There might even be a very real market for cheap cells. Regular CPU's always have lesser quality versions. Not a problem for an intel or AMD who just badge them celeron or whatever but you can't do that with a console processor. All cell processors destined for the PS3 must be off similar spec.
So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers? That is were celerons come from (defective cache)
We already know that the cell processor is going to be sold for other purposes then the PS3. IBM has a line of blade servers coming up that will use the cell.
No I am afraid that it will be perfectly possible to buy Cells and they will be sold at a profit just like any other cpu. Nothing special about it. they will however benefit greatly from the fact that they already got a large customer lined up. Regular CPU's need to recover their costs as quickly as possible because their success will be uncertain. This is why regular top end cpu's are so fucking expensive. But the Cell allready has an order for millions, meaning the costs can be spread out in advance over all those units.

--

MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
1. Re:WTF? by Kjella · 2006-05-28 02:03 · Score: 3, Insightful
  
  So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers?
  
  Use it. Seriously, that's why there's central + 7 of them, not 8. One is actually a spare so that unless it's either flawed in the central logic or two separate cores, the chip is still good. Good way to keep the yields up...
  
  --
  Live today, because you never know what tomorrow brings
The ball is in the hands of developpers. by stengah · 2006-05-28 01:46 · Score: 2, Insightful

The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations. Altough theses scientists may be interested by any potential speed-up to their workflow, they are not willing to invest any bit of their time to translate all their codebase to asm-optimized C. Thus, the ball is in the hands of software developpers, not scientists.

--
I'm jack's useless sig
1. Re:The ball is in the hands of developpers. by infolib · 2006-05-28 03:09 · Score: 3, Informative
  
  The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations.
  
  Indeed, most scientists. They also know very little about profiling but since the simulation is used only maybe a hundred times that hardly matters.
  
  The cases we're talking about here are where thousands of processors grind the same program (or evolved versions of it) for years as the terabytes of data roll in. Such is the situation in weather modelling, high energy physics and several other disciplines. That's not a "program" in the usual sense, but rather a "research program" occupying a whole department including everyone from "domain-knowledge" scientists down to some very long haired programmers who will not shy away from a bit of ASM. If you're a developer good at optimization and parallellism there might just be a job for you.
  
  --
  Any sufficiently advanced libertarian utopia is indistinguishable from government.
Ease of Programming? by MOBE2001 · 2006-05-28 01:56 · Score: 2, Interesting

FTA: While their current analysis uses hand-optimized code on a set of small scientific kernels, the results are striking. On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors,

The Cell processor may be faster but how easy is it to implement an optimizing development system that eliminates the need to hand-optimized the code? Is not programming productivity just as important as performance? I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind. I don't think it is a good idea to design a software model around a CPU. It is much wiser to design the CPU around an established model. In this vein, I don't see the cell as a truly revolutionary processor because, like every other processor in existence, it is optimized for the algorithmic software model. A truly innovative design would have embraced a non-algorithmic, reactive, synchronous model, thereby killing two birds with one stone: solving the current software reliability crisis while leaving other processors in dust in terms of performance. One man's opinion.
bang, buck, effort by penguin-collective · 2006-05-28 02:35 · Score: 3, Informative

Over the last several decades, there have been lots of parallel architectures, many significantly more innovative and powerful than Cell. If Cell succeeds, it's not because of any innovation, but because it contains fairly little innovation and therefore doesn't require people to change their code too much.

One thing that Cell has that previous processors didn't is that the PS3 tie-in and IBM's backing may convince people that it's going to be around for a while; most previous efforts suffered from the problem that nobody wanted to invest time in adapting their code to an architecture that was not going to be around in a few years anyway.
Re:Xbox 2 is a "commodity" by Darkfred · 2006-05-28 03:13 · Score: 2, Informative

Did Sony pay you or did Mr. Kutaragi come over to your house and type it for you.

Have you seriously never seen anything like this before? As a professional ps2/360/ps3 developer I have to say that I was seriously underwhelmed by this demo. Every one of the effects has been used before. THe original xbox has every effect he mentioned. And HL2 has a significantly more complex lighting system and postprocessing effects.
The demo appears to be a single high-poly character in a texture mapped box. The demoer admits that this is a cut-scene quality model. I believe this scene could be rendered on an original xbox with similar 'visual' quality. Why not use some of those polys to make a realistic background? Black on PS2 looked better. And they couldn't even show a solid second of actual gameplay.
I think it will be an amaxing game, but the demo was no technical achievement. It was a hurried render test for an obviously incomplete engine. Bragging about poly count when your competition can push 1.5x-3x as many is not going to win them any points either.

Regards,

--
----- 70% of all statistics are completely made up.
No, this is why we have subroutine libraries by golodh · 2006-05-28 03:26 · Score: 5, Interesting

Although I agree with your point that crafting optimised assembly language routines is way beyond most users (and indeed a waste of time for all but an expert) there are certain "standard operations" that
(a) lend themselves extremely well to optimisation
(b) lend themselves extremely well to incorporation in subroutine libraries
(c) tend to isolate the most compute-intensive low-level operations used in scientific computation
SGEMM
If you read the article, you will find (among others) a reference to a operation called "SGEMM". This stands for Single precision General Matrix Multiplication. This is the sort of routines that make up the BLAS library (Basic Linear Algebra Subprograms) (see e.g. http://www.netlib.org/blas/). High performance computation typically starts with creating optimised implementation of the BLAS routines (if necessary handcoded at assembler level), sparse-matrix equivalents of them, Fast Fourier routines, and the LAPACK library.
ATLAS
There is a general movement away from optimised assembly language coding for the BLAS, as embodied in the ATLAS software package (Automatically Tuned Linear Algebra Software; see e.g. http://math-atlas.sourceforge.net/). The ATLAS package provides the BLAS routines but produces fairly optimal code on any machine using nothing but ordinary compilers. How? If you run a makefile for the ATLAS package, it may take about 12 hours (depending on your computer of course; this is a typical number for a PC) or so to compile. In this time the makefile will simply run through multiple switches and for the BLAS routines and run testsuites for all its routines for varying problem sizes. And then it picks the best possible combination of switches for each routine and each problem size for the machine architecture on which it's being run. In particular it takes account of the size of caches. That's why it produces much faster subroutine libraries than those produced by simply compiling e.g. the BLAS routines with an -O3 optimisation switch thrown in.
Specially tuned versus automatic?: MATLAB
The question is of course: who wins? Specially tuned code or automatic optimisation? This can be illustrated with the example of the well-known MATLAB package. Perhaps you have used MATLAB on PC's, and wondered why its matrix and vector operations are so fast? That's because for Intel and AMD processors it uses a specially (vendor-optimised) subroutine library (see http://www.mathworks.com/access/helpdesk/help/tech doc/rn/r14sp1_v7_0_1_math.html) For SUN machines, it uses SUN's optimised subroutine library. For other processors (for which there are no optimised libraries) Matlab uses the ATLAS routines. Despite the great progress and portability that the ATLAS library provides, carefully optimised libraries can still beat it (see the Intel Math Kernel Library at http://www.intel.com/cd/software/products/asmo-na/ eng/266858.htm)
Summary
In summary:
-large tracts of Scientific computation depend on optimised subroutine libraries
-hand-crafted assembly-language optimisation can still outperform machine-optimised code.
Therefore the objections that the hand-crafted routines described in the article distort the comparison or are not representative of real-world performance are invalid.
However ... it's so expensive and difficult that you only ever want to do it if you absolutely must. For scientific computation this typically means that you only consider handcrafting "inner loop primitives" such as the BLAS routines, FFT's, SPARSEPACK routines etc. for this treatment, and that you just don't attempt to do that yourself.
Ran simulations, not code by jmichaelg · 2006-05-28 03:41 · Score: 5, Insightful

Lest anyone think they actually ran "several scientific application kernels" on the Cell/AMD/Intel chips, what they actually did was run simulations of several different tasks such as FFT and matrix multiplication. Since they didn't actually run the code, they had to guess as to some parameters like DMA overhead. They also came up with a couple of hypothetical Cell processors that dispatched double precision instructions differently than how the Cell actually does it and present those results as well. They also said that IBM ran some prototype hardware that came within 2% of their simulation results, though they didn't say which hypothetical Cell the prototype hardware was implementing.
By the end of the article, I was looking for their idea of a hypothetical best-case pony.
1. Re:Ran simulations, not code by Sycraft-fu · 2006-05-28 08:40 · Score: 2, Insightful
  
  Hey it makes a real difference. There's a great quote that shows up on /. from time to time that goes along the lines of "The difference between tehory and reality is that in theory there's no difference but in reality there is."
  
  Researchers are very good at simulating things that have little or nothing to do with reality. It all looks good in theory according to their formulas, but they fail to take something in to account. As an example take the defunct Elbrus E2K computer chip. It was supposed to be an awesome processor that would kick the crap out of anything Intel or AMD offered. It was being designed by people with real computer experience, Elbrus made several Soviet supercomputers. Basically, the chip was to be their Elburs 3 supercomputer reimplemented on one chip.
  
  Everything looked good in simulations... But obviously nothing has ever come of it. The E2K never hit the market, and it and followups have been nothign but vapourware. Why? Well again, because of the difference between theory and reality. The design was all well and good on a VHDL simulator, but the hard part of chip design is not developing some powerful stuff in VHDL, it's developing powerful stuff that can be actually fabbed to a real chip.
  
  So as with anything like this, I reserve judgement until I see real silicon. To me this looks like people getting overly excited about something that doesn't exist yet. Yes, the Cell is good in theroy, we know that, that's not the issue. The issue is how will it really perform against other chips running real code. That we don't know, and won't know for some time. One simple issue that will have to be dealt with is compiler inefficiencies. Most sicentific code isn't written in assembly, often it's Fortran. Well, if there's one thing Intel's got it's a rockin' Fortran compiler. So even if the Cell's units are actually more pwoerful in theory, if the code it gets isn't optimized it may not matter.
  
  Either way, any time I hear things about what an amazing jump forward some new tech will be, I am skeptical. It just generally seems that doesn't happen. Improvements happen in small jumps, not nearly an order of magnitude of increase (which is what they are claiming with the 8x faster stat).
Ignore everything important? by Duncan3 · 2006-05-28 05:26 · Score: 2, Interesting

I love how they manage to completely ignore all the other vector-type architectures already in the market, and just compare it to Intel/AMD which are not even designed for floating point performance.

Scream "my computer beats your abacus" all you want.

But then it is from Berkeley, so that's normal. ;)

--
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
not a fair comparison by MonaLisa · 2006-05-28 09:18 · Score: 2, Insightful

The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.