The Potential of Science With the Cell Processor
prostoalex writes "High Performance Computing Newswire is running an article on a paper by computer scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory. They have evaluated the processor's performance in running several scientific application kernels, then compared this performance against other processor architectures. The full paper is available from Computer Science department at Berkeley."
OS X is closed source. This means that it is the work of the devil - its purpose is to make the end users eat babies.
Linux is the only free OS. Yes the BSD lincenses may appear more free, but as they have no restrictions, they are actually less free than the GPL. You see, restricting the end user more actually makes them more free than not putting restrictions on them. You must be a dumb luser for not understanding this.
And you obviously dont have a real job. A real job involves being a student or professional academic. You see, academics are the ones who know all about productivity - if you work for a commercial organisation you obviously do not know anything about computers. Usability is stupid. Whats wrong with the command line? If you cant use the command line then you shouldnt be using a computer. vi should be the standard word processor - you are such a luser if you want to use Word. Installing software should have to involve recompiling the kernel of the OS. If you dont know how to do this, you are a stupid luser who should RTFM. Or go to a Linux irc channel or newsgroup. After all, they are soooo friendly. If you dont know how the latest 2.6 kernel scheduling algorithm works then they will tell you to stop wasting their time, but they really are quite supportive.
Oh, and M$ is just as evil as Apple. Take LookOUT for instance. You could just as easily use Eudora. Who needs groupware anyway, a simple email client should be all we use (thats all we use as academics, why cant businesses be any different).
And trend setters - Linux is the trend setter. It may appear KDE is a ripoff from XP, but thats because M$ stole the KDE code. We all know they have GPL'ed code hidden in there somewhere (but not the things that dont work, only the things that work could possibly have GPL'ed code in it).
And Apple is the suxor because they charge people for their product. We all know that its a much better business model to give all your products away for free. If you charge for anything, then you are allied with M$ and will burn in hell.
The Cell processor is interesting. Nintendo may dominate the 2006/7 market with cheap hardware, but the PS3 will gradually gain dominance as the price drops and people learn how to write for it.
The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
From the HPCWire link:
"We also conclude that Cell's heterogeneous multi-core implementation is inherently better suited to the HPC environment than homogeneous commodity multi-core processors."
Guess that means Xbox 2's three identical general-purpose processors are inherently lacking in their high performance computing ability. And those are just fight words calling it "commodity".
PS3 will have better AI, better physics, and better games.
"The paper did a lot of hand-optimization, which is irrelevent to most programmers. "
But not to programmers who do science.
"What gcc -O3 does is way more importent then what an assembly wizard can do for most projects."
Not an unsurmountable problem.
I can't wait to hook one of these babies up as the brain of my house and run concurrent multimedia streams everywhere. Already dreaming of little wireless touch screen terminals next to the toilet, and a waterproof one in the shower :-)
"The problem with the PS3 will be that it will take companies a lot of time and money to develop games for it. They won't do this until they know there will be enough consumers to buy their games. The consumers on the other hand, won't buy it unless there are good games for it. Kind of a catch 22 if you ask me."
The phrase you're looking for is "development tools".
"Nintendo's model on the other hand, is to make it really cheap and easy to design games for the Wii, so there isn't so much risk involved for the developers. There also isn't so much risk for the consumers, because the system itself is so much cheaper than the competition."
That depends on the cost of development systems. Also something you have forgotten. The platforms aren't just differentiated by hardware, but by genre.
Doesn't the Cell's design mean that it can very easily scale up, without requiring any changes in the software? Just add more computing CPUs (SPEs they are called, I think?) and the Cell runs faster without changing your software.
I'm not entirely sure of this, can someone corroborate/disprove?
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Apple has said they considered and rejected the Cell because it's more a game-box processor, rather lacking on the multipurpose needs of a general purpose processor. So they would need to put 4 Cells to match the general needs of a Quad core.
Also they considered that one processor change was enough.
But then Apple caters to the scientific community and ignoring the Cell leaves a hole in the market with no Intel alternative in sight.
I hear the delay with the PS3 is because of problems fabricating such a complex en-masse. It must be one hot sonfabeach too.
So is the "Mac Pro" really delayed because of the Cell?
An interesting point is that most consoles sell their hardware at a loss. At least the XBox does. This means that there is no guarantee that IBM is willing to sell their CPUs at the same price that one would believe they cost for the PS3.
Moreoever, the scientific community is very likely to push their cell+ architecture and I'm sure IBM would be more than happy to help... For a massive price.
So, when building an HPC system, you're likely to work around the best architecture (the more expensive cell+), and purchasers of the HPC will then have a cray-like proprietary system at enormous cost.
Not that this is a bad thing, I just don't believe this "low cost" "high volume" statement.
-Michael
Isn't Cell similar to things like QCDOC (from what my LQCD colleagues tell me, it's based on PowerPC, but are there similarities in the wider architecture, interconnects, etc.)? Have any plans to use it here?
I'm going to disagree with you here. Here's my game system buying guide:
-I'll buy a DS Lite the day it comes out. There are several games I like and several more coming.
-I'll buy an XBox 360 when Splinter Cell comes out.
-I'll buy a PS2 as soon as I decide I can't go without playing Guitar Hero any longer.
-I'll pre-order a Wii (provided Twilight Princess remains a launch title)
-I do not see myself ever purchasing a PS3.
Each system offers me something. The DS gives me portability and a growing library of games, including some real gems. The XBox 360 gives me Live, wireless controllers, and games that I would like to play: Oblivion, Fight Night Round 3, etc. Wii is my only 'must-have' system. A low price, innovation, a lot of developer support, and established exclusive franchises. Not to mention the ability to play tons of SNES, NES, and Sega games. PS3 offers me... nothing. The only thing that would possibly entice me is the number of RPG's traditionally available on the PS systems, but I never bought a PS2 and the price was MUCH lower. Final Fantasy isn't worth $700 to me.
Every discussion I have had regarding consoles has ended the same way. People who don't have a 360 yet plan on getting one within the year, everyone wants the Wii, and the PS3 gets a big, "So what?"
The PS3 may have the hardware advantage, but that's all it offers. From a gaming perspective, Sony has yet to give me one good reason to spend my money on a console with technology that won't be fully utilized until about two years after release, a video disc format that won't be widespread until at least a year after release (if ever), and HD (which I don't have).
At least the scientists are getting something out of it. Now if Sony gave me a system that could do complex COMSOL models, I'd be interested.
IF IBM was the maker of the chip they would most certainly not sell them at a loss. Why should they? Sony might sell the console at a loss to recoup the loss from game sales but IBM has no way to recoup any losses.
Then again IBM is in a parnetship with Sony and Toshiba so the chip is probaly owned by this partnership and Sony will just be making the chips it needs itself.
So any idea that IBM is selling Cells at a loss is insane.
Then the cost of the PS3 is mostly claimed to be in the Blu-ray drive tech. Not going to be off much intrest to a science setup is it? Even if they want to use a blu-ray drive they need just 1 in a 1000 cell rig. Not going to break the bank.
No the cell will be cheap because when you run an order of millions of identical cpu's prices drop rapidly. There might even be a very real market for cheap cells. Regular CPU's always have lesser quality versions. Not a problem for an intel or AMD who just badge them celeron or whatever but you can't do that with a console processor. All cell processors destined for the PS3 must be off similar spec.
So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers? That is were celerons come from (defective cache)
We already know that the cell processor is going to be sold for other purposes then the PS3. IBM has a line of blade servers coming up that will use the cell.
No I am afraid that it will be perfectly possible to buy Cells and they will be sold at a profit just like any other cpu. Nothing special about it. they will however benefit greatly from the fact that they already got a large customer lined up. Regular CPU's need to recover their costs as quickly as possible because their success will be uncertain. This is why regular top end cpu's are so fucking expensive. But the Cell allready has an order for millions, meaning the costs can be spread out in advance over all those units.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
"evaluated the processor's performance in running several scientific application kernels"
Translation: We compiled 2.4 and 2.6 on it and ran convert on a bunch of TIFF images for a couple days.
"then compared this performance against other processor architectures."
Translation: xp woulnd't activate.
Join the Slashcott! Feb 10 thru Feb 17!
Hmmm Betty. The cat did a whoopsee on me cell processor!
The DP performance of the cell isn't that good. You can get that with FPGAs today, and beat that with other chips. When they can get that DP performance on par with the SP performance, even 1/2 of it would be fine, then it will be meaningful.
The real issue here has nothing to do with the performance and capabilities of the cell processor. The real issue is, can I make a copy, contract out my own fab, and make it without anyone elses permission. If I can, then it will be successfull, if I can't then it is just another proprietary technology that won't give the end user any real advantage over the long term - and thus no real reason to switch from more commoditized technologies.
Yes, but the Cell is designed to process data in independent packages which are scheduled and sent to processors by the central unit, it's not a traditional multiprocessor system. Hmm, I guess that from the specs the processors could be communicating via the network instead of just buses as well, which would make what you say correct. I guess we should wait and see.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Some architectures are better for some things than other architectures. A prime example would be the DSP. It is optimized for a certain kind of calculation. For those it is better than general purpose architectures by orders of magnitude.
Remember the math coprocessor? Back when I was using a 286 cpu, I bought a math coprocessor for $800 so I could do CAD. Maybe someone could put a cell chip on a daughter board to improve the math ability of a regular desktop computer.
Sounds like this cpu would end up having great folding performance. I so hope the PS3 ends up being hackable and we get to throw Linux on it ;-)
The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations. Altough theses scientists may be interested by any potential speed-up to their workflow, they are not willing to invest any bit of their time to translate all their codebase to asm-optimized C. Thus, the ball is in the hands of software developpers, not scientists.
I'm jack's useless sig
FTA: While their current analysis uses hand-optimized code on a set of small scientific kernels, the results are striking. On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors,
The Cell processor may be faster but how easy is it to implement an optimizing development system that eliminates the need to hand-optimized the code? Is not programming productivity just as important as performance? I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind. I don't think it is a good idea to design a software model around a CPU. It is much wiser to design the CPU around an established model. In this vein, I don't see the cell as a truly revolutionary processor because, like every other processor in existence, it is optimized for the algorithmic software model. A truly innovative design would have embraced a non-algorithmic, reactive, synchronous model, thereby killing two birds with one stone: solving the current software reliability crisis while leaving other processors in dust in terms of performance. One man's opinion.
All MP machines have: communication channels, and processors. If the designers envisioned it being used a certain way and optimized it for that, well, what of it? Maybe that's how the standard game API does things but, it's still processors and communication channels. It's more than likely you can get better performance out of it by adapting your problem for it specifically, minimizing communication and keeping processors busy as much as the problem allows, same as for all other MP systems.
Someday we'll all be negroes
x86, the commodity, has registers from the days when RAM was faster than the CPU (ie 8-bit days)
The tacked on FPU, MMX, SSE SIMD stuff whilst welcome still leaves few registers for program use
The PowerPC on the otherhand has a nice collection of regs, and as good if not better SIMD--The CELL goes a big step further
More regs = more varibles in the CPU = higher bandwidth of calculation
be they regular regs or SIMD regs.
That plus the way it handles cache
Could be a pig to program without the right kind of compiler optimizing
Would that mean game developers using FORTRAN 95?
Over the last several decades, there have been lots of parallel architectures, many significantly more innovative and powerful than Cell. If Cell succeeds, it's not because of any innovation, but because it contains fairly little innovation and therefore doesn't require people to change their code too much.
One thing that Cell has that previous processors didn't is that the PS3 tie-in and IBM's backing may convince people that it's going to be around for a while; most previous efforts suffered from the problem that nobody wanted to invest time in adapting their code to an architecture that was not going to be around in a few years anyway.
I thought the Cells performance was mediocre if you only had a single task going on at a time. Given that scientific simulations aren't real time, it doesn't need to be hugely multithreaded as it's better for each tick/frame/etc of the simulation to be done one after the other.
(a) lend themselves extremely well to optimisation
(b) lend themselves extremely well to incorporation in subroutine libraries
(c) tend to isolate the most compute-intensive low-level operations used in scientific computation
SGEMM
If you read the article, you will find (among others) a reference to a operation called "SGEMM". This stands for Single precision General Matrix Multiplication. This is the sort of routines that make up the BLAS library (Basic Linear Algebra Subprograms) (see e.g. http://www.netlib.org/blas/). High performance computation typically starts with creating optimised implementation of the BLAS routines (if necessary handcoded at assembler level), sparse-matrix equivalents of them, Fast Fourier routines, and the LAPACK library.
ATLAS
There is a general movement away from optimised assembly language coding for the BLAS, as embodied in the ATLAS software package (Automatically Tuned Linear Algebra Software; see e.g. http://math-atlas.sourceforge.net/). The ATLAS package provides the BLAS routines but produces fairly optimal code on any machine using nothing but ordinary compilers. How? If you run a makefile for the ATLAS package, it may take about 12 hours (depending on your computer of course; this is a typical number for a PC) or so to compile. In this time the makefile will simply run through multiple switches and for the BLAS routines and run testsuites for all its routines for varying problem sizes. And then it picks the best possible combination of switches for each routine and each problem size for the machine architecture on which it's being run. In particular it takes account of the size of caches. That's why it produces much faster subroutine libraries than those produced by simply compiling e.g. the BLAS routines with an -O3 optimisation switch thrown in.
Specially tuned versus automatic?: MATLAB
The question is of course: who wins? Specially tuned code or automatic optimisation? This can be illustrated with the example of the well-known MATLAB package. Perhaps you have used MATLAB on PC's, and wondered why its matrix and vector operations are so fast? That's because for Intel and AMD processors it uses a specially (vendor-optimised) subroutine library (see http://www.mathworks.com/access/helpdesk/help/tech doc/rn/r14sp1_v7_0_1_math.html) For SUN machines, it uses SUN's optimised subroutine library. For other processors (for which there are no optimised libraries) Matlab uses the ATLAS routines. Despite the great progress and portability that the ATLAS library provides, carefully optimised libraries can still beat it (see the Intel Math Kernel Library at http://www.intel.com/cd/software/products/asmo-na/ eng/266858.htm)
Summary
In summary:
-large tracts of Scientific computation depend on optimised subroutine libraries
-hand-crafted assembly-language optimisation can still outperform machine-optimised code.
Therefore the objections that the hand-crafted routines described in the article distort the comparison or are not representative of real-world performance are invalid.
However ... it's so expensive and difficult that you only ever want to do it if you absolutely must. For scientific computation this typically means that you only consider handcrafting "inner loop primitives" such as the BLAS routines, FFT's, SPARSEPACK routines etc. for this treatment, and that you just don't attempt to do that yourself.
By the end of the article, I was looking for their idea of a hypothetical best-case pony.
So, that means that the cell in it's current design is 14/8= 1.75x times slower for double precision than an Opteron/Itanium is for single precision. I searched around byt couldn't find a good answer on what is the ratio between an Opteron/Itanium single and double power precision performances? If it's actually just 50% slower (as I think it is) then the cell is still slower (currently 75%).
So, anyone knows for sure what is the ratio between an Opteron/Itanium single and double power precision performances?
"I don't mind God, it's his fan club I can't stand!" E8
Cell have 8 vector processor and something like a ppc to "control" all of them, it's done specially for FP operations. It's like a comparation of a GPU with a CPU, it haven't got so much sense.
politic in spanish & my blog
Check if this was sponsored by the same marketing team that was running ads that kept peddling the lackluster g4 as a supercomputer on the national watchlist.
Did you mean Fermilab, or am I not keeping up with scientific progress? :-)
I love how they manage to completely ignore all the other vector-type architectures already in the market, and just compare it to Intel/AMD which are not even designed for floating point performance.
;)
Scream "my computer beats your abacus" all you want.
But then it is from Berkeley, so that's normal.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Here you go. Enjoy.
http://www.mc.com/cell/
I paid $1800 for the 1080p monitor to use with the PS3. I am so future proofed it isn't funny. :D
The sad thing is that I don't game. I want to learn how to program the processors in this machine. I can picture a stack of 10 of these game consoles just pumping out rendered frames of animation.
The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.
What are you running your renderer on now? Or is this power lust? You'll pay a heavy price, especially in your time. I regret doing this myself.
that the Cell will be "cheap" compared to other supercomputers? (yeah, I know I can't really compare a chip with a supercomputer, but you get the idea). Not to mention more energy efficient (which in the case of HPC also saves money to a point where it's significant, right?).
And when I say cheap, I mean: "so much cheaper we can use the savings to hire a real programmer to do the optimizing for us, and then buy some extra processing power on top of that! And throw a party!"
slightly offtopic: Half a year ago I interviewed Peter Hofstee, chief architect of the Cell, for the student union for physics, mathematics and informatics of the university of Groningen, the Netherlands (he studied at our university and did quite a few things for our student union). The interview was actually going to be about him and his career after his study, but when I got to the point of "what are you doing now?", it automatically centered on the Cell chip as he just finished designing it. Most of the jargon was lost on me (studying physics myself), though he sounded very enthousiastic and convincing. However, it appears someone who does understand what he's talking about had an almost identical interview recently. The advantage I mentioned above is something I quite clearly remember from that interview. DodgeK
If the chip runs fast with some hand-optimization, then it will get done. Just follow the money. Sheesh!
I don't trust atoms -- they make up stuff.
However, the stencil code and SpMV kernels were actually coded up and simulated for the paper. They were then run (exact same code) on real hardware (a 2.1GHz prototype machine) and those results were presented at the EDGE workshop last week. The hardware performance was pretty close to the simulator (the more computationally bound the kernel, the more accurate the simulator)
The X1E MSP is certainly a vector processor, and we ran the same kernels on it and presented them in the paper. It would certainly not be considered a commodity processor though. We wanted a nice sample set of architectures: superscalar, VLIW, and vector.
So if somebody writes a couple of dozen standard routines that crank the number-crunching part of the Cell processor well, and there's a halfway-adequate compiler for the conventional-processing side, you can still get a big win from a small budget.
I did a lot of scientific-style programming on VAXes in the early-mid 80s, and my iPod Shuffle has more CPU, more disk-equivalent, faster I/O bus, and probably more RAM (? not sure, but all the non-shuffle versions do.) Our applications sped up by 2 orders of magnitude once we could get enough RAM :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks