Slashdot Mirror


The Potential of Science With the Cell Processor

prostoalex writes "High Performance Computing Newswire is running an article on a paper by computer scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory. They have evaluated the processor's performance in running several scientific application kernels, then compared this performance against other processor architectures. The full paper is available from Computer Science department at Berkeley."

176 comments

  1. Cell + Linux = success by Anonymous Coward · · Score: 3, Funny

    OS X is closed source. This means that it is the work of the devil - its purpose is to make the end users eat babies.

    Linux is the only free OS. Yes the BSD lincenses may appear more free, but as they have no restrictions, they are actually less free than the GPL. You see, restricting the end user more actually makes them more free than not putting restrictions on them. You must be a dumb luser for not understanding this.

    And you obviously dont have a real job. A real job involves being a student or professional academic. You see, academics are the ones who know all about productivity - if you work for a commercial organisation you obviously do not know anything about computers. Usability is stupid. Whats wrong with the command line? If you cant use the command line then you shouldnt be using a computer. vi should be the standard word processor - you are such a luser if you want to use Word. Installing software should have to involve recompiling the kernel of the OS. If you dont know how to do this, you are a stupid luser who should RTFM. Or go to a Linux irc channel or newsgroup. After all, they are soooo friendly. If you dont know how the latest 2.6 kernel scheduling algorithm works then they will tell you to stop wasting their time, but they really are quite supportive.

    Oh, and M$ is just as evil as Apple. Take LookOUT for instance. You could just as easily use Eudora. Who needs groupware anyway, a simple email client should be all we use (thats all we use as academics, why cant businesses be any different).

    And trend setters - Linux is the trend setter. It may appear KDE is a ripoff from XP, but thats because M$ stole the KDE code. We all know they have GPL'ed code hidden in there somewhere (but not the things that dont work, only the things that work could possibly have GPL'ed code in it).

    And Apple is the suxor because they charge people for their product. We all know that its a much better business model to give all your products away for free. If you charge for anything, then you are allied with M$ and will burn in hell.

    1. Re:Cell + Linux = success by Anonymous Coward · · Score: 0

      You sire, are a troll... If you were not AC, you would be a kharma whore. Excellent FP tho :-p

  2. Re:PS3 will rule in 2008 by Watson+Ladd · · Score: 0, Offtopic

    No. I think the PS3 will not have a large price drop until after the market is largely staturated. Enough about consoles!

    --
    Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
  3. What about the compiler? by Watson+Ladd · · Score: 2, Insightful

    The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.

    --
    Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    1. Re:What about the compiler? by Anonymous Coward · · Score: 5, Insightful

      Hand optimization _is_ relevant to scientific programmers

    2. Re:What about the compiler? by TommyBear · · Score: 5, Insightful

      Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.

    3. Re:What about the compiler? by suv4x4 · · Score: 2, Interesting

      The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.

      Actually bullshit. We're talking scientific applications here, and it's not uncommon that programs written to run on supercomputers *are* optimized by an assembly wizard to squeeze every cycle out of it.

    4. Re:What about the compiler? by maximthemagnificent · · Score: 1

      Hard science is exactly the sort of application that would employ an assembly programmer to optimize code.

    5. Re:What about the compiler? by Anonymous Coward · · Score: 0

      Did you RTFA? "Most" programmers aren't doing High Performance Computing (HPC).

    6. Re:What about the compiler? by C.A.+Nony+Mouse · · Score: 1

      That games can be written to run well on Cell is not news. That the same might be true for scientific code is.

      --
      J
    7. Re:What about the compiler? by Anonymous Coward · · Score: 2, Informative

      Insightful? Ah... no.

      Scientific users code to the bleeding edge. You give them hardware that blows their hair back and they will figure out how to use it. You give them crappy painful hardware (Maspar, CM*) that is hard to optimize for, then they probably won't use it.

      Assembly language optimization is not a big deal. Right now the biggest thing bugging me is that I have to rewrite a core portion of a code to use SSE, since SSE is so limited for integer support. As this is a small amount of work, and the potential gains are so large (about 4x), it doesn't make sense not to do this. Some of it will be hand coded and optimized assembler. This is how we have to program. Scientists need the fastest possible cycles, and as many of them as possible ... at least the ones I know need this. There are a few who do all their analysis on Excel spreadsheets. They don't need much in the way of speed. The rest of us do.

    8. Re:What about the compiler? by Anonymous Coward · · Score: 0
      What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.


      For a word processor? You are right.

      For a scientific library? No, you are dead wrong.

      This particualar article was about scientific number crunching, no?
    9. Re:What about the compiler? by Anonymous Coward · · Score: 0

      However, virtually no scientific programmers do hand optimization of assembly, even for giant programs running on big clusters. So that's irrelevant to resolving the question of whether the Cell is suitable for scientific programming.

    10. Re:What about the compiler? by Watson+Ladd · · Score: 1

      Most projects, not some superexpensive code. Sure, fast API's like BLAS will use hand-written asembler, but it takes a compiler to find those optimizations that are too complex to do by hand or hard to find while being easy to do. And the asembler advantage is negative on some RISC processors now due to advances in compiler design. So gcc -O3 might outpreform asm, so then gcc -O3 is relevant as nobody will want to use asm as gcc can outpreform it. But I haven't seen anything about how true this is for the Cell.

      --
      Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    11. Re:What about the compiler? by samkass · · Score: 5, Insightful

      What seems to be more important than that is:

      "According to the authors, the current implementation of Cell is most often noted for its extremely high performance single-precision (32-bit) floating performance, but the majority of scientific applications require double precision (64-bit). Although Cell's peak double precision performance is still impressive relative to its commodity peers (eight SPEs at 3.2GHz = 14.6 Gflop/s), the group quantified how modest hardware changes, which they named Cell+, could improve double precision performance."

      So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.

      --
      E pluribus unum
    12. Re:What about the compiler? by Anonymous Coward · · Score: 1, Insightful

      Methinks that the point was that if a GAME development company is going to fork over the cash for ASM wizards, a company spending a few hundred mil. building a super-computer might just consider doing the same. Maybe.

      And I know from Uni that many profs WILL hand optimize code for complex, much used algorithms. Then again, some will just use matlab.

    13. Re:What about the compiler? by JanneM · · Score: 3, Informative

      Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.

      It makes sense for a game developer - and even more an embedded developer. You spend the time to optimize once, and then the code is run on hundreds of thousands or millions of sites, over years. The time you spend can effectively be amortized over all those customers.

      For scientific software the calculation generally changes. You write code, and that code is typically used in one single place (the lab where the code was written), and only run a comparatively few times, indeed sometimes only once.

      For a game developer to spend three months extra to shave a few seconds of one run of a piece of code makes perfect sense. For an embedded developer using a couple of months' worth of development cost to be able to use a slower, cheaper chip, shaving a dollar of the production of perhaps tens of millions of gadgets makes sense.

      For a graduate student (cheap as they are in the funny-mirror economics of science) to spend three months to make one single run of a piece of software run a few hours faster does not make sense at all.

      In fact, disregarding the inherent coolness factor of custom hardware, in most situations it just doesn't pay to make custom stuff for science when you can just run it for a little longer to get the same result. In fact, not infrequently have I heard about labs spending the time and effort to make custom stuff, but by the time they're done, the off the shelf hardware had already caught up.

      --
      Trust the Computer. The Computer is your friend.
    14. Re:What about the compiler? by penguin-collective · · Score: 2, Insightful

      Except for a tiny minority of specialists, most scientific programmers, even those working on large-scale problems, have neither the time nor the expertise to hand-optimize. Many of them don't even know how to use optimized library routines properly.

    15. Re:What about the compiler? by FromWithin · · Score: 2, Informative

      So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.

      Did you not read the last bit?

      On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance. If Cell were to include at least one fully utilizable pipelined double precision floating point unit, as proposed in their Cell+ implementation, these speedups would easily double.

      So it's really great already. If it was tweaked a bit, it would be ludicrously great.

    16. Re:What about the compiler? by cfan · · Score: 2, Interesting

      >So the Cell is great because there's going to be millions of them sold in >PS3's so they'll be cheap. But it's only really great if a new
      >custom variant is built. Sounds kind of contradictory.

      No, the Cell is great because, as the pdf shows, it has an incredible Gflops/Power ratio, even in its current configuration.

      For example, here are the Gflops (double precision) obtained in 2d FFT:

            Cell+ Cell X1E AMD64 IA64

      1K^2 15.9 6.6 6.99 1.19 0.52
      2K^2 26.5 6.7 7.10 0.19 0.11

      So a single, normal, Cell can be compared with the processor of a Cray (that uses 3 times more power and costs a lot more).

    17. Re:What about the compiler? by john.r.strohm · · Score: 4, Interesting

      Irrelevant to most C/C++ code wallahs doing yet another Web app, perhaps.

      Irrelevant to people doing serious high-performance computing, not hardly.

      I am currently doing embedded audio digital signal processing, On one of the algorithms I am doing, even with maximum optimization for speed, the C/C++ compiler generated about 12 instructions per data point, where I, an experienced assembly language programmer (although having no previous experience with this particular processor) did it in 4 instructions per point. That's a factor of 3 speedup for that algorithm. Considering that we are still running at high CPU utilization (pushing 90%), and taking into account the fact that we can't go to a faster processor because we can't handle the additional heat dissipation in this system, I'll take it.

      I have another algorithm in this system. Written in C, it is taking about 13% of my timeline. I am seriously considering an assembly language rewrite, to see if I can improve that. The C implementation as it stands is correct, straightforward, and clean, but the compiler can only do so much.

      In a previous incarnation, I was doing real-time video image processing on a TI 320C80. We were typically processing 256x256 frames at 60 Hz. That's a little under four million pixels per second. The C compiler for that beast was HOPELESS as far as generating optimal code for the image processing kernels. It was hand-tuned assembly language or nothing. (And yes, that experience was absolutely priceless when I landed on my current job.)

    18. Re:What about the compiler? by Anonymous Coward · · Score: 0

      In a previous incarnation, I was doing real-time video image processing on a TI 320C80. We were typically processing 256x256 frames at 60 Hz. That's a little under four million pixels per second. The C compiler for that beast was HOPELESS as far as generating optimal code for the image processing kernels. It was hand-tuned assembly language or nothing. (And yes, that experience was absolutely priceless when I landed on my current job.) Ahh, the wonders of Code Composer Studio. :)

    19. Re:What about the compiler? by Angstroman · · Score: 1
      So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.

      The HPC world is substantially different from either gaming or "normal" application programming. The strong draw of the cell is that it is a production core with characteristics that are important to High Performance Computing, particularly power dissipation per flop. While conventional applications target getting the most out of a processor, HPC applications center on scalability in number of processors. This means running the largest number of processors for a given power/cooling supply, and maintaining the lowest latency in interprocessor communication. The latter is closely related to the physical layout of the processor array, which is also dependent upon cooling strategy. Hand coding, or at least hand optimization of the code, is reasonable for these applications. The resulting improvement can make possible calculations that would otherwise not be accomplished. As the number of processors increases substantially, the leading issue shifts from local execution speed to load balancing. Load balancing requires at least an initial "hand code" for a given architecture in any event.

      There are several application spaces for HPC. Some, like semantic network processing do not require double precision and can be mounted on cell processors as they stand. Those which are fundamentally based on massive differential equation solution would benefit from the double precision modification. The key point here is that the double precision pipeline unit is a modification, not a different core. It is likely that IBM can make such a change at a fraction of the cost of the original core development with benefits not only to the HPC community, but also to potential workstation use.

      The bottom line is than one can be easily mislead trying to think of HPC architectures and programming from the familiar standpoint of game and web server development.

    20. Re:What about the compiler? by Gromius · · Score: 1

      I'm a particle physicist. Our computing needs are insane but massively parrallel, basically the grid is being developed for us and us alone although we figure that some other people might find a use for it. We spend the fast majority of our day to day job programming. And we're, with only a few exceptions, piss poor at it. Forget hand optimized assembly, I'm currently fighting a losing battle to stop people using x = pow(y,2) (and I have found that in our base software package, one suposedly written by the experts). However the solution usually is just to buy a faster machine to run it on.

    21. Re:What about the compiler? by Frumious+Wombat · · Score: 1

      Actually, for my field (Chemistry), what GCC -O3 does is irrelevant, except during the development phase of a program, or as a last resort for portability. We care about what the fastest native compiler we can find + optimized libraries does. The Cell will be no different; a few hand-optimized routines such as BLAS, FFTPack, etc, in libraries, then an auto-vectorizing Fortran-95 compiler on top. I will be interested in seeing how packages such as GAMESS or NWChem http://www.emsl.pnl.gov/docs/nwchem/nwchem.html/ behave once Fortran is available, and Cell shipped in something other than game consoles.

      On the other hand, the GROMACS guys http://www.gromacs.org/, who write hand-optimized code on a per-processor basis, ought to be stoked. It already runs well using single-precision, so it looks to be tailor-made to a Cell-based setup.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    22. Re:What about the compiler? by Anonymous Coward · · Score: 0

      I was once working on data analysis involving an entire year of data. The algorithm we were using was weighted least squares, which is O(n^2). The data analysis would have taken a few weeks if it weren't for some clever optimisations. So I don't think the time I spent on that is wasted time.

    23. Re:What about the compiler? by Anonymous Coward · · Score: 0

      For crying out loud....

      importAnt thAn

      Have you learnt the alphabet yet?

    24. Re:What about the compiler? by statusbar · · Score: 1

      Jeez, that reminds me of the "Database Specialists" doing "SELECT * from mytable;" and then doing a java for() loop to find the rows they are interested in.. Then they complain about the database machine being too slow so they get it upgraded.

      How much do these new machines cost?

      How much does a competent programmer cost?

      Which one is the best option?

      --jeffk++

      --
      ipv6 is my vpn
    25. Re:What about the compiler? by Anonymous Coward · · Score: 0

      It really seemed they were just talking about processor evolution. While they found the Cell to be good they saw room for improvement. They called their ideas Cell+. That's not really contradictory.

    26. Re:What about the compiler? by JanneM · · Score: 1

      The data analysis would have taken a few weeks if it weren't for some clever optimisations. So I don't think the time I spent on that is wasted time.

      It's not wasted time if the time spent optimizing is less than the time saved. So for your example, assuming it wasn't algorithmic optimizations (which are orthogonal to doing funky assembler stuff), you may save a few days on a few weeks running time. So if the optimization took a couple of days of coding it may have been worth it. Otherwise it was not.

      And for scientific apps especially, you really do have to factor in the added cost of tweaking the software - you _always_ need to tweak, often over many cycles - when part of it is as opaque and difficult to understand as assembly optimizations are (which often implies explicit use of the semi-parallel features of modern CPU_s today).

      --
      Trust the Computer. The Computer is your friend.
    27. Re:What about the compiler? by Shinobi · · Score: 1

      Well, that's where you're wrong. There are more people who hand-optimize than the academic world cares to admit, since admitting it would also mean admitting that the oh-so-sacred academic practices as well as compiler technology+libraries has some areas where they can't be applied efficiently.

    28. Re:What about the compiler? by adam31 · · Score: 3, Informative
      Actually bullshit.

      Actually, it's not bullshit. Simple C intrinsics code is the way to go to program the Cell... there's just no need for hand-optimized asm. Intrinsics has a poor rep on x86 because SSE sucks. 8 registers. A source operand must be modified on each instr, no MADD, MSUB, etc.

      But Cell has 128 registers and a full set of vector instructions. There's no danger of stack spills. As long as the compiler doesn't freak out about aliasing (which is easy), and it can inline everything, and you present it enough independent execution streams at once... the SPE compiler writes really, really nice code.

      The thing that does need to be hand-optimized still is the memory transfer. DMA can be overlapped with execution, but it has to be done explicitly. In fact, algorithms typically need to be designed from the start so that accesses are predictable and coherent and fit within ~180kb. (Generally, someone seeking performance would do this step long before asm code on any platform anyway...)

    29. Re:What about the compiler? by SeeMyNuts! · · Score: 1

      "a company spending a few hundred mil. building a super-computer might just consider doing the same"

      Well, if they hire the typical contractor to do the work, $10 million goes towards the computer, $90 million goes towards a coffee service, and $300 million goes towards per diem.

    30. Re:What about the compiler? by fbg111 · · Score: 1

      It's a good start, a good platform upon which to expand. I would bet IBM would be willing to make Cell+, given their traditional involvement in scientific computing. But you left a key part out of your quote, that even in its current form, Cell appears to be 8x faster and more power efficient than current Opterons and Itaniums in double-precision calculations. Doubling that by making a few modifications to the silicon is probably not out of the question, though whether this would allow Cell+ the price reductions of Cell's economy of scale is another question.

      "Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency," the authors wrote. While their current analysis uses hand-optimized code on a set of small scientific kernels, the results are striking. On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance. If Cell were to include at least one fully utilizable pipelined double precision floating point unit, as proposed in their Cell+ implementation, these speedups would easily double."

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    31. Re:What about the compiler? by netwiz · · Score: 1

      it takes a compiler to find those optimizations that are too complex to do by hand or hard to find while being easy to do.

      Say what? Um, that type of optimization doesn't exist, unless the programmer is really untalented. Most of the big opportunities should stand out like a sore thumb on a trace. Once you know what's taking all the time in the code, you can look at the way it's put together to catch the low-hanging fruit. Generally, the first 10% of the work gets you 90% of the way there. Then there's all the corner-case work, the 90% that gets you the other 10%. In both cases, so long as you've the appropriate tools, finding opportunity for speedup is relatively easy, excepting in cases where a routine needs a complete reevaluation from the ground up. This can happen if the data model's not quite right, or there's significant resource blocking. These are a real bitch because they can cause you to completely redesign large sections of code, and there's absolutely nothing an optimizing compiler will do to help in those cases.

      I should point out that a well-designed program should not encounter the last two issues, as it suggests the problem wasn't well-enough understood beforehand. You have to know exactly what you want to do with a computer before you figure out how you're going to do it.

    32. Re:What about the compiler? by Anonymous Coward · · Score: 0

      The compiler can use the hand-coded library specific to the cell. If the cell were a (supercharged) arithmetic logic unit with an embedded (rom) set of calls that a compiler could access, you could still get phenomenal performance ouf the the unit along beside a conventional CPU. If in 1 machine instruction on a traditional processor, you could get a 64 bit sin, cosine, tangent, or fft result, the processor would be confiscated by the millitary. An array of cell+ processors could be used to run extremely powerful simultions. I could see a multi-core processor having at least one or two cell processors shared amoung 4 other cores (or even 4 traditional cores and 4 cell processors all on one die). A machine with 8 such dies would effectively have 64 very powerful processors. That should be enough for anyone(tm)

    33. Re:What about the compiler? by Anonymous Coward · · Score: 0

      Agreed. One of the biggest uses I have for my computer today is rendering complex 3D graphics (scan-line rendering). Its frustrating to wait 4 days for what will eventually become 24 seconds of video (1.8GHz computer with 1GB of PC2700 Ram). Each frame (picture) takes about 10 minutes to render if I'm lucky. Catmull-Clark surfaces, motion blur and advanced radiosity settings make things slow down (a lot). If I could get each frame to render in 1 minute, then 24 minutes could create one second of video. One day could create a minutes worth of video. Two months worth of rendering could create an hours worth of video WOOT!

    34. Re:What about the compiler? by uarch · · Score: 1

      Be careful assuming a 3x cut in the number of instructions is a 3x speed increase.
      There's plenty of instances where that isn't true.

    35. Re:What about the compiler? by adam31 · · Score: 3, Informative
      I am also an experienced assembly programmer, and I too shared your mistrust of the compiler. However, I started SPE programming several months ago and I promise you that the compiler can work magic with intrinsics now. Knowledge of assembly is still helpful, because you need to have in mind what you want the compiler to generate... make sure it sees enough independent execution clumps that it can cover latencies and fill both the integer pipe and FP pipe, understand SoA vs AoS, etc. But you get to write with real variable names, not worry about scheduling/pairing of individual instructions or loop unrolling issues.

      Some of my best VU routines that I spent a couple weeks hand-optimizing, I re-wrote with SPE intrinsics in an afternoon. After some initial time figuring out exactly how the compiler likes to see things, it was a total breeze. My VU code ran in 700 usec while my SPE code ran in 30 usec (@ ~1.3 IPC! Good work, compiler).

      The real worry now is becoming DMA-bound. For example, assuming you're running all 8 SPEs full-bore, and you write as much data as you read. At 25.6 GB/s, you get 3.2 GB/s per SPE, so 1.6 GB/s in each direction (assuming perfect bus utilization), so @3.2 GHz, that's 0.5 Bytes/cycle. So, for a 16-byte register, you need to execute 32 instructions minimum or you're DMA-bound!

      Food for thought.

    36. Re:What about the compiler? by TopSpin · · Score: 1

      But it's only really great if a new custom variant is built.

      Cell had a specific problem domain to address during the design of the initial product. If Cell really is all that, there will be future revisions. These researchers are pointing out what is necessary to make Cell more viable to a broader base of users. They are putting themselves at the head of the line.

      They have evaluated the existing Cell, added their guesswork as to what could be done with modest changes and quantified the result relative to competitors. The best case outcome includes another $200+ million contract to build another massive compute grid. If someone shows up at the door with this paper in one hand and the funding resources to accomplish it in the other... Cell2 (or Cell+ as they call it) gets put on the drawing board.

      --
      Lurking at the bottom of the gravity well, getting old
    37. Re:What about the compiler? by pjabardo · · Score: 1

      Even though calculation changes, there are several core math kernels that almost every numerical application uses. One such kernel is BLAS (Basic linear algebra software). An optimized version can be much faster than the fortran standard implementation. I've code that uses optimized BLAS and it is 2 times faster than the fortran implementation. If the software is going to run for days this makes a big difference.

    38. Re:What about the compiler? by Anonymous Coward · · Score: 1, Insightful

      I heard about labs spending the time and effort to make custom stuff, but by the time they're done, the off the shelf hardware had already caught up

      Haha, dude, have you ever run tests that take weeks to complete? The FLOPS improvement shown in that paper is around a factor 8 compared to AMD64 machines. You jump from weeks to days in simulation time. That is HUGE.

      As for the development time, doing a basic optimisation will give already give you a great boost in performance. You do not have handcode each and every instruction/function. As a side note, we already spend weeks on optimising pieces of code for SSE/SSE2/SSE3. I would guess using another set of assembler would not delay us too much. Especially if we can gain 8x performance.

      Our lab also does video coding, processing 8 times faster would mean that we can go from demonstrating our technology on 352x288 (CIF) sequences to demonstrating it on 720p (HD) sequences. That is if we keep it realtime, or we could process 8 CIF streams at once. Now that is WAY impressive.

    39. Re:What about the compiler? by Wolfier · · Score: 1

      What the academics does with a new technology by hand is often what makes things you do daily, like -O3, possible.
      Sometimes people DO use published research results to construct compilers.

      -O3 is more important when the optimization is just a mean to an end - however, when optimization is an end itself, it's easy to see the value of disciplined hand tuning.

    40. Re:What about the compiler? by m874t232 · · Score: 1

      Having seen a lot of scientific codes, I can assure you: most people writing scientific software don't even know about BLAS, and even if they do, they don't bother.

    41. Re:What about the compiler? by m874t232 · · Score: 1

      It's not wasted time if the time spent optimizing is less than the time saved.

      Wrong. A programmer hour is much more valuable than a machine hour.

      And this hasn't been lost on scientists and engineers--hence the popularity of software like MATLAB.

    42. Re:What about the compiler? by salad_fingers · · Score: 1

      What needs to be remembered is that Cell is a strictly in-order CPU, meaning that it cannot execute instuctions out of order. Therefore, hand-optimization can only get you so much. If you are waiting on an add, you are waiting on an add; not much can be done in that sense. It really comes down to compiler logistics and how the various tasks of branch prediction, locality and "cache" come into play/are optimized. Remember that Cell has no real cache but a virtual one. It ultimately comes down to the compiler, which I am sure is pretty beefy in this case.

    43. Re:What about the compiler? by Anonymous Coward · · Score: 1, Insightful

      If a simulation will run for several months, saving a weeks worth of run time is adventageous. That could translate into more time to do analysis, publishing sooner than a competitor, reduced overhead, etc.

      But as with any of the examples you gave, a cost benefit analysis needs to be considered.

      And with any optimization strategy, it is often better to use better data structures than to tune serial instruction streams.

      For the Cell, this might translate into reshaping data chunks to better fit the local processor environment.

    44. Re:What about the compiler? by plalonde2 · · Score: 1

      Oddly, on the Cell, most of the optimization is low-level algorithmic stuff. Yes, assembly gets you that last little boost, but most of the Cell optimizations I've worked with (for the last 15 months or so) have been data movement and data decomposition exercises. Breaking your data into SPU-sized chunks, or into SPU-streamable chunks is the hard part. It's also the part compilers are *useless* for.

    45. Re:What about the compiler? by jericho4.0 · · Score: 2, Informative

      Maybe true on our computers, but not on supercomputers.

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    46. Re:What about the compiler? by try_anything · · Score: 1
      I think you're mixing up CS theorists with the scientists and engineers who just want to crunch a bunch of numbers and get the answers. You'd like the latter group; they write horrible code and aren't ashamed of it.

      You know the saying, "You can write Fortran in any language?" Scientists judge a language by how easy it is to write Fortran in it. That's why C is their second-favorite language.

    47. Re:What about the compiler? by Anonymous Coward · · Score: 0

      Hand-optimization becomes suddenly quite relevant when your application is performance-critical. If it isn't you probably shouldn't be bothering with fast and expensive processors either.

    48. Re:What about the compiler? by Anonymous Coward · · Score: 0

      "This means running the largest number of processors for a given power/cooling supply,"

      I totally agree. These days problems of cooling are often the limiting factor in HPC solutions. Also
      with energy costs rising the energy inputs are also becoming a concern even when sufficient cooling
      exists. Energy efficiency (Flops/Watt) is something we look at when considering a new system, in
      combination with how easy it is to extract the peak performance. The PA-Semi range looks like it could
      be very interesting too for HPC in a few years' time.

      With regard to the programmer time versus machine time debate, if you are running HPC
      calculations over a large number of processors (for example weather modelling or CFD of an airframe)
      then the cost of an hour of programmer time can be trivial compared to the energy saving of a more
      efficient processor.

    49. Re:What about the compiler? by m874t232 · · Score: 1

      Yes, which brings us back to only a "tiny fraction" of all scientific programmers dealing with optimizations. In fact, supercomputers are disappearing from all but a few niches.

    50. Re:What about the compiler? by Anonymous Coward · · Score: 0

      Your comment is a little short-sighted. I am currently waiting for a profiling session to finish running and I do this often for once off projects where the only person impacted by the execution time is me. Why? Because the time spent optimising is paid off many times over when running a seven dimensional optimisation algorithm. More options for optimisation are always good. Many of the 'wonders of modern science' would have taken geological time to discover if it were not for optimisation. These guys are talking as much as 16 times faster. That's the difference between possible and not possible.

    51. Re:What about the compiler? by Memnos · · Score: 1

      No Shit. For mathematically-intensive uses the Cell Processor blows everything else out of the water, by far. Given that faster might actually be better in such contexts, what would you buy? Granted, it's backed by a little-known company by the name of IBM, which may actually have its shit together on chip design.

      --
      I don't trust atoms -- they make up stuff.
    52. Re:What about the compiler? by pjabardo · · Score: 1

      I have to agree with you. Many (most?) people do not know about it. I think the problem is that many people on scientific computing have most of their background on science not on computers and they don't know about these libraries. Since usually it is very simple to implement a replacement they don't bother to look for 'standard' solutions. But performance goes down. But even in this situation it is likely that the programmer is using BLAS even if he is not aware of it. R (www.-r-project.org) uses BLAS and LAPACK and if I'm no mistaken, Matlab does too. Several matrix libraries provide interfaces to BLAS (see uBlas from boost as one example). But when the problemas start to grow, things like BLAS begin to gain importance. I do 3D fluid dynamic simulations using clusters (up to 64 processors) and many simulations take days. So 50% gain matters. Most manufacturers provide these optimized kernels. Intel has MKL, Amd has ACML, DEC (there are still many alphas around) has DXML. There are several standalone implementations - atlas, Goto atlas, etc. Even the GNU Scientific Library uses blas (it has a C implementation that can be replaced by faster routines). I do recommend these libraries. You gain in performance, bugs and even portability.

    53. Re:What about the compiler? by Tough+Love · · Score: 2, Informative

      A programmer hour is much more valuable than a machine hour

      You forgot to take into account the team of scientists waiting for the machine to produce a result.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    54. Re:What about the compiler? by m874t232 · · Score: 1

      First of all, few scientists have the luxury of having dedicated programmers anymore. Second, the team of scientists can generally find other things to do with no problems (grant applications, etc.).

    55. Re:What about the compiler? by WindShadow · · Score: 1

      Actually given the time it takes to run a big engineering calculation and the time it take to hand optimize, for long runs it makes lots of sense and is relevant to users. Also note that if this ever became more widely used there's no reason gcc can't be taught to do a much better job for this hardware.

  4. What about the programmer? by Anonymous Coward · · Score: 5, Insightful

    "The paper did a lot of hand-optimization, which is irrelevent to most programmers. "

    But not to programmers who do science.

    "What gcc -O3 does is way more importent then what an assembly wizard can do for most projects."

    Not an unsurmountable problem.

    1. Re:What about the programmer? by Anonymous Coward · · Score: 0

      Yes, because in the world of Science people are often looking
      to hand optimize code, buzzz..... incorrect thanks for
      playing.

      Hand optimization or writing portions of code in assembler is
      the last thing 85% of these people want to do. They don't want
      to be computing experts to do their science/research.

      Vast numbers of people are employed at various HPC centers
      as "user consultants" to help these people around those
      very problems. Most of the time, what these "consultants"
      will do is limited to better coding to "Standards", since
      the academic community often prefers "portable" code to
      routines written in assembler for their codes.

      Again thanks for playing the HPC question/answer game.

    2. Re:What about the programmer? by zCyl · · Score: 2, Insightful

      Hand optimization or writing portions of code in assembler is
      the last thing 85% of these people want to do. They don't want
      to be computing experts to do their science/research.


      When you're talking about reuseable modules like an FFT or matrix multiplication, then many scientists doing simulations would love to have a hand optimized FFT or matrix module to plug in as a simulation component. Even if they don't know a drop of assembly themselves, having the optimized module available can make a large difference in running time for big simulations.

    3. Re:What about the programmer? by OnceWasLurker · · Score: 0

      Hand-optimization is an extremely important step in a number of fields - without the hand-optimizations of graphics engines in the "olden days", we probably wouldn't have taken quite the route we did. Making vga buffer registers work to push out a wider write than the processor was capable, etc... An operation taking 1mS to perform sounds like it isn't worth looking at, unless you're doing it 10,000,000 times... take a few % (= a few dozen cycles) off the execution time of that can save time and $, or get the frame rate of a game back into the sweet spot, etc... The converse, "get a faster processor" often isn't feasable - and the salary for a decent programmer for a couple of weeks doesn't ring any bells on the balance sheet if you're cranking out a million processors.

      --
      Mmmmm... I'm sure you have an invalid iterator there somewhere.
    4. Re:What about the programmer? by fatphil · · Score: 1

      Some hardware/system companies have a small bunch of volunteers for this very task - firstly they select programmers which they believe have l33t programming skills, then they lend you a top-of-the-range model (and even are prepared to ship it all the way to Finland if the volunteer lives in Finland), and in return you promise to work on hand-optimising code for their platform, and publishing the results.

      FatPhil, in Finland. ;-)

      --
      Also FatPhil on SoylentNews, id 863
  5. Multimedia server by podz · · Score: 0, Redundant

    I can't wait to hook one of these babies up as the brain of my house and run concurrent multimedia streams everywhere. Already dreaming of little wireless touch screen terminals next to the toilet, and a waterproof one in the shower :-)

  6. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 0, Flamebait

    I know you console players just grew your pubes, and this might be hard for you to understand, but Cell will be available in workstations and clusters. XBox's CPU is a one-off for a game console.

  7. PS3 will RULZ! in 2008 by Anonymous Coward · · Score: 0

    "The problem with the PS3 will be that it will take companies a lot of time and money to develop games for it. They won't do this until they know there will be enough consumers to buy their games. The consumers on the other hand, won't buy it unless there are good games for it. Kind of a catch 22 if you ask me."

    The phrase you're looking for is "development tools".

    "Nintendo's model on the other hand, is to make it really cheap and easy to design games for the Wii, so there isn't so much risk involved for the developers. There also isn't so much risk for the consumers, because the system itself is so much cheaper than the competition."

    That depends on the cost of development systems. Also something you have forgotten. The platforms aren't just differentiated by hardware, but by genre.

  8. Re:PS3 will rule in 2008 by FatherOfONe · · Score: 0, Offtopic

    Do you honestly believe that Sony won't sell all 6 million consoles at launch time?

    Now the next question. Do you believe that they won't sell 20 million by the 2007 Christmas?

    That is a huge install base. Remember that Sony at any time can lower the price of the PS3. If price ever truely becomes an issue, then they can adjust it. They will keep it as high as possible as long as possible.

    Now on the other hand, it will probably never reach the price of the Nintendo, but then the Nintendo doesn't have HD, Blu-Ray or a hard drive. Those three options add cost, but will probably add significant value to a lot of games.

    The way I see the console going is that Sony will dominate the 12 year old and up crowd. Microsoft will own a smaller percentage this time than with the XBOX, and Nintendo will own (as always) the 12 and under crowd.

    Now what will the average slashdot/digg reader buy? Well those people generally hate Sony and Microsoft, but at the end of the day, they will buy the console with the best games they like. That will be Sony, and then I could see some of the Slashdot crowd actually tinkering with Linux on the PS3.

    I am also curious to see what happens when the development kits get better for both the 360 and PS3. Creating an "easier" way to use the multi cores in those system will show the differences between all the consoles even more, and also display better how well (or poorly) the PS3 runs normal Linux stuff.

    The last part of the puzzle is how cheap 1080P TV's will get in the next 5 years. It isn't out of the question to hook up a keyboard, mouse and "cheap" 1080P LCD or Plasma TV to a PS3 and have a computer. This is a giant leap forward for consoles, and Sonys first attempt to bridge the gap between console, computer and DVR type of device.

    Time will tell if it will be a sucess or not, but one thing is certain. They will sell all the systems they can make this year and early next. People are asking every day now at EB if they can preorder the system. "If" Sony could make 20 million this year they would sell every one. I kind of wonder why they honestly don't raise the price up even more. It would suck for us gamers, but if I had a product that would max out my manufacturing for the next two years and I was sure I would sell every one I would make, then I would probably rethink my asking price. The only logic I see is that they don't want to anger the initial buyers if they have to lower the price next year for the second wave buyers. If I was Microsoft I would be very worried about the PS3. If I was Nintendo I would keep producing kids games and doing pretty much what they are doing, much like they did with the GameCube, the only difference is that I would try not to over "Mario" the system.

    --
    The more I learn about science, the more my faith in God increases.
  9. Re:Xbox 2 is a "commodity" by MooUK · · Score: 2, Insightful

    I think you misunderstand what HPC actually is.

    High performance computing is that which you'd want to throw a huge Beowulf cluster at, or possibly a supercomputer or twenty. Not three small pathetic cores.

  10. Doesn't it easily scale up? by Poromenos1 · · Score: 1, Interesting

    Doesn't the Cell's design mean that it can very easily scale up, without requiring any changes in the software? Just add more computing CPUs (SPEs they are called, I think?) and the Cell runs faster without changing your software.

    I'm not entirely sure of this, can someone corroborate/disprove?

    --
    Send email from the afterlife! Write your e-will at Dead Man's Switch.
    1. Re:Doesn't it easily scale up? by owlstead · · Score: 1

      Yes, if there isn't any communication overhead between the processors. If you have 100 seperate threads or processes, without (or almost without) any computation, then the application is perfect for multiple CPU's. If there is a lot of communication needed, then much less so. You cannot write an application for 8 cores with very fast communications and expect it to run on multiple processors without any modifications. That's why many parallel processor designs cost more for the networking part than for the processors itself.

    2. Re:Doesn't it easily scale up? by jacksonj04 · · Score: 1

      It should be best suited to things needing concurrent, but not parallel processing. For example you could be running several simulations at once, none of which are interdependent. When one is done, the processor can be handed another instruction without needing to wait for the results from everything else.

      The code will be the tricky bit.

      --
      How many people can read hex if only you and dead people can read hex?
    3. Re:Doesn't it easily scale up? by Anonymous Coward · · Score: 0

      You can plug Cell's together basically, into dual or quad blade configurations and so forth, but greater communications overhead kicks in as noted in the above posts, as is the same for all multi-node systems. However, if one node today with Cell is a PPE and 8 SPEs, in the future you'll very probably see single chips with 16 SPEs and up.

  11. Future "Mac Pro"? by Anonymous Coward · · Score: 0

    Apple has said they considered and rejected the Cell because it's more a game-box processor, rather lacking on the multipurpose needs of a general purpose processor. So they would need to put 4 Cells to match the general needs of a Quad core.

    Also they considered that one processor change was enough.

    But then Apple caters to the scientific community and ignoring the Cell leaves a hole in the market with no Intel alternative in sight.

    I hear the delay with the PS3 is because of problems fabricating such a complex en-masse. It must be one hot sonfabeach too.

    So is the "Mac Pro" really delayed because of the Cell?

  12. Not likely to be low cost CPUs by maraist · · Score: 1

    An interesting point is that most consoles sell their hardware at a loss. At least the XBox does. This means that there is no guarantee that IBM is willing to sell their CPUs at the same price that one would believe they cost for the PS3.

    Moreoever, the scientific community is very likely to push their cell+ architecture and I'm sure IBM would be more than happy to help... For a massive price.

    So, when building an HPC system, you're likely to work around the best architecture (the more expensive cell+), and purchasers of the HPC will then have a cray-like proprietary system at enormous cost.

    Not that this is a bad thing, I just don't believe this "low cost" "high volume" statement.

    --
    -Michael
    1. Re:Not likely to be low cost CPUs by Oswald · · Score: 1

      Doesn't sound right. IBM isn't taking a loss on PS3 hardware. If anybody is, it's Sony, and they would be subsidizing the volume that would allow IBM to sell the chip (relatively) cheaply.

    2. Re:Not likely to be low cost CPUs by WindBourne · · Score: 1

      I just don't believe this "low cost" "high volume" statement.If not, then you are about the only one. Simply look at the top500.org to see what low cost,high volume produces. My bet is that IBM is using sony to get to high volume rather quickly. After that point, they will start using this in a number of their own systems. And you can bet that this will form the foundation of a very very fast parallel arch for top500. I also expect to see it upgraded to cell+ quickly.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    3. Re:Not likely to be low cost CPUs by maraist · · Score: 1

      I just don't believe this "low cost" "high volume" statement.If not, then you are about the only one.

      Well, I'm just saying, I wouldn't bet money on IBM coming out with their cell in a high volume enough way to provide ultra-low pricing as in the PowerPC or obviously x86 markets. History has shown time and time again, that innovation is not what is important, dominance is. Alpha had a superb chip but was in no way marketable. Apple has always had a better design in computer hardware, but will likely never achieve any respectable market presense.

      My post is one of pessimism; that the PS3 will not be a sufficient vehicle to drive the cell processor to a volume of scale (where you could afford to make only a few pennys of gross profit per CPU). Certainly anything could happen. But I have yet to see a video graphics processor (in the 10 years I've watched GPU progression) break out of it's niche market (despite all these innovative ways of using their specialized mathmatical processing power).

      This article was about the scientific community seeing a potential to use the cell for what it was meant, but in a different venue than still-frame graphics rendering. That's fine, but any architectural descision requries taking the whole project into account, and I simply don't see the cost effectiveness of cell processing until and unless it becomes ubiquitous.. Otherwise what you have is an original cray (prior to the opteron or even alpha chip): 100% custom. And you pay for it.

      Basically it's slightly cheaper than designing the chip yourself.. But my argument is... Not by much.

      --
      -Michael
  13. Re:PS3 will rule in 2008 by Lonewolf666 · · Score: 0, Offtopic

    The last part of the puzzle is how cheap 1080P TV's will get in the next 5 years. It isn't out of the question to hook up a keyboard, mouse and "cheap" 1080P LCD or Plasma TV to a PS3 and have a computer. This is a giant leap forward for consoles, and Sonys first attempt to bridge the gap between console, computer and DVR type of device.
    If this is worthwile for users will depend a lot on how open the console is for third-party software. Usually consoles are designed to run only software licensed by the console vendor, and in some cases those vendors will even sue companies that offer modifications. An example where Microsoft went after XBox modders:
    http://www.geek.com/news/geeknews/2002Oct/gam20021 004016641.htm

    If Sony pulls a similar thing with the PS3, it will remain rather uninteresting as a computer unless they provide all the software an average user might want. Which I don't believe ;-)

    --
    C - the footgun of programming languages
  14. Lattice QCD people: by ettlz · · Score: 1

    Isn't Cell similar to things like QCDOC (from what my LQCD colleagues tell me, it's based on PowerPC, but are there similarities in the wider architecture, interconnects, etc.)? Have any plans to use it here?

    1. Re:Lattice QCD people: by Watson+Ladd · · Score: 1

      A little bit. The big difference is the Cell has SPE's which are like DSP's on the chip which are controlled by a PPC processor. QCDOC is a lot of PPC processors connected similarly. Also, memory is symmetric on QCDOC, while it is asymmetric on the Cell. The similarity is mostly in the kind of bus used. Think about one QCDOC node connected to seven QCDSP nodes and only the QCDOC node having a lot of memory and you will have the right idea. Ars Technica had a good review of the Cell.

      --
      Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    2. Re:Lattice QCD people: by Anonymous Coward · · Score: 0

      I haven't the first fucking idea what all that means... but it sounds good.

    3. Re:Lattice QCD people: by Quiberon · · Score: 1

      QCDOC people ended up making BlueGene

  15. No, it won't by PackerX · · Score: 0, Offtopic

    I'm going to disagree with you here. Here's my game system buying guide:

    -I'll buy a DS Lite the day it comes out. There are several games I like and several more coming.

    -I'll buy an XBox 360 when Splinter Cell comes out.

    -I'll buy a PS2 as soon as I decide I can't go without playing Guitar Hero any longer.

    -I'll pre-order a Wii (provided Twilight Princess remains a launch title)

    -I do not see myself ever purchasing a PS3.

    Each system offers me something. The DS gives me portability and a growing library of games, including some real gems. The XBox 360 gives me Live, wireless controllers, and games that I would like to play: Oblivion, Fight Night Round 3, etc. Wii is my only 'must-have' system. A low price, innovation, a lot of developer support, and established exclusive franchises. Not to mention the ability to play tons of SNES, NES, and Sega games. PS3 offers me... nothing. The only thing that would possibly entice me is the number of RPG's traditionally available on the PS systems, but I never bought a PS2 and the price was MUCH lower. Final Fantasy isn't worth $700 to me.

    Every discussion I have had regarding consoles has ended the same way. People who don't have a 360 yet plan on getting one within the year, everyone wants the Wii, and the PS3 gets a big, "So what?"

    The PS3 may have the hardware advantage, but that's all it offers. From a gaming perspective, Sony has yet to give me one good reason to spend my money on a console with technology that won't be fully utilized until about two years after release, a video disc format that won't be widespread until at least a year after release (if ever), and HD (which I don't have).

    At least the scientists are getting something out of it. Now if Sony gave me a system that could do complex COMSOL models, I'd be interested.

    1. Re:No, it won't by Anonymous Coward · · Score: 0

      So, you're going to wait for Splinter Cell before you buy Xbox 360? Well, the two million Japanese gamers alone who are Metal Gear fans are going to buy the PS3 when Metal Gear Solid 4 comes out. The three million American gamers who are Metal Gear fans are going to buy PS3 when MGS4 comes out. That adds up to five million easy sales for PS3 and you're not even taking Europe into account yet. Don't think Metal Gear is popular in Europe? If so, then how would you care to explain the "European Extreme" difficulty mode in MGS3: Subsistence? And that's just the Metal Gear franchise. You've still got Final Fantasy to chalk up the tally in Japan and North America. Want to bet that more people will buy PS3s when FFXIII comes out than people will buy Xbox 360 when Splinter Cell comes out? Last I checked, Splinter Cell was nothing more than a Metal Gear rip-off masquerading as a "highly intellectual thriller". I'd rather go with Kojima's genuine article.

  16. Re:Xbox 2 is a "commodity" by PhotoBoy · · Score: 1

    Except neither of those links point to anything that proves the Cell is good for High Performance Computing which is the point of the article. This isn't anything to do with 360 vs PS3. If MS wanted to design a CPU that could be scaled up for HPC they would have done, instead they just got IBM to customise a PPC chip for their games console because their goal is dominance in the living room, not to become the next Intel.

    To be honest I question the validity of this study anyway, I seem to recall lots of papers proclaiming the PS2's so-called "Emotion Engine" as the future of super computing and that never happened either. This is probably more hype paid for by Sony to make people believe the PS3 will be the second coming.

    Plus if you actually watch that whole interview with Carmack you linked to, he says the only advantage of the PS3 hardware is peak performance, which if it's anything like the PS2 will be limited by memory bandwidth. And everything I've seen of the PS3's RSX suggests it's just an nVidia 7800 GTX, which means the 360 should have the advantage graphically. With the PS3 having more CPU power but the 360 having more polygon power I suspect we'll end up with fairly similar looking games.

  17. WTF? by SmallFurryCreature · · Score: 4, Insightful
    First off you are talking about consoles being sold at a loss. NOT their components.

    IF IBM was the maker of the chip they would most certainly not sell them at a loss. Why should they? Sony might sell the console at a loss to recoup the loss from game sales but IBM has no way to recoup any losses.

    Then again IBM is in a parnetship with Sony and Toshiba so the chip is probaly owned by this partnership and Sony will just be making the chips it needs itself.

    So any idea that IBM is selling Cells at a loss is insane.

    Then the cost of the PS3 is mostly claimed to be in the Blu-ray drive tech. Not going to be off much intrest to a science setup is it? Even if they want to use a blu-ray drive they need just 1 in a 1000 cell rig. Not going to break the bank.

    No the cell will be cheap because when you run an order of millions of identical cpu's prices drop rapidly. There might even be a very real market for cheap cells. Regular CPU's always have lesser quality versions. Not a problem for an intel or AMD who just badge them celeron or whatever but you can't do that with a console processor. All cell processors destined for the PS3 must be off similar spec.

    So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers? That is were celerons come from (defective cache)

    We already know that the cell processor is going to be sold for other purposes then the PS3. IBM has a line of blade servers coming up that will use the cell.

    No I am afraid that it will be perfectly possible to buy Cells and they will be sold at a profit just like any other cpu. Nothing special about it. they will however benefit greatly from the fact that they already got a large customer lined up. Regular CPU's need to recover their costs as quickly as possible because their success will be uncertain. This is why regular top end cpu's are so fucking expensive. But the Cell allready has an order for millions, meaning the costs can be spread out in advance over all those units.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:WTF? by Kjella · · Score: 3, Insightful

      So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers?

      Use it. Seriously, that's why there's central + 7 of them, not 8. One is actually a spare so that unless it's either flawed in the central logic or two separate cores, the chip is still good. Good way to keep the yields up...

      --
      Live today, because you never know what tomorrow brings
    2. Re:WTF? by epiphani · · Score: 1

      So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers? That is were celerons come from (defective cache)

      Actually, the cell has 8 SPU's on die. It only utilizes seven, specifically to handle the possibility of defective units. They throw the extra SPU on there to increase yields.

      --
      .
    3. Re:WTF? by jericho4.0 · · Score: 1

      The Cell in the PS3 has 7 SPEs. The Cell as used in other places will likely have the full 8 available.

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
  18. Some mothers do have 'em by Anonymous Coward · · Score: 0

    Hmmm Betty. The cat did a whoopsee on me cell processor!

  19. The real problem: Double Precision by Anonymous Coward · · Score: 0

    The DP performance of the cell isn't that good. You can get that with FPGAs today, and beat that with other chips. When they can get that DP performance on par with the SP performance, even 1/2 of it would be fine, then it will be meaningful.

  20. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 1

    "We also conclude that Cell's heterogeneous multi-core implementation is inherently better suited to the HPC environment than homogeneous commodity multi-core processors."

    Whether or not HPC is something you'd want to throw 20 or more supercomputers at in a Beowulf cluster, at least you know that the PS3 is really the only next-generation video game system because nobody concerned with raw performance and power efficiency would want to use the Xbox 2 in a HPC environment.

  21. Not the real issue by argoff · · Score: 0, Offtopic

    The real issue here has nothing to do with the performance and capabilities of the cell processor. The real issue is, can I make a copy, contract out my own fab, and make it without anyone elses permission. If I can, then it will be successfull, if I can't then it is just another proprietary technology that won't give the end user any real advantage over the long term - and thus no real reason to switch from more commoditized technologies.

  22. Hmm by Poromenos1 · · Score: 1

    Yes, but the Cell is designed to process data in independent packages which are scheduled and sent to processors by the central unit, it's not a traditional multiprocessor system. Hmm, I guess that from the specs the processors could be communicating via the network instead of just buses as well, which would make what you say correct. I guess we should wait and see.

    --
    Send email from the afterlife! Write your e-will at Dead Man's Switch.
    1. Re:Hmm by owlstead · · Score: 1

      The cell architecture makes it easy to distribute workloads, that's true. But that's just the beginning of solving the parallel puzzle. The trick is to spread the workload in such a way that the communication overhead is minimal. Otherwise, it may be wiser to use a different architecture. My guess is that the cell processor is interesting to grid computing, but needs a serious platform, both hardware and software-wise to be viable for the more serious work. On the other hand, IBM should be big enough to handle this.

  23. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 0
    John Carmack is probably the most emasculated, infantile, spineless game developer out there. This is what his whining sounds like to me, "It's too hard! I don't want to work for my money. I just want to program in Python and hopefully interpret something, boo hoo."

    It's game developers like Naughty Dog who show the skill and fortitude of innovators. Instead of complacently licensing some hand me down game engine from Epic Games, they opted to hand optimize, custom code, and basically create their own sophisticated and Unix philosophy-adhering game engine.

    For example, here's a quote from Naughty Dog game designer Evan Wells in a Q&A with IGN, "Like the PS2 the PS3 is a sophisticated and powerful piece of hardware. Our engineers are working very hard at making specific optimizations to take full advantage of the Cell and its SPU's. However, there is so much depth to this machine, that much like the PS2, you will continue to see developers squeeze more and more out of it over the course of what I am sure is going to be a lengthy life-cycle."

    He continues, "The engine we are using is completely proprietary and is being developed here at Naughty Dog. We have some of the industry's top engine programming talent dedicated to getting the most out of the PlayStation 3 possible. With the introduction of the SCE Worldwide Studio there has been an increased sharing effort between the internal teams. It extends far beyond Naughty Dog and Insomniac this time and I think you'll see a lot of first party titles that reap these benefits."

    When John Carmack can stop text messaging his Neopets buddies on cellphones while trying to develop a mobile MMOG (WTF?), I may actually think he matters anymore.
  24. No great surprise by Anonymous Coward · · Score: 0

    Some architectures are better for some things than other architectures. A prime example would be the DSP. It is optimized for a certain kind of calculation. For those it is better than general purpose architectures by orders of magnitude.

    Remember the math coprocessor? Back when I was using a 286 cpu, I bought a math coprocessor for $800 so I could do CAD. Maybe someone could put a cell chip on a daughter board to improve the math ability of a regular desktop computer.

  25. When can we start Folding with it? by BartonOC · · Score: 1

    Sounds like this cpu would end up having great folding performance. I so hope the PS3 ends up being hackable and we get to throw Linux on it ;-)

    1. Re:When can we start Folding with it? by ahodes1 · · Score: 1

      Linux will be pre-installed on the PS3 HDD, no hacking needed: http://www.gamasutra.com/php-bin/news_index.php?st ory=9290

    2. Re:When can we start Folding with it? by Anonymous Coward · · Score: 0

      Well that seals my purchase as soon as possible, then. I just hope it's fairly unrestricted. Is there any more info out there on this?

    3. Re:When can we start Folding with it? by Xymor · · Score: 1

      E3: Kawanishi Talks Homebrew Linux PS3 Development there's also some talks on idie game development, just google PS3 + Linux

    4. Re:When can we start Folding with it? by newt0311 · · Score: 0

      what would be really good is if we could find a way to also upgrade the ram. in the console, there is unlikely to be much dedicated RAM and that would kill performance. so if we coul grab the power of the cell and increase RAM in the xbox, that would be good. Personally, I think a better alternative would be to grab one of those blade servers and turn it into a slave computer.

  26. Re:Xbox 2 is a "commodity" by vertinox · · Score: 0, Offtopic

    Wow, if nothing else the MGS4 demo has left me jaw dropped. That is some friggin high poly count. I was kind of doubtful of the PS3 thinking it would be just a Xbox 360, but that video looked awesome.

    (Although, I dunno if it is still worth the price tag though)

    --
    "I am the king of the Romans, and am superior to rules of grammar!"
    -Sigismund, Holy Roman Emperor (1368-1437)
  27. Re:Xbox 2 is a "commodity" by KitesWorld · · Score: 1

    at least you know that the PS3 is really the only next-generation video game system because nobody concerned with raw performance and power efficiency would want to use the Xbox 2 in a HPC environment.

    Not quite. What they're saying is that the Cell is better suited to parralel applications, like physics simulations, and that it is more scaleable - ie, easier to build supercomputers or distributed computing nodes from.

    However, that has no bearing upon what 'generation' the host console is - largely because a console has a pre-determined number of chips installed, and cannot be scaled without breaking it's own specification. Remember, the fact that there are exactly n cores in a console is what makes that console a stable development platform (as opposed to the PC, where performance is different on each unit).

    You *could* argue that console is using more modular technology, but that on its own doesn't tell you anything about overall performance, ease of development, stability, robustness, nor any of the other metrics that you can really apply to a console. If 'older' technology can be used to provide those same metrics in a home console, then which is better simply becomes an issue of cost. If the older gear does the same job, but is cheaper to produce, then it is the better alternative from everything but a marketing standpoint. Expandibility of the hardware in other platforms does not affect the quality of the platform in question.

  28. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 0

    "With the introduction of the SCE Worldwide Studio there has been an increased sharing effort between the internal teams. It extends far beyond Naughty Dog and Insomniac this time and I think you'll see a lot of first party titles that reap these benefits."

    Wow. Is the same SCE Worldwide Studios that brought me God of War? Dude, suite. If Naughty Dog and all other first party developers or even Insomniac are going to be reaping benefits from this sharing of manpower, there's no telling how much better Resistance: Fall of Man is going to get and blow gamers away. It could quite possibly shut Gears of War down because that chainsaw is going to get old but 8 to 32 online players in Resistance on the Cell processor is going to exude replayability. And yeah, John Carmack is dead. Doom died when Insomniac entered the FPS market with Resistance. Can't wait.

  29. The ball is in the hands of developpers. by stengah · · Score: 2, Insightful

    The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations. Altough theses scientists may be interested by any potential speed-up to their workflow, they are not willing to invest any bit of their time to translate all their codebase to asm-optimized C. Thus, the ball is in the hands of software developpers, not scientists.

    --
    I'm jack's useless sig
    1. Re:The ball is in the hands of developpers. by infolib · · Score: 3, Informative

      The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations.

      Indeed, most scientists. They also know very little about profiling but since the simulation is used only maybe a hundred times that hardly matters.

      The cases we're talking about here are where thousands of processors grind the same program (or evolved versions of it) for years as the terabytes of data roll in. Such is the situation in weather modelling, high energy physics and several other disciplines. That's not a "program" in the usual sense, but rather a "research program" occupying a whole department including everyone from "domain-knowledge" scientists down to some very long haired programmers who will not shy away from a bit of ASM. If you're a developer good at optimization and parallellism there might just be a job for you.

      --
      Any sufficiently advanced libertarian utopia is indistinguishable from government.
    2. Re:The ball is in the hands of developpers. by Surt · · Score: 1

      In the article they mentioned that they had ported several scientific kernels to cell, so presumably the porting work isn't going to be the core of the challenge. It sounds like the real work to be done will be convincing sony to make modifications to the next generation of cell processors to improve the double precision performance.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    3. Re:The ball is in the hands of developpers. by fitten · · Score: 1

      The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations. Altough theses scientists may be interested by any potential speed-up to their workflow, they are not willing to invest any bit of their time to translate all their codebase to asm-optimized C. Thus, the ball is in the hands of software developpers, not scientists.

      Isn't this the same argument as the Itanium proponents used? ...It's up to the compiler writers to make good compilers so the code runs well...

    4. Re:The ball is in the hands of developpers. by ceoyoyo · · Score: 1

      Those scientists are NOT high performance computing scientists.

      I do a bit of HPC. I wouldn't touch Matlab with a ten foot pole. Of course, I wouldn't touch Matlab with a ten foot pole for non-HPC stuff either.

    5. Re:The ball is in the hands of developpers. by Anonymous Coward · · Score: 0

      That depends greatly on the scientist. The bulk of the work that I do is in Fortran + ASM. Please note that I do in fact hate my life... ;)

  30. Ease of Programming? by MOBE2001 · · Score: 2, Interesting

    FTA: While their current analysis uses hand-optimized code on a set of small scientific kernels, the results are striking. On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors,

    The Cell processor may be faster but how easy is it to implement an optimizing development system that eliminates the need to hand-optimized the code? Is not programming productivity just as important as performance? I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind. I don't think it is a good idea to design a software model around a CPU. It is much wiser to design the CPU around an established model. In this vein, I don't see the cell as a truly revolutionary processor because, like every other processor in existence, it is optimized for the algorithmic software model. A truly innovative design would have embraced a non-algorithmic, reactive, synchronous model, thereby killing two birds with one stone: solving the current software reliability crisis while leaving other processors in dust in terms of performance. One man's opinion.

    1. Re:Ease of Programming? by adam31 · · Score: 1
      I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind.

      It's possible that this is the case, however IBM is actively working on compiler technology to abstract the complexity of an unshared memory architecture from developers whose goal isn't to squeeze the processor:

      When compiling SPE code, the compiler identifies data references in system memory that have not been optimized by using explicit DMA transfers and inserts code to invoke the software-cache mechanism before each such reference.

      So for developers who want performance, the architecture is ideal. 2 Megs of L1-speed memory, a 25 GB/s bus servicing 8 processors each with 128 128-bit registers. And for the rest, it's still a high-performance programmer-friendly development environment.

      Your point is not going unnoticed by IBM.

    2. Re:Ease of Programming? by Anonymous Coward · · Score: 0

      The reason the Cell is able to have such high performance is because it makes life difficult for programmers. It's not like IBM suddenly got way smarter at designer processors than they have been for the decades they've been building them, but just felt lazy when it came to making it programmer friendly.

      This of course means the Cell, alone, will never be mainstream. However for certain niches, such as video games and scientific computing, the Cell's tradeoffs are acceptable.

    3. Re:Ease of Programming? by Chris+Snook · · Score: 1

      I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind. I don't think it is a good idea to design a software model around a CPU. It is much wiser to design the CPU around an established model. In this vein, I don't see the cell as a truly revolutionary processor because, like every other processor in existence, it is optimized for the algorithmic software model. A truly innovative design would have embraced a non-algorithmic, reactive, synchronous model, thereby killing two birds with one stone: solving the current software reliability crisis while leaving other processors in dust in terms of performance.

      I've read this a dozen times, and can't figure out what the hell you're talking about.

      Anyway, as so many other people have pointed out, if 99% of your CPU cycles are spent doing matrix multiplication, and you can make matrix multiplication go 5 times faster with some assembly optimization, your application is now almost 5X faster, without touching 99% of your code. This really happens in scientific computation. It's the extremely friendly end of the spectrum of Amdahl's Law, and is why reusable libraries are very good.

      --
      There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
    4. Re:Ease of Programming? by zCyl · · Score: 1

      Is not programming productivity just as important as performance?

      When you're talking about scientific computations which can sometimes take a month or more to do one run, then suddenly it can become worth it to sacrifice a bit of programmer time if it can make a substantial increase in performance. If you can do a run in a week instead of a month, then that makes a huge difference in what you can investigate. Often it's not a question of just buying more machines because sometimes you need to know the answer to the last run before starting the next one.

    5. Re:Ease of Programming? by jthill · · Score: 1
      how easy is it to implement an optimizing development system that eliminates the need to hand-optimize the code?
      Not much payoff optimizing development systems for slow hardware. Cray tout the X1E as offering "Unrivalled Vector Processing and Scalability for Extreme Performance". These guys smoked one for dinner, woke up the next day, rebuilt their code from the ground up a completely different way and smoked it again for lunch.

      It took them a month to figure out how to do that, on maybe $3K worth of hardware. Think anybody wants to teach a compiler how to get close? TFP:

      Having become experienced Cell programmers, the single precision time skewed stencil -- although virtually a complete rewrite from the double precision single step version -- required only a single day to code, debug, benchmark, and attain spectacular results of over 65 Gflop/s. This implementation consists of about 450 lines, due once again to unrolling and the heavy use of intrinsics.

      I'm just a fanboi in this territory, but last I looked the guys who don't quite need to do that just use pre-tuned libraries to get a nice chunk of what's possible. Who really cares how hard it is to tune those, once?

      And when they were just doodling, not thinking hard?

      These results are conservative given the naive 1D FFT implementation we used on Cell whereas the other systems in the comparison used highly tuned FFTW or vendor-tuned FFT implementations [...] Cell performance is nearly at parity with the X1E in double precision.

      They say DP arithmetic is apparently in there as an afterthought -- it's not really necessary for game-quality 3D, after all -- and they think they know how tweak the pipeline for better than double the throughput.

      --
      IABCOT!

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
    6. Re:Ease of Programming? by Lars+T. · · Score: 1
      The Cell processor may be faster but how easy is it to implement an optimizing development system that eliminates the need to hand-optimized the code? [...] I suspect that the Cell's design is not as elegant (from a programmer's POV) as it could have been, only because it was not designed with an elegant software model in mind.

      Hunh? From a (assembler) programmer's POV we have something close to AltiVec/VMX vs. x86 and EPIC - and you ask which is easier?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  31. Re:PS3 will rule in 2008 by nayten · · Score: 0, Offtopic

    Your rant is juvenile. Check the facts-- not everyone who owns a nintendo is under 12. I didn't realise owning a game system that has more first-party titles than bloody, violent, car-robbing games and shitty budget titles constituted the age group for the console, but I digress. Your fuzzy statement of having BR, HD-DVD, and a HD doesn't offer much either. The 3D0 launched as one of the first 32-bit CD-based systems and it didn't help them. Also remember that the system launched at $799-- only $100 more than the PS3 is scheduled to. And whats up with this thing about buying these massive HDTVs? Most of us still have our good old standard television sets. Partially due to the price to upgrade, but also since it just doesn't do anything special. I highly doubt most people are looking forward to dump $700 on a new game console, then $2000 months later for a new tv to play UT2007 with. I could be wrong, though. (sarcasm) You also forget, that Nintendo is the only major console maker who doesn't LOSE money by loading up their consoles with bells and whistles, like hard drives and HD output. They also put out plenty of first-party titles. They're also the ones generating the most revenue off their products, based off what they put in. And that they should-- they've been in the business for over twenty years now. What it comes down to is, the gamers care about the games. Nintendo had plenty to deliver at E3, while Sony didn't. You really sound young, making all of these statements about console hardware. That's exactly the reason why rehashed shooter crap overflows my gaming magazines today.

  32. 'designed', nothing by Szplug · · Score: 1

    All MP machines have: communication channels, and processors. If the designers envisioned it being used a certain way and optimized it for that, well, what of it? Maybe that's how the standard game API does things but, it's still processors and communication channels. It's more than likely you can get better performance out of it by adapting your problem for it specifically, minimizing communication and keeping processors busy as much as the problem allows, same as for all other MP systems.

    --
    Someday we'll all be negroes
  33. And why Apple going Intel was so sad by Anonymous Coward · · Score: 1, Insightful

    x86, the commodity, has registers from the days when RAM was faster than the CPU (ie 8-bit days)

    The tacked on FPU, MMX, SSE SIMD stuff whilst welcome still leaves few registers for program use

    The PowerPC on the otherhand has a nice collection of regs, and as good if not better SIMD--The CELL goes a big step further

    More regs = more varibles in the CPU = higher bandwidth of calculation
    be they regular regs or SIMD regs.
      That plus the way it handles cache
    Could be a pig to program without the right kind of compiler optimizing
    Would that mean game developers using FORTRAN 95?

    1. Re:And why Apple going Intel was so sad by uarch · · Score: 1

      x86-64 has bumped the number of general registers up to 16. Sure, its still less than the 32 used by PowerPC but the performance difference between the two will be negligable for most apps. In some cases more registers win. In other cases fewer registers win. (Think about saving registers on function calls, context switches, etc)

      Besides, Cell in its current form wouldn't be a huge win in standard desktops. Its specialized for specific workloads and you wouldn't see the same performance gains across the board. In several areas which might be more common to a desktop PC you would probably see a drop in performance. This story doesn't really apply directly to Apple.

    2. Re:And why Apple going Intel was so sad by rrohbeck · · Score: 1

      Now when is AMD going to add a bunch of small slave CPUs? With limited 64-bit instruction set, few fast integer and FP execution units, with local SRAM, hooked up through DMA via HyperTransport?

      Ooh, the idea makes me drool.

    3. Re:And why Apple going Intel was so sad by Anonymous Coward · · Score: 0

      Cell has almost nothing to do with PowerPC. There is a PowerPC core in it, but the SPEs could equally well be fitted around a x86 core.

      x86-64 has 16 GPRs, PowerPC 32.

      Do you know anything about SSE registers? Does XMM ring a bell?

  34. bang, buck, effort by penguin-collective · · Score: 3, Informative

    Over the last several decades, there have been lots of parallel architectures, many significantly more innovative and powerful than Cell. If Cell succeeds, it's not because of any innovation, but because it contains fairly little innovation and therefore doesn't require people to change their code too much.

    One thing that Cell has that previous processors didn't is that the PS3 tie-in and IBM's backing may convince people that it's going to be around for a while; most previous efforts suffered from the problem that nobody wanted to invest time in adapting their code to an architecture that was not going to be around in a few years anyway.

    1. Re:bang, buck, effort by Anonymous Coward · · Score: 0

      your point about other parallel architectures is only partially valid.

      the key to the PS3's possible wealth of utility in scientific computation
      won't be due to its architecture but its price, or more accurately, its price/performance.
      Just think how much cheaper this unit will be from the effect of however many
      million gamers buying it compared to the cost of a high end desktop.

    2. Re:bang, buck, effort by m874t232 · · Score: 1

      Well, that's why the Subject says "bang, buck, effort", so, yes, I agree that bang for the buck matters.

      However, there is a problem with the PS3: the only chip that will be made in volume is the chip that goes into the PS3, and that will likely remain at its current clock frequency for a while. And that means that it will be obsolete pretty soon. Faster versions will be much smaller runs and hence much more expensive.

      So, I hope the high volume of the PS3 will help, but I wouldn't bet on it.

  35. single threaded vs multithreaded by abigsmurf · · Score: 1

    I thought the Cells performance was mediocre if you only had a single task going on at a time. Given that scientific simulations aren't real time, it doesn't need to be hugely multithreaded as it's better for each tick/frame/etc of the simulation to be done one after the other.

    1. Re:single threaded vs multithreaded by be-fan · · Score: 1

      1) Cell's performance is mediocre on typical single-threaded applications (eg: AI). Not because it has inherently bad single-threaded performance, but because most single-threaded code happens to be integer code, and the SPE's integer and branching performance sucks.

      2) Most simulations are highly parallel. There are lots of cases where you can simulate many parts of the system simultaniously, and only synchronize state at certain points.

      --
      A deep unwavering belief is a sure sign you're missing something...
  36. Re:Xbox 2 is a "commodity" by Darkfred · · Score: 2, Informative

    Did Sony pay you or did Mr. Kutaragi come over to your house and type it for you.

    Have you seriously never seen anything like this before? As a professional ps2/360/ps3 developer I have to say that I was seriously underwhelmed by this demo. Every one of the effects has been used before. THe original xbox has every effect he mentioned. And HL2 has a significantly more complex lighting system and postprocessing effects.
    The demo appears to be a single high-poly character in a texture mapped box. The demoer admits that this is a cut-scene quality model. I believe this scene could be rendered on an original xbox with similar 'visual' quality. Why not use some of those polys to make a realistic background? Black on PS2 looked better. And they couldn't even show a solid second of actual gameplay.
    I think it will be an amaxing game, but the demo was no technical achievement. It was a hurried render test for an obviously incomplete engine. Bragging about poly count when your competition can push 1.5x-3x as many is not going to win them any points either.

    Regards,

    --
    ----- 70% of all statistics are completely made up.
  37. Re:PS3 will rule in 2008 by Xymor · · Score: 0, Offtopic
    lso remember that the system launched at $799-- only $100 more than the PS3 is scheduled to
    according to Google: $799 - $599 = $200
  38. No, this is why we have subroutine libraries by golodh · · Score: 5, Interesting
    Although I agree with your point that crafting optimised assembly language routines is way beyond most users (and indeed a waste of time for all but an expert) there are certain "standard operations" that

    (a) lend themselves extremely well to optimisation

    (b) lend themselves extremely well to incorporation in subroutine libraries

    (c) tend to isolate the most compute-intensive low-level operations used in scientific computation

    SGEMM

    If you read the article, you will find (among others) a reference to a operation called "SGEMM". This stands for Single precision General Matrix Multiplication. This is the sort of routines that make up the BLAS library (Basic Linear Algebra Subprograms) (see e.g. http://www.netlib.org/blas/). High performance computation typically starts with creating optimised implementation of the BLAS routines (if necessary handcoded at assembler level), sparse-matrix equivalents of them, Fast Fourier routines, and the LAPACK library.

    ATLAS

    There is a general movement away from optimised assembly language coding for the BLAS, as embodied in the ATLAS software package (Automatically Tuned Linear Algebra Software; see e.g. http://math-atlas.sourceforge.net/). The ATLAS package provides the BLAS routines but produces fairly optimal code on any machine using nothing but ordinary compilers. How? If you run a makefile for the ATLAS package, it may take about 12 hours (depending on your computer of course; this is a typical number for a PC) or so to compile. In this time the makefile will simply run through multiple switches and for the BLAS routines and run testsuites for all its routines for varying problem sizes. And then it picks the best possible combination of switches for each routine and each problem size for the machine architecture on which it's being run. In particular it takes account of the size of caches. That's why it produces much faster subroutine libraries than those produced by simply compiling e.g. the BLAS routines with an -O3 optimisation switch thrown in.

    Specially tuned versus automatic?: MATLAB

    The question is of course: who wins? Specially tuned code or automatic optimisation? This can be illustrated with the example of the well-known MATLAB package. Perhaps you have used MATLAB on PC's, and wondered why its matrix and vector operations are so fast? That's because for Intel and AMD processors it uses a specially (vendor-optimised) subroutine library (see http://www.mathworks.com/access/helpdesk/help/tech doc/rn/r14sp1_v7_0_1_math.html) For SUN machines, it uses SUN's optimised subroutine library. For other processors (for which there are no optimised libraries) Matlab uses the ATLAS routines. Despite the great progress and portability that the ATLAS library provides, carefully optimised libraries can still beat it (see the Intel Math Kernel Library at http://www.intel.com/cd/software/products/asmo-na/ eng/266858.htm)

    Summary

    In summary:

    -large tracts of Scientific computation depend on optimised subroutine libraries

    -hand-crafted assembly-language optimisation can still outperform machine-optimised code.

    Therefore the objections that the hand-crafted routines described in the article distort the comparison or are not representative of real-world performance are invalid.

    However ... it's so expensive and difficult that you only ever want to do it if you absolutely must. For scientific computation this typically means that you only consider handcrafting "inner loop primitives" such as the BLAS routines, FFT's, SPARSEPACK routines etc. for this treatment, and that you just don't attempt to do that yourself.

    1. Re:No, this is why we have subroutine libraries by definate · · Score: 1
      Specially tuned versus automatic?: MATLAB


      I believe you're mistaking MATLAB for Matlock.
      --
      This is my footer. There are many like it, but this one is mine.
  39. Ran simulations, not code by jmichaelg · · Score: 5, Insightful
    Lest anyone think they actually ran "several scientific application kernels" on the Cell/AMD/Intel chips, what they actually did was run simulations of several different tasks such as FFT and matrix multiplication. Since they didn't actually run the code, they had to guess as to some parameters like DMA overhead. They also came up with a couple of hypothetical Cell processors that dispatched double precision instructions differently than how the Cell actually does it and present those results as well. They also said that IBM ran some prototype hardware that came within 2% of their simulation results, though they didn't say which hypothetical Cell the prototype hardware was implementing.

    By the end of the article, I was looking for their idea of a hypothetical best-case pony.

    1. Re:Ran simulations, not code by Keeper · · Score: 1

      By the end of the article, I was looking for their idea of a hypothetical best-case pony.

      That would be a sphere, right? :)

    2. Re:Ran simulations, not code by the_ed_dawg · · Score: 1
      Lest anyone think they actually ran "several scientific application kernels" on the Cell/AMD/Intel chips, what they actually did was run simulations of several different tasks such as FFT and matrix multiplication.
      Simulation makes computer architecture research possible because researchers don't have access to prototype hardware. If we insisted that all experiments run on real hardware, the only people who could possibly do research are Intel, AMD, and IBM because they have access to the fab and masks to make modifications. Worse, it would take months and tremendous financial resources to test whether an idea even works.

      Any good architecture course goes over how to properly configure simulation parameters to make a practical comparison. The guys at Berkeley spent time trying to tune those DMA numbers because they went to the trouble to make a comparison. They have some truly talented architects at Berkeley, so I'm sure they have the experience to guide their numbers.

      Of course, this is all a moot point, since the numbers are so far in Cell's favor that I doubt the DMA transfer rate would make a damn bit of difference.

      --
      There are two types of people: those prepared for the zombie apocalypse and those who will be eaten.
    3. Re:Ran simulations, not code by Sycraft-fu · · Score: 2, Insightful

      Hey it makes a real difference. There's a great quote that shows up on /. from time to time that goes along the lines of "The difference between tehory and reality is that in theory there's no difference but in reality there is."

      Researchers are very good at simulating things that have little or nothing to do with reality. It all looks good in theory according to their formulas, but they fail to take something in to account. As an example take the defunct Elbrus E2K computer chip. It was supposed to be an awesome processor that would kick the crap out of anything Intel or AMD offered. It was being designed by people with real computer experience, Elbrus made several Soviet supercomputers. Basically, the chip was to be their Elburs 3 supercomputer reimplemented on one chip.

      Everything looked good in simulations... But obviously nothing has ever come of it. The E2K never hit the market, and it and followups have been nothign but vapourware. Why? Well again, because of the difference between theory and reality. The design was all well and good on a VHDL simulator, but the hard part of chip design is not developing some powerful stuff in VHDL, it's developing powerful stuff that can be actually fabbed to a real chip.

      So as with anything like this, I reserve judgement until I see real silicon. To me this looks like people getting overly excited about something that doesn't exist yet. Yes, the Cell is good in theroy, we know that, that's not the issue. The issue is how will it really perform against other chips running real code. That we don't know, and won't know for some time. One simple issue that will have to be dealt with is compiler inefficiencies. Most sicentific code isn't written in assembly, often it's Fortran. Well, if there's one thing Intel's got it's a rockin' Fortran compiler. So even if the Cell's units are actually more pwoerful in theory, if the code it gets isn't optimized it may not matter.

      Either way, any time I hear things about what an amazing jump forward some new tech will be, I am skeptical. It just generally seems that doesn't happen. Improvements happen in small jumps, not nearly an order of magnitude of increase (which is what they are claiming with the 8x faster stat).

    4. Re:Ran simulations, not code by adam31 · · Score: 1
      Sycraft-fu, I understand your skepticism, and I think it's a unfortunate that they didn't publish physical timings. Your post has 3 main points: 1) Their simulations don't factor in something that will account for additional slow-down, 2) Their compilers aren't adapted, and that will contribute to slowdown. 3) Realistic improvements are incremental.

      1) The Cell is actually a pretty simple architecture. Once memory is transfered to SPE local store, performance is deterministic within a fraction of a %. The big question mark is the performance of the DMAC and XDR, in both bandwidth and latency. I feel like, because the paper consistently assumes 25.6 GB/s (theoretical max memory bandwidth), that will be the cause of unexpected slow-down. Achievable should be somewhere 18-24, and that will only affect operations that are memory-bound. They assume 1000 cycles of latency, which should be sufficient in any case.

      2) The fact is that their simulations were run using machine code generated from a real compiler. The language of the source code is irrelevant. More logical is to argue that other compilers have exhausted their potential, while Cell compilers are still in their youth. More straightforwardly, you can argue that a typical compiler has three main deficiencies: it doesn't appreciate the cost of spilling to the stack, it is a slave to correctness in the face of any potential pointer aliasing, and a compiler's nature is scalar processing. The three answers as far as Cell is concerned are: 128 vector registers minimize spilling, all aliasing can be hidden by the restrict keyword + local intrinsic variables, and SPEs are vector-only, with integer and FP sharing a register file. Never has an architecture been more ripe for compiler optimization.

      3) I don't know that this is more than an incremental step, at least as far as high performance computing technology is concerned. It is fundamentally different from AMD64, for example, but so in a way that addresses major concerns. 128 Registers per SPE, 25.6 GB/s bus, 256 kb L1-speed memory per processor, all at minimal power consumption... plus they can be linked together on a 35 GB/s bus. The key is that if I ask you to point to the major architectural bottleneck... could you?

      I remember many years ago, I was listening to a talk given by a Pixar tech guy. He articulated that one of the primary benchmarks they used in how to construct their renderfarm was flops per meter cubed (based on performance - heap dissipation of rack space). The Cell isn't quite revolutionary, but it will make many companies re-evaluate their high-performance needs.

    5. Re:Ran simulations, not code by Sycraft-fu · · Score: 1

      No I can't point out for sure the major bottleneck, I don't claim to be a chip engineer. However I can point out one that might not have been considered by the simulation: The registers. While tons o' registers sounds like nothing but a boon, you have to remeber that on any system you are likely to see today, you are going to be running a multi-tasking OS. Well, that of course means every time the OS switches tasks, all the registers need to be saved, so the task can resume properly when it switches back. Not a big deal if you are saving the 30 or so registers more processors have. Gets to be a little more problematic if you are talking a couple thousand, which it sounds like the Cell is. Even on a small scale, it does matter and for that reason Intel and AMD leave the vector registers used by SSE disabled unless the OS explictly turns them on so only tasks that need them save them.

      Well, I can see this being a non-trivial source of slowdown. OSes task switch a lot. Even if you aren't running anything else, it still has lots of system processes and drivers running. Every time a driver pops an interrupt, you have to push everything your program is doing on the stack so it can run, then pop it all back off.

      Now I'm not saying this will be killer, or that there aren't ways of mitigating it (perhaps the OS can just save the state of 1 SPE to use for execution, the rest can just be suspended as they are) but it's one of those kinds of things that tends to fall by the wayside in simulations.

      I just find that when people talk theoritical numbers, they often fall woefully short of the actual performance you see in the real world. I see much in the way of hyping and little in the way of actual hardware demos. Until I see the silicon running in a real environment, I remain ever the skeptic for new products, espically ones that claim to be a major leap over what's come before them. Probably because so many times in the past, I've seen it not pan out.

      Also, I think you are a little overly optimistic on memory speeds. For example the fastest desktop systems these days (which actually have faster RAM speeds than servers due to lack of error correction) on x86 get memory speends in the 5-6GB/sec range on a real computer, running a peak theoritical benchmark. For example on my system I get 4.8GB/sec, using DDR2 RAM rated to 5.3GB/sec at the speed it's running. You can get a little faster with a faster CPU and bus, but not much. That's on a bus with the theoritical max bandwidth of 10GB/sec (according to the test software at least).

      Talking about quadrupling that, well that's a hell of a feat. Where you'd even get RAM that can do that is a good question. Currently the fastest DDR2 DIMMs on the market are about 8.5GB/sec theoritical in dual channel configs and they aren't cheap. So to achieve memory numbers like you are talking about you now need much faster RAM than is on the market.

      Also please remember we are talking peak speeds here, 4.8GB/sec is what I measure running a simple RAM benchmark, not what the speed is running actual software.

      Now look, please don't think I'm trying to speak with authority here as to what the Cell's problems are. I don't. All I'm trying to point is things under appreciated by theoritical tests. There are a LOT of things to consider on real hardware that can make the best theoritical plan not work out as well as you'd hoped.

    6. Re:Ran simulations, not code by egghat · · Score: 1

      Elbrus may have "failed" because market leader Intel chose to buy them.

      Bye egghat.

      --
      -- "As a human being I claim the right to be widely inconsistent", John Peel
    7. Re:Ran simulations, not code by adam31 · · Score: 1
      The point about context switching is a good one. Not only do all the registers need to be saved, but the entire 256 kb of local store! That's a hugely non-trivial feat, but I think performance applications will be written to avoid context switches entirely.

      The RAM is XDR. The IOIF (to talk to other Cells) connection is 2 FlexIO ports. The bus itself (called the EIB ) is something like 300 GB/s. I agree that peak is never achievable, but it should be possible to get around 18 GB/s or so.

    8. Re:Ran simulations, not code by jthill · · Score: 1
      Full-system emulators are just that. They model bus contention and DRAM refresh and everything else. If anything at all shows up in the actual hardware that those emulators didn't predict, the engineers figure it out and fix it; they don't like not understanding the hardware they're building, and IBM aren't the only ones who've been doing things like this for a while now.

      The LBNL guys started with a simple model. Their model generally predicted performance within 2% of what the full emulator said. It was off by ~13% once, and that bugged them; it turned out the emulator knew about a dispatch interlock that they didn't.

      I believe their predictions will going to be dead on the mark.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
    9. Re:Ran simulations, not code by Sycraft-fu · · Score: 1

      The problem I see with the "let's just not context switch" idea is how do you do that, barring using the chip is a dedicated DSP? If you want to use it as a CPU, it's going to context switch. A lot. That's just how it works on a modern OS. If nothing else, the kernel wants to check on things perodicly. I don't know how often most OSes reenter their kernel, but I'd bet it's multiple times per second. Then of course there's the hardware. Every time the hardware needs attention, which is again multiple times per second I'm sure, it'll fire an interrupt and you have to switch to it and execute it's code to deal with whatever it needs done.

      Just sounds like a major potential slowdown. Now maybe you don't use it as a CPU for this reason, you have it as an addin card. Ok, fair enough, but then comparing it to one of the chips being used as a CPU is a little disengenious.

      As for the RAM, we'll see. Forgive me if I'm skeptical of Rambus's offerings but having seen their under delievered first go at desktops. Against stuff that was good in theory, but just failed to pan out.

    10. Re:Ran simulations, not code by ivan256 · · Score: 1

      However I can point out one that might not have been considered by the simulation: The registers. While tons o' registers sounds like nothing but a boon, you have to remeber that on any system you are likely to see today, you are going to be running a multi-tasking OS. Well, that of course means every time the OS switches tasks, all the registers need to be saved, so the task can resume properly when it switches back. Not a big deal if you are saving the 30 or so registers more processors have. Gets to be a little more problematic if you are talking a couple thousand, which it sounds like the Cell is.

      When running HPTC tasks, processing units are reserved for exactly the reason you describe. Preemptive multi-tasking (if it's done at all, which isn't a given) is only done on a subset of the compute units (this may be a cluster node, a CPU, etc...) while the others are free to run the CPU bound task without worry of context switches. This is also the way the Cell architecture is intended to be used, which is why they can get away with having so many registers.

      Incidentally, even in general purpose computing on modern operating systems, it isn't uncommon only to save a subset of the registers depending on what kind of context change is occuring for performance reasons.

      Also, I think you are a little overly optimistic on memory speeds. For example the fastest desktop systems these days (which actually have faster RAM speeds than servers due to lack of error correction) on x86 get memory speends in the 5-6GB/sec range on a real computer, running a peak theoritical benchmark. For example on my system I get 4.8GB/sec, using DDR2 RAM rated to 5.3GB/sec at the speed it's running. You can get a little faster with a faster CPU and bus, but not much. That's on a bus with the theoritical max bandwidth of 10GB/sec (according to the test software at least).

      Talking about quadrupling that, well that's a hell of a feat. Where you'd even get RAM that can do that is a good question. Currently the fastest DDR2 DIMMs on the market are about 8.5GB/sec theoritical in dual channel configs and they aren't cheap. So to achieve memory numbers like you are talking about you now need much faster RAM than is on the market.


      RAM performance is largely a function of how much money you're willing to spend. The memory in commodity servers and your desktop computer is slow because it is designed as much for the low transistor count as it is for the speed. There is already MUCH faster memory in your desktop computer, but only a very small amount because of how much die space it takes up. The DRAM you use as main system memory is very cost efficient because bits are stored with a single transistor and a capacitor. This makes it slow, however, because the charge in the capacitor has to be refreshed, and the bits cannot be accessed while this is occuring. There are other alternatives, however. SRAM uses 6 transistors, and thus is signifigantly more expensive, but can be had at speeds in the hundreds of gigabits per second in multi-channel configurations.

    11. Re:Ran simulations, not code by Anonymous Coward · · Score: 0

      OK -- so this part of the thread seems to have become mired in uninformed babble. So... some facts:
      1) The SPEs are optimized for streaming algorithms and for "synergistic" processing. Context switching is therefore not something worth discussing for the most part. For streaming algorithms, the assumption is that the code is small and the data is large (and DMA'ed in/out). For synergistic processing, the OS runs on the PowerPC PPE, which is perfectly capable of running any OS with good performance (like in all of those great big IBM servers), while the SPEs use run-to-completion on processing blocks of interest.

      2) SPEs are not like SSE. SSE is a vector unit. The "equivalent" is the VMX/AltiVec unit on the PPE. (I use quotes because, as another poster said, we all know that AltiVec is superior.) SPEs run full threads while vector units are slaves to the main processing threads.

      3) Cell was designed for broadband applications. Games are one such. The concern in such applications is balancing the processing power with the memory bandwidth... keeping the processors fed. In commodity processors, the memory bandwidth is frequently the bottleneck, fixed with ever-larger caches, OOO execution, branch prediction, deeper pipelines, and more tricks. These tricks cost silicon area = both power **and** cost. Instead, Cell adopted a very high speed memory interface. Yes, theoretical peak is theoretical, but several demos on Cell have demonstrated memory IO several times what you can get with a commodity processor.

      4) The simulator is available to all at http://www.ibm.com/developerworks/power/cell. Run it yourself. According to the discussion forum there, the sim is cycle accurate for the SPEs. It is correct that memory interactions **may** reduce the performance from the sim since the sim does not simulate that accurately.

      The more accurate way to think of Cell is to think of it as a PowerPC processor and 8 DSPs hooked together with very fast memory and using the same memory space. Now THAT is something.

    12. Re:Ran simulations, not code by Bert64 · · Score: 1

      Well, actually on highend servers the memory will still be faster overall due to a number of things:

      Interleaving
      NUMA (one memory controller per cpu)
      Wider memory bus width

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
  40. 14 times slower vs 8 times faster by Kell_pt · · Score: 1
    On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance.

    So, that means that the cell in it's current design is 14/8= 1.75x times slower for double precision than an Opteron/Itanium is for single precision. I searched around byt couldn't find a good answer on what is the ratio between an Opteron/Itanium single and double power precision performances? If it's actually just 50% slower (as I think it is) then the cell is still slower (currently 75%).

    So, anyone knows for sure what is the ratio between an Opteron/Itanium single and double power precision performances?
    --
    "I don't mind God, it's his fan club I can't stand!" E8
    1. Re:14 times slower vs 8 times faster by be-fan · · Score: 1

      The Opteron/Itanium's SP/DV performance is about the same.

      And you misread the statement. It said that Cell was 8 times faster than Opteron in DP.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:14 times slower vs 8 times faster by Kell_pt · · Score: 1

      Aye, seems I misunderstood, thanks. That "despite" word in there makes a difference. :)
      Still, it would seem that Cell is 1.75x (14/8) times slower for double precision (although on average it's 8x times faster (which makes sense, because its single precision speed is enough to raise the average).

      --
      "I don't mind God, it's his fan club I can't stand!" E8
  41. Benchmark by roadrouter · · Score: 1
    I don't understand how they can compare the new Cell with a amd64 or an Itanuim and be so happy.

    Cell have 8 vector processor and something like a ppc to "control" all of them, it's done specially for FP operations. It's like a comparation of a GPU with a CPU, it haven't got so much sense.

  42. marketing by prurientknave · · Score: 1

    Check if this was sponsored by the same marketing team that was running ads that kept peddling the lackluster g4 as a supercomputer on the national watchlist.

  43. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 0

    Did Bill Gates pay you to type that up? Or are you just a Microsoft fanboy? You assumed that the MGS4 trailer was pre-rendered cutscene, that obviously shows that you have little knowledge of the PlayStation and MGS. MGS has NEVER used pre-rendered cutscenes. The original Xbox CANNOT produce similar quality as MGS4. Snake's hair alone would cause the original Xbox to be at its limitation.

    Finally, where did you hear that the Xbox360 can push 3x more polygons than the PS3? Your ass? You are NOT a developer, and it is obvious from your lack of knowledge in the subject.

  44. Femlab? by colinrichardday · · Score: 1

    Did you mean Fermilab, or am I not keeping up with scientific progress? :-)

  45. Ignore everything important? by Duncan3 · · Score: 2, Interesting

    I love how they manage to completely ignore all the other vector-type architectures already in the market, and just compare it to Intel/AMD which are not even designed for floating point performance.

    Scream "my computer beats your abacus" all you want.

    But then it is from Berkeley, so that's normal. ;)

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:Ignore everything important? by jthill · · Score: 1

      I have to wonder whether the poster, the modder or both are actively committing slashdot self-parody, because this is just screamingly funny.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
  46. Re:PS3 will rule in 2008 by FatherOfONe · · Score: 0, Offtopic

    You are correct not every Nintenod owner is under 12. Most are. A very very large percentage of their current market and future demographics are targeted for just that audience.

    HDTV, Blu-Ray, and a HD can and probably will add to the overall fun of a system. Significantly faster load time, better textures, and downloadable content just to name a few things.

    "Most of us still have..." Yep, and you are probably not in the target market for a PS3... this year. Now what percentage of new TV's being sold are not HD ready? What percentage have been ready for the last 3 years? My point is that when 1080P TV's drop in price, and it won't take long... using a PS3 as a computer isn't out of the question.

    You are correct about Nintendo not loosing money. Great for them, but bad for Nintendo buyers. They have to pay "full" price for a system that PS3 owners get a discount for.

    Nintendo AND Sony are planning on 15 to 20 launch titles. Not bad for either console. You are also correct in saying Nintendo has been around for a while... So was Atari.... I wish them well.

    --
    The more I learn about science, the more my faith in God increases.
  47. You can buy CELL systems right now. by Anonymous Coward · · Score: 0

    Here you go. Enjoy.

    http://www.mc.com/cell/

  48. I am buying a psp3 as soon as they are available. by Anonymous Coward · · Score: 0

    I paid $1800 for the 1080p monitor to use with the PS3. I am so future proofed it isn't funny. :D

    The sad thing is that I don't game. I want to learn how to program the processors in this machine. I can picture a stack of 10 of these game consoles just pumping out rendered frames of animation.

  49. not a fair comparison by MonaLisa · · Score: 2, Insightful

    The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.

  50. Re:Xbox 2 is a "commodity" by Anonymous Coward · · Score: 0

    wait...did someone mention a Beowulf cluster in a serious context? what is slashdot coming to?

  51. Re:I am buying a psp3 as soon as they are availabl by thejam · · Score: 1

    What are you running your renderer on now? Or is this power lust? You'll pay a heavy price, especially in your time. I regret doing this myself.

  52. Re:Xbox 2 is a "commodity" by Darkfred · · Score: 1

    I will try to clear up a little of your confusion.

    > You assumed that the MGS4 trailer was pre-rendered cutscene,
    > that obviously shows that you have little knowledge of the PlayStation
    > and MGS. MGS has NEVER used pre-rendered cutscenes.. blah blah blah

    I never said it was prerendered. You simply misunderstood they way these things work. In-game cut scenes use different models than the regular game. That is because the artists need more detailed control of the animations. They can be much more complex because artists can focus on the elements used in that specific cut-scene.
    Therefor even when rendered in-game cutscenes are a bad estimate of actual gameplay experience. This is why you so often see xbox cutscenes in commercials rather than actual gameplay. Sure it is rendered real time but it will always look the best possible quality.

    > The original Xbox CANNOT produce similar quality as MGS4. Snake's
    > hair alone would cause the original Xbox to be at its limitation.

    60,000 polys for hair alone! My GOD call the nobel prize committee! Even if you wanted to waste this many polys on something that could be done with similar quality and 5k polys. What is so spectacular about 60k? They XBOX could do this at its native resolution withou too much difficulty, its not an impressive number, even the ps2 could do it, although you would only be able to render hair and nothing else.

    > Finally, where did you hear that the Xbox360 can push 3x more polygons
    > than the PS3? Your ass? You are NOT a developer, and it is obvious from
    > your lack of knowledge in the subject.

    Well I didn't say 3x. It depends on what you are rendering. But the simplest limitation is the clock speed and the number of pipelines. I am not saying PS3 is worst, since it can do a lot more shader ops per second (3x as many). But it can only do them on half as many polys at a lower clock speed. This is all academic anyway since total performance is a combination of many things. But I deal with 400k poly models every day, and I just wasn't impressed by the demo.

    --
    ----- 70% of all statistics are completely made up.
  53. Isn't the biggest advantage... by Anonymous Coward · · Score: 0

    that the Cell will be "cheap" compared to other supercomputers? (yeah, I know I can't really compare a chip with a supercomputer, but you get the idea). Not to mention more energy efficient (which in the case of HPC also saves money to a point where it's significant, right?).

    And when I say cheap, I mean: "so much cheaper we can use the savings to hire a real programmer to do the optimizing for us, and then buy some extra processing power on top of that! And throw a party!"

    slightly offtopic: Half a year ago I interviewed Peter Hofstee, chief architect of the Cell, for the student union for physics, mathematics and informatics of the university of Groningen, the Netherlands (he studied at our university and did quite a few things for our student union). The interview was actually going to be about him and his career after his study, but when I got to the point of "what are you doing now?", it automatically centered on the Cell chip as he just finished designing it. Most of the jargon was lost on me (studying physics myself), though he sounded very enthousiastic and convincing. However, it appears someone who does understand what he's talking about had an almost identical interview recently. The advantage I mentioned above is something I quite clearly remember from that interview. DodgeK

  54. Re:Xbox 2 is a "commodity" by CronoCloud · · Score: 1

    The Emotion Engine was the future of HPC, the Cell is simply an extension of ideas and concepts tested out with the EE.

  55. You are SO wrong by Memnos · · Score: 1

    If the chip runs fast with some hand-optimization, then it will get done. Just follow the money. Sheesh!

    --
    I don't trust atoms -- they make up stuff.
  56. We really did run code for Stencil and SpMV by SWWilliams · · Score: 1
    The work for the paper was actually started a year ago, and the paper was finalized 6 months ago. During that period IBM began to release their matrix multiplication and FFT results. It seemed wasteful for us to duplicate their work, so we stopped those at the performance model.

    However, the stencil code and SpMV kernels were actually coded up and simulated for the paper. They were then run (exact same code) on real hardware (a 2.1GHz prototype machine) and those results were presented at the EDGE workshop last week. The hardware performance was pretty close to the simulator (the more computationally bound the kernel, the more accurate the simulator)

  57. You don't consider Cray's X1E a vector processor? by SWWilliams · · Score: 1

    The X1E MSP is certainly a vector processor, and we ran the same kernels on it and presented them in the paper. It would certainly not be considered a commodity processor though. We wanted a nice sample set of architectures: superscalar, VLIW, and vector.

  58. That's why F0rtran really doesn't matter here by billstewart · · Score: 1
    There's a lot of scientific programming that's complex, but a lot of it really involves doing lots of setup and transformation twiddling that hands big chunks of data to a standard package like a matrix multiplier or a Fourier Transformer or Linear Programmer etc. that really burns most of the CPU cycles. Or maybe you're doing graphics and it's a ray tracer / shader / lighter / etc., but you've still got one side of your program that's harder-to-parallelize complexity and another that's just raw standard number crunching.

    So if somebody writes a couple of dozen standard routines that crank the number-crunching part of the Cell processor well, and there's a halfway-adequate compiler for the conventional-processing side, you can still get a big win from a small budget.

    I did a lot of scientific-style programming on VAXes in the early-mid 80s, and my iPod Shuffle has more CPU, more disk-equivalent, faster I/O bus, and probably more RAM (? not sure, but all the non-shuffle versions do.) Our applications sped up by 2 orders of magnitude once we could get enough RAM :-)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks