IBM Releases Cell SDK

← Back to Stories (view on slashdot.org)

Posted by Zonk on Thursday November 10, 2005 @04:27AM from the toys-while-waiting-for-the-next-gen-consoles dept.

derek_farn writes "IBM has released an SDK running under Fedora core 4 for the Cell Broadband Engine (CBE) Processor. The software includes many gnu tools, but the underlying compiler does not appear to be gnu based. For those keen to start running programs before they get their hands on actual hardware a full system simulator is available. The minimum system requirement specification has obviously not been written by the marketing department: 'Processor - x86 or x86-64; anything under 2GHz or so will be slow to the point of being unusable.'"

207 comments

Well . . . by Yocto+Yotta · 2005-11-10 04:28 · Score: 2, Funny

But does it run Linux?

Oh. Well, okay then.

--
A B A C A B B
1. Re:Well . . . by Kasracer · 2005-11-10 04:46 · Score: 1
  
  Next question.... does it run OSX x86?
2. Re:Well . . . by StevoJ · 2005-11-10 05:10 · Score: 1
  
  I know that one! I know that one! No.
  
  --
  That didn't really make sense. But I'm going to post it anyway.
Re:Well . . .Next question by Nom+du+Keyboard · 2005-11-10 04:30 · Score: 1

But does it run Linux?
Well, we know the answer to that. Next we want to know, will it kill Intel?

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Wikipedia article question by goofyheadedpunk · 2005-11-10 04:34 · Score: 2, Insightful

Not knowing too much about the cell processor I read the wikipedia article. I came across this: "In other ways the Cell resembles a modern desktop computer on a single chip."

Why?

--

What if the entire Universe were a chrooted environment with everything symlinked from the host?
1. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 04:39 · Score: 0
  
  cell is gonna be the biggest non-event, just you wait and see.
2. Re:Wikipedia article question by pkvon · 2005-11-10 04:40 · Score: 0
  
  Because the author of the wikipedia article doesnt know better.
  
  The CELL is not made for general purpose computing.
3. Re:Wikipedia article question by Surt · 2005-11-10 04:40 · Score: 1
  
  Because they are offering audio, video, networking on the same chip as the general purpose processing.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
4. Re:Wikipedia article question by AKAImBatman · 2005-11-10 04:43 · Score: 4, Insightful
  
  Um. That's kind of a weird statement. I think they mean to say that it encompasses much of the multiprocessing capabilities of a modern PC in a single chip. i.e. It's your CPU and GPU rolled into one.
  
  Cell processors aren't really anything all that new per say. The multi-core design makes them superficially similar to GPUs (which are also vector processors) with the difference that GPUs use multiple pipelines for parallel processing whereas each cell is a self-contained pipeline capable of true multi-threaded execution. In theory, the interplay between these chips could accelerate a lot of the work currently done through a combination of software and hardware. e.g. All the work that graphics drivers do to process OpenGL commands into vector instructions could be done on one or two cells, thus allowing those cells to feed the other cells with data.
  
  I guess you could say that the cell processor is the start of a general purpose vector processing design. I'm not really sure if it will take off, but unbroken thoroughput on these things is just incredible.
  
  --
  Javascript + Nintendo DSi = DSiCade
5. Re:Wikipedia article question by l33t-gu3lph1t3 · 2005-11-10 04:45 · Score: 4, Insightful
  
  Easy answer - the wiki article on "Cell" isn't that good. Cell isn't a System-On-A-Chip. It's just a stripped-down, in-order power pc core coupled to 8 single-purpose in-order SIMD units, using an unconventional cache/local memory architecture. It can run perfectly optimized code very, very fast, at extremely low power consumption to boot, but optimization will be/is a bitch. For instance, you have to unroll your "for" loops to start, since those SIMD co-processors can't do loops.
  
  I'm sure IBM and Sony have much better documentation on the CPU than I do, but that's it in a nutshell. Everything else you hear about it is just marketing. Oh yeah, almost forgot. Microsoft's "Xenon" processor for the Xbox360 is pretty much just 3 of those stripped down, in-order PPC cores in one cpu die.
  
  --
  ------- "From bored to fanboy in 3.8 asian girls" ----------
6. Re:Wikipedia article question by stienman · 2005-11-10 04:52 · Score: 1, Informative
  
  The Cell processor is essentially a multi-core chip. It has, IIRC, one "master" CPU, and then multiple slave CPUs on the same die.
  
  A modern desktop computer has one master CPU, then several smaller CPUs each running their own software. Graphics, Sound, CD/DVD, HD, not to mention all the CPUs in all the peripherals.
  
  But the analogy ends there. The Cell has certian limitations and wouldn't be able to operate as a full computer system with no other processors very efficiently. I believe the PS3 has a seperate GPU, for instance. And doubtless has many other microcontrollers managing the rest of the system.
  
  -Adam
7. Re:Wikipedia article question by AKAImBatman · 2005-11-10 04:55 · Score: 2, Interesting
  
  Cell isn't a System-On-A-Chip. It's just a stripped-down, in-order power pc core coupled to 8 single-purpose in-order SIMD units, using an unconventional cache/local memory architecture
  
  You know, I'm looking back at all these replies to the poor guy, and I can't help but think that he's sitting in front of his computer wondering, "Can't anyone explain it in ENGLISH?!?" :-P
  
  For instance, you have to unroll your "for" loops to start, since those SIMD co-processors can't do loops.
  
  Actually, we need a new programming model. Instead of using FOR loops, we need a model under while you can say, "Perform these instructions X number of times." One could probably do a bit of guess-work in the compiler based on loops like "for(i=0;i<COUNT;i++)", but that doesn't help cases where the loop uses a more complex conditional statement (or where the test is affected by the loop itself). Thus the language needs to be changed to force the programmer to pre-compute the loop length for maximum performance. For example:
  int i = 0; do(COUNT) { /*code goes here */ i++; }
  
  --
  Javascript + Nintendo DSi = DSiCade
8. Re:Wikipedia article question by plalonde2 · 2005-11-10 04:55 · Score: 4, Informative
  
  You are wrong. These SIMD processors do loops just fine. There's a hefty hit for a mis-predicted branch, but the branch hint instruction works wonders for loops.
  The reason you want to unroll loops is because of various other delays. If it takes 7 cycles to load from the local store to a register, you want to throw a few more operations in there to fill the stall slots. Unrolling can provide those operations, as well as reduce the relative importance of branch overheads.
9. Re:Wikipedia article question by goofyheadedpunk · 2005-11-10 04:58 · Score: 0
  
  Thanks.
  
  --
  
  What if the entire Universe were a chrooted environment with everything symlinked from the host?
10. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 05:05 · Score: 0
  
  (dotimes i (code-goes-here))
11. Re:Wikipedia article question by Jellybob · 2005-11-10 05:05 · Score: 2, Funny
  
  Looks like Ruby to me, although it's a little to verbose ;)
  
  0..9 { |i| puts i }
12. Re:Wikipedia article question by AKAImBatman · 2005-11-10 05:11 · Score: 1
  
  You are wrong. These SIMD processors do loops just fine. There's a hefty hit for a mis-predicted branch, but the branch hint instruction works wonders for loops.
  
  Um... I'm not sure that's what he's trying to say. SIMD by definition is Single Instruction, Multiple Data. i.e. You give it a couple of instructions and watch it perform them on every item in the stream of data. By definition, a loop is an iteration over each instruction, multiple times. a.k.a. Multiple Instruction Multiple Data (MIMD).
  
  What's needed to take full advantage of SIMD is instructions that can be bundled together into one long stream. That way the entire stream can be processed without using a loop. In theory, this is the fastest possible way to process data. Especially with high-latency, high-bandwidth memory. Unfortunately, our programming models aren't designed around such concepts, leaving us relying on optimizing compilers and handcoded assembly.
  
  --
  Javascript + Nintendo DSi = DSiCade
13. Re:Wikipedia article question by AKAImBatman · 2005-11-10 05:14 · Score: 1
  
  Excellent! Now all we need are SIMD optimized LISP Compilers.
  
  (Must (resist (temptation (to (joke (about (syntax))))))) :-P
  
  --
  Javascript + Nintendo DSi = DSiCade
14. Re:Wikipedia article question by morgan_greywolf · 2005-11-10 05:15 · Score: 1, Informative
  
  That looks more like syntactic sugar to me. How is that different? More importantly, how would that translate differently into assembler code? You pretty much will wind up with the same thing, that is: "do your thang, increment the accumulator, if the accumulator equals the count, jump to do your thang."
  
  gcc and other compilers have options such as -funroll-loops, which will unroll loops (no matter how they were specified) for you if the count can be determined at compile time. So you wind up with "Do your thang, do your thang, do your thang, do your thang ... Do your thang". You get the idea.
  
  --
  My blog
15. Re:Wikipedia article question by thexgodfather · 2005-11-10 05:20 · Score: 1
  
  I'm still waiting explination in english -_- Looks like you changed a for loop to a do loop and the other guy said it can't DO loops... frak it just give me a bowl of fruit loops =9
16. Re:Wikipedia article question by jcnnghm · 2005-11-10 05:26 · Score: 1
  
  mov ecx, COUNT
  LOOP_START:
  ;IIRC this is the underlying assembly
  ;construct for looping
  ;
  ;excluding conditional jumps
  LOOP LOOP_START
  
  --
  You don't make the poor richer by making the rich poorer. - Winston Churchill
17. Re:Wikipedia article question by ashSlash · 2005-11-10 05:27 · Score: 1, Informative
  
  It's "per se".
18. Re:Wikipedia article question by AKAImBatman · 2005-11-10 05:29 · Score: 1
  
  a.k.a. Multiple Instruction Multiple Data (MIMD).
  
  Minor correction. That's supposed to be Single Instruction, Single Data. (SISD) My bad.
  
  --
  Javascript + Nintendo DSi = DSiCade
19. Re:Wikipedia article question by MatD · 2005-11-10 05:29 · Score: 1
  
  In most cases, I think template metaprogramming (in C++) is pedantic garbage. In this case however, you could probably use it to great effect (ie, the compiler will unroll your loops for you). The syntax is still pretty horrible though.
  
  --
  Since when did operating systems become a religion?
20. Re:Wikipedia article question by pingveno · 2005-11-10 05:30 · Score: 1
  
  explination
  
  frak
  
  It looks like your not so hot on the English usage yourself. :P
  
  P.S. Just kidding, I've seen worse. Fast typing leads nasty spelling.
  
  --
  "it's not about aptitude, it's the way you're viewed" - Galinda
21. Re:Wikipedia article question by tomstdenis · 2005-11-10 05:31 · Score: 2, Informative
  
  GCC can unroll all loops if you want including those with variable itteration counts. In those cases it uses a variant of duff's device. [well on x86 anyways].
  
  As for the other posters, the real reason you want to unroll loops is basically to avoid the cost of managing the loop, e.g.
  
  a simple loop like
  
  for (a = i = 0; i b; i++) a += data[i];
  
  In x86 would amount to
  
  mov ecx,b
  loop:
  add eax,[ebx]
  add ebx,4
  dec ecx
  jnz loop
  
  So you have a 50% efficiency at best. Now if you unroll it to
  
  mov ecx,b
  shr ecx,1
  loop:
  add eax,[ebx]
  add eax,[ebx+4]
  add ebx,8
  dec ecx
  jnz loop
  
  You now have 5 instructions for two itterations. That's down from 8 you would have before, and so on, e.g.
  
  mov ecx,b
  shr ecx,2
  loop:
  add eax,[ebx]
  add eax,[ebx+4]
  add eax,[ebx+8]
  add eax,[ebx+12]
  add ebx,16
  dec ecx
  jnz loop
  
  Does 7 opcodes for 4 itterations [down from the 16 required previously, e.g. 100% more efficient].
  
  Tom
  
  --
  Someday, I'll have a real sig.
22. Re:Wikipedia article question by pdbogen · 2005-11-10 05:37 · Score: 1
  
  Reportedly, the SIMD processors can't do loops. Okay, this probably just means they can't Branch. A loop in assembly basically looks like:
  
  loop: /* do some stuff */
  branch to loop
  
  However, you can "unroll" loops. If you have a loop that always runs 8 times, instead of doing a for loop you can just put the statement there 8 times. It makes the code larger in memory, but it saves processing time since you don't have to check exit conditions or jump around. This would be something done by the compiler, so OP's point that programming on these things is harder because they require unrolled loops is kind of bunk. It might still be necessary to make your loops more unrollable than otherwise.
23. Re:Wikipedia article question by AKAImBatman · 2005-11-10 05:37 · Score: 2, Interesting
  
  mov ecx,b
  shr ecx,2
  loop:
  add eax,[ebx]
  add eax,[ebx+4]
  add eax,[ebx+8]
  add eax,[ebx+12]
  add ebx,16
  dec ecx
  jnz loop
  
  With SIMD instructions, you can execute all four of those adds in one instruction. I wish I knew SSE a bit better, then I could rewrite the above. Sadly, I haven't gotten around to learning the precise syntax. :-(
  
  However, there's a fairly good (if not a bit dated) explanation of SIMD here.
  
  --
  Javascript + Nintendo DSi = DSiCade
24. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 05:38 · Score: 0
  
  The do(COUNT)-construct's inner loop mustn't depend on any other iteration (ie, one iteration may not use results from a previous one).
  So "i++" won't do...
  
  do(COUNT)(int i) {
  /* code goes here, i will be in the range 0...COUNT-1 */
  }
  
  The syntax could of course be improved...
25. Re:Wikipedia article question by Doug-W · 2005-11-10 05:52 · Score: 1
  
  You can do loops on the SPE it's just that the lack of branch prediction and it's dual pipeline nature means that you're better off unrolling them for speed.
26. Re:Wikipedia article question by Jerry+Coffin · 2005-11-10 05:59 · Score: 1
  
  I suspect the author of the Wikipiedia article knows a bit more than he's being given credit for elsethread.
  Each cell processor includes not only the multiple processors mentioned elsethread, but addressable memory, DMA controller, and a controller for what is essentially a proprietary network. The last is somewhat open to argument -- for example, current AMD CPUs include HyperTransport controllers, which are somewhat similar.
  In any case, IBM does (e.g. here) talk about the Cell as a System on a Chip, though IMO, this is a stretch -- a PS3 system includes quite a few other chips, some of them pretty significant (e.g. the GPU). In fact, I find it somewhat difficult to contemplate a system that would make good use of a Cell without a pretty fair number of other chips. OTOH, as the Wikipedia article suggests, it does include a number of elements that are normally implemented in separate chips on a PC.
  --
  The universe is a figment of its own imagination.
  
  --
  The universe is a figment of its own imagination.
27. Re:Wikipedia article question by AKAImBatman · 2005-11-10 06:00 · Score: 1
  
  Thanks. That's one of those I keep getting wrong. Keep reminding me and I'll remember at some point. :-)
  
  --
  Javascript + Nintendo DSi = DSiCade
28. Re:Wikipedia article question by BarryNorton · 2005-11-10 06:06 · Score: 1
  
  This would be something done by the compiler[...] It might still be necessary to make your loops more unrollable than otherwise
  Perhaps something like writing in tail recursive style to help out an optimising compiler?...
29. Re:Wikipedia article question by plalonde2 · 2005-11-10 06:06 · Score: 1
  
  The SIMD in question here is the Altivec/SSD style also called SWAR (SIMD Within A Register); the instruction set has many ops for applying the same operation over each of the 4 (or 8, or 16) elements within a 128 bit register. It's not the streaming type of SIMD.
30. Re:Wikipedia article question by farquharsoncraig · 2005-11-10 06:07 · Score: 1
  
  you mean per se. (-: I can't point to any specific rule, but I've seen it often enough to know that foreign words or phrases are italicized when used in English, and latin is no exception.
31. Re:Wikipedia article question by JWW · 2005-11-10 06:10 · Score: 1
  
  I'm pretty sure he spelled frak right.
  
  Frak is actually a made up swear word from Battlestar Galactica.
  
  It is sometimes used by the super geeky. Like er, um, er.. me.
32. Re:Wikipedia article question by tomstdenis · 2005-11-10 06:11 · Score: 1
  
  Yes, and unrolling would speed that up in the same fashion.
  
  iirc the instruction is "paddd", and you'd do four parallel adds then shuffle and add twice to get the single sum.
  
  Tom
  
  --
  Someday, I'll have a real sig.
33. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 06:23 · Score: 0
  
  Wait, if the Xbox 360 processor contains the same cores as Cell, does that mean Sony gets license revenue (the core being co-developed with IBM) for every Xbox sold? Mwahahahaha. The irony.
34. Re:Wikipedia article question by adam31 · 2005-11-10 06:25 · Score: 1
  
  You were off to a really good start! But a couple of things:
  Optimization won't be a problem. At least it won't be the main problem. The instruction set is rich enough to provide scalar and vector integer/fp/dp operations along with both conditional branching and conditional assignment. And it can be programmed in C using intrinsics for SIMD instead of assembly. That brings up the really important part-- 128 128-bit registers. Current x86 compilers suck balls at intrinsics mostly because SSE is such a register-starved instruction set. 128 registers allows lots of unrolling without any premature loads/stores from the stack.
  The main problem is memory. Or structuring a memory-flow so that everything ends up in the local store when it is needed. Many current programs are written pretending to have zero-latency access to everywhere in memory, or they follow several levels of indirection to get to the data they process. Those need to be re-thought so that memory access patterns are predictable and "the meat" gets to the store before it's needed.
  The other memory problem is that these SPEs have their best bandwidth when they talk to each other, and not when they DMA from main memory. However, it's very unclear how to leverage that bandwidth. Certainly the complexity of memory patterns that programmers will have to deal with to get maximum performance dwarfs the problems they will have in optimizing the code that processes the data.
35. Re:Wikipedia article question by Hal_Porter · 2005-11-10 06:26 · Score: 1
  
  I remember some early Risc chips that didn't have branch prediction hardware - they would simply predict a backward branch as taken, and a forward one as not.
  
  Which would be ok for most loops.
  
  http://en.wikipedia.org/wiki/Branch_prediction
  
  Incidentally, SPEs have rather short pipelines, so a mispredicted branch is not the catastophe it would be on a desktop CPU.
  
  http://www.realworldtech.com/includes/templates/ar ticles.cfm?ArticleID=RWT021005084318&mode=print
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
36. Re:Wikipedia article question by mightypenguin · 2005-11-10 06:40 · Score: 1
  
  Actually, what's really needed is a really smart AI that you can just talk to and say "Yo, computer, mess with this stuff and make it cool". But seriously, I think the trend towards more conversational :) programming is leading to a place where we will eventually just use voice commands to get the computer to do most common programming tasks. Obviously the real science related heavy math alg. stuff will still need to be programmed by hand until one of the sentient AIs decides it's time to take over.
37. Re:Wikipedia article question by taracta · 2005-11-10 06:46 · Score: 1
  
  SPUs on the CELL processor are not SIMD units but fully fledged VECTOR processors. Don't confuse the two. Whatever gave you the impression that the SPUs cannot do loops? Don't let the fact that they are IN-ORDER processors confuse you. That was a space saving issue, not an instruction limitation. Sure you would like to unroll loops just as you would like to not write code with lots of branches for a P4, a matter of efficiency for a particular processor but you don't have to. Especially if it is going to make you code unreadable.
38. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 06:48 · Score: 0
  
  Actually despite the most common claim to fame of Cell being its insane floating point power, the real power of Cell chips is the memory architecture.
  
  The most common use for SPEs will be a cascading approach to computation with each SPE double buffering data into local store. A SPE program will usually being an async transfer of data into one half of its local store and then go to work on the other half of its local store and then ping pong back and forth. And then in parallel the next SPE performs the next sequence of operations on the newely processed data in the same manner.
  
  The effect is for well written SPE code that memory latency is reduced to zero on average.
  
  The internal memory architecture of Cell chips makes writing optimally memory efficient code massively easier than on old style x86 chips. This is one of the common themes you are hearing from developers working on the PS3 when they talk about how easy it is to code for Cell systems.
39. Re:Wikipedia article question by bmh129 · 2005-11-10 07:13 · Score: 1
  
  A "For Loop" is a compiler trick. Assembly code on the RISC processors I've programmed don't include For Loops. Loops in assembly are handled with combinations of bit tests and gotos (program counter manipulation), or, in the case of For Loops, executing an instruction, incrementing a counter, and then comparing the counter to the number of times execution should take place.
  It stands to reason that a properly optimized Cell compiler would have no problem executing For Loops.
  
  Never say "Never."
40. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 07:27 · Score: 0
  
  Almost every English word is foreign. There are only about fifty words in common use that actually come from early English.
  When a foreign word (or phrase) has been used for long enough that it's accepted as English you stop italicizing it. And according to the Oxford English Dictionary, "per se" falls into this category.
41. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 07:29 · Score: 0
  
  You, Sir, are an ass.
42. Re:Wikipedia article question by niteice · 2005-11-10 07:48 · Score: 1
  
  From what I understand, the 360 actually uses a processor that's pretty close to a triple-core PowerPC 970. So no, no Cell royalties, just to IBM for the PowerPC processor. I think.
  
  --
  ROMANES EUNT DOMUS
43. Re:Wikipedia article question by hr+raattgift · 2005-11-10 07:59 · Score: 3, Informative
  
  Perhaps something like writing in tail recursive style to help out an optimising compiler?...
  
  You have this backwards. Optimizing compilers will turn tail-recursive style source into "normal" loops.
  
  You can write a loop recursively, so that:
  foo() { int x=8; int b=1; while(x > 0) { b << 1; --x; } return b; }
  becomes
  foo() { return foo-helper(10, 1); } foo-helper(int x, int b) { if(x <= 0) return b; else return foo-helper(--x, b << 1); }
  Recursion in foo-helper is in the tail position. That is, foo-helper only calls itself as the final operation before returning.
  
  Compiling this naively involves a function call per recursion, which on most architectures results in pushing data onto the stack. However, because we are doing tail-recursion, we can do a tail call elimination optimization.
  
  How this works is that the "return" before the recursion is taken to mean that any automatic variables are dead, any stack space used for the arguments is reusable, and the recursive call is really a jump.
  
  That is, when foo-helper calls itself, it really does an argument rewrite and jump, which in effect "pretends" that foo-helper was called with different arguments in the first place.
  
  In other words, tail call elimination turns recursive loops into iterative loops.
  
  Writing in "tail-recursive style" just means making sure your recursion is done in tail position (i.e., attached to a "return"). Some compilers for a variety of languages can identify recursion which is not done in the tail position, and reorder the recursion into tail position (and then the tail calls are eliminated into iterative loops). However, many compilers can't, and many more don't do tail-call elimination at all. :-(
  
  Once you've optimized recursive loops into iterative ones, you can optimize iterative loops however you like, including partially or fully unrolling them.
  
  In summary, recursion is a way of looping, but function calls are not free. In particular, they usually consume stack space. If you only return the result of your recursion, then you are tail-recursing. Tail recursion can be turned into code which does not incur function-call overhead.
44. Re:Wikipedia article question by thexgodfather · 2005-11-10 08:01 · Score: 1
  
  You no speak good english? Me help you =) Spelling is over-rated and Battle Star Galactica is not!
45. Re:Wikipedia article question by BarryNorton · 2005-11-10 08:12 · Score: 1
  
  You have this backwards. Optimizing compilers will turn tail-recursive style source into "normal" loops.
  Thanks, no I don't. I said "like writing in tail recursive style" - I know what it means.
  My point is that, just like one can write recursions in a form that a compiler can turn them into something more (stack) efficient, so one might write iterations in a style that they can be unwound more easily (like using a primitive type as the counter, rather than an OO-style iterator)...
46. Re:Wikipedia article question by hr+raattgift · 2005-11-10 08:13 · Score: 2, Funny
  
  (dotimes i (code-goes-here))
  
  Ack, pfft, says the evil Schemer. This is just insipid syntactic sugar for what you really mean:
  (let loop ((i number-of-iterations)) (if (= i 0) #f ;; because CommonLisp dotimes returns NIL (begin (code-goes-here) (loop (- i 1)))))
  instead of whatever dark magic your buggy
  (dotimes (i number-of-iterations) (code-goes-here))
  ends up being mangled into by your CommonLisp compiler because it can't do a safe-for-space tail recursion.
47. Re:Wikipedia article question by BarryNorton · 2005-11-10 08:23 · Score: 1
  
  Did you perhaps misunderstand 'like' as introducing an example, rather than an analogy? ('Optimising compiler' was the clue that I was talking about a different scenario - the compiler tricks being discussed in the general context are not optimisations...)
48. Re:Wikipedia article question by adisakp · 2005-11-10 08:28 · Score: 1
  
  Not knowing too much about the cell processor I read the wikipedia article. I came across this: "In other ways the Cell resembles a modern desktop computer on a single chip."
  
  Why?
  
  Actually each of the SPU's resemble a system-on-a-chip. They each have local memory, CPU and I/O. The Cell itself actually resembles a network-on-a-chip (or in slashdotology, a Beowulf-Cluster-on-a-Chip) if you consider main memory to be I/O storage.
49. Re:Wikipedia article question by hr+raattgift · 2005-11-10 08:59 · Score: 2, Interesting
  
  Ah, OK, I had to think about this a bit... please correct me if I'm still misunderstanding you.
  
  I now think you were using a simile or making an analogy to argue that compilers can benefit from careful construction of loops in the source code.
  
  If so, then of course I agree with you.
  
  Saying this in a much more general way: careful choice of syntax can make the semantics more clear to the compiler.
  
  A high level language with "dotimes (count) { action }" syntax lets the compiler make good choices about loop unrolling and the counter's type.
  
  A language where you have to test and modify your own counter lets the writer make good or incredibly awful choices about loop unrolling and the counter's type.
  
  This version:
  foo() { double d = 1.0; int x=1; while(d > 0) { x = x << 1; d -= 0.1; } return x; }
  is semantic brain-damage on a system with very slow very IEEE doubles, and loop-unrolling this naively is not going to help.
  
  A compiler which realizes that this is a loop whose length is constant can unroll the loop fully, partially, or simply use a better/faster iterator like an integer. But should we end up with 0x400 or 0x800?
  
  Haha, now throw side-effecting at your smart compiler by
  inserting a debugging
  printf("d: %G, x: %x\n", d, x);
  into the while loop ... how should it optimize that?
  ... d: 0.2, x: 100 d: 0.1, x: 200 d: 1.38778E-16, x: 400 d: -0.1, x: 800
  Right?
  
  Anyway, I think we're not really disagreeing. You can write loops stupidly, whether they're iterative (as above) or whether they're recursive. A compiler probably can't save you if you are particularly stupid. It might even make things worse.
  
  For what it's worth, when I say your sentence to myself, I want to make the like bold, I guess to emphasize the simile.
50. Re:Wikipedia article question by hr+raattgift · 2005-11-10 09:03 · Score: 1
  
  Yes :-)
51. Re:Wikipedia article question by BarryNorton · 2005-11-10 09:10 · Score: 1
  
  For what it's worth, when I say your sentence to myself, I want to make the like bold, I guess to emphasize the simile
  Quite (hence my second post) - I couldn't work out what you thought I meant - at first I wondered if you thought I meant that iterations could be turned into recursions by the Cell compiler (i.e. the opposite to the normal optimisation, which is why I was trying to make it clear that I know what direction this happens in), then I realised you'd mistaken my analogy for an example... Rather, I was asking you for an example - i.e. I meant: some organisation of program flow (like restricting recursions to tail form), or some typing discipline, or what?...
52. Re:Wikipedia article question by BarryNorton · 2005-11-10 09:16 · Score: 1
  
  I could have been clearer, looking back :)
53. Re:Wikipedia article question by einolu · 2005-11-10 13:55 · Score: 1
  
  I thought the ps3 uses multiple cell processors?
54. Re:Wikipedia article question by ameline · 2005-11-10 15:57 · Score: 1
  
  Try; mov ecx, b shr ecx, 2 pxor xmm1, xmm1; loop_: // you can unroll this loop... movdqa xmm0, [ebx] // aligned-- use movdqu for unaligned paddd xmm1, xmm0 add ebx, 16 dec ecx jnz loop_ // now just need to do a horizontal add... // but there is no horizontal integer add instr, so... movdqa xmm0, xmm1 0123 pshufd xmm0, xmm0, 1 // xmm0 >>= 64 shift instr can only do max of 16 paddd xmm0, xmm1 movdqa xmm1, xmm0 pshufd xmm0, xmm0, 2 // xmm0 >>= 32 paddd xmm0, xmm1 // result is in the last dword of xmm1 movd result, xmm0 // I'm pretty sure those shuffle constants are correct...
  
  --
  Ian Ameline
55. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 16:12 · Score: 0
  
  those SIMD co-processors can't do loops.
  
  Think about this a bit. Every interactive program is a loop. A processor that can't do loops is worthless.
56. Re:Wikipedia article question by RogonIII · 2005-11-10 20:06 · Score: 1
  
  Unrolling on CELL is slightly more involved than on x86 machines. I took the liberty of trying to code the above C-loop in SPU assembly, and I got the following results, assuming a large byte-array: 0.1132 cycles/byte!! Beat that! Assume data is unsigned bytes, and that we have lots of data. First thing to notice is that the SPU has a SUMB instruction, that sums the bytes in a quadword. I.e.: If we have two registers R0 and R1 containing 16 bytes: R0 = [a0,b0,c0,d0; e0,f0,g0,h0; i0,j0,k0,l0; m0,n0,o0,p0], R1 = [a1,b1,c1,d1; e1,f1,g1,h1; i1,j1,k1,l1; m1,n1,o1,p1], then SUMB R2, R0, R1 gives the 8 half-words R2 = [a0+b0+c0+d0, a1+b1+c1+d1, e0+f0+g0+h0, e1+f1+g1+h1, i0+j0+k0+l0, i1+j1+k1+l1, m0+n0+o0+p0, m1+n1+o1+p1]. This instruction has throughput of 1 cycle, but a latency of 4 and is an even instruction. We then have to add the individual half-words together, R2 = [h0,...h7]. This can be done by interpreting R2 as 4 32-bit words, masking off the high and low parts of R2 and adding them into R3: R3 = (R2 >> 16) + (R2 & 0xFFFF), for each 32-bit word in R2. However, since we are even-bound, we can instead use two shuffles and an add. We can use this as a balancing technique. R3 will contain: R3 = [h0 + h1, h2 + h3, h4 + h5, h6 + h7] = [w0, w1, w2, w3]. We will accumulate all of the sums into these 4 components, and then when the loop is over, add them together to make the final tally. Notice that we could have skipped the half-word to word reduction, but we want to make sure that we can count up to more than 65535. The final loop would look something like this (w/branch hint penalty on the last loop). It reads in 256 bytes worth of data, (unrolled 8 times), and adds them together into a single vector accumulator. Since there are 29 even/odd pairs of instructions, this loop should take 29 cycles per 256 bytes, i.e., give a speed of 0.1132 cycles/byte. Of course, there are setup penalties and a penalty for missing the last branch. Still, this should show the power of the SPU. Loop: {e2} a s0, l0, h0 {o4} shufb h4, s4, s4, m_hiwords {e2} a s1, l1, h1 {o4} shufb h5, s5, s5, m_hiwords {e2} a s2, l2, h2 {o4} shufb h6, s6, s6, m_hiwords {e2} a s3, l3, h3 {o4} shufb h7, s7, s7, m_hiwords {e2} a s4, l4, h4 {o6} lqx r2, base0x20, index {e2} a s5, l5, h5 {o6} lqx r3, base0x30, index {e2} a s6, l6, h6 {o6} lqx r4, base0x40, index {e2} a s7, l7, h7 {o6} lqx r5, base0x50, index {e2} a s0, s0, s1 {o6} lqx r6, base0x60, index {e2} a s1, s2, s3 {o6} lqx r7, base0x70, index {e2} a s2, s4, s5 {o6} lqx r8, base0x80, index {e2} a s3, s6, s7 {o6} lqx r9, base0x90, index {e2} a s0, s0, s1 {o6} lqx ra, base0xa0, index {e2} a s1, s2, s3 {o6} lqx rb, base0xb0, index {e2} a acc, acc, s0 {o6} lqx rc, base0xc0, index {e4} sumb s0, r0, r1 {o6} lqx rd, base0xd0, index {e2} a
57. Re:Wikipedia article question by Anonymous Coward · 2005-11-10 20:25 · Score: 0
  
  Perhaps one could say that you swung and MISD.
58. Re:Wikipedia article question by be-fan · 2005-11-11 04:21 · Score: 1
  
  IIRC, the SPEs have 18 stage pipelines. Okay for an FP pipe, but quite long for an INT pipe.
  
  --
  A deep unwavering belief is a sure sign you're missing something...
59. Re:Wikipedia article question by farquharsoncraig · 2005-11-11 06:05 · Score: 1
  
  That is fascinating. So if only fifty words remain from "early English" what does that say about the lifetime of words? Or have there been more formal studies into the turnover of words?
60. Re:Wikipedia article question by LWATCDR · 2005-11-11 17:07 · Score: 1
  
  You don't write code do you?
  Think about it just in broad terms. Computer programing is like math. It really is best expressed visually. Think of a math class with no white board and just some one lecturing. Pretty useless. So even if you have an AI as smart or smarter than a person they will probably still want to see what your talking about.
  Not to mention that AIs as smart as a Hamster are still years or decades away.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
61. Re:Wikipedia article question by Hal_Porter · 2005-11-12 02:18 · Score: 1
  
  The Real World tech article I linked to has a rather hard to understand diagram, but it looks to me as if the effective pipeline depth fofr integer ops is really low, 2-4 for most integer stuff.
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
62. Re:Wikipedia article question by Anonymous Coward · 2005-11-12 10:38 · Score: 0
  
  It's not really about the longevity of words. Early English was nearly obliterated by the Norman invasion, but the core survived.
Is this the same Cell processor used in the PS3? by Spy+der+Mann · 2005-11-10 04:36 · Score: 1

Just to clarify.
Re:Is this the same Cell processor used in the PS3 by Spazntwich · 2005-11-10 04:39 · Score: 5, Funny

No. In our insanely litigious society, a company has graciously allowed another to create and market a different processor by the same exact name.
Re:Is this the same Cell processor used in the PS3 by Anonymous Coward · 2005-11-10 04:39 · Score: 1, Informative

Yup, it is.
Re:the inevitable question by Anonymous Coward · 2005-11-10 04:39 · Score: 0

yes. Since Linux runs on POWER and the Cell supports the PowerPC ISA linux has been able to run on Cell-based computers for a long time now.

Of course this does not mean that programs can take advantage of the Cell's odd multicore architecture. Programs would have to be specificly writen or use libraries that have been modified to use those SPU cores.
Unproductive? by RManning · 2005-11-10 04:40 · Score: 5, Funny

My favorite quote from TFA...

...in addition, the ILAR license states that "You are not authorized to use the Program for productive purposes" -- so make sure that your time spent with these downloads is as unproductive as possible.
1. Re:Unproductive? by Anonymous Coward · 2005-11-10 04:48 · Score: 0
  
  yeah i just saw that myself, and was gonna copy 'n paste. glad you beat me to it.
  
  fucking bizarre.
2. Re:Unproductive? by Kayamon · 2005-11-10 05:13 · Score: 2
  
  Sounds like my job. I don't think there'll be any problems there. :-)
  
  --
  Kayamon
3. Re:Unproductive? by lcllam · 2005-11-10 15:46 · Score: 1
  
  Right! Pr0n it is then!
Since the submitter didn't bother to explain... by frankie · 2005-11-10 04:41 · Score: 4, Informative

...the Cell processor is an upcoming PowerPC variant that will be used in the PlayStation 3. It's great at DSP but terrible at branch prediction, and would not make a very good Mac. If you want to know full tech specs, Hannibal is da man.
1. Re:Since the submitter didn't bother to explain... by imroy · 2005-11-10 04:55 · Score: 1
  
  I'm just wondering what information you have on the Cell being "terribla at branch predeiction"? I don't know about using it in a mac, but IBM seems eager to run Linux on it. They've even demonstrated a prototype cell-based blade server system running Linux, back in May.
2. Re:Since the submitter didn't bother to explain... by Anonymous Coward · 2005-11-10 05:09 · Score: 0
  
  Uhm, no, "The Cell processor" as a whole is not terrible at branchprediction. It consists of one PowerPC core, much like those used in current Macs, plus 8 so called "synergetic processors". The latter have been optimized for DSP/multimedia like algorithms, with extensize use of SIMD instructions. These processors are, however, much worse at branch prediction, and will not run general purpose code very well.
  What does all of this mean? If you're machine will be used for running, say, database applications, you will probably be better off with an SMP solution. But if your machine must run games, or video processing, or audio processing, or ..., the Cell processor would make a tremendoes CPU. In that case, all gp stuff, such as linux etc. would actually run on the PPC core, but specific multimedia threads could be run on the synergetic cores.
  
  Ruben
3. Re:Since the submitter didn't bother to explain... by nutshell42 · 2005-11-10 05:17 · Score: 1
  
  Even though the GP linked to an article that greets you with Inside the Xbox 360, Part II: the Xenon CPU there are links to some informative articles about the Cell architecture further down.
  Short story: The cool thing about the Cell are the SPEs that are the best thing since sliced bread if you have lots of matrix-vector operations to perform but more or less useless otherwise.
  IBM is eager to run Linux on it because the Cell could make one hell of a supercomputing grid. (Although it loses lots of flops if you need double-precision but even then it's quite fast and in many calculations there are large parts where you can go single-precision without losing digits in the end result anyways)
  
  --
  Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
4. Re:Since the submitter didn't bother to explain... by Anonymous Coward · 2005-11-10 05:22 · Score: 0, Interesting
  
  Where these dumb comments come from:
  
  1) Apple tries to lowball IBM on the mobile 970 design
  2) IBM give Apple the finger - they account for less the five percent of IBM's chip volume
  3) Steve goes out on stage and pretends like he has made the 'choice' to move to Intel
  4) With Cell processors in Macs no longer an option for Apple, the sour grapes meme that the idiot above parroted starts to make its rounds in Mac circles.
  5) Intel's processor roadmap fiasco continues, but what is funny is how Intel's roadmap for future chips years down the road has chip designs that look very close to STI's Cell chips that being made today.
  
  Enjoy your h.264 encoding times on those wonderful Intel SSE chips Mac crazies!
5. Re:Since the submitter didn't bother to explain... by Anonymous Coward · 2005-11-10 05:36 · Score: 0
  
  Uh, getting your console info from hannibal on ars is a lot like watching Dr. Phil for your emotional problems. Probably harmless, but certainly worthless.
  
  There is more than enough info on Cell, the PS3, other Cell based products out there from Sony and IBM talks and patents that even non-technical people can follow.
6. Re:Since the submitter didn't bother to explain... by poot_rootbeer · 2005-11-10 06:07 · Score: 1
  
  It's great at DSP but terrible at branch prediction
  
  With 8 or more semi-independent "Synergistic Processing Unit" pipelines, it doesn't really need to have a lot of complex branch prediction logic. It could adopt a bit of a quantum methodology and assign a different SPU to proceed for each possible outcome of a compare/branch instruction, and then once the correct outcome has been established, discard the "dead-end" pipelines.
  
  Then again, I learned microprocessor design principles back when the PPC 601 was state-of-the-art, so my +1 Insightfulness may vary.
7. Re:Since the submitter didn't bother to explain... by Glock27 · 2005-11-10 06:09 · Score: 1
  
  3) Steve goes out on stage and pretends like he has made the 'choice' to move to Intel
  You're bitter about something. Care to share?
  Steve most certainly made a decision to go Intel. No "pretending" involved. Just what dollar value to you ascribe to "5% of IBM's chip volume", BTW?
  4) With Cell processors in Macs no longer an option for Apple, the sour grapes meme that the idiot above parroted starts to make its rounds in Mac circles.
  Cell wouldn't be that great for it's clock speed, but it would certainly work. I'm pretty sure Pentium-M and descendants will beat it for GP computation, and without learning an involved new programming approach.
  5) Intel's processor roadmap fiasco continues, but what is funny is how Intel's roadmap for future chips years down the road has chip designs that look very close to STI's Cell chips that being made today.
  Really, care to provide a link? What are you claiming corresponds to SPEs in Intel's designs? I've heard nothing about this.
  
  --
  Galileo: "The Earth revolves around the Sun!"
  Score: -1 100% Flamebait
8. Re:Since the submitter didn't bother to explain... by MaestroSartori · 2005-11-10 09:17 · Score: 1
  
  That's not entirely true. The PPE can do branch prediction, the SPEs can't. Whether the PPE's branch prediction is any good or not, I don't know... :)
  
  --
  Game dev and music blog
9. Re:Since the submitter didn't bother to explain... by raftpeople · 2005-11-10 09:43 · Score: 1
  
  Not "useless otherwise."
  
  I am working on a project that is cpu bound with mostly non-"matrix-vector" math, but is highly "parallelizable" (I know, it's probably not a word). I'm looking forward to the raw speed (of instructions and memory xfer) and multi-processing from the cell.
10. Re:Since the submitter didn't bother to explain... by Hast · 2005-11-15 23:12 · Score: 1
  
  I don't really think that idea would work. It could work for a super scalar processor like the Itanium where they do this (but on the same die). But in that case you already have the data loaded locally and it's a lot easier to roll back the changes when one branch is rejected. I'm sure that just rejecting and rolling back changes with multiple SPEs would kill any benefit from running multiple branches in different SPEs.
  
  They just hope that the compiler will fix as much as possible for them. And other than that they hope that other problems will be fixed by brute force.
  
  It's a really interesting design though, it will be interesting to see what will be done with it.
Re:Is this the same Cell processor used in the PS3 by Anonymous Coward · 2005-11-10 04:44 · Score: 0

No. In our insanely litigious society, a company has graciously allowed another to create and market a different processor by the same exact name.

nice smart a** post.

Think about his question this way, "is the PowerPC that is running in the XBox the same as the one used in my Mac G3?". The obvious answer (no, it's not yours) is "kinda". They are obviously related, though, in this case, they are not exactly the same chip. Just as PowerPC refers to a family of chips that share certain characteristics (ISA being one of the biggest), Cell refers to a family of processors that have certain characteristics. So the emulator that IBM has may not be emulating the exact configuration of chip that is running in a PS3.
Source for actual chips? by mustafap · 2005-11-10 04:47 · Score: 3, Interesting

Thats great news, but as an embedded systems designer and eternal tinkerer, where will I be able to buy a handfull of these processors to experiment with? Without having to dismantle loads of games machines ;o)

--
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
1. Re:Source for actual chips? by Wesley+Felter · 2005-11-10 05:37 · Score: 1
  
  Unfortunately for you, you don't "tinker" with Cell. Since all the I/O is multi-GHz exotic Rambus signaling, you probably have to be an expert board designer to do anything with it. Not to mention that you have to get the processor, southbridge, and RAM from three different companies, probably signing a stack of NDAs in the process.
2. Re:Source for actual chips? by mustafap · 2005-11-10 05:40 · Score: 1
  
  Ah. When I read 'the size of your fingernail' I assumed it would be like an ARM core. Oh well. :o(
  
  Thanks,
  
  Mike.
  
  --
  Open Source Drum Kit, LPLC deve board - mjhdesigns.com
3. Re:Source for actual chips? by AKAImBatman · 2005-11-10 06:48 · Score: 1
  
  All CPUs are the size of your fingernail. It's the packaging that makes them appear large. :-)
  
  --
  Javascript + Nintendo DSi = DSiCade
4. Re:Source for actual chips? by mustafap · 2005-11-10 07:09 · Score: 1
  
  The Intel P4 is 15x15mm You must have very large fingernails.
  
  --
  Open Source Drum Kit, LPLC deve board - mjhdesigns.com
5. Re:Source for actual chips? by AKAImBatman · 2005-11-10 07:27 · Score: 1
  
  I don't have a ruler with me at the moment, but that looks pretty close to right for my thumbnail. What can I say? I'm a big guy. :-)
  
  --
  Javascript + Nintendo DSi = DSiCade
6. Re:Source for actual chips? by mustafap · 2005-11-10 07:37 · Score: 1
  
  well I wont argue with a big guy :o)
  
  I'd forgotten that these processors are not made on the 3 micron processes like the chips I used to work on!
  
  --
  Open Source Drum Kit, LPLC deve board - mjhdesigns.com
7. Re:Source for actual chips? by AKAImBatman · 2005-11-10 07:49 · Score: 1
  
  I'd forgotten that these processors are not made on the 3 micron processes like the chips I used to work on!
  
  3 microns? Wow. That's huge! The top of the line chips these days are easily below 0.5 microns. (The PIV chips are 0.18 and 0.13 microns!) I know I was just shocked when I got my Spartan III FPGA kit. I couldn't believe how small the thing was in it's packaging. I can't even imagine how small the actual die must be!
  
  --
  Javascript + Nintendo DSi = DSiCade
8. Re:Source for actual chips? by mustafap · 2005-11-10 08:20 · Score: 1
  
  >I know I was just shocked when I got my Spartan III FPGA kit.
  
  cool! I just got a Spartan III dev board in the post last week too. First thing I did was hook it up to a monitor and twiddle a few buttons :o)
  Fancy chatting about it by email?
  
  mikehibbett at oceanfree (dot) net
  
  Mike.
  
  --
  Open Source Drum Kit, LPLC deve board - mjhdesigns.com
What about a PPC SDK and simulator? by kuwan · 2005-11-10 04:49 · Score: 4, Interesting

As the Cell is basically a PPC processor I find it strange that the SDK is for x86 processors. Fedora Core 4 (PowerPC), also known as ppc-fc4-rpms-1.0.0-1.i386.rpm is listed as one of the files you need to download. Maybe it's just because of the large installed base of x86 machines.

It'd be nice if IBM released a PPC SDK for Fedora, it would have the potential to run much faster than an x86 SDK and simulator.

--
infested with jello like fishes no melotron wishes
1. Re:What about a PPC SDK and simulator? by antifoidulus · 2005-11-10 05:07 · Score: 1
  
  Not sure how much faster it would be really(though I'm writing this from a powerbook and I really wish they would release some ppc stuff). A PPC chip acts as the controller but the actual proccessing is done on chips with architectures vastly different from both x86 and PPC, For instance, they aren't superscalar so they do no branch prediction like both x86 and PPC do...so really the emulation speed is pretty independent of it's host architecture. I suppose they could use the Altivec found on Apple's CPUs to simulate some of the SIMD instructions better, but remember the only CPU that IBM makes with an altivec unit is the G5s it makes for Apple, not exactly it's core busiiness.
  
  --
  Monstar L
2. Re:What about a PPC SDK and simulator? by Jozer99 · 2005-11-10 05:47 · Score: 1
  
  I dont know how much of a performace boost you would get. Despite the fact that it has a power pc processor on it, it is a very different power pc. It does not have out of order execution, and has all those pretty vector processors dangling off of it. I think emulation would be 2x faster, at most.
3. Re:What about a PPC SDK and simulator? by jmorris42 · 2005-11-10 06:13 · Score: 1
  
  > Maybe it's just because of the large installed base of x86 machines.
  
  Got it in one try. Anyone interested in actually using this thing has a spare PC to load FC4 on, almost none has a spare top of the line PowerMac in the closet. Heck, most don't have a top of the line Powermac period.
  
  --
  Democrat delenda est
4. Re:What about a PPC SDK and simulator? by bhima · 2005-11-10 07:20 · Score: 1
  
  Well, I have a dual 2.5 G5 and as easy as it is to dual boot with OS X I'd devote a firewire disk to it for a while.
  I keep having this fantasy that a PCI-E development board will come out and I'll be able to do something interesting with it (what I have no idea but I'm open to suggestions). I'd really like OS X development environment for it to tinker with.
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
5. Re:What about a PPC SDK and simulator? by Anonymous Coward · 2005-11-10 07:59 · Score: 0
  
  The simulator is for ppc is coming soon. For everything else, the stuff is already there: just use the packages in the development section of the Barcelona web site.
  
  Arnd
6. Re:What about a PPC SDK and simulator? by pbohrer · 2005-11-10 14:18 · Score: 3, Informative
  
  The simulator is actually maintained on a number of different platforms within IBM. Since the rest of the SDK team (xlc, cross-dev gcc, sample & libs, etc) chose Fedora Core 4 on x86 as a means of enabling the most number of people, we didn't want to confuse too many people by supplying the simulator on a variety of platforms for which the rest of the SDK is not supported. This was somewhat of a big-bang release of quite a bit of software to enable exploration of Cell. Now that we have this released and the open source side of the SDK is available on the web, I am sure people will have no problem adapting that build environment to be hosted on Linux/PPC. In support of that, we will be providing a Linux/PPC version of the Cell simulator soon on alphaWorks.
  
  --
  --Pat IBM Austin Research Lab Performance and Tools, Mgr pbohrer@us.ibm.com
7. Re:What about a PPC SDK and simulator? by bmoore · 2005-11-10 14:40 · Score: 1
  
  Will you also be releasing the OSX version of Mambo/SystemSim, or just the Linux/PPC version?
8. Re:What about a PPC SDK and simulator? by pbohrer · 2005-11-10 18:10 · Score: 1
  
  We use it all the time here in the lab to run disk images and executables that are already built. We don't have a Linux/PPC cross-dev toolkit hosted on OSX though. So I am curious if there is really a demand for an OSX version of the simulator without a development environment on OSX ?
  
  --
  --Pat IBM Austin Research Lab Performance and Tools, Mgr pbohrer@us.ibm.com
GNU toolchain by lisaparratt · 2005-11-10 04:50 · Score: 5, Interesting

The software includes many gnu tools, but the underlying compiler does not appear to be gnu based.

Is this any surprise? My understanding was the Cell's a vector process, and despite the recent upgrades to GCC, it's still fairly awful at autovectorisation.

Can anyone clarify?
1. Re:GNU toolchain by Have+Blue · 2005-11-10 05:24 · Score: 3, Informative
  
  IBM may have run into the same problems with the Cell that they did with the PowerPC 970- the chip breaks some fundamental assumptions GCC makes, and to add the best optimization possible it would necessary to modify the compiler more drastically than the GCC leads would allow (to keep GCC completely platform-agnostic).
2. Re:GNU toolchain by Wesley+Felter · 2005-11-10 05:32 · Score: 3, Informative
  
  The SDK includes both GCC and XLC. GCC's autovectorization isn't the greatest, but Apple and IBM have been working on it. I think if you want fast SPE code you'll end up using intrinsics anyway.
3. Re:GNU toolchain by Anonymous Coward · 2005-11-10 08:13 · Score: 0
  
  Actually, the GNU tool chain has been available for over a month now, so it was not announced separately.
  
  Arnd
4. Re:GNU toolchain by advocate_one · 2005-11-10 09:46 · Score: 1
  
  IBM may have run into the same problems with the Cell that they did with the PowerPC 970- the chip breaks some fundamental assumptions GCC makes, and to add the best optimization possible it would necessary to modify the compiler more drastically than the GCC leads would allow (to keep GCC completely platform-agnostic).
  
  who the heck says they have to keep the GCC they distribute with the software development kit platform agnostic??? what a stupid concept. It has absolutely NOTHING to do with the GCC leads... they can't stop you from modifying GCC for your own uses such as creating a platform specific development kit... they just don't have to accept back any modifications you have made. All you have to do if you do distribute it externally like they are here, is to make sure either the full source code to your version of GCC is included, or you provide it upon request.
  
  --
  Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
5. Re:GNU toolchain by Anonymous Coward · 2005-11-10 14:03 · Score: 0
  
  >> who the heck says they have to keep the GCC they distribute with the software development kit platform agnostic???
  
  The accountants. The only reason they would be using GCC in the first place is because it's a cheap port. If they're gonig to make radical changes to the compiler, they might as well just write one from the ground up (which IBM likely is doing).
Echoes of Redhat by delire · 2005-11-10 04:51 · Score: 3, Insightful

Why Fedora is so often considered the default target distribution I don't know. Even the project page states it's an unsupported, experimental OS, and one now comparitvely marginal when tallied.

Must be a case of 'brand leakage' from a distant past, one that held Redhat as the most popular desktop Linux distribution.

Shame, I guess IBM is missing out on where the real action is.
1. Re:Echoes of Redhat by LnxAddct · 2005-11-10 06:10 · Score: 4, Insightful
  
  Fedora overtook Suse within a year and a half in terms of users. It is now a close 3rd to Debian which is a far second from Red Hat (Red Hat and Fedora together have around 3 times the market share of Debian, check netcraft to confirm those numbers). The numbers on distrowatch are not downloads or users, that number is how many people clicked on the link to read about Ubuntu. Mark Shuttleworth is obscenely good at getting press about Ubuntu so the Ubuntu link gets a lot of click throughs, and now that it is at the top, it is kind of self fulfilling as interested people want to read about the top distro so they click on that more.
  
  When it comes down to it, Fedora is the most advanced linux distribution out there. It comes standard with SELinux and virtualization. It uses LVM by default, integrates exec-shield and other code foritfying techniques into all major services. It has the latest and greatest of everything. Things just work in Fedora because a large portion of that code was coded by Red Hat. Red Hat maintains GCC and glibc, they commit more kernel code than anyone else, they play a large role in everything from Apache and Gnome to creating GCJ to get java to run natively under linux. Whether you like it or not, Fedora is the distro most professionals go with, despite what the slashdot popular oppinion is and despite the large amounts of noise that a few ubuntu users create.
  
  Out of the big two, Novell and Red Hat, Novell has never been worse off and Red Hat has never been healthier. Red Hat doesn't officially provide support for Fedora, but it is built and paid for by Red Hat and their engineers (in addition to the community contributions). By targetting Fedora, IBM knows that they are targeting a stable platform with the largest array of hardware support. IBM is in bed with both Novell and Red Hat, they didn't choose Fedora because they were paid to or something... they chose Fedora based on technical merits. Claiming that Fedora is unstable is no different than claiming GMail is in beta, both products are still the best in their respective industries. Why do people go spreading FUD about such a good produc when they've never used it themselves? Whether you want to admit it or not, Fedora is the platform to target for most. It is compatible in large part with RHEL, so you're getting the most bang for your buck.
  
  IBM doesn't just shit around, or make decisions for dumb reasons. If Fedora is good enough for IBM it is good enough for anyone. Apparently this is a common oppinion as more and more businesses switch to Fedora desktops. Here is one recent story of a major Australian company, Kennards, replacing 400 desktops with Fedora. Don't be so close minded or you might be left behind.
  Regards,
  Steve
2. Re:Echoes of Redhat by Anonymous Coward · 2005-11-10 06:19 · Score: 1, Interesting
  
  Must be a case of 'brand leakage' from a distant past, one that held Redhat as the most popular desktop Linux distribution.
  
  uh, or maybe... 1) it's because IBM has a partnership with RedHat, 2) Fedora runs on PPC (which CBE is based on) so i'm sure it's easy for them to modify, 3) there's a good chance this was developed using FC4, so it's just easy to release it for FC4
3. Re:Echoes of Redhat by dominator · 2005-11-10 06:37 · Score: 1
  
  Fedora's #4 ranking on Distrowatch can hardly be called "marginal". Nevermind that one should also question the site's "page hit ranking" methodology before passing it off as representative, much less authoritative.
4. Re:Echoes of Redhat by Anonymous Coward · 2005-11-10 07:03 · Score: 0
  
  Too bad Fedora is ass ugly and has a general feel of being clunky, to me. I tried Fedora and didn't like it. Of course, I don't really like any of RedHat's offerings (and I'm forced to use RHES3 daily as our primary target platform).
5. Re:Echoes of Redhat by CyricZ · 2005-11-10 07:22 · Score: 0
  
  Indeed. Fedora Core, even FC4, is hardly a usable system for anything serious. The last time I tried FC4, the installer crashed.
  
  The future is Ubuntu, whether Red Hat likes it or not. It's too bad that IBM does not recognize that fact, as well.
  
  --
  Cyric Zndovzny at your service.
6. Re:Echoes of Redhat by Anonymous Coward · 2005-11-10 08:26 · Score: 0
  
  actually only 2) and 3) were the reasons, plus the fact that Debian does not have a proper ppc64 target, while biarch ppc32/ppc64 works very well on Fedora.
  
  We did the development on fc3 with our own kernel and tool chain packages and only minor configuration tweaks, but most of us actually use something different on our desktops.
  
  Arnd
7. Re:Echoes of Redhat by pembo13 · 2005-11-10 10:36 · Score: 1
  
  I see you're being a good troll. I don't use Ubuntu, but I don't go trolling it either.
  
  --
  "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
8. Re:Echoes of Redhat by CyricZ · 2005-11-10 12:35 · Score: 1
  
  Your ad hominem attack actually proves my point. Fedora is inferior to Ubuntu.
  
  You cannot prove Fedora better because it just plain is not. It is impossible to debate against the truth. Thus you must resort to ad hominem attacks, which instantly prove that I am the victor in this debate.
  
  --
  Cyric Zndovzny at your service.
9. Re:Echoes of Redhat by Anonymous Coward · 2005-11-10 16:22 · Score: 0
  
  Too bad your mother is ass ugly and has a general feel of being clunky, to me. I tried your mother and didn't like her. Of course, I don't really like any of your mother's offerings (and I'm forced to use your mother daily as my primary target platform).
10. Re:Echoes of Redhat by heson · 2005-11-11 04:13 · Score: 1
  
  The "unsupported, experimental" is just FUD to keep their customers on RHEL.
11. Re:Echoes of Redhat by CronoCloud · 2005-11-11 13:27 · Score: 1
  
  Sony likes Red Hat too. Linux on the Playstation 2 is Red Hat.
  
  Maybe they just like rpm.
Re:Is this the same Cell processor used in the PS3 by Anonymous Coward · 2005-11-10 05:01 · Score: 0

and you can totally play your downloaded ps3 games on it
New & Improved by Doc+Ruby · 2005-11-10 05:02 · Score: 1, Funny

I dunno - telling people they have to upgrade their PC to run the SDK for a new PC architecture seems like a marketer's job.

--
--
make install -not war
Branch Prediction by Anonymous Coward · 2005-11-10 05:05 · Score: 0

What branch-prediction means is that a JUMP instruction cannot be made directly in a single atomic instruction. Instead the CPU must single-step forward until the appropriate offset is reached, rather than just incrementing the Program Counter register in one go. In the CELL's case, it cannot jump in one go because it is RISC and there are not enough bits left after the opcode bits to fit the offset.

And yes, this does mean that code must be optimised so that JUMPs are very small.
1. Re:Branch Prediction by farnz · 2005-11-10 05:51 · Score: 1
  
  Branch prediction is when the CPU guesses whether a conditional jump will be taken or not taken, in order to keep the pipeline full (if the pipeline empties, even for one instruction's worth of time, the CPU runs slower). Usually, this guess is partly based on static rules (if it would branch backwards, it'll probably be taken, while forward jumps are probably not taken), and partly on dynamic rules (the last four times I saw this conditional jump, I took it, so I'll assume it'll be taken again this time).
  In Cell and other IBM PPE designs, there is no dynamic prediction hardware, so the CPU makes the same guess every time, even when it's obvious to a human that it's wrong. This costs in performance for code like AI, where the chances of taking the jump vary while the game is running.
  
  --
  I appear to have a blog. Odd.
2. Re:Branch Prediction by imroy · 2005-11-10 05:52 · Score: 1
  
  Wow, you could not be more wrong. See the wikipedia article on branch misprediction. You should probably read up on exactly what RISC means as well. I have the "SPU assembly language" document here from IBM (can't remember where I got it from, sorry). The branch instructions (not JUMP) can jump to any location stored in any 32-bit register, minus the two least significant bits. It is a RISC CPU after all. Or it can branch relative to the current PC using an 18-bit direct value. Considering the first generation of Cell's have 256KB of local addressable memory per SPU, that's half the available memory in a relative jump. And most of that memory is probably going to be used by data anyway. So no, JUMP's do not have to be small. This is not your dad's SIMD computer, this is a pretty general RISC processor with vector extensions.
3. Re:Branch Prediction by lmsig · 2005-11-10 06:04 · Score: 1
  
  Has any CPU ever had a mechanism for the user to hint to the CPU whether or not the branch will be taken? Perhaps just another branch instruction that hints to the CPU that it is very likely for a branch to be taken.
  
  This way the compiler could insert the appropriate optimization depending on the situation (or we could even allow #pragma type statements so a programmer could tell the compiler which way to hint!)
  
  Granted most of the time the compiler could decide; or it doesn't matter so you could just use the same simple rules that the CPU might use (or just defer to the CPU as we do now). However, we've all be stuck in a situation where we end up with an if statement inside of a for loop. It'd be nice to be able to tighten a loop like that up in the rare situation where it matters.
  
  --
  .plan!! what plan?
4. Re:Branch Prediction by TurkishGeek · 2005-11-10 06:57 · Score: 1
  
  This is exactly what the Cell SPE's have. The SPE compiler uses "branch hints" that are put in by the compiler using the GCC pragma "__builtin_expect_". Take a look at the "SPU C/C++ Language Extensions" document that was released a while back by the Cell team.
  
  Most of the other posters have no idea what they are talking about. The PPE is a fully PowerPC compliant two-way SMT processor and absolutely has a branch predictor. It is the SPEs (SIMD vector units) that do not have branch prediction, but they do have branch hints. A tacit assumption in the SPE design is that the vector code used in the SPE's will not have too many branches to begin with.
  
  --
  Zigbee Central: A Zigbee weblog
5. Re:Branch Prediction by raftpeople · 2005-11-10 09:51 · Score: 1
  
  But the PowerPC core does have a more limited pipeline and branch prediction logic than some of the other Power chips. I believe this simplification was to make room for other "stuff."
6. Re:Branch Prediction by TurkishGeek · 2005-11-10 10:21 · Score: 1
  
  Agreed, the PPE core only has a 4KB by 2-bit BHT(branch history table). Note that the PPE pipeline depth is only 23 stages (i.e. branch misprediction penalty is 23 cycles), so a misprediction penalty is comparable to designs that run at far, far slower clocks. I am not sure if the main motivation was making chip real estate available for other things: The recent IBM Journal of R & D paper by Kahle et al. is an excellent read to gain insight into the design decisions they took, and I believe they were confident that a more sophisticated branch predictor was unnecessary considering the elegant PowerPC core that they had in their hands (23 FO4 PPE pipeline depth, 11 FO4 SPE pipeline depth, can run at 4GHz plus!).
  
  --
  Zigbee Central: A Zigbee weblog
7. Re:Branch Prediction by be-fan · 2005-11-11 04:12 · Score: 1
  
  I'd hardly call a CPU with a 23-stage pipeline and no out of order execution 'elegant'. Maybe to a hardware guy, but all I see when I look at Cell is absolutely atrocious integer performance.
  
  --
  A deep unwavering belief is a sure sign you're missing something...
8. Re:Branch Prediction by TurkishGeek · 2005-11-11 06:53 · Score: 1
  
  I am a hardware guy, and the design is far more elegant and simpler than most of the competing CPU's out there; mainly as a result of the push to get it to work at the 4GHz+ frequency range.
  
  I think it's very early to talk about the integer performance of Cell. I have been working on Cell for a few months now, and all I can say is that the integer performance of the PPE core is on par with the competition; and it beats them handily using hand-written code to take advantage of the SPEs.
  
  --
  Zigbee Central: A Zigbee weblog
9. Re:Branch Prediction by be-fan · 2005-11-11 11:06 · Score: 1
  
  On par with what? I'm a Lisp guy/compiler enthusiast. I like processors with out-of-order execution that don't care about code scheduling, have excellent branch prediction, have low memory latency, etc. Basically, my ideal processor is an Opteron. It's all about perspective, hence my criticism of your use of the word "elegant".
  
  --
  A deep unwavering belief is a sure sign you're missing something...
Very Similar to PS3 SDK by leather_helmet · 2005-11-10 05:09 · Score: 0

We have been working with the PS3 SDK for just about a month now and have begun to run our preliminary game code on it - one complaint we have is that the it took us a few weeks to read through the documentation, sample code etc. before we became remotely comfortable with the hardware, whereas with 'other' next-gen SDK's we were able to pretty much hit the ground running...this was expected however and in the end, the PS3 has got a LOT more upside potential - with the other SDK we know pretty much what we have to work with, the PS3 looks like it will take a few years before developers really begin to tap into its somewhat convuluted power and architecture
1. Re:Very Similar to PS3 SDK by Anonymous Coward · 2005-11-10 05:31 · Score: 0
  
  "somewhat convuluted power and architecture"
  
  So you mean that it isn't the retarded desktop x86 peecee architecture that the 360 and your computer at home are like?
  
  Perhaps you should stick to cutting and pasting directx code?
  
  Let this guy be an example to everyone who loves to quote 'developers' who talk about system being 'hard to program' - invariably it is some clown whose world revolves around Microsoft and directx.
Obligatory by Anonymous Coward · 2005-11-10 05:15 · Score: 0

Imagine a beowulf cluster of these...
Re:Is this the same Cell processor used in the PS3 by Spazntwich · 2005-11-10 05:16 · Score: 0

Actually, you simp, the problem with your argument is PowerPC and Cell are not analogous names. PowerPC denotes certain things, but it doesn't specify the final type. There are PowerPC G3s, G4s, G5s, and a host of others I've forgotten that predate those. Thus far sir, there is only one iteration of a cell processor, meaning any cell you find out there right now is the same one in anything else with a cell processor.
Re:Is this the same Cell processor used in the PS3 by Anonymous Coward · 2005-11-10 05:17 · Score: 2, Funny

I not get mine run. Please send exact instruction how downloaded PS3 games play can?
Re:Is this the same Cell processor used in the PS3 by Chris+Redfield · 2005-11-10 05:19 · Score: 1

which would be why he responded in such a sarcastic manner, and why everyone accused him of "trolling"
I should have added... by mustafap · 2005-11-10 05:20 · Score: 1

That I am in the UK, although I dont think that will make much difference :o)

But I would like to know.

Mike.

--
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
I agree! by BobPaul · 2005-11-10 05:27 · Score: 1

Give me a nice clean distro like Gentoo anyday. I can't stand that a Fedora install requires 5CDs and installs some 600 packages that I will never use. Why do I need so many text editors, etc? I get lost in the and nervous in the Applications menu. Sure, I tried 30 text editors before I found the one I wanted, but that's all I install on my box durring reinstall or upgrade.

BTW, this parent might be offtopic, be he is no troll. Shame on you mods!
1. Re:I agree! by raverbuzzy · 2005-11-10 05:58 · Score: 1
  
  You don't have to install the packages. We use Fedora at work and only use the 1st CD to build the system. Anything else gets installed only if and when it's needed using yum.
2. Re:I agree! by LnxAddct · 2005-11-10 06:16 · Score: 1
  
  If you do a minimal install of Fedora, it requires only the first CD and gives you just the essentials. Anything else can be installed with Yum. Gentoo is horrific as far as security goes. Fedora takes security very serious (SELinux, exec-shield, code foritification, quick patching, etc...), and it is also great on performance too. It focuses on ease of use, yet targets power users and server operators as well. It is quite an impressive distro and I've seen it used all throughout companies, inlcuding a few Fortune 500s. If your doing anything other then little kid stuff, use a real distro like Fedora.
  Regards,
  Steve
3. Re:I agree! by drinkypoo · 2005-11-10 07:39 · Score: 1
  
  If "your" (sic) doing anything other than little kid stuff, use a real distro like RHEL. Fedora is unsupported and is permanently unstable because it's a testing ground... for RHEL.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
4. Re:I agree! by BobPaul · 2005-11-10 07:59 · Score: 1
  
  Gentoo is horrific as far as security goes.
  
  Compile the SELinux profile into your kernel and then follow the gentoo security handbook.
5. Re:I agree! by LnxAddct · 2005-11-10 08:19 · Score: 1
  
  Security should be easy. Fedora comes with it preconfigured and has an excellent default policy that you can modify at your desire. The more obstructions in the way, the less likely the security features are to be used, Fedora literally removes all obstructions. Gentoo has significantly more vulnerabilites than Fedora, even if you add up all the vulnerabilities for all 4 cores (not that those raw numbers really matter in the end as long as they all get patched). The difference is that most of the vulnerabilities announced in Fedora can't actually be used against the system for a couple of reasons including SELinux, exec-shield, randomized memory mappings and compiling with fortify source for all major public facing services . All of this is done and implemented without the user having to do anything, that is the way to implement security (if you want it to be used). Keep it simple, and keep it out of the way. In addition to this, Fedora/RHEL have the fastest rate of getting patches out, followed by Novell which is sometimes 2 or 3 days later, and then followed by others (Its been a few months since I've read the report, but it was on /.) I'm not knocking Gentoo, it has its place, but it isn't really made for the desktop or the server, it is a really good learning tool that some people use as their main OS (of course it also has other purposes). I've ran Gentoo before, and learned alot about my system, but as far as speed goes I noticed nothing above Fedora (which has all optimized packages). Infact the only distro that I've ever used where there was a huge noticable speed difference over the others was Yoper., unfortunately its development seems stagnant.
  REgards,
  Steve
6. Re:I agree! by Anonymous Coward · 2005-11-10 09:13 · Score: 0
  
  ...anything other than little kid stuff, use a real distro like RHEL.
  
  Oh, God, no. RHEL is... well, Hell. It's so boring and out of date. Far too grown up for my liking, not a technical distribution by any stretch of the imagination. Then again maybe it's because my local sysadmins don't have a clue when it comes to modern Linux (hence the anonymity...).
7. Re:I agree! by BobPaul · 2005-11-11 07:40 · Score: 1
  
  Gentoo has significantly more vulnerabilites than Fedora, even if you add up all the vulnerabilities for all 4 cores (not that those raw numbers really matter in the end as long as they all get patched)
  
  Well, first I'd like to irraterate what you already pointed out, that neither has unpatched vulnerabilities.
  
  Second, you're comparing EVERY release of Gentoo ever to Fedora Core 4.0. Notice how Fedora Core 4.0 doesn't have any vulnerabilities before Feb 2005? That's because it didn't exist much before then.
  
  You forgot the 186 patched vulerabilities in FC 3, the 132 patched vulnerabilities in FC 2, and the 74 patched vulnerabilities in FC 1.
  
  No, that 448 patched vulerabilities is much less than the 746 vulnerabilities for Gentoo, but that's a stupid rubrik anyway. 746 vulerabilities covers the entire portage tree, where as 448 vulnerabilities only covers those packages distributed on the RedHat installation media.
  
  Keep your meta distribution, it's no skin off my nose. But at least attempt to make like comparisons in your arguments.
Linux on PS3? by deadline · 2005-11-10 05:28 · Score: 1

This is very interesting. The Cell has a very non-standard architecture, but it can be used in a very powerful way. The key is software and thus, an emulation SDK is really important for a new architecture. From and HPC (High Performance Computing) prospective, these chips could be very powerful.
The real question is whether the the PS3 will have an Linux hard disk option like the PS2. If that is the case, it may be the cheapest way to get actual development hardware.

--
HPC for Primates. Read Cluster Monkey
1. Re:Linux on PS3? by MaskedSlacker · 2005-11-10 05:37 · Score: 2, Interesting
  
  Almost definitely. A cheap beowulf of PS3s.
Cell Hardware... by GoatSucker · 2005-11-10 05:35 · Score: 4, Informative

From the article:
How does one get a hold of a real CBE-based system now? It is not easy: Cell reference and other systems are not expected to ship in volume until spring 2006 at the earliest. In the meantime, one can contact the right people within IBM to inquire about early access.

By the end of Q1 2006 (or thereabouts), we expect to see shipments of Mercury Computer Systems' Dual Cell-Based Blades; Toshiba's comprehensive Cell Reference Set development platform; and of course the Sony PlayStation 3.
Rosetta to the rescue? by Caspian · 2005-11-10 05:45 · Score: 2, Interesting

'Processor - x86 or x86-64; anything under 2GHz or so will be slow to the point of being unusable.'

OK, so what they're saying is "it's slow to emulate a PPC variant on an x86 variant". Duh.

But Apple seems to have cooked up something wonderful (or at least licensed something wonderful) in this vein in the form of Rosetta, the tech that lets Mac OS X for x86 run Mac OS X for PPC binaries very fast.

Sony has several metric fucktons of money. Can't they license the Rosetta technology, or pay for it to be basically "ported" from its current state of PPC-on-x86 to Cell-on-x86? Cell is PPC-based, so it shouldn't be so hard, no?

--
With spending like this, exactly what are "conservatives" conserving?
1. Re:Rosetta to the rescue? by Synic · 2005-11-10 05:51 · Score: 1
  
  you ever think that rosetta is more like wine than it is like virtual pc? ie handles the upper API calls by translating them to native lower level?
2. Re:Rosetta to the rescue? by Caspian · 2005-11-10 05:59 · Score: 1
  
  No. Because then, applications like Photoshop would NOT be fast in Rosetta, and they are.
  
  My reasoning for saying this is that CPU-intensive, [presumably] tightly optimized things like Photoshop would not (at least, for the filters and other image operations) use API calls, they'd use hand-optimized raw C/C++ code, or even raw PPC assembly.
  
  The fact that Photoshop performs quickly under Rosetta, to me, indicates that it's not primarily API reimplementation under the hood, but is some advanced form of cacheing JIT.
  
  --
  With spending like this, exactly what are "conservatives" conserving?
3. Re:Rosetta to the rescue? by Hal_Porter · 2005-11-10 07:04 · Score: 2, Interesting
  
  Apple wrote a great 68K emulator for the PowerPC macs. It was non JIT, and worked like a big jump table. So you took a 16bit 68k instruction, shifted it and jumped to the base of the table + the shifted offset. The code there would essentially be a PowerPC version of the 68K code.
  
  http://www.mactech.com/articles/mactech/Vol.10/10. 09/Emulation/
  
  So you end up doing four instructions to decode the 68K instruction, and then whatever it takes to actually do the operation, typically 2-4.
  
  JIT emulators would profile the code and check which bits were frequently executed. Then they would essentially copy the table entries into a buffer. So in a loop, you'd actually execute native just execute the 2-4 native instructions and skip the table dispatch.
  There's another benefit too, you can skip things like condition code updates, if you know that they will be overwritten by another instruction before they are checked. Plus you can do peephole optimisations, constant folding and so on.
  
  There's a wonderful article here -
  
  http://www.gtoal.com/sbt/
  
  I can easily believe that CPU intensive code like image processing can run at a very impressive speed, especially as top of the range x86 chips have better SpecInt perormance than a top of the range PPC.
  
  Incidentally, I read about Apple's second generation 68K emulator being a "dynamic recompiler", so they've been working on this sort of thing for ages.
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
4. Re:Rosetta to the rescue? by Anonymous Coward · 2005-11-10 07:11 · Score: 0
  
  Did you just mention SPEC as a valid mean of measuring chip performance???
  
  Hahahaha!!!
5. Re:Rosetta to the rescue? by seebs · 2005-11-10 08:10 · Score: 1
  
  It's not the PPE cores that are slow to emulate. It's the 8 additional vector-only processors.
  
  This is a sim, not just an emulator. It's not just vaguely implementing the output; it is at least to some extent modeling the instruction pipelining, branch miss penalties, and so on.
  
  --
  My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
Another question about the simulator by Weaselmancer · 2005-11-10 05:50 · Score: 1

I wonder if it'll take advantage of multi-core chips? Might make sense to do so, especially since that's also (sort of) similar to the hardware being simulated.

--
Weaselmancer
rediculous.
Am I really the first to mention it.. by Transcendor · 2005-11-10 05:51 · Score: 1

Imagine that running on a beowulf Cluster of Cell Processors, running Bochs to run... nevermind
1. Re:Am I really the first to mention it.. by Anonymous Coward · 2005-11-10 08:21 · Score: 0
  
  Ha ha...ha...
  
  ...
  
  ha...
Oh come on. by Anonymous Coward · 2005-11-10 05:59 · Score: 0

They give people a free PS3 emulator and think they will ever do anything "productive" on it?!
Re:Is this the same Cell processor used in the PS3 by Anonymous Coward · 2005-11-10 06:12 · Score: 0

Actually, you simp, the problem with your argument is PowerPC and Cell are not analogous names. PowerPC denotes certain things, but it doesn't specify the final type. There are PowerPC G3s, G4s, G5s, and a host of others I've forgotten that predate those. Thus far sir, there is only one iteration of a cell processor, meaning any cell you find out there right now is the same one in anything else with a cell processor.

You're all class :)

The following is taken from "the source" (i.e. IBM) :

The Cell Broadband Engine (CBE) is a new architecture which extends the 64-bit Power Architecture(TM) technology. Ideal for compute-intensive tasks like gaming, multimedia, and physics- or life-sciences and related workloads, the CBE is a single-chip multiprocessor no bigger than a fingernail, with nine processors operating on a shared, coherent memory. The CBE processor contains a Power Architecture-based control processor (PPU) augmented with eight (or more) SIMD Synergistic Processor Units (SPUs) and a rich set of DMA commands for efficient communications between processing elements.

So actually, no, it's not much different than saying POWER or PowerPC. It denotes a family of processors that share certain capabilities. The fact that there is only a single implementation does not detract from this definition (just as the fact that when PowerPC came out, one could only initially get their hands on a 601). Plus, unless you're fairly wired in, how do you know exactly which flavour(s) of cell proccies IBM actually has vs what Sony might have? IBM themselves say "8 or more". How do you know that the PS3 versions has only eight (and from what I understand some will have specific tasks) while IBM currently has versions with more? That's the point.
Hannibal sucks by Anonymous Coward · 2005-11-10 06:13 · Score: 0

and DSP is wrong. They are processors.
"cell" architecture is all about local memory by Animats · 2005-11-10 06:25 · Score: 4, Informative

The "cell" processors have fast access to local, unshared memory, and slow access to global memory. That's the defining property of the architecture. You have to design your "cell" program around that limitation. Most memory usage must be in local memory. Local memory is fast, but not large, perhaps as little as 128KB per processor.
The cell processors can do DMA to and from main memory while computing. As IBM puts it, "The most productive SPE memory-access model appears to be the one in which a list (such as a scatter-gather list) of DMA transfers is constructed in an SPE's local store so that the SPE's DMA controller can process the list asynchronously while the SPE operates on previously transferred data." So the cell processors basically have to be used as pipeline elements in a messaging system.
That's a tough design constraint. It's fine for low-interaction problems like cryptanalysis. It's OK for signal processing. It may or may not be good for rendering; the cell processors don't have enough memory to store a whole frame, or even a big chunk of one.
This is actually an old supercomputer design trick. In the supercomputer world, it was not too successful; look up the the nCube and the BBN Butterfly, all of which were a bunch of non-shared-memory machines tied to a control CPU. But the problem was that those machines were intended for heavy number-crunching on big problems, and those problems didn't break up well.
The closest machine architecturally to the "cell" processor is the Sony PS2. The PS2 is basically a rather slow general purpose CPU and two fast vector units. Initial programmer reaction to the PS2 was quite negative, and early games weren't very good. It took about two years before people figured out how to program the beast effectively. It was worth it because there were enough PS2s in the world to justify the programming headaches.
The small memory per cell processor is going to a big hassle for rendering. GPUs today let the pixel processors get at the frame buffer, dealing with the latency problem by having lots of pixel processors. The PS2 has a GS unit which owns the frame buffer and does the per-pixel updates. It looks like the cell architecture must do all frame buffer operations in the main CPU, which will bottleneck the graphics pipeline. For the "cell" scheme to succeed in graphics, there's going to have to be some kind of pixel-level GPU bolted on somewhere.
It's not really clear what the "cell" processors are for. They're fine for audio processing, but seem to be overkill for that alone. The memory limitations make them underpowered for rendering. And they're a pain to program for more general applications. Multicore shared-memory multiprocessors with good cacheing look like a better bet.
Read the cell architecture manual.
1. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 06:36 · Score: 1, Informative
  
  Actually, its 256k. One for each SPU. That 2mb total. Not bad if you ask me..
  d
2. Re:"cell" architecture is all about local memory by Naysayer · 2005-11-10 07:36 · Score: 1
  
  That 256k has to hold the program that the SPE is running, as well as all the data. For fast DMA, though, your data is probably double-buffered so divide your data space in half, and hey, you might want a little space for stack / other dynamic memory usage.
  
  Suppose your program is 48k, you use 32k of memory dynamically, that leaves 172k for data, which is double-buffered, which means the program can only process 86k of data at a time.
  
  But it sure can do it fast.
3. Re:"cell" architecture is all about local memory by taracta · 2005-11-10 08:03 · Score: 2, Informative
  
  I think too much emphasis is being placed on "slow" access to system memory for the CELL processor when is is "slow" only relative to access to local memory of the SPUs. Please remember that system memory for the CELL is about 8 times faster than the memory in todays high end PCs with lower latency. XDR is by far the best memory type available unfortunately nobody like RAMBUS the company. So please when you are speaking about access to system memory keep in mind that the CELL processor has about the same memory bandwith has top of the line Graphics cards and probably lower latency. Don't you wish your PC had the bandwith of top of the line Graphics cards?
4. Re:"cell" architecture is all about local memory by taracta · 2005-11-10 08:14 · Score: 1
  
  Actually IMHO I would use the local store primarily for instructions and stream the data DMA which at 25GBs should be more than enough bandwith for data. 256KB plus fast access to the memory of the other SPUs (~2MB +/- 256KB) should be enough for a decent program(s) to run efficiently.
5. Re:"cell" architecture is all about local memory by frostfreek · 2005-11-10 08:19 · Score: 2, Informative
  
  > It's not really clear...
  
  There was a Toshiba demo, showing 8 Cells; 6 used to decode forty-eight HDTV MPEG4 streams, simultaneously, 1 for scaling the results to display, and one left over. A spare, I guess?
  
  This reminds me of the Texas Instruments 320C80 processor; 1 RISC general purpose cpu, plus four DSP-oriented CPUs. Each had an on-chip memory chunk. 4KB. 256KB would be fantastic, after the experience of programming for the C80. 256KB will be plenty of memory to work on a tile of framebuffer.
  
  1. DMA tile -> local RAM
  2. render to local...
  3. ???
  4. Profit!
  
  Whoops, where was I going with that, again?
6. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 09:34 · Score: 0
  
  The PS3 has a GPU from nVidia in it - the Cell won't be doing the rendering itself, so it's free to do things like AI and physics calculations.
7. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 10:14 · Score: 1, Informative
  
  "The PS3 has a GPU from nVidia in it - the Cell won't be doing the rendering itself, so it's free to do things like AI and physics calculations."
  
  WTF?
  
  Just what the world needs, another clown from the peecee world talking about the PS3.
  
  There is no 'GPU' in the PS3. The entire Cell+RSX unit is used to render. The RSX would best described as the PS3's rasterizer, but even that isn't entirely accurate since there is a large amount of painting/modifying pixels that the SPEs do. Physics and graphics data is unified and processed on the Cell side of the system, although the RSX does have vertex transform capabilities itself.
  
  PS3 rendering is best described as a hybrid rendering system where rendering is load balanced between the internal components on the fly depending on the unique characteristics of the scene and world data being processed.
  
  So, no, there isn't a NVidia 'GPU' in the PS3...
8. Re:"cell" architecture is all about local memory by tbird20d · 2005-11-10 10:35 · Score: 1
  
  So, no, there isn't a NVidia 'GPU' in the PS3...
  Wrong. Here's something from eurogamer.net, from last year:
  "NVIDIA's graphics processing unit (GPU) for PlayStation 3 will be completed by the end of 2005, according to US Internet reports citing comments made by chief executive Jen-Hsun Huang to investment firm Morgan Stanley this week...."
  Please at least google "nvidia ps3" before responding.
9. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 10:39 · Score: 0
  
  Everything you need to know about the PS3 rendering system was just laid out for you.
  
  If any other PS3 developers would like to make additions or comments, jump right in.
  
  Fanboys should keep their mouth's shut.
10. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 11:30 · Score: 0
  
  The PS3 *HAS* a GPU by NVidia called the RSX. Even though you could render on the Cell, this is not the path expected to be taken by developers. You're expected to build display commands on the CPU and push these to the GPU. Just like the PS2, or the Xbox or any PC with a video card.
  
  I believe it was initially planned that the Cell would do the rendering, but Sony came to their senses and slapped a GPU on it. You can tell it's kind of a last minute thing because the way the GPU is attached to the Cell is kind of awkward.
11. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 14:51 · Score: 0
  
  Just FYI, the spelling of PC is PC, and PC implies it is pronounced as P C.
  
  peecee doesn't mean anything at all. Attempting to phonetically spell out letters is idiotic, and makes _you_ look like an idiot. The correct way to pronounce 'A' is 'A', and 'B' is 'B'
  The average 3 year old knows how to pronounce, and read 'A' and 'B' before they could possibly intepret 'ay' and 'bee'
  
  Now, back to the topic - rather than ridiculing, how about providing references, since this is a fairly common belief, and one that seems to be the line that Sony and nVidia are giving out. So some actual references would be very good.
12. Re:"cell" architecture is all about local memory by Anonymous Coward · 2005-11-10 15:33 · Score: 0
  
  "Even though you could render on the Cell"
  
  "You're expected to build display commands on the CPU and push these to the GPU."
  
  Just fucking stop.
  
  It is sad to think there are people as ignorant as you out there when, even though a good deal of PS3 info is under NDA, there is more than enough public info to understand the PS3 rendering system.
  
  Repeat after me dimwit:
  
  "There is no 'GPU' in the PS3."
  "There is no 'GPU' in the PS3."
  "There is no 'GPU' in the PS3."
  "There is no 'GPU' in the PS3."
  "There is no 'GPU' in the PS3."
In South Korea... by Anonymous Coward · 2005-11-10 06:29 · Score: 0

Only old people abuse memes.
Application for Neural Net Simulations...? by CnlPepper · 2005-11-10 07:03 · Score: 1

I wonder if anyone has considered using cell processors to run large neural network simulations, the SPEs would churn though node calculations at an incredible rate. You wouldn't need any greater accuracy than single precision.

It would be an interesting application.
1. Re:Application for Neural Net Simulations...? by icesprite · 2005-11-10 17:50 · Score: 1
  
  The SPU SIMD engine is actually a teriffic fit for neural network evaluation. However, the most time consuming stage is the NN training part. It is usually done off-line and only once.
Not a PPC Processor by MJOverkill · 2005-11-10 08:36 · Score: 2, Informative

Once again, the cell is not a PPC processor. It is not PPC based. The cell going into the playstation 3 has a POWER based PPE (power processing element) that is used as a controller, not a main system processor. Releasing an SDK for Macs would not give any advantage over an X-86 based SDK because you are still emulating another platform.

Wiki
1. Re:Not a PPC Processor by Anonymous Coward · 2005-11-10 08:42 · Score: 0
  
  Perhaps you should find a new subject to lecture people on.
  
  Linking to a wiki article, that was probably written by someone with as little knowledge as yourself, when there are people here who work for IBM, Sony, and/or are actually working on Cell systems is not a good idea.
2. Re:Not a PPC Processor by MJOverkill · 2005-11-10 09:37 · Score: 1
  Nice troll, I don't know why I'm bothering to respond, but just in case anyone else cannot find the "External Links" and "Articles" sections at the bottom of wikipedia articles, here is a sample from the Cell page:
  
  IBM - Cell Project
  
  Sony - Cell Broadband Engine
  
  IBM CBE Resources
  
  Wikipedia is a good starting point to learn about a topic, but it is not definitive. That is why wikipedia qualifies (i.e. sources) its material. If you actually took some time to read wikipedia pages, you would know this. Next time, read the link before trolling.
3. Re:Not a PPC Processor by Wesley+Felter · 2005-11-10 11:51 · Score: 1
  
  Well, the PPE is a PowerPC. Whether you call the PPE a "controller" or "main system processor" is really just a matter of definitions (I think both terms are simultaneously applicable).
4. Re:Not a PPC Processor by MJOverkill · 2005-11-10 12:19 · Score: 1
  
  It's a Power core, not a PowerPC.
  
  IBM CBE Architecture
  
  The first type of processor, the PPE, is a 64-bit Power Architecture core. It is fully compliant with the 64-bit Power Architecture specification and can run 32-bit and 64-bit operating systems and applications.
5. Re:Not a PPC Processor by Wesley+Felter · 2005-11-10 13:30 · Score: 2, Informative
  
  "Power Architecture" is PowerPC.
  
  What is Power Architecture technology?
  
  "Power Architecture is an umbrella term for the PowerPC® and POWER4(TM) and POWER5(TM) processors produced by IBM, as well as PowerPC processors from other suppliers."
6. Re:Not a PPC Processor by Anonymous Coward · 2005-11-10 19:50 · Score: 0
  
  PowerPC is a subset of Power (as in, instructions, etc.). Not the same.
License Issues by Anonymous Coward · 2005-11-10 08:45 · Score: 0

Its notable that a large number of components in this (like a lot of IBM's software) is not GPL'ed and the code is not available. Smacks of standard IBM hypocrisy. The community really needs to be carefull with IBM taking us for fools.
Fromt the article
The binary packages released in this downloads package are licensed under the IBM International License Agreement for Early Release of Programs, or ILAR. You can check out the terms of it (and other IBM "base licenses") from the IBM base licenses page. alphaWorks downloads are typically limited-time (usually 90 days); in addition, the ILAR license states that "You are not authorized to use the Program for productive purposes" -- so make sure that your time spent with these downloads is as unproductive as possible. Licensing conditions and commercial licensing options for the alphaWorks downloads are discussed in Resources.
Don't be ridiculous by Anonymous Coward · 2005-11-10 14:55 · Score: 0

Instead the CPU must single-step forward until the appropriate offset is reached, rather than just incrementing the Program Counter register in one go.

Christ, that's the most ludicrous comment I've heard in a long time.

For goodness sakes just keep quiet when you have no clue about how processors work. The comment is not only 100% wrong, but totally ridiculous to boot.
why? by PopCulture · 2005-11-10 15:08 · Score: 1

I'm very excited about this project, even spec'd out a new dell to handle it. But before I can lay down the cash, I just wonder: why?

why? Is the cell processor expected to go anywhere past PS3? There is obviously no OS port planned, and I have no access to PS3 game SDK. I have read some pretty awesome posts regarding the technical details of cell vs. x86 or Mac architectures, but none that would encourage me to download, install, and play around with this with the hope of ever making a buck.

--

Here's to finally giving Bush his exit strategy in November
1. Re:why? by seebs · 2005-11-10 18:29 · Score: 1
  
  Blade servers have already been announced.
  
  I would buy one of these, and no, I don't plan to get a PS3.
  
  --
  My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
2. Re:why? by Anonymous Coward · 2005-11-10 20:07 · Score: 0
  
  I will probably buy one when they are available in 1Q 2006 from mercury systems. I have a personal project that requires much parallel processing, cell is perfect for my needs.
  
  So that's one reason.
Sony has poisoned by quarkscat · 2005-11-10 21:50 · Score: 1

the "Cell" well, as far as I am concerned. They seem to be totally unremorseful regarding their music CD DRM (aka rootkit). At one point I considered the purchase of a PS3 in order to gain experience with the Cell Processor. Today, I would not consider the purchase of ANYTHING with Sony's name on it, regardless of how "geeky" it might be.

Purchasing IBM's (or perhaps Mercury Computer's) reference CBE-based platform are now my only choices. Sony's NRE for the PS3 might make their platform a "best buy" price-wise because of the manufacturing volume. But between their heavy involvement in the MPAA, the RIAA, and this DRM issue that makes customer's computers extremely vulnerable, there is no longer any compulsion to give Sony anything other than a "loud, wet rasberry".
will it kill Intel? by Anonymous Coward · 2005-11-10 23:31 · Score: 0

No way. As long as IBM doesn't recognize that these days nobody wants to or can afford to have several different development lines (except maybe Microsoft), literally nobody will port his applications. And nobody will port his application to Java neither. So the only solution to this problem is wxWidgets (http://www.wxwidgets.org/), because it allows to merge all platforms into one single development tree. And the easiest way to start this transition goes through wyoGuide (http://wyoguide.sf.net/).
The NVidia GPU in the PS3 by Animats · 2005-11-12 09:49 · Score: 2, Informative

That's not what Sony is saying:
SCEA press release:
SONY COMPUTER ENTERTAINMENT INC. AND NVIDIA ANNOUNCE JOINT GPU DEVELOPMENT FOR SCEI'S NEXT-GENERATION COMPUTER ENTERTAINMENT SYSTEM> .
TOKYO and SANTA CLARA, CA
DECEMBER 7, 2004
"Sony Computer Entertainment Inc. (SCEI) and NVIDIA Corporation (Nasdaq: NVDA) today announced that the companies have been collaborating on bringing advanced graphics technology and computer entertainment technology to SCEI's highly anticipated next-generation computer entertainment system. Both companies are jointly developing a custom graphics processing unit (GPU) incorporating NVIDIA's next-generation GeForce(TM) and SCEI's system solutions for next-generation computer entertainment systems featuring the Cell* processor".
1. Re:The NVidia GPU in the PS3 by game+kid · 2005-11-20 06:05 · Score: 1
  
  Damn. You win my inaugural Lifetime Parent-Post-Pwnage Award.
  
  --
  You can hold down the "B" button for continuous firing.