IBM Full-System Simulator Team Speaks Out

← Back to Stories (view on slashdot.org)

IBM Full-System Simulator Team Speaks Out

Posted by ScuttleMonkey on Tuesday November 29, 2005 @11:04AM from the from-the-horses-mouth dept.

Shell writes "The IBM Full-System Simulator for the Cell Broadband Engine (Cell BE) processor, known inside IBM as codeword Mambo, is a key component of the newly posted offerings on alphaWorks. Meet some of the members of the team that pulled it together, and hear about the simulator in their own words."

115 comments

Min score:

Reason:

Sort:

PS3? by raingrove · 2005-11-29 11:05 · Score: 3, Funny

Does this mean we can emulate PS3? lol
1. Re:PS3? by Anonymous Coward · 2005-11-29 11:15 · Score: 0
  
  > Does this mean we can emulate PS3? lol
  
  No, it means you can simulate the PS3 . . . Simulation is a really, really painful version of emulation that's mostly used by architects.
2. Re:PS3? by garrett714 · 2005-11-29 11:16 · Score: 5, Informative
  
  Yes and no.
  
  While this "simulator" is basically an emulation of the Cell hardware, it won't allow people to run games at full speed. It's more of a developer tool, that allows programmers to start coding for the PS3 when they don't actually have the hardware yet. Still, it is reasonable to believe that emulation of the PS3 will be viable in the future (although not for a long time)
3. Re:PS3? by Anonymous Coward · 2005-11-29 11:31 · Score: 0, Informative
  
  There is virtually zero chance that any x86 system will ever be able to emulate even the first generation Cell chip that is in the PS3 and IBM and other company's server products that are starting to show up now.
  
  First, neither Intel nor AMD will be shipping any thing that even come close to the ~256 Gflops and whatever the Int performance number is of the latest version of the Broadband Engine does.
  
  Second, x86 chips will never be able to emulate the internal ring bus in Cell chips. The killer ring bus inside the chip is really the key to the crazy performance people are getting out of Cell systems.
  
  Intel and AMD pretty much have nothing but slapping additional cores together for the next decade on their roadmaps. And even if they could finally manage to get enough of their x86 cores onto one chip with the same amount of computational performance years from now, they will have nothing like the internal ring bus.
  
  In other words, don't hold your breath waiting to emulate PS3 games on any x86 system...ever.
4. Re:PS3? by garrett714 · 2005-11-29 11:38 · Score: 0, Troll
  
  Did I ever say that it would be possible soon, or worse, possible on x86 hardware? I never made this claim, I was simply saying that emulation of the PS3 in the future is a possibility and if you are going to be a naysayer and claim it's not possible, remember that pretty much every single game console to this day has been emulated (if not perfectly at least to some extent, even the PS2) Anything is possible :-)
5. Re:PS3? by Kuciwalker · 2005-11-29 11:59 · Score: 0, Funny
  
  Yes, but you only get decent fps on a 6.8GHz 1TB RAM 2TB HD laptop.
6. Re:PS3? by moonbender · 2005-11-29 12:08 · Score: 0, Troll
  
  Okay, he never said anything about the "coming years" or about running it on x86. Here, I'll repeat it because it's so easy using copy and paste. He never said anything about the "coming years" or about running it on x86. One more time? Nah, I think two should be enough. Well, three, because he already repated it. And he also said it in the first place. Or didn't say it. See above. Now, please either make a point why we will never be able to emulate the Cell on any other architecture ever or, well, you get the drift. No offense.
  
  --
  Switch back to Slashdot's D1 system.
7. Re:PS3? by joto · 2005-11-29 12:10 · Score: 0, Troll
  
  The architectures are so dissimilar that even innovative emulator techniques like dynamic recompilation wouldn't be able to achieve reasonable performance.
  Ouch, does that count for innovative these days?
8. Re:PS3? by garrett714 · 2005-11-29 12:13 · Score: 0, Troll
  
  Except.. the PS3 is like no other console seen before. So your argument doesn't stand.
  
  First, there is no "argument." All I am arguing is that anything is possible. Am I saying it's likely to happen soon? Not at all.
  
  Maybe we could emulate the Cell arch on x86, but the fact of the matter is that its not reasonable to even hope that any time in the foreseeable future we'll be able to run a PS3 game in an emulated environment at playable speed.
  
  Why is it not reasonable to hope for that? I never said it would be happening anytime soon, I'm just leaving the possibility open. People said the PS2 could never be emulated, yet there are (somewhat) working emulators for it.
  
  The architectures are so dissimilar that even innovative emulator techniques like dynamic recompilation wouldn't be able to achieve reasonable performance.
  
  Honestly I don't know what you are talking about here... emulators "emulate" the hardware of a specific device, allowing *gasp* native binaries to run on a non-native architecture. x86 and PowerPC aren't very similar, yet we have Bochs and PearPC (and many others.)
  
  Look at the Intel and AMD roadmaps as the grandparent noted; we aren't going to have even comparable native performance in the coming years, much less the level of performance that would be required to emulate the platform at full speed.
  
  Once again, I never said it would happen anytime soon, or that any of the current Intel/AMD processors (or near-future, for that matter) would be able to emulate it. I'm talking a LONG TIME before anything would be possible. C'mon, do you think back when the SNES was released that people thought years later we would be able to play any SNES game on your home computer? But give anything time, and someone will code it.
9. Re:PS3? by oGMo · 2005-11-29 12:47 · Score: 1
  
  While this "simulator" is basically an emulation of the Cell hardware, it won't allow people to run games at full speed.
  
  Yeah, remember, the cell is just one component, you've got the GPU to worry about too, and make sure you can match the other system component performance (RAM bus and the like). Not impossible, but consider it took/takes a 100-200MHz intel system to emulate a 3MHz SNES. While other techniques are available (like dynamic recompilation and the like), these only go so far. If you could get the same performance out of lesser parts, there would be no reason to upgrade.
  
  --
  Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
10. Re:PS3? by Poltras · 2005-11-29 14:55 · Score: 1
  
  new in 2011: multicore 8 x86 4.0 Ghz Pentium ZZZ. Can we say more than enough? No? oh well, make it new in 2015, following moore's law: multicore 16 x86 6 Ghz Pentium Mothafucka... want me to go on?
  
  --
  Of Code And Men
11. Re:PS3? by Anonymous Coward · 2005-11-29 16:57 · Score: 0
  
  Do you actually understand what emulation is?
  
  There is no reason why current hardware can't emulate Cell. No reason at all.
  
  It might be slow. It might be fucking slow. it might even be Really. Fucking. Slow. but there's no reason why it can't be done, short of having the emulator to do it.
12. Re:PS3? by ghukov · 2005-11-30 01:38 · Score: 0
  
  256 Gflops ought to be enough for anybody....
  
  --
  ...because Plutonians are teh suck
You WANT A Cell System... by Anonymous Coward · 2005-11-29 11:16 · Score: 1, Interesting

Running Linux on one of these things is simply INSANE.

I have been through a lot of chip transitions over the years and been impressed with the leaps each new generation has made.

But Cell is something entirely different. It is such a HUGE leap in performance beyond x86 systems that to go back to using a x86 machine is unthinkable now for me. I almost feel drunk from the power I have at my hands...

Read up all the Cell info you can at IBM's site and read the various patents IBM, Toshiba, and Sony have out there. And find some way to get your hands on one of these...

I can now see why the PS3 stuff we are seeing is so amazing...
1. Re:You WANT A Cell System... by smashr · 2005-11-29 11:29 · Score: 3, Insightful
  
  Running Linux on one of these things is simply INSANE.
  
  I have been through a lot of chip transitions over the years and been impressed with the leaps each new generation has made.
  
  But Cell is something entirely different. It is such a HUGE leap in performance beyond x86 systems that to go back to using a x86 machine is unthinkable now for me. I almost feel drunk from the power I have at my hands...
  
  Read up all the Cell info you can at IBM's site and read the various patents IBM, Toshiba, and Sony have out there. And find some way to get your hands on one of these...
  
  I can now see why the PS3 stuff we are seeing is so amazing...
  
  Sure, the cell is amazing, IF you are doing the right things. You say that you simply want to leave the old x86 architecture behind but the truth of the matter is that the two do not even begin to compare.
  
  It is not simply a matter of saying "OMG my cell has 8 cores at 4ghz". The main Power Processing Element is crippled at best for simple single threaded applications -- roughly equivalent to a PowerPC of the G3 era, but specifically in-order execution. The SPEs (the other 8 cores) are essentially mini vector computers. They can perform a massive amount of floating point calculations in parrallel, however they do not enjoy an inante ability to deal well with all sorts of code as a standard x86 cpu could.
  
  The cell designers have comptley sacrificed instruction level parrallelism in exchange for thread level parrallelism. It is certainly a valid and interesting way to achieve speed, but not for single threaded applications. -- Don't throw out your x86 just yet.
2. Re:You WANT A Cell System... by game+kid · 2005-11-29 11:33 · Score: 1
  
  Since parent is an Anonymous Coward, it's a bit hard to believe, but if it does have one PowerPC-like thingy controlling 7 or 8 SPEs, it will be teh h0ttz0r (if used correctly - that PS3 better have good antialiasing this time; I notice the jaggies with the PS2 and I don't like 'em).
  
  (offtopic: To whoever has made post #14142136 (sqrt(2)*10000000-rounded-to-nearest integer): please reply, and congratulations. Hopefully you'll get the 31415927th too.)
  
  --
  You can hold down the "B" button for continuous firing.
3. Re:You WANT A Cell System... by garrett714 · 2005-11-29 11:34 · Score: 1
  
  Here's an article about Sony possibly using Linux on the PS3. The chances of this happening are good, we all remember how Sony released the Linux kit for the PS2.
4. Re:You WANT A Cell System... by Anonymous Coward · 2005-11-29 11:34 · Score: 0
  
  Oh god.
  
  Let me guess...you get your Cell info from teamxbox and Intel/AMD fanboy sites like aceshardware...
  
  Right?
  
  "They can perform a massive amount of floating point calculations in parrallel, however they do not enjoy an inante ability to deal well with all sorts of code as a standard x86 cpu could"
  
  Oh please! There is no excuse for such a silly statement, all the info you need is out there.
  
  Read!
5. Re:You WANT A Cell System... by adisakp · 2005-11-29 11:37 · Score: 5, Informative
  
  Running Linux on one of these things is simply INSANE.
  
  I almost feel drunk from the power I have at my hands
  
  Here's some advice from someone who has access to a REAL CELL chip. I hate to disappoint you but aside from custom libraries specifically optimized for CELL, Linux ain't going to run fast on this machine. All the generic open source code targeted towards the general CPU is going to run faster on a Dual-Core Intel or Dual-Proc/Dual-Core Mac. The actual CPU's in this machine are simple pipelined (think Pentium I level of optimizations) vs current gen CPUs (P4 has out-of-order execution, speculative execution, register renaming, branch prediction, etc). While simple C code runs roughly the same speed, complicated C++ constructs are running 2-10X slower on CELL's simplified PowerPC core versus the G5's you'll find in a Mac.
  
  Code needs to be rewritten specifically to take advantage of the actual SPE/SPU's (Synergistic Processing Engines/Units - I prefer SPE since Sony calls their PS1/PS2 sound chip the SPE). Until those Linux libraries appear, CELL isn't going to run anything faster. Not to mention that it will have to be custom code libraries that DON'T run on the MAIN CPU since the SPE's execute different machine code.
6. Re:You WANT A Cell System... by F_Scentura · 2005-11-29 11:38 · Score: 1
  
  "I can now see why the PS3 stuff we are seeing is so amazing..."
  
  Because it's prerendered on Cell processors, naturally ;)
7. Re:You WANT A Cell System... by Anonymous Coward · 2005-11-29 11:39 · Score: 0
  
  Running Linux on one of these things is simply INSANE. LOL - the port will be done before the hardware :)
8. Re:You WANT A Cell System... by Anonymous Coward · 2005-11-29 12:17 · Score: 0
  
  Calm down, calm down, don't buy into all the hype, especially if you're thinking about a general purpose OS like Linux.
  
  IMHO, Microsoft took a better approach with the Xbox 360: it uses three conventional (but fast) PPCs sharing memory (each of which has the altivec instructions). We all know how to program this, it's just like most dual-processor PC motherboards.
  
  With Cell, the one PPC gets just a corner of the chip, so how good is it going to be? You are going to be forced to use the SPEs: but wait, these are not GP CPUs and can only DMA to main memory so it's not going to be pleasant. The SPEs don't have any OS support: meaning no multi-tasking and no memory protection!
  
  Cell uses rambus "xdr" memory with high speed I/Os: this IMHO, is the biggest mistake. The memory will cost twice as much, and according to Samsung's XDR website, they only support point-to-point signaling, meaning the chips can't be bussed, which makes me think that the maximum memory configuration is going to be pretty limited.
  
  Cell uses rambus "flex-I/O" for I/O as well, so what kind of chips can be hooked up to it? I just don't get why people fall for rambus. It's interesting that the Flex-I/O bandwidth is so high (3x the xdr bandwidth): a hint that it's better to have the "SPEs" in the graphics chip.
  
  To get all the wonderful performance, you are forced to use the SPEs, which means a radically different software architecture. Thus, Xbox 360 has made the '05 christmas season, but PS-3 has missed it: no software yet I bet.
  
  I wonder how much power Cell uses? Where are the instruction timing specs for the chip? It would be interesting to try to extract these from the released simulator, assuming it's cycle-accurate.
  
  Note that IBM makes money whether PS-3 or Xbox-360 wins. I bet the politics between the different groups within IBM are fun :-)
9. Re:You WANT A Cell System... by Anonymous Coward · 2005-11-29 12:22 · Score: 0
  
  "Because it's prerendered on Cell processors, naturally ;)"
  
  God it must suck to be you.
10. Re:You WANT A Cell System... by FatherOfONe · 2005-11-29 13:07 · Score: 1
  
  Ok, you appear to be a developer who has experience with all the "major" chips. My question is this:
  For gaming, specifically games with a 3D engine, will the CELL be better than a top of the line P4 or Athlon 64? Let us assume that the entire code has been enginered for every chip. I believe the question that a lot of people have is if the XBOX chip is less powerful than the cell chip in the PS3. Again they want to know if someone wrote Wold of Warcraft or EQII for both platforms, and optimized both to the best they could for the hardware, which would be better and by how much?
  
  I have seen the demos of MGS4 and GT5 for the PS3 and in my opinion it simply blows away anything I have seen or played on the new XBOX. But it makes me wonder how much of this is just pre-rendered stuff. If it isn't pre-renederd then I can't wait for a PS3.
  
  Thanks.
  
  Now personally I just want a JVM for whatever chip is out there. :-) It makes comparing systems much easier for me :-)
  
  --
  The more I learn about science, the more my faith in God increases.
11. Re:You WANT A Cell System... by goMac2500 · 2005-11-29 13:10 · Score: 1
  
  Except Cell doesn't support out of order execution making it not suitable for any normal operating system... i.e. Linux. This is the the same reason why Apple rejected the Cell.
12. Re:You WANT A Cell System... by Anonymous Coward · 2005-11-29 13:52 · Score: 0
  
  I think that, indeed, being a God is not an easy life. Decisions, decisions...
13. Re:You WANT A Cell System... by The+Warlock · 2005-11-29 14:02 · Score: 1
  
  Every last bit of every demo for the PS3 was pre-rendered, or at least, that's the safest assumption to make. Sony did this with their last two systems, too. To be honest, I don't think we'll know how the games really look until the machine is released. I would be very surprised if it was significantly better than the other next-gen consoles. It's going to be about the same, just with more hype, as usual.
  
  --
  I've upped my standards, so up yours.
14. Re:You WANT A Cell System... by epine · 2005-11-29 14:28 · Score: 4, Insightful
  
  The cell designers have comptley sacrificed instruction level parallelism in exchange for thread level parrallelism. It is certainly a valid and interesting way to achieve speed, but not for single threaded applications.
  
  This analysis is incorrect, because it fails to recognize the fixed point. By sacrificing the out-of-order (OOO) mechanisms (which are brutal for heat production) they gained enough thermal headroom to effectively the double the clock rate. In the same thermal envelop, you either get an OOO processor running at 2GHz with three or four issues pathways (three has been the rule under x86) and a very deep pipeline, or you get a processor running at 4GHz with two issue pathways and a relatively short pipeline.
  
  A deep pipeline grants (partial) immunity from stalls and bubbles. A short pipeline grants (partial) immunity from branch misprediction effects. To make the deep pipelines work well, huge investments are required in the branch-prediction unit, which is also infamous for throwing off a lot of heat.
  
  The main Power Processing Element is crippled at best for simple single threaded applications ...
  
  Fortunately for Cell, this is also the wrong denominator for use in this discussion. Applications might be single threaded, but systems are hardly ever single threaded. While the SPU processors handle audio, video, encryption, block I/O and other compute/bandwidth intensive primitives that most systems engage, they also off-loading cache pollution from the main Cell processor threads, both in the data space and in the task scheduling space.
  
  Nothing will ever best the Pentium IV for single thread peak performance with no calorie spared. News flash: Intel has already given up on this flawed approach. The Pentium IV could easily beat the Opteron by cranking itself up to 6GHz if there was any practical way to extract 200W from a small core with no hot spots.
  
  OOO served its purpose in the era where cycle time was paramount and the processor to cache cycle time ratios were in closer balance. Now that heat has become the limiting factor, we'll be seeing a lot less of that from all parties.
  
  The reality in silicon is that we need to start rethinking those portions of the code base which only perform well under an OOO execution regime.
  
  This can be accomplished at so many different levels. The entire OpenSSL library can be recoded for SPU coprocessors with massive speed gains. Existing code can be recompiled with modern compilers which exploit large register sets to offset lack of hardware-level OOO. Key algorithms in system libraries can be recoded using better algorithms or memory access patterns.
  
  Those of you who insist on putting all your eggs into one 100W single threaded basket, it's time to step off the Moore's law express train. Hope you enjoy the milk run.
15. Re:You WANT A Cell System... by epine · 2005-11-29 14:56 · Score: 1
  
  P4 has out-of-order execution, speculative execution, register renaming, branch prediction
  
  All of those features were introduced with the Pentium Pro, which was savaged at the time relative to the Pentium (which is far more like the Cell) because the pre-NT Windows codebase ran like crap in that regime (one factor was partial register stalls, but there were many issues). A decade later the compilers and general codebase has become extremely tweaked in the other direction.
  
  After the new code optimization framework in GCC 4.x has time to mature and fully target Cell, I'd be surprised if those loss factors of 2-10 don't settle down into the range of 1-3.
  
  To see loss in the 2-10 range suggests to me that the Cell is blocking on memory loads far more often than it should be, which could be a compiler fault.
  
  Here is a sequence that's hard to handle at the compiler level lacking OOO in hardware:
  
  a = **p0;
  b = **p1;
  c = **p2;
  
  If one of *p0, *p1, *p2 is an L1 cache miss, an OOO processor will still schedule two of **p0, **p1, **p2 while waiting for the cache miss to complete. This is impossible for a compiler to achieve on non-OOO hardware unless the compiler knows in advance which of those pointers will miss. I tend to refer to this class of optimizations as "stalling in parallel". I suspect C++ does a lot of double indirections to implement vtable mechanics. Maybe this is more of an issue than I thought.
16. Re:You WANT A Cell System... by sukotto · 2005-11-29 14:58 · Score: 1
  
  So... when can I buy a video card with one of these on it?
  
  --
  Come play free flash games on Kongregate!
17. Re:You WANT A Cell System... by F_Scentura · 2005-11-29 15:52 · Score: 1
  
  Oh, terribly.
18. Re:You WANT A Cell System... by RzUpAnmsCwrds · 2005-11-29 17:13 · Score: 2, Interesting
  
  "The Pentium IV could easily beat the Opteron by cranking itself up to 6GHz if there was any practical way to extract 200W from a small core with no hot spots."
  
  Not the case. Among other things, modern code is highly dependant on memory latency. P4 as of late hasn't even been getting 60% of clock; Opteron gets nearly 95%.
  
  Your whole argument is why Intel developed the Itanium. The idea of producing a simpler CPU that is thermally more efficent is a novel one, but time and again we find that you can't erase the last 15 years of CPU innovation. We're still driving gasoline cars, we're still using paper money, and the Opteron still remians highly competitive with the Itanium at a fraction of the transistor count.
19. Re:You WANT A Cell System... by sl3xd · 2005-11-29 18:39 · Score: 1
  
  Considering nVIDIA's engineers have publicly stated that they hadn't finished designing (let alone debugging/testing) the silicon when the PS3 videos were made public, I'd say it's a safe bet to say there's no way in hell the graphics weren't pre-rendered. If the graphics chip wasn't designed yet, then it's not possible to have one fabricated and rendering the movies for E3.
  
  Nintendo has already said that while the Revolution will definitely be an improvement over the GameCube, it won't have the kind of quality you see in the 360 or the PS3.
  
  I have no real love for any of the console makers; but I will say this: I've grown ever so tired of fanboys making up factoids to try to claim their favorite system is better than the rest (espescially when the numbers aren't in their favor, and neither is a side-by-side comparison).
  
  I can honestly say I'm most sick of the Sony fanboys. But this is probably as much a function of the market share as it is of anything else. (And don't try for one second to convince me that the PS2 has better graphics than the GameCube or the X-Box; it's obvious the PS2 is lacking in that area, espescially to someone who has all three systems. Can't the Sony fanboys figure out that a year's development goes a long way in the graphics department?)
  
  Off of this metric, I would say the PS3 will probably have better graphics than the 360 -- simply because the PS3 will have newer technology than the 360. A six-month old graphics card has a significant enough difference to a brand new one to substantiate the belief.
  
  OTOH, the Revolution will probably come out last, and will have the worst graphics of the bunch -- but I'm also confident it will have the higest profit margins by far. There's something to be said about being 'good enough' and inexpensive.
  
  --
  -- Sometimes you have to turn the lights off in order to see.
20. Re:You WANT A Cell System... by faragon · 2005-11-29 20:21 · Score: 1
  
  I prefer SPE since Sony calls their PS1/PS2 sound chip the SPE
  
  As far as I know, Sony call their PS1/PS2 sound chips as SPU and SPU2.
21. Re:You WANT A Cell System... by encia · 2005-11-29 22:07 · Score: 1
  
  This analysis is incorrect...
  
  Note that this dual issue PPE core is a 21 stage pipeline(similar to PIV Northwood), while AMD's K8 is a 12 stage integer and 17 stage FP combo. PPE is not PPC 7447A nor it's PPC 750FX.
  
  "you either get an OOO processor running at 2GHz with three or four issues pathways (three has been the rule under x86)"
  
  Not quite since a K8's macro-op instruction (fix length) is fused with two instructions (one of the instructions must be an address type instruction). K8 issues three macro-op instructions. Currently, dual core K8 clocks at 2.6Ghz with Opteron SE not 2Ghz.
22. Re:You WANT A Cell System... by encia · 2005-11-29 22:20 · Score: 1
  
  >Your whole argument is why Intel developed the Itanium. The idea of producing a simpler CPU that is thermally >more efficent is a novel one, but time and again we find that you can't erase the last 15 years of CPU innovation Itanium still has instruction fusing (i.e. three instructions fused into a single instruction issue) and extensive HW branch support.
23. Re:You WANT A Cell System... by TheRaven64 · 2005-11-30 00:14 · Score: 1
  
  The entire OpenSSL library can be recoded for SPU coprocessors with massive speed gains.
  A minor point, but this probably isn't a sensible thing to do. OpenSSL already supports crypto accellerators, so it would be better to write a kernel module that provided /dev/crypto using an SPU or two (or more, in very high load situations, like an eCommerce server).
  
  --
  I am TheRaven on Soylent News
24. Re:You WANT A Cell System... by TheRaven64 · 2005-11-30 00:22 · Score: 2, Informative
  
  All of those features were introduced with the Pentium Pro, which was savaged at the time relative to the Pentium
  The Pentium Pro ran Windows NT much faster than an equivalent speed Pentium. A lot of the old 16-bit instructions, however, were microcoded rather than being natively executed, and took a few clocks longer. Since much legacy code at the time (games, anything with win16 roots including Window 95) made use of 16 bit instructions, they ran slower. Comparing Windows NT 4 on a 200MHz Pentium Pro and a 200MHz Pentium (which wasn't available for a few years), the Pentium Pro won hands down. By the time the Pentium II (i.e. Pentium Pro MMX) was released, everyone was running 32-bit apps - the only 16-bit apps left were so old that people didn't mind that they were slower than native ones, since they were still much faster than they had been on any CPU designed to run them.
  The only differences between the Pentium Pro and the Pentium II were the addition of MMX, and the removal of the cache from a separate die in the same package to a separate package on the same board, which allowed cache and CPU cores to be tested inedpendently, improving yields.
  
  --
  I am TheRaven on Soylent News
25. Re:You WANT A Cell System... by Thing+1 · 2005-11-30 01:33 · Score: 1
  
  Existing code can be recompiled with modern compilers which exploit large register sets to offset lack of hardware-level OOO.
  
  I saw this quote, and wondered why CPU manufacturers don't create a chip that is flexible. So instead of 8 registers, or 32, or 64, it would allow the programmer to address L1 cache as "registers" and to set aside a variable portion of L1 cache for the program's needs.
  
  --
  I feel fantastic, and I'm still alive.
26. Re:You WANT A Cell System... by s0me1tm · 2005-11-30 02:12 · Score: 1
  
  *cough* 8051 *cough*
27. Re:You WANT A Cell System... by Ilgaz · 2005-11-30 02:25 · Score: 1
  
  It is not the only reason. If it was the only reason, there would be AMD choice at least for workstations.
  
  Apple made an exclusive agreement with Intel.
  
  Half of the stuff Steve Jobs says are lies.
  
  If Apple didn't become a white box Intel builder, there would be a destop variant of the Cell processor. They have chosen to bitch about PowerPC and remove legit benchmark results from their site emberassing Intel CISC stuff.
  
  They trusted the "cult like" zealotry behind them. They were proven right.
28. Re:You WANT A Cell System... by pkhuong · 2005-11-30 02:47 · Score: 1
  
  *cough* 6502 *cough* (the 6502's the one with page 0 and all)
  
  Interestingly enough, it might very well lower performance, rather than improve it, since that makes much, much more state to save during thread switches. It would also just about kill any chance of older programs (even for the same arch) running as well as new ones on each new iteration.
  
  --
  Try Corewar @ www.koth.org - rec.games.corewar
29. Re:You WANT A Cell System... by FatherOfONe · 2005-11-30 05:44 · Score: 1
  
  This is not true. Almost the exact opposite is true. Sony originally told everyone that there were no pre-rendered footage shown, but then found out that some of it was. They informed people but they couldn't comment on what was and what wasn't. But understand that Sony went out of their way to state the stuff shown was NOT pre-rendered. They did not do this with the PS2. Yes this is Sony, but it would look very bad on them to lie on this. Time will tell.
  
  --
  The more I learn about science, the more my faith in God increases.
30. Re:You WANT A Cell System... by adisakp · 2005-11-30 05:55 · Score: 1
  
  Yes, you're correct... that was a typo on my part. I prefer SPE for the Cell Synergistic unit so it *DOESN'T* conflict with the current SPU term we use for the PS1/PS2 sound chips. And while the PS2 official name is SPU2, nearly all developers (i.e. at PS2 Devcon) simply refer to it as SPU when discussing PS2 (that extra "2" is annoying to say a hundred times in a speech.
31. Re:You WANT A Cell System... by adisakp · 2005-11-30 07:41 · Score: 1
  
  FWIW, the another NextGen game system :) has very similar problems with a different compiler since the simplified PPC cores are nearly identical. Complicated C++ code simply runs 2-10X slower on these simple pipelined chips. Straight "C" code runs nearly the same speed.
  
  To see loss in the 2-10 range suggests to me that the Cell is blocking on memory loads far more often than it should be, which could be a compiler fault.
  
  Here is a sequence that's hard to handle at the compiler level lacking OOO in hardware:
  
  a = **p0;
  b = **p1;
  c = **p2;
  
  If one of *p0, *p1, *p2 is an L1 cache miss, an OOO processor will still schedule two of **p0, **p1, **p2 while waiting for the cache miss to complete. This is impossible for a compiler to achieve on non-OOO hardware unless the compiler knows in advance which of those pointers will miss.
  
  Actually, this isn't completely true. If a,b, and c are local variables that have no chance of aliasing (not references) to each other or p0,p1,& p2, *AND* p0,p1&p2 are not volatile, then the compiler can generate implicit intermediates ta=*p0, tb=*p1, tc=*p2. All three of these reads can proceed in interleaved fashion before the dependency reads of a=*ta, b=*tb, c=*tb. If any of the first three result in a L1 cache miss, then the processor should still be able proceed with the other reads under a "hit under miss" cache load. This shaves a couple clock cycles off the execution.
  
  Compilers which perform loads like this also take into advantage the fact that a register which has just been loaded is often not available on the next clock cycle for another dependant load and thus the interleaved reading is faster.
  
  In pseudo assembler...
  
  load t0,(p0)
  load t1,(p1)
  load t2,(p2)
  load a,(t0)
  load b,(t1)
  load c,(t2)
  
  If any of the first three loads stall on a cache miss, the other loads should continue with "hit under miss" loads until the dependant load happens somewhere in the second set of loads. Then the CPU will stall if it can't do OOO (Out-Of-Order) Execution. A CPU without OOO execution can still do out-of-order loads in this case. Usually to force in-order loads, you need to use either uncached addresses or some sort of sync instruction for loads.
32. Re:You WANT A Cell System... by faragon · 2005-11-30 07:54 · Score: 1
  
  I'm sorry too, I adverted that it was a typo after posting. Please, accept my excuses, I do not enjoy pointing the obvious.
mambo? by donour · 2005-11-29 11:24 · Score: 1

I thought mambo was just a generic powerpc machine emulator. Not the cell...
1. Re:mambo? by donour · 2005-11-29 11:27 · Score: 2, Informative
  
  I'm a moron. I should have read the link closer.
2. Re:mambo? by geekoid · 2005-11-29 11:51 · Score: 1
  
  Mod up as +1 informative. ;)
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
True, however by geekoid · 2005-11-29 11:33 · Score: 3, Insightful

when the speed is fast enough that the single threaded applications run fast enough, even if technically crippled, will it matter?

If cell is what what it claims to be, developers will create new applications use multi threaed applications. Compared to 15 years ago, multi-threading is a snap.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
1. Re:True, however by shut_up_man · 2005-11-29 12:19 · Score: 1
  
  What is this "fast enough" that you speak of?
2. Re:True, however by joto · 2005-11-29 12:42 · Score: 1
  
  when the speed is fast enough that the single threaded applications run fast enough, even if technically crippled, will it matter?
  If performance doesn't matter, it doesn't matter. The discussion is moot. Go and buy a cheap 386.
  If cell is what what it claims to be, developers will create new applications use multi threaed applications. Compared to 15 years ago, multi-threading is a snap.
  There seems to be a difference between that the cell claims to be, and what you perceives it to be. The SPUs of the cell will make some specialized things go fast. Gamers will love it. It could also be a musicians- or video-editors dream machine. And quite likely we'll find it in a lot of specialized embedded hardware... But the SPUs of the cell are not designed as, and can not be used as general purpose CPUs. No matter how many SPUs are in the cell processor, it won't make your builds go any faster, allow you to serve more webpages, database-clients, or whatever... For that we need general purpose CPUs.
  Even the supercomputing people won't be able to use the cell (yet), as it's only got 32-bit floating point.
3. Re:True, however by TinyManCan · 2005-11-29 15:01 · Score: 1
  
  Numerical computing can deal with the 32-bit floating point issue pretty easily. Do you think no one did high precision mathematics on 16-bit CPUs? The techniques are old and well understood. Sure it costs some extra cycles, but when you have 8x4ghz going for you, you can easily afford it.
  
  Personally I think that anyone who does similar operations on a large set of data will LOVE the cell. If you can get a pipeline going where each SPU does one step of a larger algorithm, you can stream the data right through.
  
  And I KNOW that security folks are going to love the Cell. Each of the SPUs can be made into encryption/decryption units, all working at the same time. I bet John the Ripper would really chew through some DES password hashes pretty damn quickly.
  
  So, I believe that the Cell will find wider and wider popularity as people get around to thinking about their problems a bit differently, and as the tools begin to automate that process.
4. Re:True, however by Savantissimo · 2005-11-29 18:49 · Score: 1
  
  Yes, it'll be great for many things, but the Cell is not IEEE compliant in its 32-bit arithmetic, so algorithms that depend on denormalized numbers or infinities matching the spec will break. This actually matters for a graphics approach that would otherwise be attractive for the Cell, conformal geometric algebra, where plane primitives are infinite spheres and line primitives are infinite circles.
  
  --
  "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
5. Re:True, however by mwvdlee · 2005-11-29 21:07 · Score: 1
  
  If you have a single CPU which can only run an ADD operand, it doesn't really matter how many thousands times faster it does it than any other processor, you still won't be able to outperform those other processors.
  
  The "problem" is that the Cell architecture is highly specialized; it may take them much more code to do more generic stuff, enough to render it useless. Otherwise; why did they require a PowerPC core on the die as well?
  
  Cell is certainly interesting, and I expect a lot of the performance of it, but we still have to see how it performs in real-life.
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
6. Re:True, however by joto · 2005-11-30 11:49 · Score: 1
  
  Numerical computing can deal with the 32-bit floating point issue pretty easily.
  I am no numerical analyst. But it seems to me that when you really need 8x4Ghz, you are doing some spiffy stuff. The more spiffy stuff you do, the higher precision you are going to need. While I agree that there are plenty of literature on "stable" numerical algorithms, after enough iterations, anything will become less accurate. Add to that the extra cost of developing the program for the cell (lots of workarounds for weak precision and non-ieee floating point semantics, as well as the (intended) complexity of developing for the cell), very few people are going to get interested.
  Do you think no one did high precision mathematics on 16-bit CPUs?
  Of course somebody did them. But most people doing numerical work back then would buy a better computer (or at least a floating point co-processor).
  The techniques are old and well understood. Sure it costs some extra cycles, but when you have 8x4ghz going for you, you can easily afford it.
  So what good is 8x4GHz going to do you if you have to use a software emulation of an FPU? Are you even sure that the SPU can be used efficiently for emulating FPU instructions? Add to that the extra cost of converting your algorithms from a normal computer, and any gains seems pretty small.
  So, I believe that the Cell will find wider and wider popularity as people get around to thinking about their problems a bit differently, and as the tools begin to automate that process.
  So do I. But markets are hard to predict. At the very least, someone must offer it to normal computer users in a more convenient form than a playstation console (such as a PCI-express card). And the architecture must be evolved to give people what they want, which is a line of binary-compatible cell-processors with different price/performance ratios, giving at least some hope of further evolution so people can justify the initial costs of developing software for it.
7. Re:True, however by Gr8Apes · 2005-12-02 09:27 · Score: 1
  
  No matter how many SPUs are in the cell processor, it won't make your builds go any faster, allow you to serve more webpages, database-clients, or whatever... For that we need general purpose CPUs.
  Actually, the new Sun T1 processors (1 floating point, 8 integer core CPUs) are what you'd want to serve more webpages for instance. Certainly not that 10GHz Intel processor coming out any day now [Tm]
  Supercomputing folks can use it, but only for 32bit operations. Depends upon the need, not solely the bits. Not all SC users need 64 bits.
  
  --
  The cesspool just got a check and balance.
Where is my workstation! by Chayak · 2005-11-29 11:49 · Score: 1

It's great that we keep hearing about these things and we know they're out there with some great PS3 demos but all of this comes down to the point that I'm tired of hearing about them until I can turn on my cell workstation. The news I want is the workstation release!
1. Re:Where is my workstation! by garrett714 · 2005-11-29 11:58 · Score: 1
  
  As the other posters have made clear, the PS3 / Cell architecture is not designed for workstations or PCs. It is designed specifically for gaming and for graphics. Yes, you could probably make a good graphics workstation out of the cell architecture, but will it compete with an SGI box? Not likely.
2. Re:Where is my workstation! by captain+igor · 2005-11-29 12:13 · Score: 1
  
  Then why are they releasing a cell based blade server next spring?
3. Re:Where is my workstation! by Wesley+Felter · 2005-11-29 12:16 · Score: 1
  
  That will appeal to the same people who have been buying VME blades full of DSPs or G4s to run their custom signal-processing code.
4. Re:Where is my workstation! by garrett714 · 2005-11-29 12:24 · Score: 2, Informative
  
  I stand corrected. Here is a link to info about the cell based blade servers. One interesting thing to note is at the bottom of the page: "The OS used was Linux 2.6.11" So I guess that kinda disproves all the people saying Linux won't run well on the Cell.
5. Re:Where is my workstation! by the.Ceph · 2005-11-29 15:31 · Score: 1
  
  Not to flame but no one is really saying that. The PlayStation 3 either comes with Linux on it or will have it on the SDK available for it.
Mambo - LOL by ECXStar · 2005-11-29 12:00 · Score: 1

Mambo is the name of an opensource CMS http://www.mamboserver.com./ You would think these guys get out on the net and do a little research before naming a product.
1. Re:Mambo - LOL by Bobke · 2005-11-29 12:15 · Score: 1
  
  From TFA:
  It had to be called something. Before, it was based on a previous product called SIM OS for PowerPC®, and we had to have a new name for it when we made it an IBM-only, proprietary tool. So, it was just a name that didn't have the word SIM in it, since there are so many simulators that have 'SIM' in their name. Then, for alphaWorks, we were forced to give it a more docile name. So, on alphaWorks I guess there is a reference that internally we call it Mambo, but it's called the IBM Full-System Simulator for Cell, or systemsim.
2. Re:Mambo - LOL by FooAtWFU · 2005-11-29 12:18 · Score: 2, Funny
  
  It's called a 'codename'. The real name is apparently 'IBM Full-System Simulator for the Cell Broadband Engine processor'.
  Yes, all of IBM's products are named like that. I mean, every now and again they try to go for something neat and spiffy sounding like "WebSphere", but then they have to munge it all up with "Websphere Application Server" (WAS) and "Websphere Client Technologies Mobile Edition" (WCTME) and so on and so forth. This is normal for IBM, and this is why they really need code-names.
  A related story out of IBM from a distinguished engineer I once made the acquaintance of... He's walking along one day and runs into one of his boss's boss's bosses or something like that. So he says, "I know how we can win the war on drugs." He explains: "We make all drugs legal... and assign exclusive marketing rights to the OS/2 marketing team." Boss-dude tells him he's an asshole; he shrugs: "But you got my point."
  
  --
  The World Wide Web is dying. Soon, we shall have only the Internet.
3. Re:Mambo - LOL by Wesley+Felter · 2005-11-29 12:20 · Score: 1
  
  Every word is already used as the name of a product. And the Mambo simulator was named around 2000 or 2001; when did the Mambo CMS start?
4. Re:Mambo - LOL by garrett714 · 2005-11-29 12:34 · Score: 1
  
  I know this is offtopic, but this whole "Mambo" name reminded me of a funny website my friend showed me years and years ago.
  
  Has anyone else been to www.zombo.com? The infinite is possible at zombocom! The unattainable is unknown at zombocom! Welcome to ZOMBOCOM!!
  
  LOL it's the most pointless site on the web outside of a good laugh, but the funniest thing is that it's been up for years, I wonder who pays for the hosting?
5. Re:Mambo - LOL by Anonymous Coward · 2005-11-29 12:38 · Score: 0
  
  Mambo is the name of an opensource CMS http://www.mamboserver.com./ You would think these guys get out on the net and do a little research before naming a product.
  
  Quick, call the lawyars! Call CNN and Connie Chung! This is an OOOOUTRAAAGGGEEE!
  
  Intellectual Property is being violated and repeatedly sodimized!! Where is the open source community in all of this??!?!?!
  
  Mod Parent Down! Parent is a KNOWN TROLL and an Enemy of the Open Source R-E-V-O-L-U-T-I-O-N-!
6. Re:Mambo - LOL by sukotto · 2005-11-29 15:07 · Score: 1
  
  Distinguished Engineer. IBM-speak for "This guy is so valuable that he can do anything he wants, go anywhere he wants, study anything he wants... and write his own paycheque"
  
  Only the very best get that designation.
  
  --
  Come play free flash games on Kongregate!
7. Re:Mambo - LOL by ECXStar · 2005-11-30 02:21 · Score: 1
  
  Yeah. I'm enemy of OS and a troll, troll! Your funny.. I've been developing OS software for years.. Get a life.
I dunno... by maynard · 2005-11-29 12:13 · Score: 1

...that 256KB local store for each SPU looks like a pretty severe bottleneck. You'll have to limit your execution code and data to this window, otherwise you'll take a severe penalty on fetch to main memory. The PPU isn't much to brag about in comparison to a modern G4 or G5, so your task damn well better make use of those SPUs or performance will seriously suck in comparison to a modern CPU. So, it looks to me like this thing will be amazing for lots of small, jobs like several tiny monte carlo sims each running in an SPU. But for real data analysis, it's going to depend on the project requirements - which could easily demand more than 256KB for local store. Then you're SOL....

Would love to read some folks post on how they plan to use the broadband interconnect to chain code and data for solving larger problems, and what limitations they see in this arch. --M
1. Re:I dunno... by Anonymous Coward · 2005-11-29 16:30 · Score: 0
  
  You are assuming a static program and data in the local store. But Cell was designed for streaming applications. Specifically, if the program fits in the local store and you use the DMA properly to stream in/out the data, you do not have an LS bottleneck. The DMA is, in a sense, async load/store and therefore not only relieves the LS bottleneck, but also can relieve the SPU of a lot of code to move the data around. There's already some discussion of this in the IBM discussion forum for Cell: http://www-128.ibm.com/developerworks/forums/dw_th read.jsp?forum=739&thread=97698&cat=46
Praise for Cell by acidblood · 2005-11-29 12:34 · Score: 5, Informative

I've been running the simulator here, and managed to port the distributed.net client to it. The performance of current cores in the PPE is so-so (worse than the G4 in my Mac Mini), although I'm sure it would improve by proper optimization. The SPE is a completely different matter though. I wrote an RC5-72 core for it that should achieve ~190 Mkeys/s on 8 SPEs at 3.2 GHz, which is by itself almost ten times faster than the current fastest processor (G5 at 2.7 GHz, which clocks at 20 Mkeys/s, IIRC). For embarassingly parallel applications like key cracking, this thing is a dream.

Some technical details: the SPE's instruction set could be though of as `Altivec plus'. It has most of the functionality of Altivec (so far I've only missed a byte addition instruction), but quite a few improvements, like immediate operands for many instructions, immediate loads with much better range than Altivec's splat instruction, the addition of double precision floating point operations, etc. I'm sure there are more improvements, but these are the ones I noticed from my limited experience with Altivec. Instruction scheduling for this processor is remarkably similar to that of the first Pentium: it's dual issue with static scheduling, there are some conditions on pairable instructions and their ordering to ensure dual issue, and so on. The high latencies for instructions (2 for most integer arithmetic, 4 for shifts and rotates) are problematic, but the huge register file of 128 entries is very helpful to implement techniques like software pipelining which help mask these latencies. The local store is a mixed bag -- dealing with arrays larger than the local store should be challenging, but if you don't have to worry about it, it's great to have a fixed latency of 6 cycles for loads and stores, no need to worry about cache effects and so on. Actually, the local store behaves a lot like a programmer-addressable cache, which has some benefits compared to traditional cache: specifically, less control overhead per memory cell (so more logic can be packed in the same space) and, as a consequence, the potential for higher speeds and/or smaller latencies.

Overall, I'm very impressed with Cell, but for now I've only programmed toy examples and I'm sure to hit some limits of the architecture once I start looking at real-world code.

--
Join the NFSNET. Our prime goal is making little numbers out of big ones. http://www.nfsnet.org/
1. Re:Praise for Cell by maynard · 2005-11-29 13:11 · Score: 1
  
  Hi.
  
  Could you speak more to performance issues when dealing with code/data that exceeds the 256K SPU local store? It looks to me like fetches from RAM are a real bottleneck, so if you want performance you need to keep code/data within each SPU. If you can chain a series of algorithms and move data down the chain this is a win. But if you need to manipulate a huge data block you're SOL. I can see the Cell being a huge win for say a series of Monte Carlo sims running in each SPU, but am it looks like a lose once you exceed local store. But you seem to be saying that idle fetch cycles aren't so bad. Would love to get an some input on this issue from someone actually coding the chip. (already posted a comment to this effect elsewhere, but you seem to be one of the few with real world experience posting here) Thanks a bunch! --M
2. Re:Praise for Cell by imsabbel · 2005-11-29 13:17 · Score: 1
  
  Well, RC5 IS pointless and emberassingly parallel...
  (I still remember when distributed.net was running RC-56 or something for 8 months on 100k machines, and some people just made some asics that had 100+ parallel key-piplines and build a a machine that could exhaust the keyspace in 3 days or so...
  
  So i wouldnt be too optimistic because of that little performance point....
  
  --
  HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
3. Re:Praise for Cell by Anonymous Coward · 2005-11-29 13:40 · Score: 0
  
  Dude, I have no idea what the fuck you just wrote, but, damn, it made you sound smart.
4. Re:Praise for Cell by acidblood · 2005-11-29 13:58 · Score: 1
  
  May I suggest that you attend a computer architecture class? Most of these things should be discussed even in undergraduate-level computer architecture classes.
  
  --
  Join the NFSNET. Our prime goal is making little numbers out of big ones. http://www.nfsnet.org/
5. Re:Praise for Cell by acidblood · 2005-11-29 14:06 · Score: 1
  
  Well, RC5 IS pointless
  
  Not as much as it seems at first glance. But that's a discussion for another day.
  
  (I still remember when distributed.net was running RC-56 or something for 8 months on 100k machines, and some people just made some asics that had 100+ parallel key-piplines and build a a machine that could exhaust the keyspace in 3 days or so...
  
  The chips were built by the EFF, and actually they cracked not RC5-56 but DES, which is also a 56-bit-key cipher but far more widespread. Also, by the time of the last DES contest, the processing power of DES Cracker and the distributed.net network was in the same ballpark, so that actually they teamed up and ended up breaking DES in less than 24 hours.
  
  So i wouldnt be too optimistic because of that little performance point....
  
  It is a pretty important point in my neck of the woods. I wouldn't be surprised if an AES implementation (running in counter mode, of course, it'd be pointless otherwise) clocked at terabits/s with this thing. And considering it has an interconnect to match (100 GB/s, which is 0.8 Tb/s), this is certainly something to be impressed about.
  
  Moreover, if you want more `useful' results, go have a look at the figures for matrix multiplication and FFT code written by IBM. They're as impressive within their domain as this RC5 figure is.
  
  --
  Join the NFSNET. Our prime goal is making little numbers out of big ones. http://www.nfsnet.org/
6. Re:Praise for Cell by acidblood · 2005-11-29 14:28 · Score: 2, Informative
  
  Could you speak more to performance issues when dealing with code/data that exceeds the 256K SPU local store?
  
  I'll try, but take my opinion with a grain of salt as I didn't do anything beyond coding an RC5-72 core, which doesn't involve external memory accesses.
  
  It looks to me like fetches from RAM are a real bottleneck, so if you want performance you need to keep code/data within each SPU. If you can chain a series of algorithms and move data down the chain this is a win. But if you need to manipulate a huge data block you're SOL.
  
  Sure it'd be impossible to keep this thing completely fed, but I hear the RAM specs are pretty impressive, using some new-fangled XD-RAM technology from Rambus. Still, the computational power of the SPEs is huge and it's sure to be RAM-starved unless the programmers take a lot of care.
  
  Do realize though that this thing has a monster 100 GB/s interconnect. I would gather sending reasonable amounts of data back and forth between the SPUs is feasible, so perhaps operating on 8*256 KB = 2 MB datasets might be possible.
  
  Beyond this, I think programmers would look at the Cell like they do at a NUMA box or clusters -- assume fetching remote data is costly and program to that paradigm. Not as costly as it is for clusters, even those with fancy interconnects; more like NUMA boxes. Hence, lots of blocking algorithms and stuff like 4-step FFTs. IBM is suggesting techniques using double-buffering which seem to be working well.
  
  I can see the Cell being a huge win for say a series of Monte Carlo sims running in each SPU, but am it looks like a lose once you exceed local store.
  
  That depends on your workloads, in particular your access patterns. Sequential and blocking access patterns should do just fine.
  
  What makes me pretty hopeful about the potential performance of Cell is that we're currently getting by pretty well with our CPUs with fast L2 cache of similar size (256 KB was pretty common 3 or 4 years ago) and slow memory accesses. The situation is pretty similar with Cell, save that the local store is directly addressable as opposed to transparent like caches are, and I see that as a big win actually -- being able to manage the local store and only make explicit memory accesses should help spot and fix bottlenecks, without the need to worry whether the target CPU will have 512 KB or 1 MB or 2 MB of cache. Of course, having 8 high-clocked SPEs processing 128-bit vectors will impose a much higher burden on memory than your run-of-the-mill Pentium 4 currently does, but I'm hoping that XD-RAM will be up to the challenge.
  
  But you seem to be saying that idle fetch cycles aren't so bad.
  
  You may be mixing things up. What I said was that local store accesses had a fixed latency of 6 cycles.
  
  you seem to be one of the few with real world experience posting here
  
  I don't think a couple of afternoons writing code qualifies as real-world experience, but there you go.
  
  --
  Join the NFSNET. Our prime goal is making little numbers out of big ones. http://www.nfsnet.org/
7. Re:Praise for Cell by TheRaven64 · 2005-11-30 00:38 · Score: 1
  
  Just for anyone wondering what the real-world point of such crypto power is outside of specialised circles:
  Using an SPE initialised with an AES decoder and encoder would mean that every single block loaded from or stored to the disk (including the swap file) could be AES encrypted with very little performance penalty. This would be a very nice feature in a laptop, since anyone who stole it would have no way of accessing the original user's files.
  
  --
  I am TheRaven on Soylent News
Mambo? by highwaytohell · 2005-11-29 12:40 · Score: 1

Mambo is the name of a clothing brand as well. The company is run by Reg Mombassa who was in an Australian band called Mental As Anything. In retrospect it sounds apt. The whole six degrees of separation thing works! You'd have to be "mental as anything" to think that cell is going to make x86 obsolete so easily.
Amazing Cell Demo by doctor_no · 2005-11-29 12:53 · Score: 5, Interesting

Here is an impressive "virtual mirror" demo using the Cell processor put on by Toshiba. Basically, using a video camera, it can make a 3D model of the person in front of a the camera on the fly. Then it can manipulate the 3D model to change make-up, hair-styles, etc, basically a virtual magic mirror. Really demonstrates the truly unique features these more powerful processors will offer.

http://techon.nikkeibp.co.jp/lsi/images/toshiba_ce ll.mpg

http://techon.nikkeibp.co.jp/english/NEWS_EN/20051 013/109623/
1. Re:Amazing Cell Demo by Anonymous Coward · 2005-11-29 16:52 · Score: 0
  
  That just broke my brain.
2. Re:Amazing Cell Demo by DigiShaman · 2005-11-29 17:06 · Score: 2, Funny
  
  Damn!! Was that really real-time? I'm almost wanting to call it's bluff and say it was all choreographed. With the right AI program optimized for multi-threading, we could have HAL if enough CELL chips thrown at it. It may be crude, but it's worth a shot! Imagine the real-world application.
  
  "HAL: how much unread e-mail do I have?"
  
  "HAL: please set my alarm for 7:30am"
  
  "HAL: using google maps, please tell me how many miles and ETA it will be going from X to Y"
  
  and my favorite...
  
  "HAL: based on historical trends in the stock market, what do you calculate as being the best investment for quick returns"
  
  --
  Life is not for the lazy.
3. Re:Amazing Cell Demo by Anonymous Coward · 2005-11-29 19:32 · Score: 0
  
  Imagine the sex games this will make possible. Those SPEs are very good at compression and will make responsive netplay with dildonics a reality. Dildonics systems are still fairly primitive for both males and females at the this point, but there is plenty of money to be made for portable equipment that can emulate penetration from bother perspectives. It's just a matter of when.
  
  Holy crap, I want one NOW!
4. Re:Amazing Cell Demo by Anonymous Coward · 2005-11-29 21:01 · Score: 1, Informative
  
  Apologies for A/C. This is probably a little less than a full 3D model construction. Having seen a real-time demo of a "morphable model" the almost certainly use priors on face shape.
  
  "First, the applications capture a user's face with a camera and detect the position of key features of the face, including the eyes, nose and mouth, using image recognition technology."
  
  this can be done real time quite effectively right now:
  
  http://citeseer.ist.psu.edu/rd/95418640%2C476373%2 C1%2C0.25%2CDownload/http%3AqSqqSqwww.merl.comqSqp eopleqSqviolaqSqresearchqSqpublicationsqSqICCV01-V iola-Jones.pdf
  
  "By matching the 2D positions of these key features to a computer graphic image using a 3D face model, the applications estimate what direction the user is facing and the 3D positions of the face's 500 features."
  
  Having seen a real-time morphable model demo from Toshba at ICCV2003 this is probably a similar approach to this:
  
  http://gravis.cs.unibas.ch/Sigg99.html
  
  (my PhD thesis includes this area - not on my site yet, but I have a paper on MM fitting at )
  http://www.robots.ox.ac.uk/~jamie/paterson03.html
  
  Cheers.
x86, x86_64, or PPC best for mambo simulator? by Legendre · 2005-11-29 12:58 · Score: 1

I don't own a machine that meets the simulator's minimum system requirement (namely, 2.0GHz or higher), but I'm so curious about it, that I'm willing to buy a new box just to try Mambo with CBE sim. So, what hardware platform is best for the simulator software?
1. Re:x86, x86_64, or PPC best for mambo simulator? by garrett714 · 2005-11-29 13:32 · Score: 1
  
  PPC would probably be the best as it's the closest relative, however I think you are missing the point. Unless you are a developer with lots and LOTS of experience coding, this simulator would most likely be worthless to you. It's not going to show you anything meaningful about the PS3 (once again unless you are a developer) and you are most likely to become confused by even trying to run it. I'm not trying to say that I personally would have any better luck running it. It's just this really isn't a "toy" to play with unless you know what you are doing.
2. Re:x86, x86_64, or PPC best for mambo simulator? by Legendre · 2005-11-29 15:09 · Score: 1
  Thanks for the suggestion, Mr. Garrett. I'll go pick up a PPC. That's also the processor used inside BlueGene, no?
  As for my plans for the Cell, I was thinking of writing either:
  
  A toy hard real-time OS or
  
  A toy Fortran compiler for it
  
  I don't care for the PS3/games, although I'll probably pick one up as well, just to get inside the Cell. There should be a stand-alone Cell workstation, really.
  (Ex-IBMer here! Good job guys!).
3. Re:x86, x86_64, or PPC best for mambo simulator? by jjd1_dement · 2005-11-29 16:44 · Score: 2, Informative
  
  Actually the 2GHz requirement is overstated. We (ich bin ein IBMer) have run the simulator on laptops in the 1GHz range without any problems. But don't let me ruin your excuse to get a nice new computer!
4. Re:x86, x86_64, or PPC best for mambo simulator? by Ilgaz · 2005-11-30 02:17 · Score: 1
  
  I wonder that "Ghz" too. Is it Intel Ghz or PowerPC Ghz?
  
  A 1600 Mhz G5 can easily count as 2 Ghz P4 for example.
  
  (Don't tell Mr. Jobs about it)
Re:Attn Slashdot readers by Anonymous Coward · 2005-11-29 13:06 · Score: 0

I put my balls on your face. WHILE YOU SLEEP!
2.8GHz Athlon 64 by Wesley+Felter · 2005-11-29 13:33 · Score: 1

The highest-clocked K8 is probably your best bet; a 3.8GHz Pentium 4 probably wouldn't be bad either.
Summary of Article by Anonymous Coward · 2005-11-29 15:12 · Score: 0

The CBE from IBM is based on SIM.
It has SLBs and TLBs for the PPEs, and SPE for modelling on the EIB.
STI uses an API and TCL for creating SPE or SPU RTEs on AIX and PS3.
thanks a bunch! /nt by Anonymous Coward · 2005-11-29 15:24 · Score: 0

. ..
Freeze! by Anonymous Coward · 2005-11-29 16:29 · Score: 0

Take the reefer out of your mouth and put it down, turn off the Footloose DVD, step away from the google, and put both of your hands where the men in little white coats can see them.
connotations by penguin-collective · 2005-11-29 18:20 · Score: 1

The term "speaks out" has connotations, like revealing a dirty secret, which doesn't seem to be the case here. I think it would be prudent to choose one's headlines a little more carefully.
Obligatory by Bombula · 2005-11-29 18:34 · Score: 1

Daddy loves mambo...
/sorry

--
A-Bomb
OOO isn't going away... by YesIAmAScript · 2005-11-29 19:00 · Score: 2, Insightful

I do agree with your assessments of the value of non-OOO processors.

But there's one thing OOO does that these processors will never do. That is efficiently run code that was not properly scheduled.

Now, why would you generate code with the wrong scheduling? Well, you wouldn't do so on purpose. But in the field PCs frequently encounter it. This code is code that was scheduled for a different processor. As instruction latencies, CPU clocks and memory latencies change the optimal instruction order changes.

So on any system which has to run legacy code, OOO is necessary to have good performance.

And that means PCs are unlikely to go to non-OOO processors soon. No company wants to have to be afraid to release a new processor because it won't run existing versions of Windows (or Mac OS X) as well as older machines because it hasn't been recompiled with a new scheduling. Remember what happened to Pentium Pro? It didn't run legacy code well, and unfortunately the popular OS at the time (Windows 95) was all legacy code.

On the other hand, it makes total sense for a system like PS3 or Xbox 360 where there are a large number of examples of a system which are exactly the same, down to the RAM timings, and the code run on it was compiled specifically for it.

Addtionally, to mix in other arguments, I agree P IV could generate significant performance if it didn't run out of thermal headroom. You would need good caches and such but despite what the other poster says both Intel and AMD are affected similarly with memory latency and bandwidth issues. Perhaps AMD fares somewhat better. But not so much better that if the P4 were running at double its current clock rate that it wouldn't mop the floor with the AMD.

--
http://lkml.org/lkml/2005/8/20/95
1. Re:OOO isn't going away... by Gr8Apes · 2005-12-02 09:56 · Score: 1
  
  Addtionally, to mix in other arguments, I agree P IV could generate significant performance if it didn't run out of thermal headroom. You would need good caches and such but despite what the other poster says both Intel and AMD are affected similarly with memory latency and bandwidth issues. Perhaps AMD fares somewhat better. But not so much better that if the P4 were running at double its current clock rate that it wouldn't mop the floor with the AMD.
  And you could make the exact same argument about AMD mopping the floor even in that case, by cranking up the clock on the AMD. People have overclocked the 144/146 Opterons to 3.2GHz stable. I'd love to see how AMD's best would compared with Intel's best overclocked.
  And with AMD going to DDR2 support, since we're in hypothetical discussions anyways, what would the top-end AMD chip overclocked run, since they already mop the floor with Intel's best in stock configurations? Never mind the dual-core lop-sided comparisons.
  Dollar for dollar, AMD's chips really are your only choice these days, and I plan on having one in my next PC system as my P4 2.4 is about at the end of its useful gaming life. Looking forward to the Q1 price drops.
  
  --
  The cesspool just got a check and balance.
Sex Games by EMIce · 2005-11-29 19:38 · Score: 1

Leave it up to the Japanese to come up with this one.
Buy where??? by Anonymous Coward · 2005-11-29 21:28 · Score: 0

So what, i can't buy a Cell proc anyway. Let them first sort out all the details like main system/board architecture, compilers, software and then we wil see.

I think IBM still has a long way to go. Whats their timeline on this one , 1 light year?

And for the PS3, it still isn't out , or is it??

To me this is still a lot of shouting (about nothing), but no real substance that is usable to me, like a complete computer based on Cell (no PS3 doesn't count).

My 2 cents,
M
1. Re:Buy where??? by McPolu · 2005-11-30 02:50 · Score: 1
  
  Whats their timeline on this one , 1 light year?
  
  Actually, 1 light year is exactly the same amount of time than 1 tortoise year.
2. Re:Buy where??? by faragon · 2005-11-30 03:47 · Score: 1
  
  Well, just one year of waiting, having the emulator, it is no such a huge amount of time, unless you've caught by GFLOP obsession ;-)
  
  P.S. Are you the barrapunto.com's McPolu? Nice to see you here.
3. Re:Buy where??? by McPolu · 2005-11-30 04:10 · Score: 1
  
  Hello, yes, my name is Conner McPolu of the clan McPolu and I was born on the shores of barrapunto in the year of our lord 1524 and I cannot die :P
  
  It looks like barrapunto.com is not enough fun for me while I am waiting for my unit tests to complete ;-)
First Post! by Anonymous Coward · 2005-11-29 22:34 · Score: 0

FP
Yeah, but... by Maizdog · 2005-11-30 00:51 · Score: 1

it still takes 15 seconds to open Adobe Acro... Oh, nevermind. -M
Half of Mac community repeats in mind by Ilgaz · 2005-11-30 02:21 · Score: 1

"This story has nothing to do with mactel"

When there is IBM and a SORT OF (read zealots) PowerPC story like this happens, you gotta concentrate too much not to think about Mactel.

It is my personal point of view and I am kind of emberassed that whole Mac community became Intel zealots in 1 night.
Any chance of seeing Cell on a PCI-X card? by CTachyon · 2005-11-30 05:38 · Score: 2

As everyone seems to agree that running general-purpose code (e.g. Linux) on a Cell is going to be unpleasant thanks to the dumbing down of the PowerPC at the core, I was wondering what the odds are of seeing this as an add-on for doing vector-friendly operations. While I don't see people rushing out to install a Cell just for the hell of it, what are the chances that e.g. future crypto-offload accelerators or even 3D video cards might use one of these puppies?

--
Range Voting: preference intensity matters
you could make that argument... by YesIAmAScript · 2005-12-02 16:06 · Score: 1

Except unlike P IV, AMD's chips were designed properly.

P IV was designed to run at 6GHz or something. And gate-delay wise, they could probably do it with minimal changes. Except then it produces too much heat due to transistor switching that it can't be cooled properly.

AMD's chips however, were designed to run at the speeds they are running at. To make them go 4.4GHz would require redesigning them. But yes, they would also be much faster at those speeds.

So, the argument could be made for AMD, but it's not as valid.

Now, despite all this, AMDs design is the better one, the chip can reach its potential. P IV cannot really.

I'm sure AMDs new DDR2 chips will be very fast, as their current DDR ones are also.

AMDs are definitely the price performance leader in single-core right now. In double core, Intel is faster per dollar in the low-end config. But despite this, my current machine is an AMD A64 X2 4200+. I love it, works great, real fast, not too much heat. My previous machine was a 3.0GHz/800FSB (Northwood) P4, and it was fast too (though significantly less so), and ran a heck of a lot less hot than my previous machine, an Athlon XP 1700+, despite being a lot faster.

--
http://lkml.org/lkml/2005/8/20/95