Slashdot Mirror


Ars Technica's Hannibal on IBM's Cell

endersdouble writes "Ars Technica's Jon "Hannibal" Stokes, known for his many articles on CPU technology, has posted a new article on IBM's new Cell processor. This one is the first part of a series, and covers the processor's approach to caching and control logic. Good read."

17 of 449 comments (clear)

  1. Apple? by tinrobot · · Score: 3, Insightful

    Why do I have the sneaking suspicion that, if successful, this processor will eclipse the PowerPC on the Mac in the next few years?

    1. Re:Apple? by Anonymous Coward · · Score: 1, Insightful

      You sure can, on a basis that you're missing. COST. What a lot of people fail to realize in these sorts of comparisons is that there is only one quantitative metric, and that is price. What features does X solution give you over Y at price point Z? If apple was selling an iBook at a price that would buy you a FooGHz pBlahBlah, then comparisons between the gX and the pY are warranted on an economic (and thus utilitarian) basis.

  2. My article on the new cell processor: by tod_miller · · Score: 2, Insightful

    I want 2 of them, yesterday.

    Aside from my own (competent) review of the cell processor, the article possibly the most insightful and technically nicely balanced articles posted on slashdot in a long while!

    I'll cover more of the Cell's basic architecture, including the mysterious 64-bit POWERPC core that forms the "brains" of this design.

    Looking forward to that... I think that many people will be moving to Mac ... on cell... likely?

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
  3. The real value of the x86 by argoff · · Score: 4, Insightful

    Is that the 386 instruction set and arcitecture is so non proprietary. What made it so popular certainly wasn't that it was better. If I had the dough, I can literally make one and my own fab without asking a single soul. Alot of times it seems companies try to gather into consortiums to mimic the same effect and gather market momentum, but these are doomed to failure because the more valuable the technology becomes - the greater the pressure to diferentiate and fence off some "teritory" for themselves. We saw this happen first hand with UNIX, where all the flavors would constantly try to group under these unified standards - and they made little progress until Linux came along. The CPU world needs somthing similar to protect people from patent harassment. for design, cores, and fabrication.

    1. Re:The real value of the x86 by jd · · Score: 3, Insightful
      True, but at the time it came out, Intel did everything short of pay the US Govt. to take the clone manufacturers out with tac nukes.


      As I recall, at the time, there were lawsuits aplenty by Intel, claiming microcode copyright violations for the most part. The majority of clone makers, though, were making money off the maths co-processor, as Intel's 387 sucked. It was the slowest out there, expensive, with only eight entries on a linear stack.


      By moving the coprocessor into the main CPU, Intel tried to destroy clone makers. Anyone who made just 386 clones or 387 clones would be out of business, and those who made both would be years behind combining them on the same die.


      Well, history shows that far fewer clone makers existed in the 486 era. Wonder why. But even that wasn't apparently good enough, with Intel trying to claim the chip ID was trademarked. The courts threw that one out, which is why Intel switched to using names. You can't trademark a number.


      The Pentium also took some time to clone. No, not because of all the random bugs in the design, but because that's when Intel switched to a hybrid RISC/CISC design. Although it seems to have largely been a cosmetic change, to cash in on the massive publicity surrounding RISC designs at the time, it did put up a major challenge to clone makers, who - for the first time - couldn't just throw the chip together half-assedly and hope to be an order of magnitude faster than Intel.


      Intel DID do a few things, around this time, that were puzzling. Their 486DX-50 was never clock-doubled or clock-quadrupled, the way the DX-33 was. The DX-50 placed far higher demands on the surrounding components, true, but it also gave you higher real-term performance than the DX2-66, because the DX2 wasn't able to drive anything any faster than the DX-33. All it could do was run those instructions it had a little faster.


      Intel are still playing these numbers games, which is why their multi-gigahertz processors aren't noticably any faster. The bottleneck isn't in the computing elements, so faster computing elements won't make for a faster chip.


      IBM's "cell" design seems to be working much more on the bottlenecks, which means that GHz-for-GHz, they should run faster than Intel's chips for the same tasks.


      I think IBM could go further with their design - I think they're being far more conservative than they need be. When you're working in a multi-core environment, you don't always want all parts of the CPU to be in lock-step. It's not efficient to force things to wait, not because of anything they are doing but because some totally unrelated component works at a certain speed and no faster.


      It would make sense, then, for the chip to be asynchronous, at least in places, so that nothing is needlessly held up.


      However, I can easily imagine that a hybrid synchronous/asynchronous chip that is already a hybrid multi-core DSP/CPU would be a much harder sell to industry, so I can see why they'd avoid that strategy. On the other hand, if they could have pulled that off, this could have been a far more amazing press release than it already is.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  4. Perhaps I don't quite understand by Anonymous Coward · · Score: 2, Insightful

    Who would conceivably have enough money to build microchip fabrication facilities but not enough money to license the powerpc architecture?

    "Reverse engineered implementations exist" is not really much of a meaningful strength if you don't own one such reverse engineered implementation already. You say you can potentially build a 386 chip fab, but the thing is you aren't going to build a 386 chip fab, you're going to just keep on buying Intel and AMD chips, the only noteworthy people currently making x86 chips, because if you built a 386 what would you do with it? It's a 386. The ISA has moved on.

  5. Re:More info in these slides by Effugas · · Score: 1, Insightful

    No subsidies required. PS3 will sell enough to write its own ticket. No need to hope others pick up the slack.

  6. Re:If Sony can, Apple can by Namarrgon · · Score: 2, Insightful
    If Sony can fit it in a console and sell a hundred million of them in a year, I'm sure Apple can...

    Sony may be able to do that with the 65nm final design, when it arrives some time in 2006. Then we'll see.

    Even then, there are other considerations that may make it a less-than-ideal fit for a general purpose computer - all those vector units are great for number crunching, but how much of that do you do each day? And when you're not, that's 3/4 of the cost of your chip sitting around idle. There are more cost-effective alternatives.

    64-bit PPC on it has VMX. That's Altivec, baby. Sure, the SPE's don't have the full functionality of VMX but so what.

    Read Part II of the article - it's not a full implementation of VMX (the SPEs don't have VMX at all - they have a different instruction set altogether). Hannibal believes the weak VMX implementation will be a major downside for Apple. Then there's the lack of out-of-order execution etc.

    The biggest issue I see is that the Cell's design requires the programmer to have full control of the machine.

    Not so. That's what operating systems are for. SPEs would be treated as a shared resource - you ask the OS to loan you one, and if you get it, you run your code on it. Or, you ask the OS to run your code, and it schedules it onto an available SPE when it can.

    --
    Why would anyone engrave "Elbereth"?
  7. Re:How do I code this thing?? by grammar+fascist · · Score: 2, Insightful

    If that turns out to be the case, then PS2 programming is a hint towards how it'll work. On the PS2, you generally configured the DMA controller to upload mini programs to the vector units, then DMA-chained data as streams from RAM through the just-uploaded program and onto the destination (usually the GS which rasterised the display).

    Sounds a lot like pixel/vertex shaders. Is this how we're going to get around all our bandwidth problems now? Slice up our programs into little independent fragments and upload them to the CPU to run concurrently?

    --
    I got my Linux laptop at System76.
  8. Doomed until parallel programming is common by rufusdufus · · Score: 3, Insightful

    The difference is that instead of the compiler taking up the slack (as in RISC), a combination of the compiler, the programmer, some very smart scheduling software

    Requiring programmers to learn how to write parallel code that makes good use of this processor seems pretty dicey to me. Few programmers have been trained to write parallel code (most struggle with threading). The fact that no popular programming language has a good parallel model is also a big stumbling block.

    This problem seems to be looming for all the dual core processors, but I havent seen a big effort to teach programmers how to adapt.

  9. Re:Workstation? by PurpleFloyd · · Score: 4, Insightful
    The Cell workstation in question is not a home/office computer; not running Linux because it's hard to install or a scanner won't work is not an issue. The workstation is closer to a Sun or SGI system - very expensive, and faster than almost anything in the x86 world.

    The target market is not home users but rather scientists, animators, engineers, and others who need raw power and aren't concerned with the fact that Word won't work on it; many customers will probably have a cheap PC sitting next to it for office tasks, freeing up the workstation to do nothing but grind through computations. In this world, various unicies are the only serious choice; SGIs run IRIX or Linux, Suns run Solaris or Linux, and IBMs run AIX or Linux.

    Take into account IBM's commitment to Linux, and the fact that many of their customers already use it, and it's almost certain that Linux will be a major OS choice for Cell workstation customers, particularly those working in a mixed-architecture environment. While it's likely to run AIX and a Windows port is possible, it's almost certain that a majority of Cell workstations will be running Linux.

    --

    That's it. I'm no longer part of Team Sanity.
  10. Re:Not useful for scientific computing by taniwha · · Score: 2, Insightful

    the problem is that a multiplier's size is proportional to roughly the square of the things being multiplied - assuming the 64 fp's mantissa is twice the size of a 32-bit one it's going to take 4 times the area (or twice the area of a pair of them) and of course it will eat into your cycle time (both in gates and in wire delay)

  11. Re:Workstation? by TheRaven64 · · Score: 3, Insightful
    The last time Apple tried licensing the OS, it almost killed them. They licensed it completely indiscriminately and lost out at the low end because clones were built using cheaper components and at the high end because SMP clones were cheaper. Licensing to Sony or IBM remains a possibility if the licensing agreement contained some kind of non-competition clause - Apple primarily target the home user, and so would be happy to let IBM have the corporate market if it meant paying them a royalty on every sale and a whole load of free publicity for OS X.

    Apple at the moment is two companies. One is primarily a computer hardware company that makes software to drive hardware sales and sells the entire package as user experience. The other is a consumer electronics company. Last year, the profits made by both companies were about the same. Whether they wish to transition to being a software and consumer electronics company that also makes some niche hardware is a decision they will have to make.

    --
    I am TheRaven on Soylent News
  12. Re:If Sony can, Apple can by TheRaven64 · · Score: 3, Insightful
    Hannibal believes the weak VMX implementation will be a major downside for Apple.

    I am not convinced by this argument. A lot of OS X code uses AltiVec, but very little actually uses it directly. Apple has spent a lot of effort producing libraries that people can use which wrap AltiVec into something higher level (e.g. QuickTime, vDSP). Most of these could potentially be ported to the SPEs. Things like CoreVideo could also make use of the SPEs.

    all those vector units are great for number crunching, but how much of that do you do each day? And when you're not, that's 3/4 of the cost of your chip sitting around idle.

    90% of the time, my 1.5GHz G4 is sitting at 20% utilisation or less. You could argue that 80% of the power of the chip is wasted. However, when I am doing things that tax it they are almost always things that would support a large degree of parallelism.

    --
    I am TheRaven on Soylent News
  13. Re:How do I code this thing?? by TheRaven64 · · Score: 2, Insightful

    First, you will use a language that supports a vector type. The languages used for GPU programming do, and there is a vector extension to C supported by GCC. You will write code that manipulates vectors instead of scalars. And that's about it. You try to keep your working set small, and your compiler will try to fit in the local memory.

    --
    I am TheRaven on Soylent News
  14. Division of labor by chiph · · Score: 3, Insightful

    Reading the article, it reminds me of the typical mainframe architecture, where you have a central supervisory CPU, but most of the specialized work is done by the channel processors.

    In the Cell, the main PPC CPU appears to identify a piece of work that needs to be done, schedules it to run on a SPE, uploads the code snippet to the SPE's LS via DMA transfer, and then goes off and does something else worthwhile while the SPE munches on it. I presume there's an interrupt mechanism to let the PPC know that a SPE has some results to return.

    Compiler writers ought to be able to handle this new architecture well enough -- it's sort of like the current CPU/GPU split, where you've got the main program running on the system CPU, and specialized graphical transform programlets running on the GPU. There may need to be macros or code section identifiers in the source to let the compiler know which to target for that bit of code.

    Obviously, this is just the first iteration of the Cell processor. I can see them widening the SPE from single precision to double precision (for the scientific market -- the game market probably doesn't need it), and going to a multi-core design to reduce the die size.

    Chip H.

  15. Re:Top 7 Myths of the New Cell Processor: by Modab · · Score: 3, Insightful
    You bring up a good point. I gloss over it because the Emotion Engine would have had a bit of the same problems, yet developers eventually figured out how to use it... it all depends on the tools Sony ships to work with the platform, and also on how you view this parallel code executing.

    Comparing it with trying to work with threads definitely brings up nightmare conditions. But I don't think it has to be a nightmare. We use mammoth parallelization all the time and with great success. We hand off all the rendering chores to the GPU when we give it a pointer to data and say "hey, display this", or more modernly a bunch of vectors n' stuff to send down the hardware accelerated pipeline.

    The Cell hardware has the capability to get a developer in trouble, especially if you're trying to write data concurrently, and because you started from a design not specifically made for this chip. But if you focus on pipelines, with a design to avoid simultaneous writes, a lot of problems should vanish, and I believe this is the path people will take, if only because everyone seems to be viewing it as a glorified vector processor from a GPU.


    That last myth is a good one. I had no idea!