That first sentence made no sense, and the second was pointless because gaming performance is a function of the CPU + GPU, so you can't use it to measure CPU performance alone.
I never said otherwise, you twit. Sony claimed that the EE would be twice as fast as a 733Mhz P3. It is at least twice as fast as a 733Mhz P3. The fact that the EE is more burdened in the PS2 really doesn't change that.
No. Each Cell has one main (controller) CPU called a PU, and up to 8 seperate vector CPUs called SPEs. The main CPU is a regular 64-bit POWER processor (with SMT --- IBM's equivalent of hyperthreading), while the APUs are very simple processors with a lot of execution resources and insane bandwidth. Such processors are known as "stream processors" in the literature, because they are designed to handle streams of data.
it's just a different brandname, right?
Yes, "AltiVec" (like "G5") is an Apple/Motorola trademark, so IBM can't use it. And you're right, the AltiVec unit is on the PU.
For what purposes is the VMX more suited?
It's there most likely because if you're running some code that isn't suitable for the SPEs, but does need to do vector computations, you don't have to send it off to the SPEs.
Will the SPEs have this same starvation problem?
Potentially, but probably not. Altivec on the G4 was starved because the G4's bus was exceedingly slow. The SPEs are supposed to be on a shared 128GB/sec internal bus, and the Cell has 100GB/sec of bandwidth to main memory.
That each of the SPEs has 256k of private memory to work with?
Yes. In the Cell model, you design your code in "cells". A cell is a clump of code and data that's copied to the SPE's local memory. The code then runs, streaming in additional data from memory, and using the local memory as a workspace.
Can SPEs freely read other SPEs "local memory", or only their own? And who fills up this memory initially, and who deals with it once it's done?
The SPEs local memories are not connected to each other, so each SPE can only read from its own local memory. The memory is filled up by the PU, when a Cell is loaded onto the SPE. The SPE then runs autonomously, and when it finishes, sends the results back to the PU via main memory.
I.E., do the SPEs have access to main or video memory or other hardware, or do they ever require for the CPU to shuttle data to keep them fed?
The SPEs and the PU all talk to a single DMAC, which has access to main memory.
But then the article seems to be saying the is SPE access to memory is limited-- i.e. it can only be done in block load/stores.
Yes. The DMAC, actually, can only read/write in 1024-bit blocks. This isn't really a big deal if you think about it. When a regular CPU reads a memory address, it doesn't read a byte at a time. It loads a whole cacheline at a time. So a P4, for example, usually reads a 128-byte (1024-bit) block at a time from memory anyway.
Do each of the 8 SPEs actually independently load their own instruction streams?
Yes. All the processor units run seperate instruction streams. Each "software cell" runs in its own thread, if you will.
IBM has developed for a wide range of uses. They are talking about a digital content creation (DCC) workstation with Cell, as well as rackmountable units for scientific computing.
Sorry, I mean the second article out of the two new ones. Blachford's is an older one. Anyhow, I don't see how these aren't real facts. They cite some very important points about the architecture, and to someone who knows how CPUs work, they are quite useful in predicting the performance of the thing. In particular, the fact that the SPEs are in-order tells you a whole lot about how important instruction scheduling is going to be.
With regards to the EE, it wasn't a failure at all. The EE delivered exactly what it promised --- 6.2 gigaflops at 300Mhz. The only people who saw other promises were the ones who bought into the crap they read on CNET or some other piece-of-shit site, and people who have sense have learned to filter those sources out instinctively. In nearly example of supposed "hype" I've seen about the EE, it was the result of a journalist trying to "put the numbers in context" for their clueless readers.
I don't get your point. Yes, IBM is citing theoretical peaks for Cell. That's standard procedure in the industry. I mean, consider all the supercomputers --- the press releases before they are built always show the theoretical peaks!
Where does it make any statements in there that can't be backed up? The EE *is* twice as fast as a 733MHz P3 --- the only reason the XBox has better graphics is it's NVIDIA GPU. It does actually get 6 gigaflops, and it can actually draw 70M polygons per second. It does have a DVD drive, and can connect to the internet. And yeah, it's black. How is any of that hype? Certainly, none of it is at the level of what the OP was claiming, that Sony said the EE would render Toy Story in real time.
Cell != PS3. It will be the PS3 processor, but this article is about Cell, not the PS3. So texture filtering really has nothing to do with this article, because that's a function of the rasterizer Sony puts in the PS3.
The real-time architecture makes sense --- the Cell has timing circuits to enable precise control of when computation happens. It also has a real-time OS that allows programs to make demands about when computations need to be completed.
Die size: 221mm^2 Transistor count: 234m SPE Size: 2.5x5.81mm SPE Interconnect: 4x128bit ring bus SPE local memory: 256KB SPE decode rate: 2 insns/cycle SPE resources: 7 execution units (unspecified type)
They also mention the core voltage of the CPU (1.3V), the fact that the memory has been tested to 5.4GHz, detail the temperature monitoring scheme, and the fact that the SPEs are in-order chips. This is all new information.
I'd love to see where Sony said the EE could "render Toy Story in real time", as opposed to some dittohead journalist. These are presentations at an engineering convention, not some stupid "tech" articles in USA Today. Look at the specs, evaluate the thing on its own merits. Don't make baseless conjectures (which is precisely what you're doing).
Prescott has 125M transistors, while the GeForce 6800 has 222M transistors. And on a tangent: this is typical Slashdot. IBM and Sony announce a 256 gigaflop chip, and Slashdotters' first reaction is to bitch about how hot and noisy it will be! Where are the real nerds in the audience?
Because "assistive" is a positive word, while "disabled" is a negative word. "Assistive Technology" is the standard term in the industry, "Disabled Access" (is it some sort of security thing?) would just confuse those who actually knew what they were talking about.
Actually, Apple's Tiger will get an auto-vectorizing compiler courtesy of the public GCC 4.0 release. The auto-vectorizer wasn't developed in Apple's version of GCC. IBM's GCC team at the Haifa Research Lab developed the vectorizer in the public LNO (loop nest optimization) branch of GCC 4.0. I'm not trying to minimize Apple's contribution here, one of their developers did work on the team, but let's give credit where credit is due.
The set of operations offered by OpenGL is very different from the set of operations offered by the hardware. The drivers have to do an enormous amount of work to convert OpenGL operations into commands the hardware understands. Modern OpenGL drivers include an entire compiler for the GL shading language! Look at the size of an OpenGL driver sometime --- they are several megs of code and data.
Oh, I forgot in my other post. With regards to non-font rendering, GNOME is already planning on using Cairo (a vector-graphics library) for its core drawing routines.
Fonts are already vectorial. If you've got a high-resolution screen, just go to "Preferences", then "Fonts", click the "Details" button, and set the "Resolution" spinner to get the fonts to the right size. For a UXGA laptop, presuming a 15" screen, 133dpi is the proper resolution, and 8-9 is the right-size for the UI font. I have such a screen, and I use Albany AMT 9pt at 130dpi, with sub-pixel anti-aliasing disabled (but the auto-hinter enabled). I've also heard of good results with 8pt or 9pt Tahoma at 133 dpi with sub-pixel AA enabled.
"Assistive Technology" is technology to help the disabled use computers. Stuff like screen magnifiers, screen-readers, high-contrast colors and icons, etc.
Microsoft waits for full releases to support new busses, and they have a much slower release cycle than Linux. For example, Linux supported USB 2.0 (EHCI controllers) natively several months before Microsoft added support to Windows XP in service pack 1. More recently, the latest Linux version just added support for Infiniband, while Windows still doesn't have native support (vendors like Mellanox have to ship their own drivers for it). I think Linux might have supported 1394 or 1394b before Windows too.
I think it's just your particular crowd. Because nearly all online polls I've seen have GNOME and KDE combined having the majority of the market, with WindowMaker having something like 10% or so. I've seen this both on a OSNews and a DebianPlanet poll.
The ACPI spec that's available is not the ACPI spec that's implemented. The Windows ACPI implementation is broken, because it has to run on real hardware, most of whose ACPI implementations are broken. That's the problem Linux has to deal with. That's why ACPI is taking so long, in comparison to stuff like USB, Firewire, or Infiniband, which Linux often supported before Windows did.
Ha ha, bitter Trek loser! The only good Trek in recent memory was DS9, and all the recent movies blow. The franchise is dead, and frankly, it was never very good anyway!
That first sentence made no sense, and the second was pointless because gaming performance is a function of the CPU + GPU, so you can't use it to measure CPU performance alone.
I never said otherwise, you twit. Sony claimed that the EE would be twice as fast as a 733Mhz P3. It is at least twice as fast as a 733Mhz P3. The fact that the EE is more burdened in the PS2 really doesn't change that.
So the CPU is just a normal POWER, right?
No. Each Cell has one main (controller) CPU called a PU, and up to 8 seperate vector CPUs called SPEs. The main CPU is a regular 64-bit POWER processor (with SMT --- IBM's equivalent of hyperthreading), while the APUs are very simple processors with a lot of execution resources and insane bandwidth. Such processors are known as "stream processors" in the literature, because they are designed to handle streams of data.
it's just a different brandname, right?
Yes, "AltiVec" (like "G5") is an Apple/Motorola trademark, so IBM can't use it. And you're right, the AltiVec unit is on the PU.
For what purposes is the VMX more suited?
It's there most likely because if you're running some code that isn't suitable for the SPEs, but does need to do vector computations, you don't have to send it off to the SPEs.
Will the SPEs have this same starvation problem?
Potentially, but probably not. Altivec on the G4 was starved because the G4's bus was exceedingly slow. The SPEs are supposed to be on a shared 128GB/sec internal bus, and the Cell has 100GB/sec of bandwidth to main memory.
That each of the SPEs has 256k of private memory to work with?
Yes. In the Cell model, you design your code in "cells". A cell is a clump of code and data that's copied to the SPE's local memory. The code then runs, streaming in additional data from memory, and using the local memory as a workspace.
Can SPEs freely read other SPEs "local memory", or only their own? And who fills up this memory initially, and who deals with it once it's done?
The SPEs local memories are not connected to each other, so each SPE can only read from its own local memory. The memory is filled up by the PU, when a Cell is loaded onto the SPE. The SPE then runs autonomously, and when it finishes, sends the results back to the PU via main memory.
I.E., do the SPEs have access to main or video memory or other hardware, or do they ever require for the CPU to shuttle data to keep them fed?
The SPEs and the PU all talk to a single DMAC, which has access to main memory.
But then the article seems to be saying the is SPE access to memory is limited-- i.e. it can only be done in block load/stores.
Yes. The DMAC, actually, can only read/write in 1024-bit blocks. This isn't really a big deal if you think about it. When a regular CPU reads a memory address, it doesn't read a byte at a time. It loads a whole cacheline at a time. So a P4, for example, usually reads a 128-byte (1024-bit) block at a time from memory anyway.
Do each of the 8 SPEs actually independently load their own instruction streams?
Yes. All the processor units run seperate instruction streams. Each "software cell" runs in its own thread, if you will.
IBM has developed for a wide range of uses. They are talking about a digital content creation (DCC) workstation with Cell, as well as rackmountable units for scientific computing.
If the POWER chip in the thing runs at a decent clockspeed, it should.
Sorry, I mean the second article out of the two new ones. Blachford's is an older one. Anyhow, I don't see how these aren't real facts. They cite some very important points about the architecture, and to someone who knows how CPUs work, they are quite useful in predicting the performance of the thing. In particular, the fact that the SPEs are in-order tells you a whole lot about how important instruction scheduling is going to be.
With regards to the EE, it wasn't a failure at all. The EE delivered exactly what it promised --- 6.2 gigaflops at 300Mhz. The only people who saw other promises were the ones who bought into the crap they read on CNET or some other piece-of-shit site, and people who have sense have learned to filter those sources out instinctively. In nearly example of supposed "hype" I've seen about the EE, it was the result of a journalist trying to "put the numbers in context" for their clueless readers.
I don't get your point. Yes, IBM is citing theoretical peaks for Cell. That's standard procedure in the industry. I mean, consider all the supercomputers --- the press releases before they are built always show the theoretical peaks!
Where does it make any statements in there that can't be backed up? The EE *is* twice as fast as a 733MHz P3 --- the only reason the XBox has better graphics is it's NVIDIA GPU. It does actually get 6 gigaflops, and it can actually draw 70M polygons per second. It does have a DVD drive, and can connect to the internet. And yeah, it's black. How is any of that hype? Certainly, none of it is at the level of what the OP was claiming, that Sony said the EE would render Toy Story in real time.
Cell != PS3. It will be the PS3 processor, but this article is about Cell, not the PS3. So texture filtering really has nothing to do with this article, because that's a function of the rasterizer Sony puts in the PS3.
The real-time architecture makes sense --- the Cell has timing circuits to enable precise control of when computation happens. It also has a real-time OS that allows programs to make demands about when computations need to be completed.
Did you RTFA? From the second article:
Die size: 221mm^2
Transistor count: 234m
SPE Size: 2.5x5.81mm
SPE Interconnect: 4x128bit ring bus
SPE local memory: 256KB
SPE decode rate: 2 insns/cycle
SPE resources: 7 execution units (unspecified type)
They also mention the core voltage of the CPU (1.3V), the fact that the memory has been tested to 5.4GHz, detail the temperature monitoring scheme, and the fact that the SPEs are in-order chips. This is all new information.
I'd love to see where Sony said the EE could "render Toy Story in real time", as opposed to some dittohead journalist. These are presentations at an engineering convention, not some stupid "tech" articles in USA Today. Look at the specs, evaluate the thing on its own merits. Don't make baseless conjectures (which is precisely what you're doing).
Prescott has 125M transistors, while the GeForce 6800 has 222M transistors. And on a tangent: this is typical Slashdot. IBM and Sony announce a 256 gigaflop chip, and Slashdotters' first reaction is to bitch about how hot and noisy it will be! Where are the real nerds in the audience?
The math makes sense now. 8 APUs, 2 pipelines per APU, 4 operations (SIMD) per clock, at 4GHz. That gives 256 gigaflops!
Because "assistive" is a positive word, while "disabled" is a negative word. "Assistive Technology" is the standard term in the industry, "Disabled Access" (is it some sort of security thing?) would just confuse those who actually knew what they were talking about.
Actually, Apple's Tiger will get an auto-vectorizing compiler courtesy of the public GCC 4.0 release. The auto-vectorizer wasn't developed in Apple's version of GCC. IBM's GCC team at the Haifa Research Lab developed the vectorizer in the public LNO (loop nest optimization) branch of GCC 4.0. I'm not trying to minimize Apple's contribution here, one of their developers did work on the team, but let's give credit where credit is due.
The set of operations offered by OpenGL is very different from the set of operations offered by the hardware. The drivers have to do an enormous amount of work to convert OpenGL operations into commands the hardware understands. Modern OpenGL drivers include an entire compiler for the GL shading language! Look at the size of an OpenGL driver sometime --- they are several megs of code and data.
Oh, I forgot in my other post. With regards to non-font rendering, GNOME is already planning on using Cairo (a vector-graphics library) for its core drawing routines.
Fonts are already vectorial. If you've got a high-resolution screen, just go to "Preferences", then "Fonts", click the "Details" button, and set the "Resolution" spinner to get the fonts to the right size. For a UXGA laptop, presuming a 15" screen, 133dpi is the proper resolution, and 8-9 is the right-size for the UI font. I have such a screen, and I use Albany AMT 9pt at 130dpi, with sub-pixel anti-aliasing disabled (but the auto-hinter enabled). I've also heard of good results with 8pt or 9pt Tahoma at 133 dpi with sub-pixel AA enabled.
"Assistive Technology" is technology to help the disabled use computers. Stuff like screen magnifiers, screen-readers, high-contrast colors and icons, etc.
Microsoft waits for full releases to support new busses, and they have a much slower release cycle than Linux. For example, Linux supported USB 2.0 (EHCI controllers) natively several months before Microsoft added support to Windows XP in service pack 1. More recently, the latest Linux version just added support for Infiniband, while Windows still doesn't have native support (vendors like Mellanox have to ship their own drivers for it). I think Linux might have supported 1394 or 1394b before Windows too.
I think it's just your particular crowd. Because nearly all online polls I've seen have GNOME and KDE combined having the majority of the market, with WindowMaker having something like 10% or so. I've seen this both on a OSNews and a DebianPlanet poll.
The ACPI spec that's available is not the ACPI spec that's implemented. The Windows ACPI implementation is broken, because it has to run on real hardware, most of whose ACPI implementations are broken. That's the problem Linux has to deal with. That's why ACPI is taking so long, in comparison to stuff like USB, Firewire, or Infiniband, which Linux often supported before Windows did.
Culture and history is only important if you're a fan of war? War is a part of the human condition, but it's hardly the only part!
Ha ha, bitter Trek loser! The only good Trek in recent memory was DS9, and all the recent movies blow. The franchise is dead, and frankly, it was never very good anyway!