Clearspeed Makes Tall Claims for Future Chip
Josuah writes "ClearSpeed Technology announced today a new multithreaded array processor named the CS301. Their press release states the chip can achieve 25Gflops for only 3W of power. New Scientist and TechNewsWorld have articles on this chip, each with more information about the chip. I wondering if this is too good to be true." The key phrase is in the Wired story: "Soon to be in prototype, the chip...". "Soon to be in prototype" is synonymous with "does not exist".
The basic idea is to have lots of "processing elements" that are basically ALUs with a bit of additional smarts (for branches mainly). Each PE has its own memory. The main processor (probably not the PC CPU) tells each PE what to do. Thus the Single Instruction Multiple Data. Things are a bit more complex then this (branches, pointers, and a few other things cause some problems.) but not too much worse. PE to PE communication is also interesting (the Maspar was a toroid as I recall).
The two basic problems with this type of a design are:
There are also a huge number of other problems. Caches don't generally do a darn thing for massive SIMD computers (if one processing element misses, they all do.) The memory usually has two types of pointers (one to the PE memory and one to global memory). I may contact the company to see if they want to hire a short-term consultant. hummm.... Have PhD will travel?
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."
... parallel processing units may perform a lot more ops/sec/watt than one single unit. The speed of a processor depends on the time required to charge and discharge the stray capacitances of its connects, and the impedance of its transistors increases as the drive voltage decreases so the RC time constant goes up and the speed goes down. However, the energy required to charge the capacitance scales as voltage squared, so by accepting a hit on the speed (due to the voltage drop) you can do the same calculation with less energy. Clearspeed seems to be taking parallelism to the sub-processor level in order to reduce heat loads; their operations may take longer to complete, but they can do more operations in the same time as long as the code can use the processors in parallel. Thus the emphasis on "multi-threaded", because it wouldn't work otherwise.
Scientists restrict study to entire physical universe; creationist
Some of the hardware design came from from engineers in Bristol, UK. Companies like Division and INMOS (anyone remember the T800 and T9000 transputer and a Microway board for parallel computing on a PC board more than a decade ago?). The other half of the design team came from UNC computer graphics lab in Chapel Hill. From the well known PixelFlow and PixelPlane machines. That along with a Taiwanese fab plant that would produce these SIMD processors with extra PE (SIMD Processor Engines) that would compensate for the manufacturing errors. eg. Lets say the chip would have 100 PEs so they would manufacture it 120 PEs. Those that didn't work they'd switch off and they wouldn't have to throw away the entire chip.
The story of PixelFusion was unfortunate. They could have rocked the computer graphics world with their scalable tile based rendering technology and efficient manufacturing methods. The programmable PEs would be able to handle both Direct X and Open GL. I suppose now they are trying to focus their investment and IP into more generic applications. I find their claims to be plausible because they have demonstrated innovative chips in the past.
My 2 cents
Without a working prototype they have nothing.
With a working prototype they still have not much.
With a working, and cost-efficient manufacturing process, they have something.
When there are compilers that actually can use this kind of thing, it starts to be somthing that is real.
My guess is they are about a decade from a reliable, usable and cheap product. Suddenly these numbers do not sound impressive at all...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
If I understand the article correctly, it looks like they're implementing a much more powerful version of Apple's Altivec SIMD technology. My question is, if computing power increases 500x using this technology, doesn't memory bandwidth and system bus speed have to increase exponentially as well just to realize any gains?
It seems like putting one of these cards in a PC with today's technology would be like sticking a mainframe behind a 300 baud connection: sure it can handle millions of transactions a second, but you'll never actually see that kind of throughput because memory is so slow.
"When the president does it, that means it's not illegal." - Richard M. Nixon