Clearspeed Makes Tall Claims for Future Chip
Josuah writes "ClearSpeed Technology announced today a new multithreaded array processor named the CS301. Their press release states the chip can achieve 25Gflops for only 3W of power. New Scientist and TechNewsWorld have articles on this chip, each with more information about the chip. I wondering if this is too good to be true." The key phrase is in the Wired story: "Soon to be in prototype, the chip...". "Soon to be in prototype" is synonymous with "does not exist".
Besides, it isn't real until it is used in some computer somewhere
Today it was announced that Duke Nukem Forever would be optimized to run on the new CS301 processor develpoed by a new firm called ClearSpeed Technology. It is said that with this newfound processing speed, Duke Nukem Forever will be the most realistic game ever realeased.
---- Move SIG...For great justice!
Why would they release a story on something that isn't even in prototype? Seems silly to me. I have plans for a 200GHZ chip, but I still have to make a prototype, film at 11!
It would be interesting though.
Der Tod ist der einzige Weg hier raus!
Oh, right on. It's about time someone started developing a mass-market Loch Ness monster.
We could put 32 or so of these in a computer and generate the same ammount of heat as, say a Pentium IV but with almost a Terflop of performance? This strikes me as too good to be true...
Chips are virtually fabricated and tested well before the first bit of silicon is etched....you can actually be pretty sure of both a chips performance and reliability just from simulations these days. Also, having to etch development chips constantly is both expensive and time consuming....so the longer you can leave a design in virtual space, the better.
-psy
The key phrase is in the Wired story...
No, the key phrase is this is too good to be true
SpamNet - a spam blocker that really works
Saturday.. Saturday..Saturday!!
Clearspeed will tear through the competition with their awesome megachip the CS301!!
3 WATTS of ThrowYouBackInYourSeat Power! Twice the speed! Twice Twice! Twice!
When it comes to market, the chip will likely be sold to consumers as a co-processor -- an add-on PCI card that works in parallel with a PC's main processor
It's not replacing our current processors. It is just helping them with intensive floating-point calculations. Is that really going to be helpful to the average user? Keith
I thinking it is!
Slashdotter are stupid and biased.
As well as the fact that I've seen this press release trolled by AC's on Slashdot.
25Gflops on 3W? That must be some unorthodox technology at work there. Anyone hear anything about some research corporation finding an amazing processor in a robot from the future?
...
Only $16,000! I'll take two!
But where's the desktop bus bandwidth supposed to come from? I Think it'll choke on my PC133 RAM. Whatever desktop machine they're targeting is what I want for Christmas.
of disk space will be all anyone would ever need. But really, what requires 25G flops?
Maybe if we decide to model "Life, the Universe and Everything?"
.... of Starbridge systems....
Jaysyn
There is a war going on for your mind.
I'm reminded of all the promises we heard for the Transmeta chip, only a fraction of which are being realized. And they have an actual product to demonstrate, mind you.
Yeah, it sounds like wishful thinking. I have little faith in processors from unknown companies that claim to do what Intel, AMD and IBM combined haven't yet been able to achieve.
I can't wait until they come out with diamond based processors. They can provide peformance in the 200Ghz range, and now that that fabrication methods to produce flawless diamonds has been perfected. Read the article in Wired a few months ago for more info about this.
So, in that kind of light, can anybody here with thermodynamic knowledge compare the total number of phase changes required for this speed versus the energy which has been claimed it needs?
And its still an article?
Slow news day I guess...
It looks like a math coprocessor, to be used in combination with a regular CPU. Remember how, back in the day, math coprocessors were an option for your Macintosh or 286 PC? Kind of like that.
No more having to wait a minute for it to load!
now they just need to hire Linus Torvalds and we'll be good to go!
Not just any old vaporware, HARD vaporware!
Sorry, got a 500 Server error when posting that, and didn't Plain Text it.
ClearSpeed Announces CS301 Multi-Threaded Array Processor to Deliver High Performance Computing and Power Efficiency
October 14th, 2003
Highest Floating Point Performance Chip Executes Complex Mathematical Applications in a Fraction of the Power and Time.
SAN JOSE, Calif., October 14th, 2003 -- ClearSpeed Technology, a leading provider of high performance, low power chip-based solutions, today announced the ClearSpeed CS301, a multi-threaded array processor that enables dramatic improvements in performance and power consumption for intensive floating point applications. At over 25 GFLOPS peak performance, the new chip provides more than twice the processing speed of competitive products. At 10 GFLOPS per Watt, power consumption is also twenty times more efficient. As a result, the CS301 delivers up to a ninety percent reduction in purchase price and running costs, making high performance computing affordable and available to companies of all sizes.
"With conventional processor design, increasing performance has tended to come with real penalties in power consumption and heat dissipation, to the point where computing cannot keep up with the demands of today's emerging applications and rapidly increasing volumes of data," said Tom Beese, CEO of ClearSpeed Technology. "The CS301 is designed specifically to meet those needs with high performance, power efficiency and full programmability in C
combined into a single chip. The CS301 is the first in a family of ClearSpeed microprocessors that we believe will challenge present day thinking by creating a world where scientists, bio-informaticians, engineers and content creators alike can have access to high performance computing anywhere, anytime."
The CS301 is based on a multi-threaded array processing (MTAP) architecture and includes 64 processing elements, 384 Kbytes of on-chip SRAM and I/O ports interconnecting through ClearSpeed's ClearConnect(R) bus. Each processing element has its own floating point units, local memory and I/O capability, making the CS301 ideally suited for applications which have high processing or bandwidth requirements. The ClearConnect bus is a packet switched network that provides high bandwidth and low power consumption, supporting multiple concurrent transfers giving even higher aggregate bandwidth.
As a result, complex mathematically based applications such as, computational biology and drug discovery, digital content creation, nanotechnology development, scientific research and financial modelling can now be executed in a fraction of the time.
"We are gratified to see the immediate high level of interest displayed by OEM's in the overall system improvements enabled by the CS301," said Mike Calise, president of ClearSpeed U.S. "The dual benefit of performance and efficiency is empowering companies to accelerate existing applications as well as inspiring them to explore new applications that were previously inaccessible."
The CS301 can serve either as a co-processor alongside an Intel or AMD CPU within a high performance workstation, blade server or cluster configuration, or as a standalone processor for embedded DSP applications like radar pulse compression or image processing. In applications where the CS301 is acting as a co-processor, dynamic libraries offload an application's inner loops to the CS301. Although these inner loops only make up a small portion of the source code, these loops are responsible for the vast majority of the application's running time. By offloading the inner loops, the CS301 can bypass the traditional bottleneck caused by a CPU's limited mathematical capability, executing the core of the application more than twice as fast as anything else in the marketplace.
"To deliver such high levels of performance with full programmability and outstanding gains in power efficiency is a very significant achievement," said Chris Piercy, president and chairman of the Northern Californ
....The announcement might be describing vaporware but 3W / 25 Gflops isnt too amazing to definitely indicate vaporware. ARM VFP9-S co-processor is about 0.4 Gflops for about 0.8 watt (about 1.5 gflops for 3 watt). Keep in mind that it was introduced in 2001. 4 years and 15 fold improvement seems possible.....
So they now think we can go from the NASA Space shuttle to StarTrek Next generations in terms of computational speed, i find this very hard to believe that they can bring us this far in technology in such a short jump. To me it's like when they build the first jet, now we go from builting the first jet to reaching the speed of a bullet via the SR71 Blackbird. Maybe i haven't done enough research into their methodologies in this technology, but i'm very doubtful.
... best case, and 128 K of cache.
Unless this thing is working on highly specialized data sets, it doesn't matter how much data the core can mow through if it can't get the data fast enough. Why do you think AMD and Intel are so obsessed with their memory interfaces? There's little difference between the Athlon and the Athlon 64 besides large data width and fancy memory / SMP interfaces.
it would be quite difficult to get high performance IO through a PCMCIA bus. I can see its use for large matrix computations but not generally useful as an "OS" CPU.
Unless they have some monster NUMA architecture RAM access will also suffer dramatically. RAM contention on a 64-way system would be AWFUL and there is NO WAY in hell they would access system RAM through PCMCIA. It is certainly an interesting idea but I do not see a way for this technology to be useful without lots of changes to existing software and program design. This is NOT like adding 64 CPUs to your home machine.
----- Refactoring is the reason why man does not mistake himself for a god.
I just heard it is going to be used in the Infinium Labs Phantom Console!
Manipulate the moderator system! Mod someone as "overrated" today.
If, on the other hand, they do in fact have a chip that is not too hard to program and can pound crunch numbers that quickly, it will certainly bring with it a revolution in high performance computing, and probably change the world as we know it.
sounds like it might make a very nice DSP chip. However lots of simple, non-contentious, non-overlapping floatingpoint computation is really not a problem most desktop or notebook users are struggling with. In fact it's really not a problem that super-computers are struggling with. There have been pci cards with power-pc chips on them for years. Curiously enough these cards haven't ended up being used in many top500 supercomputers.
A fast low-voltage DSP chip is interesting for a lot of applications, but not in the way that this press release describes the product.
you can actually be pretty sure of both a chips performance and reliability just from simulations these days
yes... its true... imagine: they build the first bunch of prototypes, and with all that power, even if they dont achieve the 25 Gflops at that stage, those prototypes can be used to simulate and tune the second generation of chips and so on ... just a dream/thought
Cheers...
PD: I was tempted to say 'imagine a beowulf cluster of these'... yikes... i said it
Knight Rider, a shadowy flight into the dangerous world of a man who will soon be in prototype.
Gonzo Granzeau
"Nothing the god of biomechanics wouldn't let you into heaven for.." -Roy Batty
It's going to power the phantom.
"The market alone cannot provide sufficient constraints on corporation's penchant to cause harm." -- Joel Bakan
As always, these things will have to be fed data at a high rate in order to be completely utilised. I don't see current PC memory subsystems being able to do it, and as for a PCMCIA card, well forget it!
You'll probably only ever see a tiny fraction of their claimed performance.
to see if clearspeed can develop a CPU that can survive /.
In other news, slashdot.org announces that now allows the use of the html tag "
" in the comments and introduced a "Preview" button that shows the user how the comment will appear before the actual posting.
Graphics cards used for general computing
,including me,have talked about everytime a powerful graphics card has come out.
The chip is a vector processor like major 3d graphics cards.
It's a great idea and something that people on slash
The compiler will be key.
Anything that can steal the thunder of INTEL & AMD is fine by me. Their processors are heat emitting loads of crap. If you noticed the amount of wattage that this new chip uses , it's because of the architecture and not fancy transistor technology.
Now you all know that Intel , Sun, Ibm, and Amd do not make efficient processors.
There seems to be wild confusion from ClearSpeed about pricing. In the New Scientist article they quote a starting price of $16,500/chip, but in the Wired article they state you could get a PC with 24 of these chips in it for $25,000.
But either of those prices are pretty high for your average home user. Hopefully someone can give them strong competition without violating their patents.
From their press release:
ClearSpeed Technology, a leading provider of high performance, low power chip-based solutions, today announced the ClearSpeed CS301, a multi-threaded array processor that enables dramatic improvements in performance and power consumption for intensive floating point applications.
But what are these other "high-performance, low-power solutions"? Looking at their web site, the CS301 is the company's only product.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
You can make theoritical things on a VHDL simulator that you'll never be able to make into actual silicon. The real magic of companies like Intel, IBM, AMD, etc isn't designing an uber powerful chip, it's designing an uber powerful chip that can actually be realizied in silicon, and at a cost that makes it worth selling.
There has been more than one firm that has suffered from simulator disease. They get all caught up in making an awesome, ass-kicking theoritical design that will eclipse everything and everybody that they forget about physical limits of actual silicon. They then find, when they try to really implement the chip, it just can't be done.
The basic idea is to have lots of "processing elements" that are basically ALUs with a bit of additional smarts (for branches mainly). Each PE has its own memory. The main processor (probably not the PC CPU) tells each PE what to do. Thus the Single Instruction Multiple Data. Things are a bit more complex then this (branches, pointers, and a few other things cause some problems.) but not too much worse. PE to PE communication is also interesting (the Maspar was a toroid as I recall).
The two basic problems with this type of a design are:
There are also a huge number of other problems. Caches don't generally do a darn thing for massive SIMD computers (if one processing element misses, they all do.) The memory usually has two types of pointers (one to the PE memory and one to global memory). I may contact the company to see if they want to hire a short-term consultant. hummm.... Have PhD will travel?
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."
For a rousing discussion of what applications
this chip could possibly have. Its being sold as
a CO-processor, but what kind of bus will it be used
on? It would seem that all variants of PCI/SBUS would choke.. we really need more information.
/* * pope1 */
The CS301 can serve either as a co-processor alongside an Intel or AMD CPU within a high performance workstation, blade server or cluster configuration, or as a standalone processor for embedded DSP applications like radar pulse compression or image processing.
..
... lets see
so there, its a dsp. lets do some simple fictional estimates
strip a 3ghz p3 down to the XMMS core, make the ops 1 cycle/instruction, restructure the pipeline to loop only (no branches)
3ghz minus some overhead.. lets say 2ghz effectivly, 2ghz*4 operands per cycle.. 8 gflops, and this strange beast wouldnt have been designed for it from the ground up anyway.
move along, nothing special to see here, dsps like that were able to encode/decode mpeg1 constrained in realtime when 386s were state of the art.
Graphics cards used for general computing
unfortunately for them, the proof is too big for them to fit in this margin...
-1, "1337" speak
it doesn't appear that the felonious wons are planning to delete themselves/surrender to the light? how long until the planet/population rescue is complete, & we're free (from the hostage taking scams of the corepirate nazis) again?
... parallel processing units may perform a lot more ops/sec/watt than one single unit. The speed of a processor depends on the time required to charge and discharge the stray capacitances of its connects, and the impedance of its transistors increases as the drive voltage decreases so the RC time constant goes up and the speed goes down. However, the energy required to charge the capacitance scales as voltage squared, so by accepting a hit on the speed (due to the voltage drop) you can do the same calculation with less energy. Clearspeed seems to be taking parallelism to the sub-processor level in order to reduce heat loads; their operations may take longer to complete, but they can do more operations in the same time as long as the code can use the processors in parallel. Thus the emphasis on "multi-threaded", because it wouldn't work otherwise.
Scientists restrict study to entire physical universe; creationist
Aren't these used in the Phantom Game console?
Xaotik Designs
<a href="http://www.gpgpu.org/">Graphics cards used for general computing</a>
Apple = altivec =vector processor.
Maybe if we decide to model "Life, the Universe and Everything?"
No, just modelling the surf breaking on a beach would need several beowulf clusters of these chips. Or the flow of gas through an airplane turbine. Or the weather in a small region of planet Earth. There are many simulations of non-linear systems whose simulation require a lot more CPU power than is likely to be available on the near future.
And what about the human brain itself? Our current computers are still so far from the data processing capability in our brains that many people doubt it will be possible at all. Assume we have about 100 billion (10^11) neurons, and each neuron has about a thousand synapses. Assume the simulation of each synapse would need one hundred floating point operations per second. Therefore, to simulate the operation of a typical human brain one would need ten million Gflops, equivalent to a Beowulf cluster of 400000 of these chips. That's what it'll take to do the AI in Duke Nukem Whenever...
The 256K limit was in DOS 1.0 and the quoted person was Bill Gates. And I am that old. The first serious machine I worked with had 16K of usable memory.
Looks like middle age hasn't been kind to action hero Duke Nukem. In a prerelease press preview, presented by Joe Siegler, the studly hero is bald with a huge beer-gut.
"We wanted to flesh out the character of Duke", Siegler said, "we want to make him more a character that his fans can directly relate to".
In the new title, Duke is in a custody dispute with his ex-wife. Apparently, since he lost his job, he's in arrears on his child-support payments. When his (alien) wife kidnaps their kids and leaves for her mothers' on Alogl II, it's butt-kicking time!"
http://techzone.pcvsconsole.com/news.php?tzd=1973
that's right. as the lights come up, you might note the creator's interaction with yOUR environment, as well as interventions on the planet/population life0cide perpetrated by the phonIE felons/walking dead.
what happens next, is ALL about yOUR intentions/motives/behaviours.
you can pretend some more if you want/need to, but that doesn't really help. see you there.
Stock. Price.
The Kruger Dunning explains most post on
This technology has been around for years, US GOVERNMENT individuals have used this to crack codes and such. The add on expansion card pops into a normal desktop slot or a laptop slot and gives them Super Computer Power. Although it was about half as powerful as these, but its been around... ;)
Onyxruby's law:
;) /me reminded of when apple tried claiming the imac as supercomputer.
The amount of hype per inch produced by marketing doubles every 18 months.
With apologies to Moore
Microsoft's next game engine is said to be based on Excel.
Not actually all that impressive. Texas Memory Systems has been selling comparable chips for several years. There are others.
http://www.superdsp.com/products/tm44.asp
Multi-Threaded Array Processor
ClearSpeed's multi-threaded array processor can be applied to any high performance computing application where large volumes of data can be processed in parallel.
The performance of conventional processors has been primarily driven by higher clock speeds. This inevitably means higher power dissipation and the resulting problems of system cost and reliability. These designs can no longer meet the requirements of emerging applications or keep up with the demands presented by the explosive growth of data volumes.
The multi-threaded array processor architecture provides an exceptionally powerful and scalable processing solution, based on an array of tens to thousands of Processing Elements (PEs). Each PE has its own local memory and I/O capability, making the architecture ideally suited for applications which have high processing and/or bandwidth requirements. The inherently scalable array architecture is also highly area and power efficient.
The processor can be used either as a co-processor sitting alongside an Intel or AMD CPU within a high performance workstation, blade server or cluster configuration, or as a standalone processor for embedded DSP applications such as radar pulse compression or image processing. In applications where it is acting as a co-processor, dynamic libraries off-load an application's inner loops to the processor. Although these inner loops only make up a small portion of the source code, these loops are responsible for the vast majority of the application's running time.
Imagine a beowulf cluster of these things!
Hasta la vista, Karma!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
When they have a device that delivers 200 GFLOPS with 64 threads, then I'll be interested.
You can certainly throw a bunch of ALUs on a grid (it's not so difficult) and claim GIPS, GFLOPS or whatever ... but you won't get similar speedups on real-life benchmarks, because no program will be able to use them all at the same time (btw, only professional bullshiters - read marketing people - use MIPS/GIPS/GFLOPS these days. The computer architecture has realized for a LOONG time that they're not a good indicator of performance).
The Raven
There are too many uses for large scale computing which is plagued by huge power hungry systems.
technewsworld had this as their last paragraph. If anything indicates the complete bullshit smell of this announcement, attaching it to a similarly wildyly overhyped fad tech would be it.
I bet it might hit 25 gigaflops with an "optimized demonstration algorithm" with no cache misses, no branch misses, and heck, all the data is in the registers at all times, so it doesn't even wait for the cache.
Hey, I'm just your average shit and piss factory.
Getting high performance out of a chip really isn't that difficult (I know I'm understating a lot of the real knowledge underneath); however, the trick is doing it reliably. An Intel or AMD processor must be able change from a wide variety of states (fixed to floating to OS commands) and be able to recover from any invalid state, so a lot of the chip is tied up in ensuring consistant operations. As I gather, they're just basically making an optimized floating point coprocessor (can you say 387? I knew you could) and I can bet that feeding it bad instructions would do bad things. This is why videocards are capable doing such tremendous amounts of calculations far and above CPU's--consistant types of instructions and the attitude "screw up and you lock up the video card, so don't do that."
Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
"Soon to be in prototype" is synonymous with "does not exist"
Ah, but what the hell, let's post it to the front page anyway.
Computational Fluid Dynamics.
-- HG Pennypacker, wealthy industrialist and philanthropist
This looks to me like a venture backed bullshit company. Here are some clues:
Management team includes 2 people who are nobody's, just investors "putting together advisory boards..." and whatnot...
They are not hiring. WHAT!!! If they are really doing this, there is no frigging way they wouldn't be hiring.
The "about this company" says that they are a "leading provider of..." - a dead ringer give-away that the company is a tale spinner, since they have never provided a thing.
These news stories have been seeded in order to attract attention. The attention will be used to try to leverage a buyout (of patents, probably) or large-scale investment. I doubt they even have the silicon designed. Just BS.
Nothing to see here. Move along.
It should read: "Clearspeed Makes Tall Claims for Fictional Chip"
Been reading the Inquirer again too much? You smell a little bit cynical across the pond, you see.
michael tell me!
http://www.resconsys.com/products/controllers/CS30 .htm
My dad is the smartest person in the world.
Does anyone know if this company employees the same marketing/PR firm that handled the BitBoys?
Getting data in and out fast enough to feed the thing will be a problem. It will probably only achieve its rated speed when it's working intensively on small data sets. That's a typical DSP application. This might be a useful part for a software radio. They mention radar applications, which are basically software radios.
That ratio of MFLOPS/watt would help for graphics processors, but they need to be architected differently. Still, if they've figured out a way to get FPU power consumption down, that's helpful.
Thus, it's not a useful device for desktops.
As for it being vaporware, it sounds like they have it running in simulation but haven't had it fabbed yet. They're offering it as "intellectual property", which means "buy the VHDL file". If they have good VHDL, and are honest about the simulation results, they could have valid performance numbers. But from the information give, we don't know that.
I'm sure that cyberdyne chip is working out well for them... But what are they going to do with the arm? Juggling? Labyrinth-esque sphere stuff? I kinda shudder to think...
The big print giveth and the small print taketh away - Tom Waits
MicroSoft the master of pre-announcing vaporware. Sometimes it eventually does work!
Some of the hardware design came from from engineers in Bristol, UK. Companies like Division and INMOS (anyone remember the T800 and T9000 transputer and a Microway board for parallel computing on a PC board more than a decade ago?). The other half of the design team came from UNC computer graphics lab in Chapel Hill. From the well known PixelFlow and PixelPlane machines. That along with a Taiwanese fab plant that would produce these SIMD processors with extra PE (SIMD Processor Engines) that would compensate for the manufacturing errors. eg. Lets say the chip would have 100 PEs so they would manufacture it 120 PEs. Those that didn't work they'd switch off and they wouldn't have to throw away the entire chip.
The story of PixelFusion was unfortunate. They could have rocked the computer graphics world with their scalable tile based rendering technology and efficient manufacturing methods. The programmable PEs would be able to handle both Direct X and Open GL. I suppose now they are trying to focus their investment and IP into more generic applications. I find their claims to be plausible because they have demonstrated innovative chips in the past.
My 2 cents
The chip will have 64 parallel FPU's. If it can complete one floating point operation per cycle, it will only need to run at about 350 to 400Mhz to reach 25GFLOPS (latency and pipeline issues aside, of course). Even if it requires 2 clock cycles, or the first 32 FPU's feed the second, we're talking about 700 to 800Mhz.
I'm not certain, but I thought I ran across similar number crunching capabilities in Integer OPS. It seems to me to have been in regards to fibre fabric and switching.
Or I could be on crack.
Hm.
Moekandu
Mediocrity knows nothing higher than itself; but talent instantly recognizes genius. -- Sir Arthur Conan Doyle
Article also gives an srp of $16,500, for that much it better be fast! (Think of the linux boxen cluster you could build for 16G). It would be nice to be able to a full Navier Stokes simulation on my PC, mayber Santa we send me one...
Since you can get 6 GFLOP in a conventional x86 compatible CPU, why go to incompatible technology for a 4X speed improvement?
I will tell you in confidence that the US GOVERMENT (as you so capitalized, would you care to be more specific?) is the sole entity keeping SGI alive.
No one would pop it into their desktop. Most spooks with desktops wouldn't be allowed to open their damn computers, lest they violate the service contract with whomever their superiors pork-rolled.
No, that kind of thing stays in the data center, where it can bought by writing a few short purchase orders for inordinate amounts of budget padding money.
Fuck Beta. Fuck Dice
I wonder how many people work for Slashdot and own shares in Transmeta, which is coming out with the TM8000 right now, and is announcing earnings tomorrow. Full disclosure: I own Transmeta shares too.
Now, usually Slashdot greets these RSN products with glee and neglects to mention that they are vapor. Not this time, nosiree. Why? Because if it were true it would compete with Transmeta.
Not accusing anybody of anything wrong here... just... well... I've drawn my conclusions. You draw yours.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
I don't have a question on how these chips will plug in. Most likely their card will contain 2-8 of these chips, plus a controller and specialized RAM, all interconnected by their proprietary bus (mentioned in the press release). It will do a large chunk of the processing in isolation from the CPU and other system components, sending back aggregate results through the PCI-X or Hypertransport system bus.
I can't say they'll build a world-beating supercomputer out of them. They seem like a way of bringing "sort-of supercomputer" power to environments where having a few $65k 100 gflop workstations running specialized apps makes sense.
That's, of course, if they're not vaporware designed to generate another round of VC funding before they slip the projected ship date.
- Greg
Start a happiness pandemic
I remember the transputer.Good points. Didn't know they were a former graphics card company.
Explains there vector parallel processing knowlege.
But there could be a developing problem for them.
New Nvidia and ATI graphics cards can compete with this chip.
This chip sounds like a big parallel DSP. All those transistors on a Pentium 4 that go into the virtual memory system or the branch prediction or out of order pipeline juggling, in the DSP are dedicated to number crunching. I don't know how much the crunch power of this chip exceeds those of a current high end graphics chip (NVidia, etc). but it's probably not that big a ratio. The graphics chip also beats the heck out of a Pentium 4 in raw parallel arithmetic speed. The graphic chip is of course very specialized (for crunching display lists) while the DSP is a little more general (crunching 64-element vectors, like a Cray-1). It's likely that some multimedia applications like MPEG encoders can be sped up a lot by this chip, but don't expect it to make MS Word or GCC run any faster.
Here's the link.
graphics cards as general processors
You only have to cool it down to -1 Kelvin temperature and send it to sun's orbit with 1.1 times the speed of light.
love slashdot. populate it. use it. abuse it. hate it. kill it. miss it. stop following links, they only kill servers.
Intel has it on its P4, IBM has it on its RS64III and POWER4, Sun will have it on its Niagara... everyone is going the same way.. now, none has find a way to transform serial code to parallel. There's a simple Touring machine that converts parallel graphs to serial, no one has ever came up with a Touring machine that takes serial code and makes it a "perfect" (every posible paralelism degree) parallel one. Thas the saint-grial that seems unrecheable... up to then, only some apps can scale with those CMT processors.
Maybe they can incorporate the "Not Yet in Prototype" Bit Boys Oy! Advanced graphics chip technology....
:)
That will be one awesome "Does Not Yet Exist" chip!
ClearSpeed Technology Inc. 20 North Santa Cruz Ave, Suite C Los Gatos CA 95030-5917
The key phrase is in the Wired story...
Ah the key phrase here is that it came from wired magizine, the abhored wanna be magazine that has wanna be tech articles.
In otherwords ... cutting edge is going to be expensive forever? I am very happy that today I can buy a great machine for doing CFD calculations for only a few hundred dollars, even cheaper if I put it together myself. Considered the length of the grant-cycle I usually have outdated technology... I suppose if this sort of technology becomes the leader, I would still have to have the resources of a large research university to get anything done. At least the way it is now, my home machine is faster then the one at work!
These guys have been around before.
I don't remember anything that wasn't vaporware..
http://www.model.com/news_events/pr/pixel.asp
What happened to this thing anyway??
Hey psy,
This is true iff the chip is using standard/existing fabrication tools, processes and development/layout tools. Looking at the articles, it seems like the chip is designed using traditional methods, so except for the "ClearConnect Bus", there doesn't seem to be any ground breaking technology. I would be interested in seeing how a packet based network linking 64, 32 bit processors would be implemented on a standard piece of silicon.
From this chip's perspective I would like to understand how data will be shoveled in and out of the chip to allow it to run at full 25 GFLOPS performance - will the performance of the multiple "processing elements" be hindered by the lack of bus bandwidth? 384k ain't a whole bunch of memory for data AND programs for the 64 "processing elements" (only 6k per processor).
myke
Mimetics Inc. Twitter
Sounds to me like a Blitter chip! I better check and see if this'll work with the TOS 2.5 ROM upgrade for my Atari 1040ST! Bonus points if it'll plug into the cartridge slot! I can't wait for the ST to beat the G5 using Cubase Audio now... :)
"Right now, somewhere in this world, Scott Baio is plowing a woman he doesn't love," - Peter Griffin, *Family Guy*
Without a working prototype they have nothing.
With a working prototype they still have not much.
With a working, and cost-efficient manufacturing process, they have something.
When there are compilers that actually can use this kind of thing, it starts to be somthing that is real.
My guess is they are about a decade from a reliable, usable and cheap product. Suddenly these numbers do not sound impressive at all...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It aint going to run windows or even linux. But it will accelerate some difficult real world problems. There's probably even money to be made there - only a tiny percentage of CPUs sold go into PCs after all.
The key thing is low power. And low cost. OK. The two key things are low power and low cost. And scalability. And a compiler. And...
Who needs supercomputers when you can have this on your desktop.
And... I hear they have working silicon already....
If I understand the article correctly, it looks like they're implementing a much more powerful version of Apple's Altivec SIMD technology. My question is, if computing power increases 500x using this technology, doesn't memory bandwidth and system bus speed have to increase exponentially as well just to realize any gains?
It seems like putting one of these cards in a PC with today's technology would be like sticking a mainframe behind a 300 baud connection: sure it can handle millions of transactions a second, but you'll never actually see that kind of throughput because memory is so slow.
"When the president does it, that means it's not illegal." - Richard M. Nixon
you sure know alot about gay
I get the impression that they may have chips, but not ready-to-use boards for PCs, and that *those* are the prototypes they speak of. Other than that, this just sounds like the old "Transputer" idea recycled.
Heck, when the NSF built the five National Centers for Supercomputing Applications, one of the facilities used a bunch of FPS array processors to give them the number crunching power instead of a traditional IBM, CDC or Cray supercomputer.
Nothing special about these guys and I'd take what they have with a grain of salt. Unless they're offering half a teraflop for $500K out-the-door in real world workloads (fluid dynamics, oil exploration, weather simulations, finite element analysis, etc), forget it. 25GFlops peak? Big deal. It's what it does on real work that matters, not the amount of useless calculations you can perform. Plus, they don't even have pre-production silicon. The problem is the compiling and the programming. If you can get those tasks down and be able to write code that can harness that power efficiently, their nothing but expensive serial computers at best and boat anchors at worst.
Coincidentally, I have been building my own computing cluster at home. It will involve 7 P4 2.4GHz processors. Extrapolating from benchmarks on other clusters, I estimate the 7 nodes to be capable of about 30 gigaFLOPs.
Now they claim (2nd article above - TechNewsWorld) 64 processing elements to get an aggregate of 25 gigaFLOPs. A little quick math shows that each processing element has to be about equivalent to a 220 Mhz P4 FPU. Definitely doable.
But 3 watts? Hmmm, that might be stretching technology a little. VIAs processor chips, the lowest power general purpose CPUs I know of, achieve (they claim) about 7 watts for 900Mhz operation. Now, granted, the FPU seems about 1/2 as fast as P3 or P4 FPUs, clock for clock, but there's a helluva lot more than just a FPU on any of these chips. Maybe doable.
But the price! Jeeez, Louise! from the NewScientist article:
Beese says a single chip will initially cost around $16,500.
for 25 gigaFLOPs! My cluster is coming in at around $1500 for about the same performance. Admittedly, there are classes of problems that will perform better on such chips (anything that requires tightly coupled data, usually solved with shared memory among many processors) but I think they are off by a factor of 10!
So I'll summarize some interesting key points:
1. The chip is fully programmable and an SDK invluding C compiler is available now.
2. The chip will be marketed as a coprocessor.
3. They expect to start selling them for around $16,000 in a few months.
Fun with Anagarams! LADS HOST, SHALT DOS. HAS DOLTS. AD SLOTHS, HATS SOLD. ASS HO, LTD.
Clicky, clicky.
Building multiprocessor chips, or chips from arrays of processors has become a fairly hot design approach. There are a number of companies using it. It seems to be especially popular in the reconfigurable computing area. There is an interesting paper here. These processors go well beyond the current crop of dual CPU core chips like the P4, Power 5, and Ultrasparc IV.
Clearspeed's chip is a static 64 processor array chip aimed at FPU intensive applications, but there are many more things that you can do with array designs.
Mathstar is building a reconfigurable chip with hundreds of elements availble in various mixes of processors, memory blocks and other components. They are trying to replace ASICs and FPGAs as a platform for some part needs. There was a story on their architecture in EE Times a couple of months ago.
Intel is wokring on an array based processor aimed at the radio / communications market. I will be interested to see if their work with these chips ends up being used in other Intel chips. That could be deadly. So, the Pentium-X sucks at that task today? [Morph] Not now!
Phillips has what they call Silicon Hive technology which is another reconfigurable processor of functional blocks.
There have been plenty of companies using arrays and reconfigurable techniques too, like Altera and Chameleon.
Sun bought up a start up and is developing massivly multithreaded processors based on the start-up's technology. They call it Throughput Computing. They claim that in about two years they will have a chip 30x faster than todays designs. I'll be very interested to see if they can do that.
The next couple of years should be very interesting on the processor front.
The T-800 transputer, you say? This must be proof that somebody indeed has found a neural-net CPU from the future.
Wait, wait, based on what we've seen of it running, that means the only effective language to program it in is gonna be a division of COBOL. Noooo!
Suicide.
Actually, this isn't terribly surprising if you look at the specs. Its a vector processor with 64 processing elements. Each PE has an FPU. The 25 gigaflop theoretical rating probably comes from FPUs * Clock_Speed, so the thing probably runs about 400 MHz. You have to understand that this isn't a general purpose processor --- you just send it some numbers to crunch, and it sends numbers back to you.
A deep unwavering belief is a sure sign you're missing something...
I just witnessed a machine that could simulate the motions of trillions of trillions of water molecules under different temperature configurations. It can do this at speeds so incredibly fast - billions of samples per second can accurately be made against any of the particles. Currently, the scientists are now working on a way of reading this data as fast as possible.
Let me describe the machine:
It's a plastic cube filled with water.
I work for the government.
Well, actually I work for the contractor that provides information systems to the DoD, IRS, Customs, Coast Guard, etc.
Honest to god... it's bunk. Paranoid ramblings.
COTS is the big push, or big iron. You have to realize that the end users are not that bright. They pay us to figure out what can make it simplest.
Put a card in each persons' machine? SUPPORT NIGHTMARE. Not going to happen. They want us to deliver "something", a nice packaged-up solution. Usually it's based off commodity parts, so they can drop and switch vendors on a dime.
The NSA is the only group that might have been using something like that so secretly, but a little birdie tells me that they are basically the only people that buy the 9 micron Alpha EV7 chips, and the sole reason HP keeps that contract alive.
Those monsters would eat your "tv card" alive. ^_^;;;
Oh, and speaking of proprietary programming languages: who's going to teach them to learn it?
You might be confusing that with Ada (which the US Gov still likes for some reason), or Smalltalk. Powerful languages that only still exist in the government nowadays.
Did you know that Microsoft has it's own programming language that looks like Objective C that they program all their operating systems in? And that every developers' computer is used in parallel when making daily builds of software!?
THAT's probably the closest thing to what you are implying.
Guys with the money in government are suspicious of technology, and costs. That kind of fantasy story is NOT characterstic of even crazy, tech-happy places like ONR or JPL.
Sorry. If you worked for SAIC or some place like that for a few weeks, you'd get the idea.
Take off the tin foil hat, man. I'm not saying the US Government DOESN'T have huge computing power, but it's not so silly sounding. Kind of boring, really. BUt very expensive, don't kid yourself.
The thing you might want to start worrying about is "netted sensors". I keep hearing that word thrown around a lot, and it's going to become real hard to sneak out of the country in a few years, I GUARANTEE THAT.
Fuck Beta. Fuck Dice
It is a SIMD machine. It looks like they've put some real thought into the software, which is the hard part in something like this. The debugger certainly looks pretty.
The ported C code on slide 13 is a bit scary. The intermediate language appears to rely on the compiler to distribute the workload to the PEs (otherwise why is the loop the same in both.) I'd much prefer the intermediate language give you complete control of the PEs rather than letting the compiler do it for you.
It does look like there are actual dies out there. Maybe not functional, but built. Also, it looks like the PE communication is more limited than I'd like. There are only two communcation ports. I'd expect four if they want this architecture to scale past 64 PEs.
Other comments.
Given the above issues I don't think this thing is going to take off anytime soon for "super computer" purposes. The big win is high FLOPS per Watt, which isn't all that important for SCs (well not that important). As part of a graphics processor or DSP I could see potential. I still think that in a few years (say 5 to 10) this type of thing will be a coprossesor on a fair number of those Linux clusters used for scientific computing. But this one isn't there yet.
I've seen people toss around $16,000 as a price point, but I can't find that anywhere. I assume I'm missing something obvious. At that price it is useless. It needs to be under $1,000, and really wants to be a lot cheaper than that to be interesting.
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."