Supercomputer On-a-Chip Prototype Unveiled
An anonymous reader writes "Researchers at University of Maryland have developed a prototype of what may be the next generation of personal computers. The new technology is based on parallel processing on a single chip and is 'capable of computing speeds up to 100 times faster than current desktops.' The prototype 'uses rich algorithmic theory to address the practical problem of building an easy-to-program multicore computer.' Readers can win $500 in cash and write their names in the history of computer science by naming the new technology."
What's wrong with Supercomputer On-a-Chip (c) ?
~
I call the "supercomputer on a chip" the "Cell microprocessor". Of course, next year, it won't be so super. But there will be a new one that's really super.
--
make install -not war
"Readers can win $500 in cash and write their names in the history of computer science by naming the new technology."
Is "Clippy" taken?
Vote monkeys into Congress. They are cheaper and more trustworthy.
We have microcomputers and supercomputers and nothing in between? Seems to be a bit of hyperbole involved here.
"National Security is the chief cause of national insecurity." - Celine's First Law
Looks like a cluster on a single board. The cleaning analogy is kind of stupid. If I had 100 people cleaning my house at the same time, they wouldn't get shit done. New twist on old technology.
I'm not sleeping, I passed out from holding my breath.
Signed,
The Cowardly Lion
'Space Heater'
Future Slashotting in the Waiting (FSW).
CDE open sourced! https://sourceforge.net/projects/cdesktopenv/
I RTFA... It seems to handwave so much about parallel computing, that it seems they haven't discovered anything. All i see is "clock frequency can't increase, so we're going parallel'.... Surely, this can't be the extent of their research. The article claims its 'easy to program', but there are zero specifics about why that would be the case. Can anyone tell me what they've done here (if anything)?
Assuming this actually works as detailed and the fine print on the claim isn't too onerous, there's three practical problems:
:) Maybe by the time this is a viable commercial product it will have more practical uses. (Remembering LOGO on my TI-94/A... we've come a long way baby)
1. Many applications are limited by the speed of the user, not the computer. You can only type or click so fast.
2. Hardware would have to catch up to drive this beast. This would max out all known memory and storage systems. Not to mention your internet connection.
3. As has been mentioned time and again, until developers actually embrace multi-threading this will be relatively useless. Tests from various hardware sites have shown that going from the Core 2 Duo to the Core 2 Quad offers very little benefit except for a very small subset of users... who should probably be running workstations anyway (Video editing, 3D rendering, etc.)
However, I have a ton of HD content on my MythTV box that I would like to turn this processor and h264 loose on
Vaporac. Vaporlon. Vaporium. Whatever...
Strange things are afoot at the Circle-K.
Supercomputing 2.0. Now, I'd like that 500 bucks in twenties, please.
Hard to tell from the some of those "papers" since they seem to be written for kindergarteners - or journalists. But with that much parallelism I'm guessing that these computers basically allow "dataflow" style programming, with a certain amount of automatic decomposition, similar to the way PC chips decompose assembly into a simpler language on-chip.
But I want those $500. Maybe I could use it to buy a board with a chip that will actually provide some routine functionality on a shorterm scale. Wouldn't that be the ultimate irony?
Bob
All the processors in the world won't do you any good if you can't write the software to harness them, and conventional lock-based techniques are really really easy to screw up. I'm really curious to see what those 'rich algorithmic' solutions they've got are.
My blog
Oh, for god's sake. I don't understand why this is getting so much press. It was stupid when it went up on Digg, and it's stupid that it's showing up here. This isn't substantially different from any of the other parallel architecture and programming work that's been going on for the last two decades. Their benchmarks are against embarrassingly parallelizable algorithms like matrix multiplies and randomized quicksort, things that any half-intelligent lemur (with a math and cs class or two) could get to run quickly. The hard part is speeding up your average desktop application which, I guarantee you, is not spending the majority of its time doing matrix multiplies.
On top of that, their "parallel extension of von Neumann" amounts to adding primitives to start and stop threads into the language. Again, any half-intelligent lemur (with a slightly different skill set from the first) could have done that. And I think a few actually have (at the risk of comparing language researchers to lemurs). It doesn't solve the underlying problem.
Oh, and did we mention no floating point and the lack of any memory bandwidth to get data into and out of this thing?
This is over-hyped research and shameless self-promotion, and for some weird reason the press seems to be buying it. Stop it.
I think SOC would SUCK as a product name.
This issue is a bit more complicated than you think.
kobatan.
I wonder if they can get the domain cheaply?
"Pulling together is the aim of despotism and tyranny. Free men pull in all kinds of directions." -TP
Brilliant! Even my mother had not thought of such an idea.
"OMG I gotta have It (TM)" or Deep Silicon :)
You know, autovectorization looks good on paper. But for most tasks, it really doesn't net you any benefit unless you can separate all your work into non-overlapping chunks. You can't have any interdependancies on your working set (or risk expensive, non-scalable locking), and if you're all pulling from a single data source to split up the analysis work you'll spend a lot of time in contention for the pipe to that resource.
For example, it wouldn't make searching a database (scratch that, searching any data set) any faster unless the index was already pre-split among the processing units.
In this architecture the processing units have the same bus to RAM and disk on the front and back ends and have to deal with contention.
Your system is only as fast as the slowest serial part. Typically this is storage media, a network connection, or a memory crossbar. Processors really are fast enough for the non-embarrasingly parallel stuff. They are at the right ratio with respect to the other slower busses to do most general purpose work.
If you want to do more than that then its other things; storage media, memory, I/O busses -- that need to be multiplied in density and number. Only then can we see higher throughput.
Autovectorization is only good for things we already have offloading for anyway (TCP encryption, graphics, sound)... and for those general purpose cases like in Game AI where you might want a linear algebra boost NVidia has beaten these guys to the punch with the GP stream processing in the newest chips and the very flexible Cg language/environment.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
All entries become the property of the University and will not be returned. By participating, entrants agree to abide by and be bound by these Official Rules and the decisions of the University, which shall be final and binding with respect to all issues relating to this Contest. It is your responsibility to ensure that you have complied with all of the conditions contained in the Official Rules. The University is not responsible for any lost, late, misdirected, stolen, illegible, incomplete entries, or for any computer, online, telephone or technical malfunctions that may occur. The University is not responsible for any incorrect or inaccurate information, whether caused by website users, any of the equipment or programming associated with or utilized in the Contest, or any technical or human error which may occur in the processing of submissions in the Contest. The University assumes no responsibility for any error, omission, interruption, deletion, defect, delay in operation or transmission, communications line failure, theft or destruction or unauthorized access to, or alteration of, entries. The University is not responsible for any problems, failures or technical malfunction of any telephone network or lines, computer online systems, servers, providers, computer equipment, software, email, players or browsers, on account of technical problems or traffic congestion on the Internet, at any website, or on account of any combination of the foregoing. The University is not responsible for any injury or damage to participants or to any computer related to or resulting from participating or downloading materials in this Contest. If, for any reason, the Contest is not capable of running as planned, including infection by computer virus, bugs, tampering, unauthorized intervention, fraud, technical failures, or any other causes beyond the control of Contest which corrupt or affect the administration, security, fairness, integrity or proper conduct of this Contest, the University reserves the right at its sole discretion to cancel, terminate, modify or suspend the Contest and select winners from among all eligible entries received prior to the cancellation. Persons found tampering with or abusing any aspect of this Contest, or whom the University believes to be causing malfunction, error, disruption or damage will be disqualified. CAUTION: ANY ATTEMPT BY AN ENTRANT OR ANY OTHER INDIVIDUAL TO DELIBERATELY DAMAGE ANY WEBSITE OR UNDERMINE THE LEGITIMATE OPERATION OF THE CONTEST MAY BE A VIOLATION OF CRIMINAL AND CIVIL LAWS. SHOULD SUCH AN ATTEMPT BE MADE, SPONSOR RESERVES THE RIGHT TO SEEK DAMAGES FROM ANY SUCH PERSON TO THE FULLEST EXTENT PERMITTED BY LAW. The University reserves the right to correct any typographical, printing, computer programming or operator errors.
By redefining it.
Data parallel programming is a significant subset of parallel programming in general but it is relatively easy to get right to start with, so I don't see how XMT-C is such an advance.
Second paragraph of the rules:
THE FOLLOWING CONTEST IS INTENDED FOR PLAY IN THE UNITED STATES AND SHALL ONLY BE CONSTRUED AND EVALUATED ACCORDING TO UNITED STATES LAW. DO NOT ENTER THIS CONTEST IF YOU ARE NOT LOCATED IN THE UNITED STATES.
Even though there is a country field in the form. WTF?
They don't mention that on the form page, either. It peeves me just a little bit that they would do that, I mean, how many people actually read these conditions things, anyway? Can't say I'm surprised, though.
Call it Grendel - it has no ARM
how about naming it Vizi?
I could get more than that for naming my neopet.
How about....Glumphoof
An algorithm came up with it.
Pretty much the same with any multi-processor technology: shared resources like buses are the major limitation.
Engineering is the art of compromise.
But I doubt that's worth $500...
Skynet or Borg both great recognizable names refering to a massive supercomputer, or perhaps a massive cluster of nodes, either way , both those names would pwn. resistance is futile
http://interserver.net/
I dub thee: SKYNET
"Most. Insightful. Post. Ever. ;)"
*smirk*
For all you youngsters, there is minicomputer.
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
Transparel.
I will either nominate the name, "Giant Douche," or, "Turd Sandwich," depending on which one slashdotters vote for.
http://www.dinigroup.com/index.php?product=DN8000k 10pci
There you go! It's just a vertex 4 development board. Nothing special. I mean, if they would have used this graphic http://www.dinigroup.com/DN9000k10PCI.php it would have been a little more impressive.
"VaporWire"
"Parallel Lies Processor"
"iProcessor"
iPerbole©
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
They describe the same old massively parallel computing idea but gloss over the problems involved. This old chestnut keeps coming to the surface every few years but nobody ever seems to show any working hardware...
What about iCPU? has some other company already done the 'i' prefix thing? i mean like iPod or something like that?
Deep CPU
MP (Moon MacroProccesor)
MPU (Macrohard Proccesing Unit)
http://en.wikipedia.org/wiki/Transputer
"It doesn't cost enough, and it makes too much sense."
'capable of computing speeds up to 100 times faster than current desktops.'
So, how many laptop miles are this? If it has more power than one laptop mile, they could name it 'Milestone Computer'!
xyzzy :)
that word noone can pronounce.
from advent
How about "Puter," as in, "What? Did your mother purchase for you a PUTER for Christmas?"
A supercomputer on a chip....so it should be named Altivec?
Anyone remember the hype of the i860? Great on paper, but not so great in reality. I really hope this works though, von Neuman architecture was always supposed to be a stop-gap (even vN said so I think).
Bitter and proud of it.
It appears to be a few FPGAs. With FPGAs, you can optimize the logic to represent algorithms for faster execution that on general purpose processors. Simply, you use more of the gates available on the chip. That appears to be what these guys are doing. It also appears that there is a single memory controller (I think that is what the QuickLogic chip is) and there is only one DRAM module installed on the board. It would be interesting if the board had a unified memory architecture. There is a separate Xilinx Spartan FPGA on the board that does who-knows-what, but I wouldn't be surprised if it was involved in communication with the processing chips. Of course, this is speculation, but it would seem logical for a board layout.
Just my thoughts.
The stupid web form always complained about illegal characters in a field without specifying which one.
'Once scientists, even the dim-witted social scientists, get muzzled, the Western Civilization is finished.' - oldhack
Why is this guy able get patents on research financed by NSF and DOD? These should be assigned to the US Government or simply become public domain.
I think it goes Micro->Transputer->Mini->Supercomputer. Could be wrong.
On the other hand, most of these technologies seem kind of obsolete now, as distinctions are falling away.
Is how it benchmarks against, say, an nVidia Tesla (a GeForce 8800, with more, faster memory and no DVI connectors). I mean ok, you want to limit to just parallel kinds of benchmarks I can live with that, after all it is ok to design more specialized chips. However then let's see it go against a chip designed for that. Ya, an 8800 will eat shit on a calculation that's a single thread with a lot of branching. However you give it a task that can be highly parallelized and is straight through computation (like, say, 3D graphics hence the reason it is designed this way) and it flies like you can't believe. We are talking in the realm of 400-500 gigaflops (single precision) when it is crunching an ideal problem. That's your competition if you want to make a specialized parallel processor. As you noted, a desktop processor is a different animal, hence why a system that has an 8800 would still want a Core 2. What the Core 2 is good at, the 8800 is not.
Since of course that breaks down. Actually maybe it isn't so retarded since the same thing is true in many computing problems.
For example if you take the cleaning situation sure, adding a second cleaner will nearly double the speed it gets cleaned at. Adding four will probably close to quadruple it. However, it starts to break down after a while. At first the gains just start slowing down, as there's more people they have to spend more time talking and dividing up who does what than actually working, as well as doing work others have done because of a miscommunication. Eventually you have so many people that you start actually slowing down with each person you add, because they are getting in each other's way and taking up too much time with non-work.
That's fairly similar to what you get with a lot of problems in computation. You split the task in half, you can have 2 processors/cores/whatever execute it and nearly double your speed. However after a point you find that you can't split the task more, or that even if you can, it takes more time getting it all sync'd up than you gain from the multiple execution, or that contention in other parts of the system (like memory) holds things back.
The concept of "two is better than one to 100 must be better than 10" doesn't hold up. There are almost always limits to how much you can divide up a task. Sometimes those limits are extremely high, but they are there. Unfortunately, for many tasks, the limits are pretty low.
There is nothing fundamentally new and effective First no indication of performance, but probably as many others similar solutions can solve effectivelly only DGEMM,FFT,and crypto, sorry for all others thousands algos no future for you. The number one problem you need to solve to pretend to be 100 times more rapid is to increase bandwidths by 100 and Amdhall Law works against you. My bottleck is the network or the disk and if I could look at the memory bus, the memory Then do you deliver 100Gbytes of network? 5GB/s disks? 600GB/s memory dimms? In fact you need all that since I stress all that on my home PC. My 2 CPUs are iddle 99% of time, and when not idle they consume only 10% of the cycles in average. Now, maybe you want me to buy this machine, and give the cycles to SETI@home. A real alternative is that you made a major breakthrough in compiler technology. But without massive data flow, I dont see what new problem you can solve assuming you made a breakthrought at least on the 3 algos I cited above. Then next times in place of photos of protos that prove nothing, please offer diagrams of the architecture and details on bandwidth paths, latency, ... and explain where is the innovation
For people interested on massive multicores look at what INTEL is doing this is more serious
and they try somewhat to solve the memory bandwidth issue. This is horribly hard problem
and this can ONLY be solved with massive investments and research that only very few large
companies can do it, all others will disappear because there is no market for a processor that
cost more than 10$ since we plan to have 32 on a motherboard.
Who can develop that in 32nm?
Again this is NOT just hardware this is even more a software problem.
How many people know to program effectivelly 2 Woodcrest?
How many people will be capable to extract more than 1% of that future massive multicores?
Does Windows effectivelly use multicores? and why not?
Then what is the market for massive multicores?
There is a chance that Moore Law become ineffective not because it can not deliver the promise
of doubling the number of transistors every 2 years but because it is useless to do it.
I Can't Believe It's Not A Beowulf Cluster
DEC and DG are no longer. Today's marketeers allow no middleground - it's either a microcomputer (implying small), or a supercomputer (implying powerful). The terms used to have some meaning - now it's just marketing fluff.
"National Security is the chief cause of national insecurity." - Celine's First Law
How about Sybil, based on the chick with 16 personalities. personality one: I am the worlds fastest parrallel processing super computer personality two: I am a toaster personality three: I am a yo yo personality four: syntax error personality five: yes but do I run Linux? etc.......
The Tits. "You get that new supercomputer on a chip?" "Yeah, its The Tits." "Awesome."
You're doing it wrong--http://youredoingitwrong.mee.nu
Oh fer...
An article about a supercomputer on a chip and nobody memes the hell out of this one? The article doesn't even answer what I thought must be the most OBVIOUS question!
"But does it run linux?"
Yeesh!
how is babby formed?
Lets program an FPGA and write a cheezy 'spawn' scatter/join function to allow "desktop applications" to benefit from parallel processing.
And then lets tell the world we actually accomplished something thats in any way useful or new by doing this...
Someones ego is in serious need of a parallel deflating algorithm.
Plus, as a bonus, it connects to Monty Python via the Cheese Sketch.
Wish me luck.
How the hell is this single chip? TFA says the prototype has 64 processors. Speculating that the prototype may eventually be produced on a single piece of silicon is commonplace. People have speculated that just about everything would end up on a single chip by now.
Deleted
Suppose you hire one person to clean your home, and it takes five hours, or 300 minutes, for the person to perform each task, one after the other," Vishkin said. "That's analogous to the current serial processing method. Now imagine that you have 100 cleaning people who can work on your home at the same time! That's the parallel processing method.
100 people trying to clean my house at the same time would be slower than 1, because no one would be able to move or breathe. Which is exactly what makes parallel computing hard.
It's not wasting time, I'm educating myself.
At the moment, our software is mostly designed as a script. 1, 2, 3 we push the instructions onto the CPU. As you say, sequential.
But we already have a different way of thinking about getting information, client/server. With the Internet, millions of people get the information they need by asking a server somewhere. Instead of applications running sequentially on a cpu, shouldn't they be parallel by default, little bits of client code querying and updating little bits of server code.
Deleted
Right. Not sure I'm with you there. 256 cores is a lot, and I doubt that the infrastructure of (e.g.) memory bandwidth and power supply would be able to keep up with such demands.
Right. You know, I'm sure the fastest desktop processor you could buy in June 2003 had a clock speed of about 3GHz. Clearly I'm imagining the availability of 4GHz chips on the market today. Yeah, sure, it's slowed down. It hasn't stopped, though. I'm also clearly imagining that Core2 chips achieve more calculations per core-second than Pentium 4 chips running at significantly faster clock speeds.
Basically, the entire thesis here is that improvement in individual processor core performance has been halted for the last 4 years. This blantantly is not the case.
Can somebody help me out here. I've never heard of this "random access machine" model. Are we talking about a von Neumann machine, or something else?
Well, duh. Thanks for enlightening us. To improve performance, minimize the number of steps that must be taken sequentially, and perform as many as possible in parallel, but don't make too many parallel ones either. Clearly revolutionary thinking, there.
Err, OK. Me, I thought starting several threads and making them all write to the same location would result in an unpredictable choice of the values written being stored in that location in standard languages. But then I don't have a PhD in parallel programming techniques, just ten years of industry experience writing multithreaded software, so what do I know?
All successful general-purpose computers since the 1940s rely on the so-called Von-
Neumann apparatus. Is there a way to upgrade, rather than completely replace, this
successful apparatus to handle parallelism?
Yes. You place multiple von neumann machines with access to the same memory, and provide wtructures for sending control signals between them so that a thread on one processor can start a new thread on another one. This is generally called SMP, and has been used extensively since a long time before this paper was written, so why are you even asking the question? There are of course alternative approaches (e.g. NUMA) that can provide better efficiency in some cases, but the basic question is answered already.
I've always wanted a computer named Steve...
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte
Chuck Norris does not sleep. He waits.
This could be the bestest thing in supercomputing EVAR!!1!one!1
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte
Here's the webiste of a class at Berkeley that is designing totally new chip architecture, something actually innovative and quite interesting in my opinion. http://research.cs.berkeley.edu/class/fleet/ It's still a few years away from being practical, but they are hoping to have in-silicon test chips very soon now.
Your claims are valueless because 'yer anonymous, coward. I made it up!
illegitimii non ingravare
Tim?
All that really matters here is how fast it runs Microsoft Word and Excel. You may not like it. You may want to mod me Troll or Flamebait, but to 80%+ of the population, as long as their PC brings up e-mail faster than they can type, shows movies without dropped frames, and quickly runs Word and Excel, that's all they care about. Blazing Folding@Home scores simply don't translate to a computing experience improvement. It's either faster enough in MSOffice, or it isn't. Sad, but very true.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
what do you call a dog with no ears..... "it dont matter cuz he aint hear you no how"
--as told by my co-worker
> parallel processing on a single chip and is 'capable of computing
> speeds up to 100 times faster than current desktops.'
Toshiba plans on releasing a laptop in six months, complete with 448MB of RAM (512 - 64MB for shared video RAM).
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
AJAX / Web 2.0 stuff is mostly about re-creating the exact same apps that used to be built natively on the desktop, i.e., a new, more CPU/memory-hungry way of doing the same old stuff, but nothing fundamentally new. A blog is a web page that someone else hosts for you and that you can create without needing to know HTML. Social networking actually is looking to be pretty revolutionary, sociologically speaking, but technologically, it's just email and web pages.
It's (probably) true that there's a limit to how many GHz and GBs we can soak up by finding new, less-efficient ways to do the same things over and over again. So yes, a 100x speedup might seem unneeded. But that's failing to take into account the really new technologies that [could|will] come along -- stuff that would be as revolutionary as the CLUI->GUI change.
It's not even that hard to guess what sort of app this might be: offhand, I can suggest fully-immersive virtual reality and speech-controlled UIs, which have been predicted in sci-fi for decades. The only reason they haven't arrived yet is that our computers are still too damn slow to do it properly. (Voice recognition is just about now crossing the threshold of being "accurate enough", but that's just for recognizing the words you're speaking -- for a UI revolution to happen, we also need some major natural-language processing and AI advances.)
In short: Yes, there's plenty that we could do with a 100x faster CPU.
David Gould
main(i){putchar(340056100>>(i-1)*5&31|!!(i<6)<< 6)&&main(++i);}
... oops, sorry; those are already taken...
"Rich algorithmic theory" probably means "you cannae afford it".
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
I'm thinking Die on Fire would work nicely.