Supercomputer to Hit 1.6 Petaflops With 16,000 Cell Chips
tygerstripes writes, "IBM has announced that they are gearing up to build the world's fastest supercomputer, more than four times faster than the reigning champ, IBM's BlueGene/L. Nicknamed 'Roadrunner,' the new machine will be a hybrid of off-the-shelf CPUs and Cell chips designed for the PS3. Roadrunner is to be installed at Los Alamos National Laboratory, occupying 1,100 square metres of floorspace (that's a square about 110 feet on a side). According to the BBC: 'The computer will contain 16,000 standard processors working alongside 16,000 Cell processors... each Cell is capable of 256 billion calculations per second.'"
OS/2 compiles your homemade C code faster than you've ever seen before!
Appended to the end of comments you post. 120 chars.
now we know why they cut shipments by 1 mil units. IBM wanted to build 62 supercomputers.
Just in time for the Vista RC1 release!
I guess.
IBM is also building a slightly slower computer, called "Wile E. Coyote", which is slightly slower. They are currently attempting to work out the bugs, as it keeps crashing...
Zagreus sits inside your head, Zagreus lives among the dead, Zagreus sees you in your bed and eats you in your sleep.
So, is this the reason why the PS3 release has been delayed?
The owls are not what they seem
This was reported a couple of days ago on el reg http://www.theregister.co.uk/2006/09/05/ibm_roadru nner_amd/
That is absolutely ridiculous. I want it!
For one, they gloss over whether they mean floating point operations or "calculations" per second. The article seems to equate a flop with "calculations per second". The flop, of course, came from floating point operation. Even then it's vague--is it single, double or double-extended?
Yes, it's certainly better than the old "megahurts" races. But I think they could come up with something better.
British 'billions' are American 'trillions'. So this may be running three orders of magnitude faster than you first expect.
Abacus to the millionth power!
but will it run Linux?
or...
but will it play ogg files?
has been identifed as sub-standard components delivered by a third party company called "acme".
These components had a tendency to either explode at in-opportune moments, or behave in a manner that while was true to the letter of their description was totally ineffective for the desired purpose.
At the moment each side is gathering its hoards of lawyers and all involved are jumping up and down, waving thigh-bones in the air and screaming incomprehensible abuse at each other.
I am Slashdot. Are you Slashdot as well?
Roadrunner is to be installed at Los Alamos National Laboratory, occupying 1,100 square metres of floorspace (that's a square about 110 feet on a side)
.. enough for 4 interns including desks.
Why mix the units like that? It's either 33 meters a side, or its 12,100 square feet. Mixing units is the sort of thing that can only lead to errors.
And for the record, sqrt(1100m2) = 33.17 meters = 108.83 feet a side. 110 feet per side gets you an extra 24.13 square meters
http://twitter.com/onion2k
A Beowolf Cluster of these!
16,000 *600$= 9.6 million. That doesn't seem like much for the biggest super computer.
God spoke to me.
The roadrunner is also the state bird of New Mexico, location of LANL.
http://en.wikipedia.org/wiki/Roadrunner_(bird)
It was always ironic to see them running up and down the road in front of my grandparents home.
The world is made by those who show up for the job.
The new machine will be a hybrid of off-the-shelf CPUs and Cell chips designed for the PS3.
So you're saying that if I buy one of these systems I'm going to be locked into the Blu-Ray format?
Interesting sidenote in the article not mentioned here:
"The laboratory is owned by the US Department of Energy (DOE). Eventually the machine could be used for a programme that ensures the US nuclear weapons stockpile remains safe and reliable, the DOE said in a statement."
Why do I get a weird feeling that I've seen this sort of thing in one too many movies?
1,000,000 - 16,000 more cells for IBM to fab by the end of the year
Change your name to Homer Junior! Your friends can call you Hoju
I've often heared that modern games have huge computing demands. But I didn't know it's that bad!
The Tao of math: The numbers you can count are not the real numbers.
And this is from BBC News, no less. <sigh>
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
And still it only runs F.E.A.R. at 25fps... weak...
I Like Pie...
I thought BlueGene/P was targeting a petaflop?
I don't think this Cell based thing is its replacement. If BGP is still coming, it should be coming soon:
link
http://news.zdnet.com/2100-9584_22-6112975.html
we'll laugh at such a large room full of computer equipment, the equivalent of which will be powering our mobile communications devices in a 150mm x 150mm package.
I know the cluster has ~ 2,200 4-socket dual core 2.2 ghz opteron systems (x3755) in the aggregate, each socket tied to 8GB consisting of 8 1 GB DIMMs. Each node will be connected to a blade using infinband, the blade having 2 first-gen (i.e. PS3) cell processors. I'm not sure how the second gen cell processor part will be (that's when it is supposed to get interesting for 64-bit precision operations, the only ones that count for top500). BTW the systems will also all be connected to each other by an inifinband fabric. I don't know if ultimately the cell processors add up to 16,000 chips, but I do know the number of physical AMD parts will be about 8,000 or so, though you could either say 8,000 processors or 16,000 depending on how you count dual core...
Of course, there may be other challenges top500 wise beyond the first-gen cell limitations. I know the cluster is supposed to have some bits operate on classified problems and that will begin before the entire setup is there, while other bits are to remain working on unclassified stuff. I don't know how that impacts them, someone at LANL may be able to answer as to whether they could run a big linpack run once complete across the typically distinct units of the cluster. Of course, the first gen cell blades will not deliver remotely impressive top500 numbers, only 32-bit precision operations. The 1.6 petaflops number I'm not sure is intended to be a 64-bit precision number, and therefore isn't necessarily directly comparable with the BlueGene numbers on Top500.
Also, I'm not sure if the cell blade is a proven platform for IO performance (i.e. pushing the Infiniband). The blade is largely based on the PS3 reference implementation, afaik, and of course in designing that they didn't necessarily worry too much about high-speed interconnects. Of course the cell blades have no high-speed graphics to worry about, so whatever communication mechanism used for that may be redirected for inter-blade ccommunications.
Other tidbits, the x3755 is a 4U box, and they have no more than 6 per rack (to leave room for bladecenters), and this means on the order of 400 racks or so. It will be running linux (that's nearly a given in the top500 nowadays). For a cell processor to qualify to be in one of the blades, all 8 SPEs must be workable, and all 8 will be usable by developers/users, the core os generally running only on the modest PPC core, unlike the PS3 which will contain a single cell part that may contain a failed SPE, and Sony reserves the use of one of the others at all times, limiting application developers to 6 SPEs, but the Cell blade of course doesn't have anything but serial console, so no gaming on cell blades....
But it costs $600 and doesn't even come with an HDMI cable and looks like a George Foreman Grill!!!!!!one111!
Oh wait, sorry, I just saw PS3 and had a Zonk-attack, my bad.
Poor Coyote, Road Runner will kick his ass as always. Is there a justice in this world?
Did the thought of this monster computer give anyone else an erection? Um...me neither...
I love Slashdot.
Though not necessarily 64-bit precision flops, as are required for top500 scores... The cell isn't impressive double-precision wise.
XML is like violence. If it doesn't solve the problem, use more.
This just means there will be one more annoying asshat bragging about his Counterstrike framerate.
Wow that's a lot of Playstation 3s.
Notice is says " Supercomputer to Hit 1.6 Petaflops With 16,000 Cell Chips" ... "To Hit" meaning, "Up to"? There's no way that all 8 cores in those 16,000 chips are going to be working 100%... Are they accounting for this factor?
just a tought... but would it be posible somehow to use old cellphones to build some kind of supercomputer???
"Yeah, but will it run Linux"?
"Imagine a BeoWulf cluster of these"
But is it fast enough to figure out the Answer to Life, the Universe, and Everything?
Introducing Microsoft Vacuum 1.0 The first Microsoft product that doesn't suck.
Can it run Windows Vista Ultimate
Reason #32767 not to use VB6: Integers are 2 bytes... Think about it!
I know the Opteron servers are going to be 35 million dollars of good ole taxpayer money, I think it's about 35 million for the first wave of cell blades, and 35 million more for the second and last wave of cells. Keep in mind that there is a lot more infrastructure, cost for high speed interconnects, and that the cells used in this must have all 8 SPEs pass as opposed to the PS3 cells which tolerate a failed SPE.
Let's see the Xbox 360 do that...
http://www-128.ibm.com/developerworks/power/cell/
The toolchain and a simulator are freely available and run on Fedora Core 5 systems. Take a look for yourself.
occupying 1,100 square metres of floorspace (that's a square about 110 feet on a side) It is not abnormal to present both english and metric units in a presentation such as this when you have a diverse audience (the source was the BBC, although it appears the article submitter did his own math to get the side length). 110-108.83 = 1.17 meters, which is 1.06% off. Which is more than acceptable when talking to the "common man" ... now when ordering the carpenting, on the other hand ...
I can't believe the scope failed to mention if it will run Linux. I can't believe I haven't seen any posts asking that question, either. Nor have I seen anyone imagining Beowulf clusters of these. What's happened to /.?!?!
Please correct me if I got my facts wrong.
So this is where all the chips are going.
...Imagine a beowulf cluster of those.
They were responsible for the development of the Cell processor alongside Sony.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Mmmm......sacrelicious.
I'll take part of the bait - anonymously anyway...
Compiz - nothing else comes close to virtual desktop in 3D, Video demonstration of Compiz on Xgl (linked at the bottom of wikipedia page direct to video)
I've been using Windows since 95 on a P1 all through 98, NT4, ME, 2000, XP, and my current XP2003 with 200 and 300 gig hdd's, Athlon 64 X2 4200, XFX GeForce 7900 GT, 4 gigs of ram, and twin/dual display 17" lcd's, - and Linux is great...
BTW: did I mention I also put Linux on my laptop (an Athlon 3200 with a gig of ram). Not everybody that runs linux cheaps out on hardware - in fact, more windows users cheap out on machines than linux - what can you say about Dell's running XP? not shit. On the other side, what do you have that can compete with supercomputers such as the one this article is supposed to be about - IBM's next 1.6 petaflop supercomputer - which will run what? LINUX!!!
It's big. It's powerful. But, in simple human talk, how much GHz does that thing have?
*cough* XGL/Compiz *cough*
And a Google seach for 'linux games' should probably shut you up about Linux having no games; change 'no games' to 'has very few commercially developed games', and you're right. But there are many games for Linux.
I know I'm not meant to reply to trolls, but I couldn't resist. I know that each OS has its strengths and weaknesses, and so should you. Tossers like you just piss me off. I have a dual-boot Windows XP/Ubuntu 6.06 system. I get the best of both worlds, so you can't give me shit like that and say that one is t3h suckzorz. If you want to just use Windows, go ahead. Tell everyone you like Windows in an appropriate manner. But no one cares that you hate Linux when you have are flamebait arguments to use to try and put it down. If you actually have any decent arguments, maybe someone would listen. But, at the moment, as I said: No one cares. Being good with a computer is recognising different needs/wants and being able to find and use or make programs to sort out those needs/wants. If the want is games, then Windows is probably better. If the want is a usually rock-steady, fairly secure system for everyday use, Linux could be the way to go. If you like open-source, you have one real choice. If you'd prefer closed source (for whatever reason), Windows is the way. Or have a dual boot, and your problems are sorted. Or, take an alternative route and get a Mac or use a BSD. Whatever, just recognise that different pieces of software are better at different things.
How about a nice game of chess?
Great, maybe the beginning of a new family of
RISC-processors which could be widely used,
and much more relevant, is cheap enough to be buyable for a average/power-user for homeusage
But the real question is: Where is the Linux/UNIX-powered
CELL-Workstation, so ein can run such a elegant baby
on my desktop? It was anounced years ago.
Anyone else bothered by the fact that the Novell demonstration of Compiz shows pirated movies on the computer being used for the demo?
"IBM says it will start shipping the new supercomputer later this year."
:)
You can preorder at http://www-03.ibm.com/systems/clusters/
...that Windows has become so bloated in the previous 20 years that it still just barely crawls on our pocket supercomputers.
...we're planning on sending it to Mars .
This space intentionally left (almost) blank.
I can do 1.8pflops with a #2 pencil, some scratch paper, and a few grams of peyote.
Building a machine that has several petaflops is one thing... somehow I haven't heard anyone about programming the bloody beast. I'm not afraid of multithreading, but keeping 16000 processors busy in a meaningful way (emphasis on the last four words, i.e. not as a simple load sharing farm) sounds like a non-trivial problem to me...
Somehow I'm less worried about finding problems for it to chew on. I think that works a bit along the lines of build the biggest cruiseship and people will be buying tickets just because they want to make a trip on the biggest cruiseship (even if they know in advance that the second biggest cruiseship is probably a lot cheaper and maybe even more fun because it's less crowded (although I am at a loss when asked to explain why people even want to go on a cruiseship at all)).
thats what really want to know, how many FPS does it give us.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
Imagine a .... posting like this gets by without a single "beowulf cluster" comment. Wouldn't that be something.
I heard a rumor that this computer is being sold to Sony as a prototype for the Playstation 4. It's supposed to be totally teh r0x0rz and only cost a gabrillion dollars. Another rumor says that the PS4 prototype may be portable which could explain why Sony is receiving large orders of batteries.
In the near future, there will be an article on the PS3's hardware specs.
Then some unimaginative Slashdotter will say "Imagine a Beowulf cluster of these!"
Then a (slightly) more imaginative Slashdotter will direct them to this article.
"When Roadrunner is finished in 2008 it will cover 12,000 square feet (1,100 square metres) of floor space at Los Alamos National Laboratory IBM says it will start shipping the new supercomputer later this year." I liked to order one myself, do they need to use the Airbus 380 for delivery.
An SPE is essentially a speedy stream processing unit, kinda like an FPU. You give it a stream of data and tell it to perform operations. It has a limited scope of functionality, each cell unit has at its core a low speed PPC core for general processing and managing the SPEs. And the user software requests operations be done on the data by spes, like FPU programming (or more applicable SSE instruction-style stuff) but generally a processor has one FPU so leveraging multiple SPEs is a tad more complicated... I think that's a fairly mundane rough explanation....
Now installation and integration of linux onto a system like this isn't as esoteric as many would think. There are some complexities due to the sheer number of systems, but generally in clustering you have a fairly normal linux running on a node. Generally installed via network (can't imagine installing hundreds to thousands of nodes via physical media), but the end result on a typical cluster is a bunch of linux boxes installed similarly to a normal server system, with probably a lot fewer packages though (no dhcp server, etc etc). The 'magic' of clustering is almost always done mostly in userspace, with schedulers/resource managers kicking off commands on each node (sorta like knowing the right time to ssh into each node and type commands manually and doing it fast), and some MPI implementation provides an API and mechanism for processes on disparate nodes to send data between each other. Common free ones are mpich and lam/mpi. A lot of cluster interconnect vendors (Pathscale (now Qlogic), Myrinet, Voltaire, Topspin(now cisco)) ship MPI implementations specifically optimized to leverage their hardware to acheive really fast inter node communication (mpich on ethernet may get about 50 us latency or so in a decent environment, Infinipath can get about 1.3 us latency for example).
This *particular* config is a little more complicated as among the 2,200 or so x3755 systems, only 16 of them will have any local storage, the rest will be booting diskless, running a small image in ram root and utilizing network filesystems for other stuff (and just sending the data across ib for another node to record, I don't know specifically how they will handle their data, only that they won't have disks). See http://www.warewulf-cluster.org/cgi-bin/trac.cgi for a system for manageing diskless images in a way amenable to clustering. Diskless is appealing particularly at scale as you can centralize your rotating disks into one place and manage the higher failure component more easily. Additionally, being a confidential site, physical security is also easier when the nodes have no persistant storage that someone can yank or misplace without thinking. With this, the only thing left in the nodes that moves are fans. BTW redundant fans are omitted from this config, which also makes sense at scale because the scheduler can work around nodes that are down and not be too terribly impacted by individual node outages.
There is MOSIX clustering which has a lower level kernel based system that makes a bunch of different nodes look like a big NUMA box, and applications with normal multithreading may run on different nodes without making special API calls, but at large scale this is less popular It's a little less manageable in some respects, it's harder to programatically recognize the difference between intra-system and inter-system communication (applications may be written to MPI and use multithreading and leverage processes on the same node differently than processes external to the node), and developers generally would have to tailor the app for massively parallel execution anyway, so using a different API isn't as much of an impact. At low scale MOSIX is a neat way to run a lot of multithreaded non-interactive applications (media transcoding and the like, generally multithreaded but not usually written against MPI or even PVM).
Here's an explanation. Keep this in mind whenever you read PR about vapor hardware... Most likely the confusion between FLOPS and "calculations per second" is not unlike the confusion between peak PR numbers, peak Linpack results, sustained Linpack results, and sustained application FLOPS. For example, no Cell processor ever reaches the impossible speed of 360 GFLOPS on any real world scientific application because of the real world problems of a slow interface to memory, storage, network, etc. which all chips have to contend with. When numbers are being used in a press release, all vendors in the industry benefit greatly from using whichever number is the largest and most impressive to the reader, even if it is completely impractical to a supercomputer user. Also, there can only be theoretical Linpack numbers for a machine that isn't built yet, so they have a rationale to explain such behavior.
-Those who would give up essential liberty to purchase temporary safety deserve neither. -Ben Franklin
I think this is an admission from IBM that the Cell processor has some good ideas, but is not an ideal implementation. I bet that is why they need many other general purpose CPU's in the computer.
The one main "CPU"(PPE) in the cell processor is far too weak, I believe we will find game developers harping on that for the next few years.
Maybe the next version of the cell will have a main processor that is much wider and more robust. Maybe even two of them?
You hope... Or we could be hitching horses to a buckboard to ride into town when the cheap oil runs out and there is no good energy substitute available.
Tired of all the isms, don't exploit people as an employer, or a government, mmmmK?
but will is play Counter-Strike?
"One man's "magic" is another man's engineering."-- Robert A. Heinlein
YOu have a source on that flamebait?
How to use coral cache: http://slashdot.org.nyud.net:8090/~oscartheduck
As always, all IMO. Insert "I think" everywhere grammatically possible.
Now we know why the PS3 is going to be late. Component shortages, according to Sony. 16,000 working Cell processors for this, and I'll bet it doesn't even play Halo!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Yes, it is, in fact, running Linux.
According to today's Austin American Statesman article , the other 16,000+ CPUs in this machine will be AMD Opterons.
And, the article also confirms that the machine will indeed be running Linux.
These '16,000 standard processors working alongside 16,000 "cell" processors' are Opteron processors. ..and it will run Linux.
2 10.wss
http://www-03.ibm.com/press/us/en/pressrelease/20
Ok, going to assume DP flops because that's more fun...
First, they have 16,000 2.2 GHz Opteron cores, at 2 DP flops/clock, thats... 70 teraflops Rpeak from opterons.. ok, not that exciting, but it would count...
I know their first cells will be the PS3 variety cells, so no DP fun there...
But yes, the final wave of cells are intended to be DP calculation oriented for this cluster...
Does it run Windows for Supercomputers?
My new blog
You must be new here...
Funny, in the article when they talk about Folding@Home on the PS3:
The Stanford researchers say that 10,000 consoles running the program would give a performance equivalent to one petaflop. The team hopes eventually to enlist 100,000 machines.
Odd how they need 16,000 Cell's and 16,000 Opterons to get a theortetical 1.6 Petaflops, maybe the supercomputer is using Cells with only 5 or 6 SPE's enabled?
How many football fields is that comutor?
Is it just a coincidence or you didn't mention that the "off the shelf" chips are AMD's, and that's beacuse of AMD' Torrenza that they'll be able to join two different architectures in one single computer?
with the power like this, you DON'T have to imagine a Beowulf cluster of these.
You can probably run all games released for the last 20 years on this sucker IN PARALLEL.
With all that speed, I can just imagine what a gamer would do with it. Probably feint at first, get up, and feint again....
_____
"Well Mr. Smarty Pants if that was so easy, then why did it take you 5 hours to figure it out?!"
"I hope you'll be happy when you run Linux and its cli interface"
Yeah, why do they use all those buttons in planes and stuff!!!?!?!! It would surely be a lot more efficient if they just used a one button mouse to fly them? In fact, get rid of the buttons altogether!! Evil!
which is totally what she said
With supercomputers, the massive interconnect system is easily more expensive than the processors and the memory.
Of course, that's why they beat cluster computers or distributed apps silly in their home turf tasks. BTW, big props to BBC for a delightfully clear explanation of this key difference, right in TFA!
PowerPC or x86 or something else?
Like this...?
Meta will eat itself
Yeah, like that, but much more intuitive! Input is formed from a simple, easy to learn set of 102 mouse gestures!!
which is totally what she said
This thing is powered by roadrunners with friggin' lasers in their heads!
I love the /. main page right now: IBM announce a 1.6Pflop supercomputer with 32,000 chips. Meanwhile, over here, a team of volunteers have recreated part of the Enigma-cracking Bombe !!!
Eat our high-tech dust yankees !
Mark, from somewhere in the UK
As the CEC (Chief Operating Coyote) and primary beta tester for ACME (A Company Making Everything) I will soon have in my possession a clone of the hardware which I intend to XOC while loaded with Vista RC1. Beep beep my ass, Roadrunner will finally be mine, mine I say, all mine, hehehehe !!!!!
/. OOPS! /.
wilec (super genius)
.
.
ACME is a wholely owned subsidiary of Microsoft Corp.
God... raytracing 3D animations used to really suck on my old abacus.