NCSA To Build $53 Million, 13-Teraflop Facility
Quite a few readers submitted news of a distributed system to be built by four U.S. institutions (mostly) out of IBM computers, and paid for with a whopping grant. DoctorWho and november writes: "'The National Science Foundation has awarded $53 million to four U.S. research institutions to build and deploy a distributed terascale facility...' A link to the press release is here." An anonymous reader contributed a link to coverage on Wired, and GreazyMF to one of this story at the New York Times.
It's a $53 million($US) project. Using about a thousand CPUs at say, $1000($US) each, you have an expense of $1million. Clearly, the cost of the CPUs is not going to be where the project will be limited. It's the cost of integration. More processors which are cheaper individually will likely have a higher integration cost and therefore be more expensive, not less. The real question is why they choose Itanium which is really an unproven technology.
I assume we've never really used AIX?
Intel had an article on this on their internal web site today; it went on and on about their Intaniums used in this system. But not once did they mention the OS used!! I don't think Intel wants to be associated with Linux.
Custom systems- whether completely novel, or a
scale up of a commercial system- always have
very high overheads.
First, you have a dedicated hardware and software
support crew. A production system ammortises
this over multiple deliveries.
Second, you are pushing the envelope. Though it
looks possible on paper, you don't always know
what won't scale up properly in a cutting edge
system.
Third, educational institutions (U of I) charge
large overheads (@50%) for existing buildings/staff.
The largest systems just don't get built
unless the government subsdizes some of the costs.
If you are lucky, the contracting company learns
new things to help its commercial side.
I do think it is:
I (gasp) -- LOVE (huuugh) -- THIS (aaarrr) -- COMPANY (shhhhlop)
The machine in Britain would barely rank #48 on top500.org, so what's your point?
120 characters isn't enough to explain it.
13 teraflops is a pretty big toy.
"If Microsoft had offered common external interfaces in the first release of NT, and not those bloated buggy propriety standards years later, they might actually have managed to produce a useable OS that enterprises could then integrate into their existing data centres, rather than boxes that perform tasks in independant installations."
Ah, but then there would be no incentive in the future to replace those machines. Microsoft, as the subscription based licenses show, cannot merely sell a product and live off the income. That's not how you maximize profit. You keep them paying, and make sure they can't pay anyone else. That's how a monopoly works - you don't play nice with anyone else.
"I object to doing things that computers can do." -- Olin Shivers, lispers.org
2003 ibm brings into operation petaflop bluegene computer for a cost of 100 million. .. that would bring the price of a quality 10 terafloop machine into the 1 mill range .. i think that makes for some some interesting posibilitys ...
This project will flop most terably
sorry
A former co-worker of mine could make an NT server scream. It had uptimes of a year or more.
:)
I bet he just used VMWare to run Linux on top of it
-Billco, Fnarg.com
Because most (if not all) of the applications of this super-cluster will probably be research. Scientific research. Scientific research that requires insanely high-precision numbers. 64 bit processors go a long way toward high-precision without using any scary high-precision math libraries. Or the scary high-precision math libraries that you do use can be tweeked for 64 bit processors thusly resulting in faster math. That's the name of the game here. Faster math.
Beyond that, you really need as high a processing-power to memory-transfer cost ratio as possible. When you are dealing with highly coupled simulations (such as wireless simulations) you pay dearly for cross-processor memory IO.
something clever
With 450 Terabytes, we can give almost 7 generations of people music for a lifetime, without repeating...
Let us examine:
450 Terabytes
4.5mb per mp3 (average)
Thats 100,000,000 Mp3's
Lets take the average length of music = 3.5 min.
Thats:
350000000 Minutes
5833333.33 Hours
243055.55 Days
34722.22 Weeks
667.73 Years
Is there any venture capitalists interested in this idea? I think that this could be one great consumer service!!
is that I can't remember the last time I heard the words "microsoft" and "windows 2000" and "cluster" used in the same sentence.
It's all linux baby!
McKinley is the second generation Itanium CPU which is at least a year away from production. The SGI cluster is using the first generation Itanium CPU (also known as "Merced") which is actually just a technology demonstration, and not a full-blown product from Intel.
Saying Linux isn't tested nor built for clusters this big is a little like saying that sand isn't meant to go in car windows.
Linux has ten years and millions of manhours' worth of development and refinement that has gone into it. You wanna do WHAT from scratch?? PASS!
A cluster is still a machine-by-machine entity, which is the level that the OS is working at; it's the "hooks" you create that facilitate cluster behavior. If you want to write an actual "cluster OS," i.e., one that does not have a context on a single machine, then by all means, go for it, but don't blame these guys for building something by integrating mostly pre-existing parts in order to get the behavior they seek.
Forgive me for the harsh subject line; it's been a long week!
I should be able to build a single machine this fast for about $1,000 in 10 years. Do you think they'll be done by then?
These opinions guaranteed or your money back.
Most people can visualize a hundred or so boxen a lot easier than a thousand or so. It gets a little unreal. So the Brit site with pretty pictures of the system is a good site for those not familiar with the larger systems.
They have other pretty pictures from their work as well.
"It is a greater offense to steal men's labor, than their clothes"
So since we don't have faster processors (relatively) we will have more and more processors.
I do not advocate spending Billions on teaching how parallel programming works and how to use PVM and MPI effectively, but I do think it is time that it become a standard theme at the college level CS world. That means that the professors learn how it works and then have access to equipment that allows everyone to have the experience.
-- Multics
If they're IBM machines most likely they're going to use Linux... IBM is making a company wide push to the linux platform.
Now my only question is... where can i get a beowulf cluster of these babies? That would be sweet...
can't sleep slashdot will eat me
In fairness they're going to be used for the next generation of particle physics experiments at cern, fermilab, slac and a couple of other places, some bio work on protein folding and a few other things.
While I'm sure many members of the audience would like to see NSA's hand in here somewhere the processing power is needed since CERN's sending out data from the experiments at 40TB a second (ok, ok I know it gets filtered down to only 100MB/sec)
Which is the problem, while these 4 systems make a nice addition to the GRID we need more supercomputers!!!
But how do I get the monitor? :)
-- "Other than that, how was the play Mrs. Lincoln?"
11. yeah, it can do all that, but can it get me pr0n?
The current number of procs is 3300. The machines they are using are 16 headed machines and they are installing 1024 machines. That means they are only populating 3 procs in each box. This means that over time they can expand the system to over 52000 procs. Sounds like a good way to go. It will allow expansion with time and money.
One could also calc PI now on a beowolf now. So yes, I would like to see a beowolf programming class in the college courses. Have is as intro, hardware setup, software setup, programming, advanced topics (weather).
make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
Actually, it would probably cost more per instruction/sec. to use cheaper processors since each one (pair? 8-way?) will need its own motherboard, RAM, etc.
Always nice to see professionals understand the benefits of open source that no closed source movement could possibly replicate.
While I am in broad agreement, do not take the announcement of this machine as another blast in the direction of Micro$haft, or another nail in their corporate coffin. If a closed-source system is built correctly, and presents consistent and well-documented interfaces to the outside world, then it can be just as effective.
Business didn't employ Unix because they could get the source code, they bought it because it followed interface standards, and it was thus easier to get your Unix boxes to talk to your S390s and your Unisys 2200s and your VAXs etc etc etc
If Microsoft had offered common external interfaces in the first release of NT, and not those bloated buggy propriety standards years later, they might actually have managed to produce a useable OS that enterprises could then integrate into their existing data centres, rather than boxes that perform tasks in independant installations.
karb: I'm going to kuro5hin!!
Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone
The first thing I looked for was what OS it used. Linux seemed as a good choise, but being no expert I wonder if even Linux can efficiently utilize 1300+ Itanium processors. I realise that Linux (me being a big supporter myself) will have the wanted customizability, but wouldn't making a OS from scratch (Linux-like if that's best) Afterall Linux isn't tested nor built for clusters this big.
Look a monkey!
I used to think AIX was a lame flavor of Unix... until after I'd been the sysadmin for 5 years for a govt organization that runs a mixture of AIX, Solaris, HP-UX, Linux, *BSD, and NT. I used to prefer Solaris, but now AIX is my favorite. It's the most stable by far, and performance is top notch for the hardware it runs upon. True, it's got its quirks and wierdnesses, but they all do. You just get used to them over time. The AIX LVM/JFS and memory management is the finest of all.
to eliminate the tyranny of time and space limitations.
This time and space flaming has got to end. Granted, time and space have a monopoly on time and space, but it is a *benevolent* monopoly, which is ok with every legislative body in the world except the EU. Time and space have prevailed as the primary purveyors of time and space through quality, perseverance, and generous donations to any political party that would take their money. So, lay off, slashdot!
Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone
-- "Other than that, how was the play Mrs. Lincoln?"
"Scientists involved in the project said the facility would help researchers understand the origins of the universe, cure cancer, unlock secrets of the brain, predict tornadoes, and save lives in an earthquake" yeah, but can it find me pr0n?
$53 million for a cluster to provide that power is dramatically less then it would cost from a vector outfit like Tera, NCR, Fujitsu, SGI, Cray, etc. etc. However, you can get more bang for the buck. I priced building a cluster, with gigabit switches and all that, for 13 teraflops around 8 months ago, to be around 20-25 million. Prices on processors have dramatically dropped since then. Like mentioned on a previous post, use cheaper processors, Itaniums dont have the price/performance ratio a Athlon 1.4GHZ, or a Intel P3 would have. Sometimes using the newest technology isnt always worth it.
Jeff Knox
"Quite a few readers submitted news of a distributed system to be built by four U.S. institutions..."
Looks like our "Slashdot Distributed Story Submission" (SDSS) is working quite nicely.
No, I'm not a zealot. AIX does, however, provide me a nice comfortable living... it can't be all bad.
-- "Other than that, how was the play Mrs. Lincoln?"
You mean the Globe is not an OS? Think about it for a while - you can set your own enviroment in which you operate and it is a complex system.
how long it will take this thing to decode on data block from seti@home?
We REALLY need to frind a different term for measuring floating point operations. Anyone from the country, or has spent time their, can tell you that a cowflop, sometimes shortened to just flop, is the results after a cow is finished with the grass it ate. I see the term terraflop and frankly I reach for my boots figurring this is going to be a big one....
Papa Legba come and open the gate
Aww, shit. nevermind.
NetInfo connection failed for server 127.0.0.1/local
I think any OS can be an entreprise-level OS in the hands of the right person (even M$ Windows and OS/2). A former co-worker of mine could make an NT server scream. It had uptimes of a year or more. Very stable, very reliable---in his hands. We had a similar box for in-house purposes. Almost the same hardware. It went up and down like a damned yo-yo----in our hands. A similarly gifted AIX person can do similar things. The average Joe can't though. The average Joe can't make termcaps work right in AIX let along secure the box. I'd love to run PPC Linux on our 6k's. It would really make those boxes scream. Anything is faster than AIX on those boxes.
I would personally love to have the time to get really good Solaris experience. Sure I probably wouldn't use it in the end unless I became the admin of a number of Solaris boxes but still I'd like the experience. I'd like to shadow a good Solaris admin for a couple weeks.
BTW, the original post was 90% humor and 10% sarcasm.
--Mike--
If you want to compare, a better match is what NCSA is already running. 1024 processors, over half a TFLOP sustained, a full TFLOP at peak.
"I am a cipher, a cipher, wrapped in an enigma, smothered in secret sauce" -Jimmy James
Interesting points, but you do have to remember that massively parallel systems aren't for the masses anyway, and normal programmers don't wrestle with these "0.0001%" of problems that demand this kind of power. The fact is that those small percentage of problems aren't always trivial theoretical problems that don't have impact on our lives, but are more often things of practical importance to scientists and the military. Nuclear reaction simulations (both weapons and energy), protein folding, DNA sequencing, molecular simulations...all very very intense computing problems that demand powerful computers to produce better and better simulations.
We need more programmers to program the machines? Maybe. This is an important but niche market, and throwing billions into education so that kids with bachelor's can call themselves super-computer programmers isn't the answer. The systems are already programmed by brilliant people researching these problems, doctorates all around. This isn't work for your average 15 year old 3r33t haXor, you know?
120 characters isn't enough to explain it.
Then maybe the government would discover some intelligent life, because they obviously don't have any.
~ now you know
Check this out: The software They're running
Big deal, the article claims its going to use McKinley based Itanium processors. Which are at least 2 years away from production. Plus they are using 1300 processors, while the one in Britain only has 152 processors. Quite a bit of a difference if you ask me :)
Let's just hope it doesn't run AIX. 'When you don't understand Unix, you probably run AIX.
the "cosmology machine" is small fry compared to the Cray T3e we have in manchester (www.csar.cfs.ac.uk)
and that's our old machine...
but it probably won't pay their first loan payment on that behemoth. skye
We're using a cluster very similar to this to run our wall.
- Win the RSA factoring challenge, put the money in a swiss bank account, and feed Illuminati(tm) back the account number.
- Use genetic programming to predict the stock market, making billions of dollars from the $500,000 won in the factoring challenge.
- Buy and sell peoples lives, based on loyalty to myself and Illuminati(tm).
- Voila, world domination
Pinky will probably screw it up, as usual.std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
This is not a big SMP machine - the kernel does not have to manage all 1300 CPU's at once. Instead, there will be 1300 copies of Linux running (in the long run, you don't really want the OS involved much anyway)
It totally depends on exactly what they'll run on it, but based on what's currently running on the NCSA machines the concerns will be a high speed, low latency network (which they got in Myrinet - note that I didn't say cheap) and a good MPI implemenation to take advantage of it. Both LAM and MPICH have Myrinet-aware implementations, and they're both pretty fast.
while open source is useful here, you shouldn't use this argument as a justification for the GPL. the BSD license would more than suffice for these purposes.
The GPL seriously undermines the commercial viability of software.
Hmmm, take a look at this Slashdot story, also on today's front page. It looks like somebody just built the first Itanium cluster. That's really impressive if the chip doesn't exist for another two years...
"I am a cipher, a cipher, wrapped in an enigma, smothered in secret sauce" -Jimmy James
Since my suggestions are not welcome, I am leaving slashdot. I would have figured Troll -1, but off topic??? Adios.
To-do List: Receive telemarketing call during a tornado warning. Check.
Very little of the system time is likely to go to waste. I'd say a likely down time is only a couple of % since there's always a long queue waiting to get on and there's a lot of stuff being done at the moment in this area. Or to put it another way - it's not going to waste
If you look at the details of the system it doesn't handle email or web traffic, just physics programs which will be submitted through a single node which then distributes it out to the 128 processors so there wont be any user data on the machines just temp files from the data being run on each processor.
Backing up data is likely to occur through the huge amount of storage currently being purchased for the UK-GRID and tape. What is there to protect? Monte Carlo simulations of cosmology experiments? this isn't personal or corporate data, one bogus result is unlikely to throw the experiment off.
Anyway, this is only one of a few new systems in the UK which are getting announced at the moment, so although they aren't as large as the ones being *talked* about in the states they're here, now and working while it'll take 2 years before the american ones come online.
Yeah that's why they're making all the details so public. >SMACK<
On the other hand, think Contact: government philosophy: "Why buy one when you can get two at twice the price."
m00.
... and I will be using this cluster for my distributed.net client...
FLR
The HitchHikers Guide to the Galaxy is a couple of years behind, technology has advanced since then, even my pocket calculator can answer that question.
it's 42.
Modding down an AC is like kicking a baby in the face.
frist prosts r kewl
NCSA is certainly an important part of this partnership, but they're neither the only part nor the lead site.
so if there's a distibuted client app that also allows for file sharing, everyone could download it and we'd all have supercomputers. I saw 40TB of content on LimeWire yesterday, granted it was mostly music and not scientific data. but after decoding a music stream and loading a webpage, what do you do with all those extra clockcycles anyhow? how about providing a globus interface in the major Linux Distros, so you could subscribe to the grid along with system updates and supoort options. sure it'd piss off my ISP but what the hell do I care?
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas
... yes, but you'll need a Microsoft Passport account to gain access!
I've heard that the algorithms to calculate tomorrows weather exists, but todays super-computers uses two days to calculate it ("And yesterdays weather was: %s" % (calculate_weather())" Will this do it? If so, they'll need two, one for the weather and one for all the stuff they planned to use this for.
Look a monkey!
Myrinet is not cheap. If you look at their prices, 16 cards, and a 16 port hub, will set you back around 30grand. Assuming dual proc systems, thats only a 32 processor node. :P It however has killer bandwith (254MegaBYTES/second (1.96Gbps), and extremely low latency makes me drool). The klat (i think thats what it was) cluster that used the genetic algorithm to design the network, and a 3-4 cheap nics in each machine, and wirespeed switches, was a pretty good idea. Semi low cost (20 dollar nics, and the switch), and the speed rivalled gigabit solutions, for ALOT lower cost.
Jeff Knox
Every cluster I know of (around 20 systems, 14 sites) is not for want of cycles, they need programmers to write the codes to eat the cycles. There are not enough small 'education' clusters to allow everyone the education & experience.
Even just $1m of that could be much better spent in education instead of feeding the 0.0001% of computer problems that currently need this class of hardware.
-- Multics
Seems to me... fast networking, collaborative computing, peer-to-peer information sharing, autonomous virus communities. We're heading towards a massive parallel global computing system controlled by no single entity.
My blog
if the NSA would build such a computer, you think they would announce it to the world ?
it may already be out there
</paranoid>
This is impressive, but the nasa machine will blow it out of the water.
"It is a greater offense to steal men's labor, than their clothes"
Actually, I didn't stick around for as much of the press conference, 'cuz I had WORK to do! Many press releases on the DTF make it sound like it's one cluster in one lab's basement, and that ain't right. As importantly as looking at the distributed nature of the project, look at what each institution is contributing- this isn't a homogeneous wide-area cluster. I don't have a big part in it, and my internship is almost over, but I'd like to think that what I've been working on for over a year may become well-known soon. So yeah, while the press conference was going on I was in the next room working on enhancing a visualization library to work on tiled displays, (which has been news on /. recently. Too bad few managed to find our work here- We gots neet stuph).
Now an obligatory Oh, puh-leeze! RC-5 cracking? Quake? We've already seen Quake3 in the CAVE. Listening to conversations at the reception, there are much cooler things coming..
Cryptomancer, working the magic on code
Yes, we understand these tags always apply: fud, dupe, typo, slashdotted, topic name
Yes - using linux is all very fine and well but it has some nasty suprises. For example on RedHat 6 upgrading to the next version of Sun's JDK (in this case 1.3) requires an upgrade to a new version of certain libraries and the recompiling of most of the software on the system.
While this is fine on a home hobbyist machine it is not very good if you have multiple users and especially not if you are selling computer time to companies. And why do you need Java 1.3 you ask? You need it because the Globus CoG toolkit needs it.
2**128 = 3.4e38
13 teraflops = 1e13 instructions per second
Assume 1 trial decryption per instruction
which is of course unrealistically low.
You still need 3.4e25 seconds or about 1e18 years to search that keyspace.
Sorry, no cigar...
will it be running the NCSA server software or will they finally switch to Apache? ;) ;)
Top Most Bizarre/Disturbing Error Messages
Actually it's in use where I work and personally I can't stand the damned thing.
I wonder how much of that power goes wasted into the regular administration of the site, idle time, everyone's web and email traffic, and storing employees' pr0n pix and mp3s, instead of the science it is intended for. It is my experience in a corporate environment that no one ever cleans up disk or mail boxes and they don't consider the impact of running non-essential processes on compute servers.
Also, what are they doing to protect and backup that much data?
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
Because that's some powerful encryption breaking power... if you know what I mean...
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Someone must be able to do their maths out there and figure out how fast you can crack 128bit encryption at 12 terraflops? (and you know the US govmt. has a centre like this alread right?) Matt
The hope is that -- as an open source network using Linux and standard IBM servers -- it will be easily expandable and able to follow a similar trajectory to the Internet.
"The only way to do this project is open source," project director Stevens said.
Interesting that researches know that open source projects are the only way they can control all the variables. After all, if you don't control the OS, you can't be sure some little bug in the code is screwing with your data. Universities have long understood this principle, which is why Unix is so popular. Now our millions of tax payer dollars will be spent on research rather then licensing costs, plus the research is controlled, scalable, and open to peer review. Always nice to see professionals understand the benefits of open source that no closed source movement could possibly replicate.
Lawrence Lessig is my personal hero.