Slashdot Mirror


NASA To Get 10,240 Node Itanium 2 Linux Cluster

starwindsurfer writes "US space agency Nasa is to get a massive supercomputing boost to help get its shuttle missions back in action after the 2003 shuttle disaster. Project Columbia, a collaboration with two technology giants, will mean Nasa's computing power will be ramped up by 10 times to do complex simulations."

59 of 249 comments (clear)

  1. Geez, that's pretty impressive... by Skyshadow · · Score: 4, Funny

    ...but someone ought to tell them that Doom 3 runs pretty well just on moderately-new hardware...

    --
    Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    1. Re:Geez, that's pretty impressive... by AIX-Hood · · Score: 4, Funny

      Going on the mention of the previous shuttle disaster, I think they're trying to avoid doom.

    2. Re:Geez, that's pretty impressive... by gl4ss · · Score: 4, Funny

      well.. they would need to get the shuttle flying..

      and get that mars plan underway as well. no way in hell i'm signing up for UAC's mars base though no matter how exciting archeological findings...

      --
      world was created 5 seconds before this post as it is.
    3. Re:Geez, that's pretty impressive... by Keruo · · Score: 5, Funny

      nah, they're just preparing for longhorn

      --
      There are no atheists when recovering from tape backup.
    4. Re:Geez, that's pretty impressive... by Naffer · · Score: 2, Informative

      Because for what they're designed to do, the Itanium 2 is a damn fast processor that no opteron could keep up with. Its only at 32 bit processing that the Itaniums suck.

  2. Dupe? by Gothmolly · · Score: 5, Informative
    --
    I want to delete my account but Slashdot doesn't allow it.
  3. As the server? by lacrymology.com · · Score: 3, Funny

    Well, I guess they're not using it to serve that webpage.

    -m

    --

    #
    # Modus Ponens
    #
  4. Nice...but a dupe. by Agent+Green · · Score: 4, Informative
    --
    // Agent Green (Ian / IU7 / KB1JQO)
    // IEEE 802.3: All 10base Are Belong To Us
  5. Are we gonna get another, by b00m3rang · · Score: 2, Funny

    "are we gonna get another barrage of (insert slashdot cliche') posts" again?

    Damnit!

  6. What Would SCO's Take Be Worth? by geomon · · Score: 4, Funny

    About $7.2 Million.

    Talk about a software tax!

    --
    "Rocky Rococo, at your cervix!"
    1. Re:What Would SCO's Take Be Worth? by geomon · · Score: 2, Interesting

      Moderators must be having a bad day. I've seen several other attempts at humor moderated 'offtopic'.

      I wonder if this is a Monday phenomenon? I wonder what the distribution of 'Funny' moderation is through the week.

      --
      "Rocky Rococo, at your cervix!"
    2. Re:What Would SCO's Take Be Worth? by Anonymous Coward · · Score: 3, Funny

      I wonder if this is a Monday phenomenon? I wonder what the distribution of 'Funny' moderation is through the week.

      Sounds like the moderators are having a case of the Mondays?

    3. Re:What Would SCO's Take Be Worth? by geomon · · Score: 2, Funny

      Actually I was referring to other topic areas other than this one. I wasn't even referring to my own attempt at humor. I was making a rather broad commentary about moderations in general and how those marked 'funny' are running rather low today...

      ....you humorless fuck.

      --
      "Rocky Rococo, at your cervix!"
  7. Should help in units conversion ... by xmas2003 · · Score: 4, Funny

    This should help 'em convert feet to meters ... ;-)

    --
    Hulk SMASH Celiac Disease
    1. Re:Should help in units conversion ... by noselasd · · Score: 2, Interesting

      You know, the actual error was some engineers assuming specs
      from other engineers were in pounds while they really were in newtons
      (1 pound == 4.45 newton)

    2. Re:Should help in units conversion ... by xmas2003 · · Score: 2, Informative
      Yea, it was actually a "force" measurement (you did say Newtons and (I assume meant) pounds-force) - see attached snippet from one writeup ... plus the incorrect deviations from the flight path weren't noticed, which is argueably a distance measurement (there was a fair amount of miscommunication going on too, so lotta blame/mistakes on this one unfortunately) ... but I simplified to feet/meters in my attempt at humor. NASA has (obviously) done a GREAT job with the current Mars Landers, but boo-boo's happen.

      Engineers on the ground calculated the size of the rocket-firing using feet-per-second of thrust, a value based on the English measure of feet and inches.

      However, the spacecraft computer interpreted the instructions in Newtons-per-second, a metric measure of thrust. The difference is 1.3 metres a second.

      --
      Hulk SMASH Celiac Disease
  8. Good news for Intel by thebra · · Score: 5, Funny

    This is great news for intel. They will double the number of itanics shipped in a single deal!

    Hahaha, my comment is a dupe!

  9. NASA vs RIAA/MPAA by grunt107 · · Score: 5, Funny

    The system will have 500 terabytes of storage, the equivalent of 800,000 CDs.

    In related news, the RIAA has filed a writ of discovery for illegal downloads of 'Major Tom' at NASA.

  10. I hope technology will help by Anonymous Coward · · Score: 5, Insightful

    But I wonder if moving from a spreadsheet to a supercomputer simulation will make it any more likely that engineers with concerns will whistleblow to non-responsive management. This is a government bureaucracy problem, not a technical problem.

    1. Re:I hope technology will help by GGardner · · Score: 2, Interesting

      Instead of saying "In my best engineering and technical judgement, based on years of training and experience, I think this is a bad idea", the engineers can say "Our really expensive computer thinks this is a bad idea".

    2. Re:I hope technology will help by Anonymous Coward · · Score: 3, Insightful

      The two biggest failures at NASA, Challenger and Columbia, absolutely would not have been fixed with more computing power.

      In the case of Challenger, engineers whose opinions should have had the most weight were ignored when they expressed concerns about the seals on the solid fuel rocket boosters. The decision was made by bureaucrats who didn't have the technical savvy required to even form an opinion.

      In the case of Columbia, many engineers at NASA were concerned about possible damage to tiles and requested some (any!) possible surveillance to get a look at the possibly affected areas of the shuttle. They were overruled and there wasn't even any attempt to get a look though such a look might have been at least possible if inconclusive.

      Take some of the brightest minds on the planet, put idiots in charge and this is exactly what you get. This is a government bureaucracy problem, not a technical problem, and all the supercomputers in the world will not help!

    3. Re:I hope technology will help by Lumpy · · Score: 2, Insightful

      you hit the head on the nail.. Along with the supercomputer they need to put in place a oversight committee of engineers that can take anonomous comments from other engineers and completely override and bypass Nasa administration.

      99% of the time major failures lies in the hands of management, or the failure of management. Yes going to space is hard and dangerous, but they KNEW that something went wrong on launch and management chose to ignore it.

      dont believe me? show me one corperation failure that was NOT the fault of managment.

      --
      Do not look at laser with remaining good eye.
  11. Tax payer. by BrookHarty · · Score: 4, Interesting

    I'm rather mad at this idea, the system costs more than an opteron system, costs more to run (heat/power) and is slower. But it at least runs linux.

    Also, why is the BBC the first news tidbit about NASA's new supercomputer?

    1. Re:Tax payer. by geomon · · Score: 4, Insightful

      Also, why is the BBC the first news tidbit about NASA's new supercomputer?

      Science isn't sexy news in America.

      Not unless they declare they've created a satellite system that will track and kill bin Laden.

      --
      "Rocky Rococo, at your cervix!"
    2. Re:Tax payer. by legoleg · · Score: 4, Funny

      Just wait till the last week of October... I'm sure he'll conviently pop up around then.

    3. Re:Tax payer. by I+confirm+I'm+not+a · · Score: 2, Informative

      Science isn't sexy news in America.

      To be fair, science isn't exactly sexy news in the UK, either. The BBC covers stuff like this because (a) it's mandated to, and (b) there's no profit motive keeping the unsexy news off the (metaphorical) frontpages. Which is nice[1].

      [1] ...provided there remain alternative broadcasters to keep the Beeb on its toes.

      --
      This is where the serious fun begins.
    4. Re:Tax payer. by dr_dank · · Score: 2, Funny

      Science isn't sexy news in America.

      When Paris Hilton has nightvision camera sex with the Hubble Space Telescope, you'll be singing a different tune.

      --
      Where does the school board find them and why do they keep sending them to ME?
    5. Re:Tax payer. by flabbergast · · Score: 4, Insightful

      Because you can't buy an opteron system with NUMA link (3.2 GB/s between bricks) and you can't simply build a 500 TB data cluster by purchasing some CAT5, 100 250GB drives, 10 Gigabit ethernet cards and call it a day. SGI thrives because it can put together a clustered supercomputer and has the technology to build a 500TB data center. 20 Altix racks, 128 Altix bricks/rack (4 processors/brick X 128 = 512 proc) and has globally shared memory thanks to numalink. This means that even though each brick can run independently, you can also build a 512 proc system with a single Linux system image that has the combined memory of all the bricks (thanks SHUB and NUMAlink!). So, when you can build a 512 processor, global shared memory system out of Opterons, then you go ahead and sell it. This is a clustered supercomputer where each cluster is a supercomputer.

    6. Re:Tax payer. by shiftless · · Score: 2, Informative

      Right, cause not only would it would make more sense to wait until 95% of America has it beat into their heads that Bush sucks before bringing Bin Laden out rather than bringing him out as soon as he's captured and using it to Bush's political advantage, but also there's no chance that the soldiers who supposedly captured him already would EVER talk or tell anyone about it.

      Did I miss anything? Oh, yeah:

  12. imagine... by pb · · Score: 4, Funny

    ...a Beowulf cluster of slashdot dupes.

    --
    pb Reply or e-mail; don't vaguely moderate.
  13. Re:AGG! by foidulus · · Score: 4, Funny

    So Hrathgar carries Hrunting into a bar. The bartender asks him "Why the long face", and Hrathgar cuts his head off with Hrunting singing, "A hrunting we will go, a hrunting we will go!"
    *Rimshot

  14. Article not written by a technical person.. by cbreaker · · Score: 2, Interesting

    .. or a very good writer.

    "They can also be modelled over a time period of weeks or months instead of over just a few days."

    Ohh sweet, so then what used to take days now takes months?

    And at one point in the article, it says "20 nodes" and then at another part it says "512 nodes." So like, what is it?

    You know what, I don't even care.

    --
    - It's not the Macs I hate. It's Digg users. -
    1. Re:Article not written by a technical person.. by orbit0r · · Score: 2, Informative

      And at one point in the article, it says "20 nodes" and then at another part it says "512 nodes." So like, what is it?

      Read the article:
      "It is using an off-the-shelf system and taken that and built a powerful system around 512-processors which are then hooked together to give considerable power."

      512 processors * 20 nodes = 10240

    2. Re:Article not written by a technical person.. by Anonymous Coward · · Score: 2, Informative

      Article not read by a technical person.

      The article is stating that the weather pattern studies would now be able to simulate activity periods of weeks or months rather than just days - NOT that the simulation runs themselves would take months rather than days!

  15. Cluster != Supercomputer by ChaosMt · · Score: 2, Insightful

    I can understand the BBC making this mistake, but slashdot?! I'm sure this was also noted in the dupe.

    1. Re:Cluster != Supercomputer by telemonster · · Score: 2, Insightful

      It is a cluster of supercomputers :-)

      Seriously, the way the Altix is laid out... I believe it is a cluster of 512 processor supercomputers.

      This isn't uncommon. Look at ASCI BLUE, or some of the other large IBM SP2 based systems.

      --
      Southeastern Virginia REPRESENT!
  16. VT paid for the G5s by tentimestwenty · · Score: 2, Informative

    Apple didn't give VT any computers, they paid for them because they were the cheapest solution.

    1. Re:VT paid for the G5s by iamdrscience · · Score: 2, Interesting

      Yes, that's why I said that Virginia Tech bought them. The point was that they were cheapest because Apple gave them a huge price break, presumably for the promotion it gave them (i.e. "Holy shit! Our computers are so fast and awesome that they're using them in supercomputer clusters!").

      You'll notice that no large clusters have built out of G5s since, and it's because nobody else is going to get price breaks significant enough to make it the cheapest solution.

    2. Re:VT paid for the G5s by djward · · Score: 2, Informative

      Apple gave them no price break on the G5 towers. The systems were purchased at EDU pricing straight from the Apple Store, online.

      Apple DID cut them some slack on the additional RAM, charging industry-norm prices for the memory instead of their usual markup. They probably saved them some money on the sidegrade to Xserves, too, but I don't know the details.

      Anyway, when the initial cost assessment was done, the G5s were cheapest not because of a price break, but because they were... well... cheapest.

    3. Re:VT paid for the G5s by psbrogna · · Score: 2, Informative

      not quite true: The US Army bought 1,500 xServes recently for a G5 based supercomputer.

      ref: http://apple.slashdot.org/apple/04/06/22/0222210.s html?tid=137&tid=179&tid=185&tid=190

  17. Okay, that's big but... by Hamlin · · Score: 3, Informative

    if they'd gone with G5 Xserves they could have had 23,888 Dual 2GHz systems with 17.916 Petabytes of storage (assuming they just went stock on the high-end systems).

    Okay and one question about the article. Was he saying 1000 Gb of RAM per system or 1000GB per system?

  18. Re:Some CLusters by DAldredge · · Score: 2, Insightful

    Does anyone know of a better site than /. to discuss tech related topics because, as the post I am responding to shows, /. users are getting dumber and dumber as time goes on.

  19. Re:Itanium? by djohnsto · · Score: 5, Informative

    Think of it less of a win for Itanium and more of a win for SGI Altix (that happens to use Itanium). The SGI Altix machines have a single system image with 512 processors (there are 20 of these clustered together). As far as I know, this is actually the cheapest and highest performing system that can use 512 nodes in a single system image. Other choices (which I'm not even sure scale to 512 processors) include Sun (slow), Power (expensive), and MIPS (SGI predecessor to the Altix - slower). Also, they are working on methods to increase single system image size to 2048 nodes, I believe an industry first. Some workloads just like running in single system images much better than on clusters.

    As for Itanium vs. Opteron - the Itanium kicks the Opteron's ass in floating point. Since NASA is presumably going to be doing a lot of engineering simulations, good FP performance is highly desirable. Having 6 MB of cache per node probably helps the Itanium beat out the Opteron for large memory footprint workloads as well.

    Basically, until Cray releases Red Storm (not sure if they'll stay in business that long), an Opteron system doesn't exist that can match the performance of the SGI Altix.

    Finally, Itaniums are NOT "rediculously more" compared to the 8xx Opteron line (which is the Itanium's real competitor in this area).

    --
    Dan
  20. Re:Itanium? by harlows_monkeys · · Score: 4, Informative
    Can anyone point out any significant advantage of the Itanium that justifies the fact that it costs ridiculously more than its competition (i.e. AMD Opteron)?

    It does floating point a lot faster than Opteron.

  21. Re:Spin by flaming-opus · · Score: 2, Informative

    Why use itaniums? Because itaniums are very fast at floating point math, and have 9MB of cache. It's not a perfect CPU, but it's not bad. Nasa is more than willing to optimize their code extensively. (Yes the optimizing compilers ARE available, just not in gcc. Both intel and hp have very good compilers for ia64) The IBM power architecture is also a very good architecture, but they are also VERY expensive.

    Mostly they use Itaniums because they are buying an SGI sollution. Nasa Ames has been a long time sgi customer. The cluster of itanium/linux altix machines is simply a kicker to their previous cluster of mips/irix origin 3000 systems, which replaced a cluster of o2000s, which replaced a cluster of power-challenge boxes. That's one of the reasons this purchase happen so quickly. All the physical/technical/knowledge/business infrastructure was in place.

    If you read the sgi press release, they are also cutting nasa a huge break on the price to win the contract. It's about $2million each for those altix boxes including fibre channel cards, switches, and storage. I can't believe SGI is making any money on the deal.

  22. where is SUN? by linuxislandsucks · · Score: 3, Funny

    where is SUN Microsystems?

    well someone had to ask :)

    --
    Don't Tread on OpenSource
  23. Re:Irony emulator by kenaaker · · Score: 5, Interesting
    I worked on the space shuttle simulator (lo, these many years ago), and the shuttle computers are derivatives of the computers that IBM originally used in the B52's. They were called AP-101's, and if I remember correctly were Harvard Architecture systems with a separate instruction and data store memories. I think they had 128K (32 bit?) words for instructions and 64K (16 bit?)words for data.

    The simulator originally ran on IBM System 360 mod 75's (serial numbers 1, 4, and 5). When I was working on it, the simulator was running on a IBM 3033 (370 architecture) machine running MVS, and had a hardware interface that attached 3 AP101's to the system IO channels. The shuttle hardware outside of the AP101's and environment were modelled in the 3033, even including the "slosh dynamics" of the fuel in the external tank. The simulator was written in 370 Assembler with macros for the programming control structures.

    One of the funniest things about running the simulator came out of the major failure tests. The simulator had a distinct "abend" that indicated that the vehicle had a position that was below the surface of the earth.

  24. the article is severely misleading by halfelven · · Score: 4, Informative

    It makes you believe this supercomputer is made out of commodity components.
    That's blatantly false.

    The SGI systems are highly proprietary equipments that provide very large bandwidth between the nodes, extremely low latency and tight integration. They're not regular Beowulf clusters. They really are single systems with hundreds or thousands of CPUs, all of them running the same single instance of the OS (as opposed to typical clusters which run one OS instance per node).
    Because of the tight integration, the software does not have to obey the same constraints as when running on commodity clusters. Especially the requirement for total parallelization does not stand anymore.
    Therefore, problems which cannot be translated into 100% parallel algorithms, and therefore do not run efficiently on commodity clusters, are easily tackled on SGI supercomputers.
    That's why they can charge a high price on their systems - because they can solve problems that are not accessible to "normal" computers.

    That being said, the system at NASA is indeed a cluster, but it's a "small" cluster (a handful of nodes), each node being a supercomputer with hundreds of CPUs. It's a hybrid that provides the best of both worlds.

  25. Re:10240 is a strange number? by halfelven · · Score: 4, Informative

    Because it's a 20-nodes cluster, each node being a supercomputer with 512 CPUs.

    The article was written, unfortunately, by a rather clueless journalist. Here's a link to the proper information:

    http://www.sgi.com/newsroom/press_releases/2004/ju ly/supercomputing_ctr.html

  26. Re:Here's hoping by halfelven · · Score: 4, Informative

    The "firmware" (the equivalent of BIOS) they have on the Altix is pretty damn smart, it's like an OS of it's own. It can do diagnostics, and inventory and a truckful of other things.
    Powering up a huge complex beast such as an Altix is no easy task. You need lots of "intelligence" at the hardware level to do that.

  27. There are limits by sakusha · · Score: 4, Insightful

    There is a limit to what computer power can do for you. I'd rather see the money being spent on human resources: people who know what they're doing. There's an old saying in the business world, I wish I knew who first said it, "for any technological problem, the limiting factor is never technology, but rather, human resources." In other words, if your technology has problems, throwing more tech at it is unlikely to solve the problems. Only more human intelligence applied to the situation will improve things.
    Having the fastest supercomputer in the world won't help you one bit if nobody thinks to run a simulation of what happens when a chunk of foam blows a hole in a wing. I keep thinking about Frank Borman's statements to the Apollo 13 Commission, he said it wasn't a failure of technology, it was a failure of imagination, nobody ever imagined there could be a problem. Computers have no imagination. They give answers, but nobody's asking the right questions.

  28. Re:Spin by halfelven · · Score: 2, Informative

    How did this comment got moderated "informative"? There's definitely something wrong with the moderators today.

    SGI Altix uses the Intel compilers. They're pretty damn good on IA64. They're available today.

    Also, the massively parallel software is already up'n'running. NASA has been using for decades SGI supercomputers - traditionally it's been the MIPS/Irix architecture. A while ago, when SGI told NASA that they were going to migrate to Intel/Linux, NASA simply recompiled their software to Linux, which is not too difficult, since Irix is pretty much standard Unix (i did some porting from Linux to Irix and often the software simply compiles with no change).
    Also, Altix systems are essentially the same hardware architecture as MIPS-based SGI Origin with the exception of the CPU (and a different OS on top), so the differences are really not that big; it's just the transition from Irix to Linux.

  29. SCO Tax by Nonillion · · Score: 2, Funny

    They better pay their $7,157,760 ($699/CPU) in SCO tax or McBride is going to be stomping around saying "NASA is screwing us!"

    --
    "I bow to no man" - Riddick
  30. Imagine.... by enginuitor · · Score: 2, Funny

    Imagine a Beowulf cluster off... oh, wait...

  31. Re:only 10 times faster? by nboscia · · Score: 2, Interesting

    The issue has become space to put the machines. As it is, the pre-existing supercomputers are being moved to other rooms and there is barely enough space to accomodate the 10k as it is. Many supercomputing facilities have a similar problem. There are not many rooms that have the environmental controls needed to run such massive systems.

    As for the comment on making it 11x faster - the other systems serve a different purpose (customer base and funding source)... and they were moved to another location to make room for Columbia.

    On a cool note, it looks like they are filming the building of this system so we can see one of those time-lapse videos.

  32. Re:pork by nboscia · · Score: 3, Informative

    NASA is not the sole user of the system. Anyone within the U.S. can use it. We support many university projects that require the use of supercomputers. This purchase is a benefit to the entire country, not just NASA.

    It is most unfortunate that people are not aware of all that NASA does for them. A majority of all research projects are in collaboration with industry vendors, universities, non-profit organizations, scientific corporations, and so on. There are few that are specific only to NASA. The range of customer database is wonderful and there is such variety in the areas of research (not just aeronautics and space technology, but biology, earth science, nanotechnology, optics, and so on). We all help each other to advance our knowledge, and computers like this make it a lot faster.

  33. NASA continues to miss the point by Hugh-know-who · · Score: 3, Interesting

    The Columbia disaster was not due to a lack of computing power, but rather to a culture of denial. The failure of mid-level and senior management to listen to their people prevented any action being taken until it was too late. In a way, this mirrors the broader American culture of the late 20th and early 21st Centuries, typified by a complete refusal of individuals - particularly but not exclusively individuals in powerful positions - to take responsibility for their own actions, inactions or failures of any kind.

  34. Re: NASA does not care about money. by nboscia · · Score: 2, Informative

    NASA does not care about money. It's US taxpayers' money

    Thanks for the troll post.. you're a wonderful example of how uninformed citizens can be. FYI: The government defines NASA's budget each year, so there is a very high concern for money. It takes months upon months of civil servants fighting for funding out of that money pool. There are a lot of research programs, and not nearly enough money to fund them. Particularly, in the case of Columbia, there were massive layoffs to fund this. I'd like to see you make your statement to all those who now do not have jobs because of the lack of money (many were needed operational engineers, not just research staff). It's sad when people lose their jobs over something like this, but it did allow something good to happen. It's unfortunate that arrogant fools are blind to such politics.