I doubt it. I bet 95% of the kernel work on Darwin is done by programmers at apple.
The real benefit is transparency and confidence. As a developer I already have a limited copy of the solaris 8 source. IT's really useful as a reference and it speeds my work. However I have never sent them and patches, or even pointed out any flaws in the software. Most people don't.
The same can almost be said about linux. Yes there are a couple hundred programmers who make regular contributions to various pieces of linux, but one could reasonably employ 95% of the linux kernel contributors for a few million a year.
Absolutely. The general methods for creating a scalable OS have been known to the linux kernel folk for a long time. It's just a lot of work, and requires some difficult design choices. Multiprocessor scalability usually comes at the expense of single cpu performance.
That said, sgi propack linux will not scale to 512 CPUs on a general purpose oracle/SAP workload. Those beasts run because the apps are highly tuned to the environment. Those are distributed compute apps that spend a lot of time in the application and threading library, and very little code in the operating system.
Linux, especially 2.6, has much better thread granularity than 2.2 did, but it's not as parallel as irix or solaris.
Solaris and irix both evolved slowly to run on those huge boxes. First you thread the vm code, then the scheduler, then the buffer cache, then the filesystems, then the scsi drivers, etc. (not a precision list, but the point is that linux has the first several steps toward massive scalability, but not every step that solaris or irix have taken).
Sun sells 144-core systems now. Quite a few of them, in fact. Just becuase doom3 doesn't scale to dozens of processors doesn't mean that real world workloads don't. Web serving, transaction processing, mail servers. These things parallelize very well on an SMP.
Not really a comparison though. I've got a bunch of xserves in the lab, and they're really cool, but they compete with dual-cpu rack'n'stack dells. They are definately an entry-level server. They are cheap, simple, elegant, but not an answer to every question. I don't think apples ready to compete in the mid-range server market. Their focus seems to be on the A/V pro-apps.
What are the servers you replace. In our lab we have a server dedicated to being a proxy cache to our code-versioning software. We have a server dedicated to being an ssh tunnel. We have a server dedicated to a dns/nis/nfs server which is terribly under-utilized.
No 4 power-5 processors aren't going to replace a dozen maxed-out dual-xeons. But more likely they will replace 2 maxed-out dual-xeons, and half a dozen servers that are largely underused. One clever thing they let you do is adjust the allocation of resources. Clever.
Cray has sold linux clusters before, and now have 2 products that use linux in some way. (Red Storm and the XD1) They even have done some experiments runnint linux on the X1 vector supercomputer. Cray certaintly isn't making moves against linux. They would just prefer you to run linux on their mpp box, rather than a rack full of Dells.
Mr. Becker has an interest in you using a penguin computing setup, rather than either Dell, or Cray. I must, however, admire the way he didn't get sucked into the interviewers desceptive question.
Blue Gene is an extremely clever design in that it uses several interconnect networks all at once. The main memory-memory interconnect is a packetized load-store interconnect arranged in a 3D mesh. Each node also has an ethernet tap for the management network, and a very wide tree network for all-reduce calls. They built their networks with MPI in mind.
The difference between a commodity cluster and something like blue-gene is only a half-step. The codes that run well on blue-gene are MORE like the codes run on clusters than those on a traditional vector super. The CPUs, memory controllers, etc, etc. are commodity parts from the microcontroller world, and they work on high compute, low memory-bandwidth tasks with moderate inter-node communication needs. Blue Gene will likely come in at costs somewhere between those of clusters and those of vector supers, or even traditional MPPs like the altix.
HP has several (6 actually) server product lines. They will probably use opterons in their high-volume/lower-profit proliant server line. However they have firmly commited to ditching pa-risc, mips, and alpha for their other 5 server platforms. The high-end/high-profit/low-volume systems are largely independant of the proliant group.
HP doesn't view itanium and opteron as an either/or proposition. Given their product porfolio, it's quite reasonable to use both. Itanium is fast and expensive, a good fit for a 128-way superdome. Opteron is pretty-fast and inexpensive, a good fit for a 4-way proliant.
Tell that to IBM. They package 2 cores on a die, and 4 dies in a multichip module to make up their high-end POWER4 & POWER5 based unix servers. Sun and HP are both putting 2 cores on a die, and hp even puts 2 itaniums on a daughter-board to approximate a dual core ia64 solution. Yes they do have BIG heat-sinks, but these are real servers, not little 1U sleds. Nobody is really worried about fan noise in this setting.
Opteron brings the price point for this down, but they will probably do it at a lower clock speed to fit it into the 1 and 2 U systems that dominate the opteron marketplace.
They should at least come out with an apple branded (rebranded) USB tv tuner. They are available on PCs, but not on the mac. I think it would be a great $50-70 add-on option.
That said, this thing really should have wifi and bluetooth by default. These aren't new/experimental technologies anymore.
Is there a need for this? If you need high speed synchronization, you're not going to use cascaded ethernet switches in the first place. The assumption is highly parallel code like that used in computer animation.
That and they are probably using off-the-shelf gigE switches with a 1-gig uplink channel. It's the same switch as sits in a rack, but it's hard-wired.
This whole system looks like a clever use of off-the-shelf components. It's almost exactly a cheap cluster of low-end-pc cpus, industry standard networks, and an industry standard OS. Instead of a rat's nest of cables, everything runs over some PCBs. Neat.
On the other hand, 12 efficions are going to give you no more performance than a 4-way xeon or opteron. Ethernet (1G or 10G) is a lousy interconnect if latency matters at all. A bunch of distributed ide disks is a far cry from a real SAN. Yes the box is cheap per flop, but you get what you pay for.
Everyone seems to be ignoring the obvious answer:cost.
A passanger car costs 15-50 thousand dollars. Most of that is things like the body, interior, air-bags, brakes, suspension, marketing costs, and (God forbid) profit margin. The engine and trnasmission can only cost a few thousand dollars. That and the cars generally only last twelve years or fewer, and might average 150,000 miles. Railroad engines cost hundreds of thousands of dollars, last for decades, and travel millions of miles before they are discarded. Cars are categorized as "Durable goods", but are really half-way between that and "disposable goods". A railway engine is definately a major "capital investment".
It should be noted that processors with shorted pipelines (POWER5 for example) also benefit from multiple threads. Prescott pays a BIGGER penalty fro m a branch mispredict, but all CPUs pay a penalty. All CPUs pay a HUGE penalty if you need to reference main memory. I would say it this way:
In other words, SMT is an ingenious method for making up for the fact that CPUs are horribly inefficient.
This big pile of simple cores idea is the premise behind IBM's Blue Gene. (or the thinking machines from the early 90s.) It works some of the time, but is extremely difficult to program.
Ummmm. Opteron does well in the high-end workstation market. Maybe the high-end gamer market. In terms of server technology it's still a bit of a joke. The biggest boxes at 4-ways. None of them support hardware partitioning, most don't even do chip-kill memory.
Sun has promised to make 8-way and larger opteron systems, but don't expect them for a couple years. Opterons don't really compete with Itanium, they compete with Xeon. (very well I might add) In the real server world Itanium is trying to break into the territory of sparc, power, pa-risc, and alpha. Opteron doesn't even exist in that space.
Not Yet they haven't. They just released the EV7z on the alpha front, and will soon release the PA-890. HP is planning to support VMS and Tru64 on alpha, hpux on pa, and tandemOS on mips until 2008 at least.
Itaniums sometimes use a shared bus architecture, but the fancier HP and SGI boxes use NUMA-style crossbars instead. Remember that crossbars scale bandwidth better, but there's a latency penalty. Note that all the big iron boxes are crossbars connecting a bunch of nodes. Those nodes all use 2 or 4 processor shared-bus designs. (sun, ibm, unisys, fujitsu, hp, etc)
HP made a calculated decission. It would be VERY expensive for them to continue development of their 5 distinct server lines. Some customers will jump ship to IBM or SUN, but that's something they can't really help. The server business is becoming a mature commodity business. How many companies manufacture airplanes? Hw many build cars. Most mature markets consolidate down to a half dozen big players. Some things suck along the way, but it should not be a shock to anyone.
Actually the FSB is the bottleneck ALMOST ALL THE TIME. It may only be 1-2% of the instructions, but a ram-load takes hundreds or thousands of CPU cycles. That's the very reason for speculative loads on the itanium, to start the load as far in advance as you can. Modern processor architectures are built around trying to minimize the necessity of RAM-loads. This is, or course, a problem of latency and not of bandwidth, though you need that too.
That said, Mainframes don't have any real sollution to the latency problem either. (except vector CPUs from NEC or Cray, but that only works for a very limited set of programs)
In x86 land the 486 was also the first to have a cache. (8kb)
As it turns out many HP customers are refusing to migrate to itanium/hp-ux. When one is considering real server-iron the currentness of the processor is not always of utmost importance. If there's a legecy app that runs on tru64 (I mean ultrix, I mean osf) and it's really expensive to port, a lot of shops are just going to keep running alphas until the wheels fall off and burn. [Look at all the guys still running on sperry 1100-series machines]
True, it's a dead-end choice, but one that might limp along for another 6-8 years. Not everyone has the option of migrating NOW. That works if you're talking about tru64/apache to linux/apache, but not if your talking about tru64/Legecy-app-from-company-no-longer-in-busines s to anything else. A migration might cost millions of dollars. A dead-end alpha server might cost tens of thousands and put off the more for a long time.
Sort of. Windows NT origionally was written for the i860, which was abandoned. The first release (3.1) ran on x86, mips, and alpha. the 3.5 release ran on powerpc. FYI
The workstation market is proving less and less profitable for sun, and for all the other unix games in town. Since a linux-PC or Mac is so close to a workstation, fewer and fewer people are willing to pay a big premium for a workstation. Thus it's probably not worth sun's engineering time to continue developing workstations. They will continue to develop their higher-end products from the ground up, but use commodity parts at the low end.
Why they would do this on powerpc when they already have an opteron product line, I don't know. I imagine they will not productize any powerpc systems, and are just doing this to thumb their nose at AIX. Either that, or they are making contingency plans in case they decide to become a total software company. In any case, I bet you can keep a limited port going with 2-3 engineers, especially since Solaris already runs on sparc and x86.
You might not want to think about everything in terms of business, but one can, and it's important that he does think about things that way.
Sun is in big trouble. They sell a bunch of decent servers that are not really unique from what the rest of the unix world is selling. They are obviously not able to keep ahead of the competition by making sparc the best processor around, so they have to come up with some other way sell something worth paying for. Solaris, for all its issues, is a reliable, scalable OS that runs a lot of applications. Solaris is a great asset to Sun; If they can leverage it on IBMs processor and make money doing so, it would really help the company.
Sun has moved beyond the "we can do everything in house" days, and is trying to figure out which battles are worth fighting. If they choose the wrong battles, they might go the way of dec, data general, and Sequent.
The memory interconnect is much more important to this system than the CPU. Yes, this CPU may perform foo% better than that one, using this benchmark, but 512 of any modern CPU pretty much rocks. Who really cares about the cost of the CPU on a system like this? Other costs are really going to dominate any different in the cost of the cpu.
The Mips based origin product from sgi scaled to 512 CPUs. (1024 I think) The cray X1 can supposedly scale to 4096 CPUs, but nobody can afford more than 256. Red storm will do several thousand CPUS, as did its predicessors asci red and cray t3e, but these are previous generation machines.
Power maxes out at 32 CPUs (soon 64) per OS. Sun claims to cluster together 4 by 72 processor boxes, but it's not really SSI. Even the earth simulator clusters together a bunch of 8 cpu nodes with a shared filesystem.
Even so, calling these machines "off the shelf" is really stretching it. The linux is close to redhat, though they do quite a lot of modifications for the altix. The processors are intel, but not mainstream. The chipsets are totally custom, including their craylink-derived memory router. The I/O controllers are completely custom until you eventually find a pcix slot. They contain a lot of cool technology, but off-the-shelf they aren't.
Why use itaniums? Because itaniums are very fast at floating point math, and have 9MB of cache. It's not a perfect CPU, but it's not bad. Nasa is more than willing to optimize their code extensively. (Yes the optimizing compilers ARE available, just not in gcc. Both intel and hp have very good compilers for ia64) The IBM power architecture is also a very good architecture, but they are also VERY expensive.
Mostly they use Itaniums because they are buying an SGI sollution. Nasa Ames has been a long time sgi customer. The cluster of itanium/linux altix machines is simply a kicker to their previous cluster of mips/irix origin 3000 systems, which replaced a cluster of o2000s, which replaced a cluster of power-challenge boxes. That's one of the reasons this purchase happen so quickly. All the physical/technical/knowledge/business infrastructure was in place.
If you read the sgi press release, they are also cutting nasa a huge break on the price to win the contract. It's about $2million each for those altix boxes including fibre channel cards, switches, and storage. I can't believe SGI is making any money on the deal.
I doubt it. I bet 95% of the kernel work on Darwin is done by programmers at apple.
The real benefit is transparency and confidence. As a developer I already have a limited copy of the solaris 8 source. IT's really useful as a reference and it speeds my work. However I have never sent them and patches, or even pointed out any flaws in the software. Most people don't.
The same can almost be said about linux. Yes there are a couple hundred programmers who make regular contributions to various pieces of linux, but one could reasonably employ 95% of the linux kernel contributors for a few million a year.
Absolutely. The general methods for creating a scalable OS have been known to the linux kernel folk for a long time. It's just a lot of work, and requires some difficult design choices. Multiprocessor scalability usually comes at the expense of single cpu performance.
That said, sgi propack linux will not scale to 512 CPUs on a general purpose oracle/SAP workload. Those beasts run because the apps are highly tuned to the environment. Those are distributed compute apps that spend a lot of time in the application and threading library, and very little code in the operating system.
Linux, especially 2.6, has much better thread granularity than 2.2 did, but it's not as parallel as irix or solaris.
Solaris and irix both evolved slowly to run on those huge boxes. First you thread the vm code, then the scheduler, then the buffer cache, then the filesystems, then the scsi drivers, etc. (not a precision list, but the point is that linux has the first several steps toward massive scalability, but not every step that solaris or irix have taken).
Sun sells 144-core systems now. Quite a few of them, in fact. Just becuase doom3 doesn't scale to dozens of processors doesn't mean that real world workloads don't. Web serving, transaction processing, mail servers. These things parallelize very well on an SMP.
Tried it, deployed it, been using it for years.
Not really a comparison though. I've got a bunch of xserves in the lab, and they're really cool, but they compete with dual-cpu rack'n'stack dells. They are definately an entry-level server. They are cheap, simple, elegant, but not an answer to every question. I don't think apples ready to compete in the mid-range server market. Their focus seems to be on the A/V pro-apps.
What are the servers you replace. In our lab we have a server dedicated to being a proxy cache to our code-versioning software. We have a server dedicated to being an ssh tunnel. We have a server dedicated to a dns/nis/nfs server which is terribly under-utilized.
No 4 power-5 processors aren't going to replace a dozen maxed-out dual-xeons. But more likely they will replace 2 maxed-out dual-xeons, and half a dozen servers that are largely underused. One clever thing they let you do is adjust the allocation of resources. Clever.
Cray has sold linux clusters before, and now have 2 products that use linux in some way. (Red Storm and the XD1) They even have done some experiments runnint linux on the X1 vector supercomputer. Cray certaintly isn't making moves against linux. They would just prefer you to run linux on their mpp box, rather than a rack full of Dells.
Mr. Becker has an interest in you using a penguin computing setup, rather than either Dell, or Cray. I must, however, admire the way he didn't get sucked into the interviewers desceptive question.
Blue Gene is an extremely clever design in that it uses several interconnect networks all at once. The main memory-memory interconnect is a packetized load-store interconnect arranged in a 3D mesh. Each node also has an ethernet tap for the management network, and a very wide tree network for all-reduce calls. They built their networks with MPI in mind.
The difference between a commodity cluster and something like blue-gene is only a half-step. The codes that run well on blue-gene are MORE like the codes run on clusters than those on a traditional vector super. The CPUs, memory controllers, etc, etc. are commodity parts from the microcontroller world, and they work on high compute, low memory-bandwidth tasks with moderate inter-node communication needs. Blue Gene will likely come in at costs somewhere between those of clusters and those of vector supers, or even traditional MPPs like the altix.
HP has several (6 actually) server product lines. They will probably use opterons in their high-volume/lower-profit proliant server line. However they have firmly commited to ditching pa-risc, mips, and alpha for their other 5 server platforms. The high-end/high-profit/low-volume systems are largely independant of the proliant group.
HP doesn't view itanium and opteron as an either/or proposition. Given their product porfolio, it's quite reasonable to use both. Itanium is fast and expensive, a good fit for a 128-way superdome. Opteron is pretty-fast and inexpensive, a good fit for a 4-way proliant.
Tell that to IBM. They package 2 cores on a die, and 4 dies in a multichip module to make up their high-end POWER4 & POWER5 based unix servers. Sun and HP are both putting 2 cores on a die, and hp even puts 2 itaniums on a daughter-board to approximate a dual core ia64 solution. Yes they do have BIG heat-sinks, but these are real servers, not little 1U sleds. Nobody is really worried about fan noise in this setting.
Opteron brings the price point for this down, but they will probably do it at a lower clock speed to fit it into the 1 and 2 U systems that dominate the opteron marketplace.
They should at least come out with an apple branded (rebranded) USB tv tuner. They are available on PCs, but not on the mac. I think it would be a great $50-70 add-on option.
That said, this thing really should have wifi and bluetooth by default. These aren't new/experimental technologies anymore.
With a fan.
Probably more than one.
Is there a need for this? If you need high speed synchronization, you're not going to use cascaded ethernet switches in the first place. The assumption is highly parallel code like that used in computer animation.
That and they are probably using off-the-shelf gigE switches with a 1-gig uplink channel. It's the same switch as sits in a rack, but it's hard-wired.
This whole system looks like a clever use of off-the-shelf components. It's almost exactly a cheap cluster of low-end-pc cpus, industry standard networks, and an industry standard OS. Instead of a rat's nest of cables, everything runs over some PCBs. Neat.
On the other hand, 12 efficions are going to give you no more performance than a 4-way xeon or opteron. Ethernet (1G or 10G) is a lousy interconnect if latency matters at all. A bunch of distributed ide disks is a far cry from a real SAN. Yes the box is cheap per flop, but you get what you pay for.
Everyone seems to be ignoring the obvious answer:cost.
A passanger car costs 15-50 thousand dollars. Most of that is things like the body, interior, air-bags, brakes, suspension, marketing costs, and (God forbid) profit margin. The engine and trnasmission can only cost a few thousand dollars. That and the cars generally only last twelve years or fewer, and might average 150,000 miles. Railroad engines cost hundreds of thousands of dollars, last for decades, and travel millions of miles before they are discarded. Cars are categorized as "Durable goods", but are really half-way between that and "disposable goods". A railway engine is definately a major "capital investment".
It should be noted that processors with shorted pipelines (POWER5 for example) also benefit from multiple threads. Prescott pays a BIGGER penalty fro m a branch mispredict, but all CPUs pay a penalty. All CPUs pay a HUGE penalty if you need to reference main memory. I would say it this way:
In other words, SMT is an ingenious method for making up for the fact that CPUs are horribly inefficient.
This big pile of simple cores idea is the premise behind IBM's Blue Gene. (or the thinking machines from the early 90s.) It works some of the time, but is extremely difficult to program.
Ummmm. Opteron does well in the high-end workstation market. Maybe the high-end gamer market. In terms of server technology it's still a bit of a joke. The biggest boxes at 4-ways. None of them support hardware partitioning, most don't even do chip-kill memory.
Sun has promised to make 8-way and larger opteron systems, but don't expect them for a couple years. Opterons don't really compete with Itanium, they compete with Xeon. (very well I might add) In the real server world Itanium is trying to break into the territory of sparc, power, pa-risc, and alpha. Opteron doesn't even exist in that space.
Not Yet they haven't. They just released the EV7z on the alpha front, and will soon release the PA-890. HP is planning to support VMS and Tru64 on alpha, hpux on pa, and tandemOS on mips until 2008 at least.
Itaniums sometimes use a shared bus architecture, but the fancier HP and SGI boxes use NUMA-style crossbars instead. Remember that crossbars scale bandwidth better, but there's a latency penalty. Note that all the big iron boxes are crossbars connecting a bunch of nodes. Those nodes all use 2 or 4 processor shared-bus designs. (sun, ibm, unisys, fujitsu, hp, etc)
HP made a calculated decission. It would be VERY expensive for them to continue development of their 5 distinct server lines. Some customers will jump ship to IBM or SUN, but that's something they can't really help. The server business is becoming a mature commodity business. How many companies manufacture airplanes? Hw many build cars. Most mature markets consolidate down to a half dozen big players. Some things suck along the way, but it should not be a shock to anyone.
Actually the FSB is the bottleneck ALMOST ALL THE TIME. It may only be 1-2% of the instructions, but a ram-load takes hundreds or thousands of CPU cycles. That's the very reason for speculative loads on the itanium, to start the load as far in advance as you can. Modern processor architectures are built around trying to minimize the necessity of RAM-loads. This is, or course, a problem of latency and not of bandwidth, though you need that too.
That said, Mainframes don't have any real sollution to the latency problem either. (except vector CPUs from NEC or Cray, but that only works for a very limited set of programs)
In x86 land the 486 was also the first to have a cache. (8kb)
As it turns out many HP customers are refusing to migrate to itanium/hp-ux. When one is considering real server-iron the currentness of the processor is not always of utmost importance. If there's a legecy app that runs on tru64 (I mean ultrix, I mean osf) and it's really expensive to port, a lot of shops are just going to keep running alphas until the wheels fall off and burn. [Look at all the guys still running on sperry 1100-series machines]
s s to anything else. A migration might cost millions of dollars. A dead-end alpha server might cost tens of thousands and put off the more for a long time.
True, it's a dead-end choice, but one that might limp along for another 6-8 years. Not everyone has the option of migrating NOW. That works if you're talking about tru64/apache to linux/apache, but not if your talking about tru64/Legecy-app-from-company-no-longer-in-busine
My call is that makes lots of sense.
Sort of. Windows NT origionally was written for the i860, which was abandoned. The first release (3.1) ran on x86, mips, and alpha. the 3.5 release ran on powerpc. FYI
The workstation market is proving less and less profitable for sun, and for all the other unix games in town. Since a linux-PC or Mac is so close to a workstation, fewer and fewer people are willing to pay a big premium for a workstation. Thus it's probably not worth sun's engineering time to continue developing workstations. They will continue to develop their higher-end products from the ground up, but use commodity parts at the low end.
Why they would do this on powerpc when they already have an opteron product line, I don't know. I imagine they will not productize any powerpc systems, and are just doing this to thumb their nose at AIX. Either that, or they are making contingency plans in case they decide to become a total software company. In any case, I bet you can keep a limited port going with 2-3 engineers, especially since Solaris already runs on sparc and x86.
You might not want to think about everything in terms of business, but one can, and it's important that he does think about things that way.
Sun is in big trouble. They sell a bunch of decent servers that are not really unique from what the rest of the unix world is selling. They are obviously not able to keep ahead of the competition by making sparc the best processor around, so they have to come up with some other way sell something worth paying for. Solaris, for all its issues, is a reliable, scalable OS that runs a lot of applications. Solaris is a great asset to Sun; If they can leverage it on IBMs processor and make money doing so, it would really help the company.
Sun has moved beyond the "we can do everything in house" days, and is trying to figure out which battles are worth fighting. If they choose the wrong battles, they might go the way of dec, data general, and Sequent.
All good points.
The memory interconnect is much more important to this system than the CPU. Yes, this CPU may perform foo% better than that one, using this benchmark, but 512 of any modern CPU pretty much rocks. Who really cares about the cost of the CPU on a system like this? Other costs are really going to dominate any different in the cost of the cpu.
The Mips based origin product from sgi scaled to 512 CPUs. (1024 I think) The cray X1 can supposedly scale to 4096 CPUs, but nobody can afford more than 256. Red storm will do several thousand CPUS, as did its predicessors asci red and cray t3e, but these are previous generation machines.
Power maxes out at 32 CPUs (soon 64) per OS. Sun claims to cluster together 4 by 72 processor boxes, but it's not really SSI. Even the earth simulator clusters together a bunch of 8 cpu nodes with a shared filesystem.
Oops. $8million ea. My bad.
Even so, calling these machines "off the shelf" is really stretching it. The linux is close to redhat, though they do quite a lot of modifications for the altix. The processors are intel, but not mainstream. The chipsets are totally custom, including their craylink-derived memory router. The I/O controllers are completely custom until you eventually find a pcix slot. They contain a lot of cool technology, but off-the-shelf they aren't.
Why use itaniums? Because itaniums are very fast at floating point math, and have 9MB of cache. It's not a perfect CPU, but it's not bad. Nasa is more than willing to optimize their code extensively. (Yes the optimizing compilers ARE available, just not in gcc. Both intel and hp have very good compilers for ia64) The IBM power architecture is also a very good architecture, but they are also VERY expensive.
Mostly they use Itaniums because they are buying an SGI sollution. Nasa Ames has been a long time sgi customer. The cluster of itanium/linux altix machines is simply a kicker to their previous cluster of mips/irix origin 3000 systems, which replaced a cluster of o2000s, which replaced a cluster of power-challenge boxes. That's one of the reasons this purchase happen so quickly. All the physical/technical/knowledge/business infrastructure was in place.
If you read the sgi press release, they are also cutting nasa a huge break on the price to win the contract. It's about $2million each for those altix boxes including fibre channel cards, switches, and storage. I can't believe SGI is making any money on the deal.