SGI Installs First Itanium Cluster At OSC
Troy Baer writes: "SGI and the Ohio Supercomputer Center (OSC) have announced the installation of the first cluster using Itanium processors. The system consists of 73 SGI 750 nodes, each with two Itanium 733MHz procs and 4GB of memory, connected by Myrinet 2000 and Ethernet. Software includes Linux/ia64, SGI's ia64 compiler suite, MPICH/ch_gm, OpenPBS, and Maui Scheduler."
The story has been out for months and people obviously haven't been paying attention or at least have failed to see just how concerted this effort is. This technology isn't new, but it's just a small part of the whole if you know where to look.
...
The new systems sound great, but they're tiny compared to what it's going to be like when the GRID is up and running.
What the story fails to mention is that this system is likely to be connected to the other GRID environments in the States and the new ones in Europe at which stage you wont be talking about just 4 super computer centres but nearly a hundred, each with several Tflops of processing power and a few petabytes (10**15) of storage.
I would suggest that to put this in the proper perspective you also look at IBM's contract to do the same to 4 sites in the Netherlands, the UK GRID which has 9 sites, the German one which I dont know much about but is fairly advanced and the CERN DataGrid. These are all interconnected with the same people working on several at a time.
Or you could have a look at the top500, find all the supercomputers in Europe & the US which aren't classified or commercial and then figure out what their combined processing power is. You should then have a fair idea just how much processing power there will be in a couple of years time 8)
Now back to the Particle Physics experiments
Sorry, but there's nothing overpriced about the Origin 3000 family. I saw a quote for a 16-processor O3400 with 16 GB of RAM; the bottom line at list price was around $500,000.
That seems totally competitive to me.
And since I'm posting anonymously anyway, you might be interested to know that SGI is planning to release a new product any minute now. (It was announced to the developer and integrator channels this week). It's a four-processor (MIPS, natch) server in a 2 RU package. They're calling it the Origin 300.
But the cool part is going to be an interconnect product codenamed Sprouter. It lets you take 2 to 8 Origin 300 systems with 4 procs each and connect them using NumaLink (formerly CrayLink) into a single system image of up to 32 processors.
At, it's projected, half the price of a 32-processor Origin 3000 system. And for my kind of programming anyway, single-system-image beats the pants off that Myrinet stuff.
The O300 has 2 66/64 PCI slots, so that's enough expansion to let you attach your basic I/O devices like fibre channel RAIDs and high-speed networking and stuff. Each server comes with USCSI3 built-in, if anybody still uses that stuff. ;-)
Not everybody needs a medium-scale single-system-image IRIX machine, but I personally do a lot of ImageVision library programming. And ImageVision, being multithreaded at the core, loves big CPU counts. So for me, and people with needs like mine, it's going to be a very cool fall.
Troy, How much power and A/C is required to run this 146 Itanium processor cluster? I have heard that 1 itanium node uses two 800 Watt power supplies and generates 6174 BTUs/Hr. I am guessing that you required 116.8 KiloWatts of power and 37.5 Tons of A/C. Please correct me if I am wrong. What did you have to do at your site to support this cluster?
This was one example of low level optimizations, another one is giving hints to different branches (both target and outcome of branch conditions). This is also best done by the compiler (at least the branch target hints), and works even better if you can supply the compiler with profiling information. You can also give data prefetch hints and specify which cache level different prefetch data should go into.
Another example of when you might need to do asm is when you do SMP. The reason being that different load and store instructions are given semantics of how the are to behave in a multiprocessor environment: you want acuire semantics on this load, release semantics on this store, fence semantics here, undefined semantics there, etc. I can't see how the compiler would be able to generate correct assembly in this case (unless it is modified so that you can attach some new attributes to your variables and types).
Then there is this whole plethora of floating point stuff that I won't mention because I don't know shit about it.
Hmm, reading your post again I see that I didn't really answer your question, and most of my ranting about doing asm coding ended up with the conclusion that having the compiler do all the nasty stuff is probably better anyway. I guess I'd better shut up now.
Slashdot is *still* using the old cube logo, rather than the new "sgi" logo. Sure the new logo sucks and the old logo is quite cool, but it's time to move on. The old days are long gone. Like the rest of the 'new' SGI, there is nothing special about the SGI 750 Itanium box, it's the same box with the same Intel reference board that HP, Dell, and others are selling.
SiliconGraphics has left the building. The "hip new" SGI is here. Quit using the old logo, it reflects a much cooler company that no longer exists.
It seems the ones who have been faithful to their commitment to Linux are SGI and IBM. The others have tried it and then decided it was not worth the effort to reach such a small segment of the population.
I'm glad there are still big players in the Linux field, though, it helps forward the cause and the OS and lets people know there IS an alternative. By all means, SUN and other, keep your propriatary stuff available and have that as the default, but allow people the option to choose another OS if they so desire.
DanH
Cav Pilot's Reference Page
UNIX - Not just for Vestal Virgins anymore
I had an opportunity to work on them about a year ago (the first one we received was a doorstop, literally... The sucker weighed 73 pounds in it's shipping package (I'm NOT KIDDING... They reeled the box in on a trolly, and I laughed at the guy cuz it looked small enough to carry, but then I tried to pick it up...) and didn't even boot, but intel shiped them with 2GB of ram and a kickass SCSI system, so let's just say that my desktop became a SWEEEEEET machine.), but once we got ones that did work, they were sweet machines. I was porting bigint libraries for encryption that I had hand-coded in assembly for the x86 platform, and going from 8 general purpose registers with 1 predicate register (i.e. only 1 carry flag) to having 128 general purpose registers, and 63 predicate registers was a GODSENT.. AMAZING... For anyone who's coded math routines in assembly, you know how much of a PITA it is having only one carry register. This was simply amazing. I could do 1024-bit RSA purely in registers, no memory access outside of the initial read of the data and the final write. Needless to say it flew. It was interesting because literally you wouldn't need a hardware crypto card if you have an Itanium system. So basically Intel really put in a lot of good effort into designing this new platform to avoid the pitfalls of the problems that they experienced with their x86 architechture.
The machines also had 4GB of ram, so it was fun to do:
char * myStr = (char *)malloc(-1);
and have it succeed! (that's a 4GB memory allocation)
If God gave us curiosity
think again, Apple current top-end dual 800Mhz PowerMac G4 runs AltiVec code at over 5GFLOPS. My older dual 450Mhz can run up to 3.2GFLOPS... And don't forget, the G4 has a successor due pretty soon now. I think these guys will find the pleasure of being the first will be tempered by eventually having the worst price/performance 64bit cluster in academia.
The person who submitted the story (Troy Baer) is also the admin of the beast. Troy had an interesting article on the current (previous?) cluster setup at OSC in one of the recent Linux mags (Linux Journal, 2001 July). To call Troy a proud father of this setup might not be too far off. ;)
;) Overall, a pretty damn sharp guy. He gets to play with Linux/SGI clusters now, I'm stuck with Alphas & an O2000 in a back room somewhere.
I knew Troy from school, admin-ed with him in the Ohio State engineering labs. Ask him what he's doing with that Aero Eng. diploma nowadays..
-'fester
-'fester
If I remember correctly, in about 1996/97 this guy went to work for SGI. He'd already had similar high-up jobs at two other companies that lasted nearly 6 months, just like his stay at SGI was going to. What he did was to go around all of these companies saying that they needed a "Windows NT strategy"
The strategy was to go NT4.0 on commodity (intel) hardware. So, SGI announced that it was reducing Irix development, halting development of the MIPS processors (which were an order of magnitude faster than the Pentium of the day and 64-bit to boot).
Very quickly, he was looking for a new job, SGI had penium machines that no-one wanted running NT (and Linux for the better informed customer), had restarted MIPS development and continued with Irix. However, by that time they had lost their lead.
I hope he lost all his money in the dot com mania.
However, I think the reason that SGI is not producing a desktop version of Onyx 3000 is obvious - SGI tried to do battle with Sun, etc. and failed. SGI tried to do battle with Dell, etc. and failed. They're not about to do the same thing to NVidia...
The original SGI targetted what was at the time a niche market - 3D graphics. It looks like the new SGI will also retreat into niche markets - very high-end graphics and compute servers.
You're also right about the Itanium server - there's nothing very interesting about it. I believe this machine is only intended as an interim solution to allow developers onto the platform until SN-IA is available (whenever that might be).
Then we'll see something a little more impressive than OEM Itanium boxes with low-bandwidth Myrinet interconnects!