Japan's Newest Linux Supercluster: 13TB RAM
green pizza writes "Following its sale of a 10240 processor cluster to NASA, Silicon Graphics Inc has announced that it's supplying a 2048 processor Altix 3700 Bx2 to the Japan Atomic Energy Research Institute. Aside from running Linux on Itanium2 processors, the beast also features 13 TB of RAM!"
I guess that'll be enough to run Longhorn then.
I remember back in my electronics course when we had to design the flip-flop grid for memory... the teacher said he'd give 100% to anyone that could draw out 64K of memory... 13TB just makes me cringe...
---
Programming is like sex... Make one mistake and support it the rest of your life.
I agree. These rather wasteful supercomputers are getting less and less impressive.
You know what would be impressive? Published results!
I mean they consume gobs of resources [power, material, waste]. That's not impressive. That's an American city block. What would be impressive is having to show for it at the end of the day.
Tom
Someday, I'll have a real sig.
Do all processors share 13TB? Because if they don't the bottleneck is that subprocesses have only 13TB/1024 available ( a mere 13GB each), and still have to communicate a lot.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
it's 13TB, not 3TB. Which is according to the article: "over 13 terabytes of memory - the world's largest memory capacity"
Szo
Red Leader Standing By!
The puter will be used for nuclear research (bushspeak: nucjular reesatch) by the Japan Atomic Energy Research Institute. More info about the organisation, their projects, etc. can be found at: http://www.jaeri.go.jp/english/index.cgi.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
2048 processors, 13 terabytes of ram, AND it comes with a smaller, more ergonomic controller.
"If you think you have things under control, you're not going fast enough." --Mario Andretti
A whooping sale of 2048 Itanium2 processors in one shot - is this the BIGGEST sale for the Itanium2 chip, so far ?
Muchas Gracias, Señor Edward Snowden !
Haven't we had enough rudeness the last four years? I happen to be pleased by most of those results (though not, for example, that anyone still uses Windows). But you're a cowardly troll for anonymously posting such off-topic flamebait. - Get some stones and at least use a pseudonym - Stay on topic - Avoid calling people names like "Eurotrash" - In short, show a little class
sigs, as if you care.
SGI has been working through this in hardware for over 10 years.
The distributed shared memory concept of the Altix (first seen on Origin 200 / Origin 2000 in the commercial space, and previously based on the Standford DASH/FLASH projects) uses a hardware based memory router.
Each PE has local ram and local CPUs and a "MAGIC" chip that routes cache invalidations, memory block "ownership", etc messages to other PE's as necessary. Unlike SMP designs, cache coherencvy doesn't destroy the whole shebang because its not a shared bus, it's a heirarchial directory system. I.e. PE0 knows it only needs to contact PE3, PE6, and PE13 to invalidate a cache block. Turns out that thats much more efficient than broadcasting a message to PE0-PE63 saying "invalidate this block!"
Now, as far as _all_ processor sharing the full 13TB - i am not sure.
The memory density / system image equation is sort of a tradeoff, as more PE's require more router hops in the topology. More router hops increase latency. SGI has sold 256 and 512p single-image systems, and may have gone up to 1024 or 2048p / system.
To be perfectly honest, the system-system latency is different than the intra-system latency, but nothing like it would be on an x86-with-ethernet shared nothing cluster.
SGI's big installations are cool as they have advantages of both SMP and MPP designs.. each autonomous machine gives you signle-image benefits but with really high proc counts.. . and then you link a bunch of those together to get this outrageously sized machine.
My opinions are my own, and do not necessarily represent those of my employer.
13Tb of RAM, but how much swap?
The more advanced the technology, the more open it is to primitive attack
Sorry to spoil the excitement for everybody but actually, Columbia far exceeds the Japanses system's memory capacity at 20 TByte. See this description for details of Columbia's config.
Most clusters run the vendor Unix. IBMs runs AIX or Linux, SGIs run IRIX or Linux, Alphas run Tru64, x86 clusters run Linux. The ultra-high-end custom machines run obscure custom Unix ports. Microsoft is trying to break into the HPC market, but so far only Cornell and Rice are buying.
isn't it recommended you have 2x ram as your swap? so that'd be *does difficult calculations in head* 26TB of swap. You really don't want the kernel killing off processes because you run out of ram....that'd be bad.
And the awnser is: it depends on what you're doing with it.
This thing is significantly more tightly coupled than VT's cluster, and uses shared memory as opposed to clustering, so for alot of tightly coupled problems it will be *far* more efficient.
As for raw processing power, the Itanium2 has the same theoretical peak floating point performance as a PPC970 at the same clock. In reality the Itanium is likely to come closer to achieving it's peak than the PPC970 due to it's massive cache (9MB compared to the 970's 512KB). However the Itaniums in an Altix3000 are only running at 1.6Ghz according to SGI's page, while the 970s in VT's cluster are now at 2.3Ghz. So the BigMac would have some advantage on loosely coupled problems that it can fit in it's smaller cache and memory.
So while the BigMac might beat this system at Linpack, the benchmark used to determine the top500, in the domain this system is to be used for (3d modeling of nuclear blasts) it's tighter coupling and greater RAM will make it much faster.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge