As covered earlier here, IBM backed out of the contract because they thought they wouldn't be able to meet the performance requirements for existing codes. They were concerned about clock speeds (POWER7 runs at 4 GHz). POWER7 excels at single thread performance, but also in fat SMP nodes.
What NSCA ordered now is system that is pretty much the antipode to the original Blue Waters: the Bulldozer cores are sub-par at floating point performance, so they'll have to rely on the Kepler GPUs. Those GPUs are great, but to make them perform well, NSCA and U of I will have to rewrite ALL of their codes. Moving data from host RAM to the GPU RAM over slow PCIe links can be a major PITA, especially if your code isn't prepared for that.
Given the fact that codes in HPC tend to live much longer than the supercomputer they run on, I think it would have been cheaper for them to give IBM another load of cash and keep the POWER7 approach.
Isn't IBM the one who's 4 years ahead?
on
The Transistor Wars
·
· Score: 1, Flamebait
I'm mainly referring to embedded DRAM, which they use for gigantic on-chip caches. Neither Intel nor AMD have this, they have to use SRAM for their caches, which consumes much more space on the die. Sure, IBM is targeting different customers with their pricing strategy, but just imagine having an 8-core POWER7 chip with 32MB on-chip L3 cache in your PC. That thing would just blow any x86 CPU away. Partly because of the gigantic cooler required, I admit.
At least it's that way for us in HPC. Sure, FreeBSD is rock stable and all, but if you run stable, you'll be trailing behind and won't get to use the latest packages. This may be fine for ordinary HTTP server, but when you need an updated NUMA aware scheduler for your 48 core/4 socket machine or the latest drivers for your InfiniBand hardware, then you'll happily give up some alleged increase in stability in favor of real performance. Same is true for Debian stable.
I'm doing my PhD in HPC. From my perspective GPUs do indeed offer a lot of GFLOPS but it's often impossible to max them out. Especially for stencil codes (read: virtually all physical simulation codes) this is hardly possible because of the low operational intensity of stencils. CPUs achieve much higher efficiencies here because they can do cache blocking. The caches on e.g. Fermi are much too small to do that well. So no: in this case a GPU won't necessarily yield you a higher performance or efficiency
As for the pricing: in scientific computing we typically care for double precision. For that you have to by Nvidia Tesla cards, which cost several thousand dollars, each. I don't know the prices of the Sparc chips, but I doubt they'll be higher.
Fujitsu is fishing in the same waters as IBM does with their BlueGene machines: both lines are designed to deliver 20 PFLOPS and both are traditional systems in the sense that you don't have accelerators like GPUs, which are still awkward to program for the average physicist. Thus, to potential buyers the TCO would be interesting. From what I've heard BlueGene/Q is twice as power efficient as the Sparc VIIIfx design, but those were just 8-cores, not 16-cores.
So, assuming comparable total power consumptions and a affordable price tag, Fujitsu could snatch several deals from big blue, perhaps even the recently failed Blue Waters, although my money is on Cray for that machine.
After having a look at the PDF I wonder which businesses they did ask. First, most large companies that I know of run their servers with Linux, no one would even dare to suggest a MS hypervisor. Second, the hypervisors that I've seen in the wild are (apart from Citrix and some VMWare hosts) mostly OpenVZ, Virtuozzo or Xen. Just think of all those root v-servers you can rent for cheap. Xem is big in companies and backed by major players, e.g. IBM. The survey numbers just don't make sense.
Exactly my point: use the file system! Otherwise, you're still free to stream via other means. I use KDE, but stream using xine, which may very well stream from smb shares.
That said, I don't claim KDE was flawless. Clearly a system that copies a 20GB file just to play it once is rubbish in that matter, I'll grant you that.
1. I've been using both, KDE 3 and 4 for years. I didn't make the switch because of "versionitis", but because 3.5 wasn't satisfying in various aspects (e.g. the broken menu editor, broken USB auto mounting, poor calendar storage format choices... I could continue this list endlessly).
2. What I'm questioning is not if "Trinity" is a good from a technical perspective, but the relevance of the project itself. People are free to do in their basements whatever they please, but not everything is worth being posted on such a high volume/high profile forum. This is a prime example. The work is irrelevant since the reason why it was started (KDE 4.0 did suck, after all) doesn't exist any longer (KDE 4.7 rocks). You'll never get enough manpower to keep up with KDE mainline. Thus, Trinity will trail behind, with the gap growing constantly. Just look other, larger projects: even Gnome is struggling to keep up the pace. The story just got posted because it triggered the right./ buzzwords: Linux desktop and KDE 3.5 vs. 4.x.
3. Anyone who's used both, QT3 and QT4 will agree with me that QT4 was a huge step forward. E.g. the widget rendering infrastructure alone is now so much more elegant that it would be a reason to make the switch.
...and stop posting irrelevant stories like this on the front page. KDE 4.0 was horrible, yes, but it's not like KDE 4 development was halted. The latest release is 4.7 and it's much more stable and feature rich than 3.5 ever was.
Or to put it in a different way: a crucial part of super computers is always the network. It sets a super computer apart from a bunch of workstations. Sunway uses InfiniBand. Even if the components (NICs and switches) are build by Huawei, it's still no Chinese design. This isn't bad or dramatic, but it adds to the fact that this isn't a totally "homegrown" machine.
Yes. And in every time step you'll have to sync the ghost zones (or halos). And if the time step is computed faster (because your CPU/GPU has been upgraded), and your network hasn't been equally accelerated, then, at some point, you'll be bandwidth limited. Overlapping of calculation and communication only hides communication time if t_send = (t_latency + size / bandwidth) is smaller than t_compute. Speeding up the CPU/GPU will reduce t_compute. t_send will remain constant.
That said, of course there are algorithms which don't have to sync in every time step because they'll communicate a wider halo, e.g. an halo of width 4 and communication only every fourth step, but those algorithms only reduce the influence of the network latency, not the bandwidth limit.
Both need caring: books may rot, but the dead see scrolls are a pretty good example for how long paper may last. Then, for digital documents there is a thing called bit rot. Ever tried to read a floppy disk from 1990? Even if you got the drive, the medium of the floppy may have become unreadable. Or there is no program to open that document format. To avoid bit rot you can expect that you'll have to constantly copy and reformat your books. Bt
The expectations in the 1990s were much higher than today. The expectation was that ultimately electronic paper would replace printed newspapers. You'd only buy one book and download any content to that book -- similar to what people could to with a vanilla Kindle. The Kindle Fire doesn't even use E-Ink, but a standard IPS LCD display. But most people still buy paper books (the offline variant). And any other products using E-Ink are still vapor ware: lots of announcements, none available. No wall sized displays replacing the concrete behind them with tropical islands, no camouflaged tanks, no nothing. Since years.
Who's talking about disk I/O? I'm talking about network bandwidth, which is required for synchronization, e.g. to update ghost zones in stencil codes. The required bandwidth is proportional to the computational power of the nodes. Latency can actually be hidden, too, by overlapping computation and communication -- at least for the afore mentioned stencil codes, which represent the largest fraction of simulation codes out there.
That said, disk I/O is still vital, at least if you want to actually see what your super computer has computed.
Of course peak performance is never actually achieved, but a 10 PFLOPS machine is useless if production codes all run at 10 TFLOPS. That's why Blue Waters did target 1 PFLOPS application performance -- that meant more to them than 10 PFLOPS Linpack throughput.
As covered earlier here, IBM backed out of the contract because they thought they wouldn't be able to meet the performance requirements for existing codes. They were concerned about clock speeds (POWER7 runs at 4 GHz). POWER7 excels at single thread performance, but also in fat SMP nodes.
What NSCA ordered now is system that is pretty much the antipode to the original Blue Waters: the Bulldozer cores are sub-par at floating point performance, so they'll have to rely on the Kepler GPUs. Those GPUs are great, but to make them perform well, NSCA and U of I will have to rewrite ALL of their codes. Moving data from host RAM to the GPU RAM over slow PCIe links can be a major PITA, especially if your code isn't prepared for that.
Given the fact that codes in HPC tend to live much longer than the supercomputer they run on, I think it would have been cheaper for them to give IBM another load of cash and keep the POWER7 approach.
I'm mainly referring to embedded DRAM, which they use for gigantic on-chip caches. Neither Intel nor AMD have this, they have to use SRAM for their caches, which consumes much more space on the die. Sure, IBM is targeting different customers with their pricing strategy, but just imagine having an 8-core POWER7 chip with 32MB on-chip L3 cache in your PC. That thing would just blow any x86 CPU away. Partly because of the gigantic cooler required, I admit.
I'd mod this post "+1 informative" if I had resisted posting prior to reading this.
And this is why I run Gentoo Linux. I don't have to roll my own, but still get optimized (and customized) builds, mostly free of effort.
At least it's that way for us in HPC. Sure, FreeBSD is rock stable and all, but if you run stable, you'll be trailing behind and won't get to use the latest packages. This may be fine for ordinary HTTP server, but when you need an updated NUMA aware scheduler for your 48 core/4 socket machine or the latest drivers for your InfiniBand hardware, then you'll happily give up some alleged increase in stability in favor of real performance. Same is true for Debian stable.
I'm doing my PhD in HPC. From my perspective GPUs do indeed offer a lot of GFLOPS but it's often impossible to max them out. Especially for stencil codes (read: virtually all physical simulation codes) this is hardly possible because of the low operational intensity of stencils. CPUs achieve much higher efficiencies here because they can do cache blocking. The caches on e.g. Fermi are much too small to do that well. So no: in this case a GPU won't necessarily yield you a higher performance or efficiency
As for the pricing: in scientific computing we typically care for double precision. For that you have to by Nvidia Tesla cards, which cost several thousand dollars, each. I don't know the prices of the Sparc chips, but I doubt they'll be higher.
Fujitsu is fishing in the same waters as IBM does with their BlueGene machines: both lines are designed to deliver 20 PFLOPS and both are traditional systems in the sense that you don't have accelerators like GPUs, which are still awkward to program for the average physicist. Thus, to potential buyers the TCO would be interesting. From what I've heard BlueGene/Q is twice as power efficient as the Sparc VIIIfx design, but those were just 8-cores, not 16-cores.
So, assuming comparable total power consumptions and a affordable price tag, Fujitsu could snatch several deals from big blue, perhaps even the recently failed Blue Waters, although my money is on Cray for that machine.
...or boot into Linux directly. :-P
After having a look at the PDF I wonder which businesses they did ask. First, most large companies that I know of run their servers with Linux, no one would even dare to suggest a MS hypervisor. Second, the hypervisors that I've seen in the wild are (apart from Citrix and some VMWare hosts) mostly OpenVZ, Virtuozzo or Xen. Just think of all those root v-servers you can rent for cheap. Xem is big in companies and backed by major players, e.g. IBM. The survey numbers just don't make sense.
Uhm, you DO have multiple desktops in KDE 4.x.
FWIW, sshfs is pretty usable for Unix only setups. Don't know GUIs for it, though.
It is more mature. It was initially released in 1976, and is still being actively developed, too. :-P
Exactly my point: use the file system! Otherwise, you're still free to stream via other means. I use KDE, but stream using xine, which may very well stream from smb shares.
That said, I don't claim KDE was flawless. Clearly a system that copies a 20GB file just to play it once is rubbish in that matter, I'll grant you that.
...disqualifies you as a technical reviewer. Sorry. KDE isn't bad just because you can't get it to work with your Windows compatibility network setup.
1. I've been using both, KDE 3 and 4 for years. I didn't make the switch because of "versionitis", but because 3.5 wasn't satisfying in various aspects (e.g. the broken menu editor, broken USB auto mounting, poor calendar storage format choices... I could continue this list endlessly).
2. What I'm questioning is not if "Trinity" is a good from a technical perspective, but the relevance of the project itself. People are free to do in their basements whatever they please, but not everything is worth being posted on such a high volume/high profile forum. This is a prime example. The work is irrelevant since the reason why it was started (KDE 4.0 did suck, after all) doesn't exist any longer (KDE 4.7 rocks). You'll never get enough manpower to keep up with KDE mainline. Thus, Trinity will trail behind, with the gap growing constantly. Just look other, larger projects: even Gnome is struggling to keep up the pace. The story just got posted because it triggered the right ./ buzzwords: Linux desktop and KDE 3.5 vs. 4.x.
3. Anyone who's used both, QT3 and QT4 will agree with me that QT4 was a huge step forward. E.g. the widget rendering infrastructure alone is now so much more elegant that it would be a reason to make the switch.
...and stop posting irrelevant stories like this on the front page. KDE 4.0 was horrible, yes, but it's not like KDE 4 development was halted. The latest release is 4.7 and it's much more stable and feature rich than 3.5 ever was.
Or to put it in a different way: a crucial part of super computers is always the network. It sets a super computer apart from a bunch of workstations. Sunway uses InfiniBand. Even if the components (NICs and switches) are build by Huawei, it's still no Chinese design. This isn't bad or dramatic, but it adds to the fact that this isn't a totally "homegrown" machine.
Just my thoughts. This story is overrated.
Yes. And in every time step you'll have to sync the ghost zones (or halos). And if the time step is computed faster (because your CPU/GPU has been upgraded), and your network hasn't been equally accelerated, then, at some point, you'll be bandwidth limited. Overlapping of calculation and communication only hides communication time if t_send = (t_latency + size / bandwidth) is smaller than t_compute. Speeding up the CPU/GPU will reduce t_compute. t_send will remain constant.
That said, of course there are algorithms which don't have to sync in every time step because they'll communicate a wider halo, e.g. an halo of width 4 and communication only every fourth step, but those algorithms only reduce the influence of the network latency, not the bandwidth limit.
Both need caring: books may rot, but the dead see scrolls are a pretty good example for how long paper may last. Then, for digital documents there is a thing called bit rot. Ever tried to read a floppy disk from 1990? Even if you got the drive, the medium of the floppy may have become unreadable. Or there is no program to open that document format. To avoid bit rot you can expect that you'll have to constantly copy and reformat your books. Bt
Plus paper books last (nearly) forever, you can give them them to kids or, if everything else fails, the nazis/communists/whatev0r can burn them.
Yeah, I'm waiting for that since reading William Gibson's Neuromancer as a kid. They could make gazillions with these things.
The expectations in the 1990s were much higher than today. The expectation was that ultimately electronic paper would replace printed newspapers. You'd only buy one book and download any content to that book -- similar to what people could to with a vanilla Kindle. The Kindle Fire doesn't even use E-Ink, but a standard IPS LCD display. But most people still buy paper books (the offline variant). And any other products using E-Ink are still vapor ware: lots of announcements, none available. No wall sized displays replacing the concrete behind them with tropical islands, no camouflaged tanks, no nothing. Since years.
Who's talking about disk I/O? I'm talking about network bandwidth, which is required for synchronization, e.g. to update ghost zones in stencil codes. The required bandwidth is proportional to the computational power of the nodes. Latency can actually be hidden, too, by overlapping computation and communication -- at least for the afore mentioned stencil codes, which represent the largest fraction of simulation codes out there. That said, disk I/O is still vital, at least if you want to actually see what your super computer has computed. Of course peak performance is never actually achieved, but a 10 PFLOPS machine is useless if production codes all run at 10 TFLOPS. That's why Blue Waters did target 1 PFLOPS application performance -- that meant more to them than 10 PFLOPS Linpack throughput.
Not just Sequoia, but also Mira (10 PFLOPS BG/Q @Argone), Hermit (4-5 PFLOPS Cray CE6 @HLRS, Germany) and... well Blue Waters seems to have silted up.