Domain: cray.com
Stories and comments across the archive that link to cray.com.
Comments · 231
-
Um, yeah...
-
Re:Kids these days
This PDF from Cray shows that they base at least one of their newest supercomputers on CPUs (Xeons) and use GPUs as accelerators (see p. 3 and p. 7). Also, on their product page, Cray states, "The XC series,and our newest entry the XC50, expands our support for the latest CPU and CPU capabilities with the addition of NVIDIA’s Tesla® P100 GPUs, previously code named “Pascal.”
So... it pretty much mirrors exactly what the article states. Your emphatic statement on architecture appears less than accurate in light of the Cray information. Fortunately ignorance is relatively easy to remedy, unlike say, personality defects.
-
Re:Kids these days
This PDF from Cray shows that they base at least one of their newest supercomputers on CPUs (Xeons) and use GPUs as accelerators (see p. 3 and p. 7). Also, on their product page, Cray states, "The XC series,and our newest entry the XC50, expands our support for the latest CPU and CPU capabilities with the addition of NVIDIA’s Tesla® P100 GPUs, previously code named “Pascal.”
So... it pretty much mirrors exactly what the article states. Your emphatic statement on architecture appears less than accurate in light of the Cray information. Fortunately ignorance is relatively easy to remedy, unlike say, personality defects.
-
Re: Cost
Okay... This is what it's supposed to emulate.
This thing has more than nine hundred thousand processor chips and two petabytes of memory. Current x64 chips are limited to 256 TB (wikipedia) of physical address space; so these chips either [a] have larger than usual physical address space (I doubt), or [b] isn't a shared memory system.
So, dumbnuts, this isn't a shared memory system. Go read about the Cray XC40. Or even this document -- clearly showing it's a multi-node system with a fast interconnect. (It talks of each node running different OS images, so that means it isn't one shared OS image - which means it isn't shared memory).
Summary: What evidence do you have that the target system is shared memory? It looks to me like it's non-shared-memory (i.e., message passing); while with an extremely fast interconnect, I'm sure it's still slower than the CPU internal busses. The same is true with this Raspberry Pi - the interconnect (ordinary Ethernet) is still significantly slower than the ARM chip itself; and THAT environment is what's being emulated.
It doesn't really matter that other architectures could be faster - the GOAL is to replicate how the Cray XC supercomputers work - albeit at a fraction of the performance and price.
-
Cray® XC40 utilize the Cray Linux® Envir
-
Re:obligatory
Yes, it runs linux.
Cray Linux® Environment (includes SUSE Linux SLES11, HSS and SMW software)
Extreme Scalability Mode (ESM) and Cluster Compatibility Mode (CCM) -
Re:Dude, you're getting a CRAY, also error in summ
And neither mentions the CPU architecture, but if you go to the product brochure then you learn that they're Intel Xeon E5s (which doesn't narrow it down much). Interesting that they're using E5s and not E7s, but perhaps most of the compute is supposed to be done on the (unnamed, vaguely referenced) accelerators.
-
Looks
I miss the days when supercomputers looked super. This one looks like a row of drinks machines.
-
Re:Ode to past:
yes it does
http://www.cray.com/Products/C...
has it's own version I believe -
Re:Throw the book... maybe literally at him.
-
Re:Why .Net?
I'm curious, what do people in HPC think of Chapel? I was curious about it, and it seemed pretty interesting, but still early days. http://chapel.cray.com/
-
Cray
Cray is still around building computers...
http://www.cray.com/Products/P...
...and they started in "72 when Jobs was just 17 years old.They installed their first system at Los Alamos National Laboratory in 1976
-
Re:Post it to Slashdot!
It looks decent, though I go to the FAQ, and I see "Please look here for a short review of how it relates to the competition.", and I go to that link, and there is no information about "the competition".
Ah, sorry. As the text evolved, that paragraph was buried. I changed the layout and link so that it is more visible.
And... "So far no one has come up with a language/compiler/library that could automatically parallelize any sequential code on any hardware."... have you seen Chapel? It is not perfect, and it looks like you have a nicer polish to some things, but is actually quite good for many things.
Yes, I'm aware of Chapel. This is a good example for the current state of generic auto-parallelization: it works well, as long as the user augments his sequential code so that the compiler/runtime/whatever can distill the parallelism from it. That's still not possible without augmentation. So the user needs to understand how a parallel system works and how his algorithm might be mapped to it. Trivial for someone who does this for his daily living, but difficult for someone who's new to parallel computing.
Also, for many applications the optimal algorithms to be used on the various target hardware architectures differ significantly (e.g. for stencil codes a 2.5D wavefront on multi-cores, but a horizontal iteration with 32-wide stride on GPUs...) Such different algorithms can't be "discovered" by some generic software (at least no one, not even the Chapel developers have achieved this), so those algorithms have to be encapsulated in specialized libraries. Which is what we do for our domain "computer simulations".
(I just code to MPI directly... I don't see what the big deal is for parallel processing for the vast majority of things, but I see why there would be a niche for what you do. Best of luck.)
Thanks.
:-) Users of my library are mostly scientists who want to simulate something big, without having to spend months learning OpenMP and MPI and CUDA and so on. So yeah, there is a niche. And thanks to the stagnating clock speeds and growing heterogeneity of HPC hardware, that niche is growing fast. Exiting times. -
Re:Post it to Slashdot!
It looks decent, though I go to the FAQ, and I see "Please look here for a short review of how it relates to the competition.", and I go to that link, and there is no information about "the competition". And... "So far no one has come up with a language/compiler/library that could automatically parallelize any sequential code on any hardware."... have you seen Chapel? It is not perfect, and it looks like you have a nicer polish to some things, but is actually quite good for many things. (I just code to MPI directly... I don't see what the big deal is for parallel processing for the vast majority of things, but I see why there would be a niche for what you do. Best of luck.)
-
Re:Uh huh
Really? That's all you got?
Where in your list is Tru64?
In the "dead" list, given that DEC^WCompaq^HP aren't coming out with new releases and given that it runs on an instruction set architecture of which no more implementations are being made and it isn't being ported to new architectures.
Irix?
In the "dead" list, given that SGI aren't coming out with new releases and given that the only instruction set it supports these days, MIPS is now targeted for various flavors of embedded computing rather than general-purpose computing, and it's not being ported to other instruction sets.
UNICOS?
These days, it's called "Linux".
SCO?
Wow, they're still around, not that they're players in the same "enterprise server" market as Oracle/HP/IBM and their respective proprietary Unixes.
Limiting yourself to Unixes you've encountered is a pretty lame standard.
Limiting yourself to Unixes that are still around, if the topic is the decline of "Unix", is, however, a rather reasonable standard.
-
Re:And The software is???
It ships with Compute Node Linux, which is a cut down (lower overhead) version of SLES. It supports several schedulers, but ORNL typically uses Altair PBS on the big systems (http://www.cray.com/Products/XK/Software.aspx). ORNL provides a large number of compilers and libraries that users can use in the form of 'modules' (http://www.olcf.ornl.gov/support/user-guides/titan-user-guide/). And in terms of scheduling/partitioning, the user just requests a specific number of nodes when they submit a job, and they get those nodes to themselves for the allotted time. It's pretty low-impact on the compute nodes, and less exciting than you might think. They don't put much emphasis on the software when reporting on these machines, because it's stripped down as much as possible to allow the user applications to run at peak performance.
-
Re:Products
It certainly didn't help that computer manufacturers have treated AMD as a budget CPU for many years.
Not all of them. Like you say, AMD needs to improve their marketing, right across the performance spectrum.
-
commodity HPC depends on your code
In HPC we call it "pleasantly parallel," nothing is embarrassing about it! =]
If your code:
-scales to OpenCL/CUDA easily.
-does not require high concurrent memory transfers
-is fault tolerant (ie a failed card doesn't hose a whole day/week of runs)
-can use single precision flopsThen you can use commodity hardware like the gtx series cards. I'd go with the gtx 560ti (GF114 gpu).
Make nodes with:
quad core processors (amd or intel)
whatever ram is needed (8GB minimum)
2 x gtx560ti (448) run in SLI (or the 560ti dual from EVGA)Basically a scaled down Cray XK6 node. http://www.cray.com/Assets/PDF/products/xk/CrayXK6Brochure.pdf
It all depends on your code.
-
Re:Completely different contract/machine/goals
I don't have any knowledge of what those change requests were, so I don't know the answer. Everything I have read indicates that IBM wanted too much money.
From what I have read, it seems that they were. They couldn't keep their costs low enough to justify the expense.
True, but only because of the strict requirements of NCSA. If they had been willing to change them, a BlueGene/Q would have been viable.
Ah, I misunderstood. I don't think directives have been around all that long (PGI's earilier directives and CAPS's directives come to mind) and they certainly weren't standardized. OpenACC, like OpenMP, should allow scientists to write more portable accelerator-enabled code. In fact the OpenACC stuff came out of the OpenMP accelerator committee as explained here. I think it's highly likely some version of it will be incorporated into OpenMP.
The reason why I'm so allergic to annotation based parallelization is the experiences folks had with OpenMP. The common fallacy about OpenMP is that it is sufficient to place a "#pragma omp parallel for" in front of your inner loops and *poof* your performance goes up. But in reality your performance may very well go down, unless your code is embarrassingly parallel. In reality especially simulation codes are tightly coupled and memory bound. The parallelization on GPUs is very different from the code on traditional multi-cores. On the latter you'll want to do pipelined cache blocking, while on the former you'll want to do tiling in the GPU DRAM. These are differences in the high level algorithm of a kernel, something which is beyond the compiler to change. Even with annotations.
Instead of a revamped OpenMP, I expect OpenCL to grab a larger share of the market when it comes to writing portable code. Even though OpenCL code by far isn't write once, run everywhere.
-
Re:Completely different contract/machine/goals
If they are so willing to adapt, why weren't they willing to accommodate IBM's change requests?
I don't have any knowledge of what those change requests were, so I don't know the answer. Everything I have read indicates that IBM wanted too much money.
It's not like IBM was totally unwilling to build a $200 million machine.
From what I have read, it seems that they were. They couldn't keep their costs low enough to justify the expense.
I was referring to annotations for GPU offloading. Codes that run on GPUs are in fact so common nowadays that in fact you'll be asked on conferences why you didn't try CUDA if you present any performance measurement sans GPU benchmarks.
Ah, I misunderstood. I don't think directives have been around all that long (PGI's earilier directives and CAPS's directives come to mind) and they certainly weren't standardized. OpenACC, like OpenMP, should allow scientists to write more portable accelerator-enabled code. In fact the OpenACC stuff came out of the OpenMP accelerator committee as explained here. I think it's highly likely some version of it will be incorporated into OpenMP.
-
Re:Completely different contract/machine/goals
Well, they won't have to completely rewrite all of their codes thanks to OpenACC. They will probably still have to do a bit of restructuring (and that's not a small task) but the nitty-gritty low-level stuff like memory transfers should be handled and optimized by the compiler.
-
Chapel
I think the language Chapel being developed by Cray is taking concrete steps to make Parallel Programming easier. It is very similar in syntax to C++ and even supports some higher level constructs.
This looks like an advertisement for Chapel but I have no relation to Cray. Having taken a graduate parallel programming course, I cannot agree more with the statement that "Parallel Programming is difficult". I struggled a lot with pthreads and MPI before doing the final assignment in Chapel which was a pleasant surprise. The difference between serial and parallel code was FIVE characters only - and it gave a near linear speedup on three different machines. -
Re:So why would anyone want to do this?
Also, MS produces a much better integrated and more functional development environment that anything available in the FOSS world.
They're nowhere near the level of depth in the HPC world that linux has, but if they can do for parallel programming what VB did for programming generally (make is accesible to non-programmer domain experts) then it could be a compelling alternative.
If you want that, then you're probably looking for either X10 or Chapel which offer a huge leap forward in making massive scale parallel programming simpler and easier. Of course X10 is from IBM and has Eclipse and plugins as the dev environment, and the closest they offer to a Windows version of the compiler is one compiled against cygwin. Chapel is from Cray, and has practically no Windows support, save for a claim that you can get it working with cygwin. Both languages are excellently supported on Linux (which is, of course, what Cray and IBM supercomputers run these days). So it woudl seem the future of parallel programming is firmly in the linux camp.
-
Re:Alternatives?
I can't give opinions on all of these (and some are still in development at this time anyway), but here's a list of some languages with paralellism designed in:
- Erlang -- Very popular message passing/actor model based language.
- Scala -- A functional language with actor model concurrency for the JVM.
- Oz -- An exceptionally multiparadigm language.
- Occam-pi -- The modern version of the old occam for transputers; CSP style concurrency (I believe).
- Chapel -- Cray's parallel programming language for supercompters. Cray's entry into DARPA's HPCS programming language competition.
- X10
- Fortress -- Sun's language for serious scientific computing. It was Sun's entry into DARPA's HPCS programming language competition, but lost and is now open sourced.
- Eiffel SCOOP -- An effort to take a CSP model and make it elegantly compatible with object oriented programming
-
Cray XT5 "Jaguar"
The #1 on the top 500 supercomputer list is using water cooling as well (in combination with phase change cooling). Watercooling whole racks can be done. The only difference from TFA is that is also adds immersion cooling. Immersion cooling has been found to be superior in cooling but comes with (obvious) considerable maintenance problems. The video for this machine shows more or less standard water cooling blocks on the processors, along with various plumbing that to keeps the machine chilled.
-
Cray XT5 "Jaguar"
The #1 on the top 500 supercomputer list is using water cooling as well (in combination with phase change cooling). Watercooling whole racks can be done. The only difference from TFA is that is also adds immersion cooling. Immersion cooling has been found to be superior in cooling but comes with (obvious) considerable maintenance problems. The video for this machine shows more or less standard water cooling blocks on the processors, along with various plumbing that to keeps the machine chilled.
-
It's no Cray CX1
A key feature instead is the system's ease of use.
But does it provide EASE OF EVERYTHING(TM) like the Cray CX1?
-
Re:Use Cilk
Wikipedia has an ok article here. It has a link to the C++0x standard library implementation of futures which is somewhat limited. It has many of the same problems as the blocks concept - namely, it requires too much programmer interaction. To be done properly, futures should really be understood by the compiler so it can generate all the boilerplate.
I'm most familiar with the Cray XMT implementation of futures as described here. To briefly summarize, a future is a variable linked to the output of some asynchronous operation, usually a function call. When the future's value is set, a thread is spawned to process some task that produces the right-hand side of the future's assignment statement. Later on, when the future is actually used, there is a compiler-generated synchronization point. If the task is complete the parent thread just continues on as normal. If the task is not ready, the parent thread waits at the point of the future's use until the value becomes available. The future's thread can come from anywhere: a thread pool, a new spawn, etc.
So essentially futures are a natural way to express the concept of some side task producing a value needed "sometime later." The XMT has special hardware to assist in the efficient processing of threads and futures but one can implement futures on any machine that provides a threading model. It's a very nice abstraction that allows the programmer to get out from under the gory details and concentrate on the problem being solved.
-
Re:This stuff is so cool
That is a mess of wires obscuring the Cray 3 CPUs to which they are connected. It cost $300 million to develop the first functional system for NCAR before Cray Computer Corp, Seymore Cray's last start-up company, folded. (Not to be confused with Cray Inc. which is still producing new systems.)
This machine required 90,000 watts of power and gave off 310,000 British thermal units of heat per hour â" enough to warm six 2,000-square-foot homes. Getting the heat out of the data center would have been a serious problem. I'm sure the whole NCAR building was designed to do just that.
DigiBarn has more pictures of the Cray 3 CPUs. -
Re:Nice, but...
Uhh...Cray is still very much alive. And doing vectors. And threads. And multicore. All long before Intel/AMD.
-
Re:Idiocray in its uttermost level
"Idiocray" ? Is that a really dumb 1970s supercomputer ?
-
Re:Gaming?
Not even close. The heavy lifting for 3D games is done on the GPU, and I'm not aware of any games (except perhaps games that utilize multiple monitors, like flight simulators) that can make use of more than one GPU.
So a single game could potentially drive many monitors, but not do more visually on a single display.
Actually, you can configure the Cray CX-1 with "visualization nodes" that contain GPUs, not just CPUs.
-
cray website requires explorer
I wonder noone pointed it out before. If you try to customize before buy it throws you "This section of the Website is compatible with only Microsoft Internet Explorer 6.x and higher. We are presently working on supporting other browsers. Sorry for this inconvenience. " maybe time to rethink strategy when most of the skilled IT people are moving away from windows... check yourself: https://cx1.cray.com/default.asp
-
Re:What's the frame rate and resolution?
-
Cray.com problems using Firefox
Cray has become so Microsoft that you can't configure your CX1 using FF! Check it out yourself here.
-
Non-useless link
instead of bloggy blather, you can go to the source.
-
Re:Look into Fluorinert
Cray has since moved on from the stuff for their middle range computers, but their iconic old big cylinder super computers were completely chock full of the stuff. I found something from 2002 that indicates they still use it in their highest end equipment: http://www.cray.com/downloads/crayx1_dhbrown.pdf
It's also used at SLAC for cooling electronics: http://www-conf.slac.stanford.edu/bfactory-decom/Talks/Wisniewski2.pdf
Looks like it's also used to cool industrial equipment that can't be exposed to reactive chemicals, like wafer ion implantation systems: http://multimedia.mmm.com/mws/mediawebserver.dyn?6666660Zjcf6lVs6EVs666YNqCOrrrrQ-
-
Re:QuickPath vs HyperTransport
the possibility for an external HT3 bus on a machine which could be used to link together multiple physical machines into one giant NUMA beast
That's what the Cray XT5 does - uses Hypertransport on new AMD Quad Core Barcelona to link multiple CPUs via their Seastar chip, and with FPGA accelerators too, sheesh -
Re:The future of Linux supercomputing
Cray, which still exists and still builds and sells Supercomputers, also runs Linux.
http://www.cray.com/products/xt5/index.html -
Cray had prior art/implementation a decade earlier
Cray Supercomputer and others were doing all sorts of parallel processing back in the 70's and 80's. Per their history page , the Cray-1 came out in 1976 and various quotes from that page include "first multiprocessor supercomputer (1982)
... multiple 333 MFLOPS processors (1988) ... massively parallel processing (MPP) system (1993)"
What are Sony's lawyers going to patent next - using MPP (multiple parallel painters) to paint a house? -
Re:download to dev/null
This one (although, strictly speaking, that's a bus controller for interprocessor communication instead of processor-RAM communication).
-
Liquid cooled computers, are so last millenium!
-
Re:more likely...
Cray is actually just about to launch a next generation MTA, the XMT. Interestingly, the processors plug into Opteron sockets and except for the processors themselves all the other hw is from the XT4 (Cray's Opteron-based supercomputer).
-
Re:revolution indeed
Things like this are already commonly done in high performance computing, where you don't want to interrupt the CPU (which is doing real work) to service message passing requests.
One example of a production system doing this is the Cray XT3. You have a PowerPC 440 processor sitting on the card, along with a DMA engine. A request comes over the NIC, it will put it in the proper place in memory that you specified earlier.
-
2 month old news
The original press release:
http://investors.cray.com/phoenix.zhtml?c=98390&p= irol-newsArticle&ID=873357
All they do is upgrade to dual-core Opterons, hence the double performance. -
Yeah, all those Cray's don't scale well at all
Yeah Cray can't seem to get them to scale at all.
Seriously though, Newisys and IBM have chipsets to do 32 Opterons, but why? That market doen't need it for the trouble it would be. Right now, you can do four way glue-less and eight way with little trouble. The next revision, in Decemeber - March-ish timeframe, K8L adds more interconnects, the ability to split HT connections to 8 bits to double connections, and 4 cores per die. This all adds up to 32 way glue-less for a total of 128 cores. The real reason why you don't see large scale single bus style Opterons, is that the combination of the current HyperTransport (ver. 1) and NUMA make for a very chatty bus, which causes performance issues related to scale. The point of HT is that it is routable and switchable by HT chips on the bus-lines, a la Cray. It's just hardly anybody does it.
They scale fine. -
Re:Vector Processing?
This isn't an all-purpose petaflop computer (ie. can't be used for protein folding calculations, thermonuclear explosion simulations, weather and climate prediction, etc...).
The first real petaflop computer will be built by Cray and up and running in about 2 years. -
Re:Its inevitable
There will come day where we expect our compilers to encode parallel information into the code so it will run faster on our 1024 core machines.
Too late. That day has been around for 20 years already. -
Re:Ease of Programming?
how easy is it to implement an optimizing development system that eliminates the need to hand-optimize the code?
Not much payoff optimizing development systems for slow hardware. Cray tout the X1E as offering "Unrivalled Vector Processing and Scalability for Extreme Performance". These guys smoked one for dinner, woke up the next day, rebuilt their code from the ground up a completely different way and smoked it again for lunch.It took them a month to figure out how to do that, on maybe $3K worth of hardware. Think anybody wants to teach a compiler how to get close? TFP:
Having become experienced Cell programmers, the single precision time skewed stencil -- although virtually a complete rewrite from the double precision single step version -- required only a single day to code, debug, benchmark, and attain spectacular results of over 65 Gflop/s. This implementation consists of about 450 lines, due once again to unrolling and the heavy use of intrinsics.
I'm just a fanboi in this territory, but last I looked the guys who don't quite need to do that just use pre-tuned libraries to get a nice chunk of what's possible. Who really cares how hard it is to tune those, once?
And when they were just doodling, not thinking hard?
These results are conservative given the naive 1D FFT implementation we used on Cell whereas the other systems in the comparison used highly tuned FFTW or vendor-tuned FFT implementations [...] Cell performance is nearly at parity with the X1E in double precision.
They say DP arithmetic is apparently in there as an afterthought -- it's not really necessary for game-quality 3D, after all -- and they think they know how tweak the pipeline for better than double the throughput.
--
IABCOT! -
Re:Press Release
Except Cray has been sold to Tera Computer Company in March, 2000.