Domain: hpcwire.com
Stories and comments across the archive that link to hpcwire.com.
Comments · 48
-
Intel already tried this
Intel was trying to push this when it was clear they weren't making headway in the GPU space and also to push a heavier reliance on CPUs over GPUs (or at least in conjunction with) but it never seemed to gain any traction and was just relegated to tech demos.
https://www.geek.com/games/int...
https://www.hpcwire.com/2010/0...
I guess we'll see how Nvidia does.
-
Re:Likely a new gift for the NSA
You can say "one of" but you can't say "the fastest" petascale machines my friend
http://www.hpcwire.com/2012/11...
I should have added "on a college campus".
My main point is, just throwing more cores at "mostly MPI" weather models is not sustainable. We are going to need to be much smarter about how we parallelize.
-
Re:Likely a new gift for the NSA
You can say "one of" but you can't say "the fastest" petascale machines my friend
-
Re:Cheap or High Performance, PickOne
OK, maybe 100mn was a bit too much, yet I don't see the 28nm coming. Just to give you a comparison: Samsung is manufacturing the brand new Galaxy S3 SOC in 40nm. Why don't they use 28nm? Don't they want it? Hell yes, but it's not that easy. Think about that.
The power argument and the architecture's openness are sensible, I don't argue against that. Yet, the performance per Watt seems grossly inflated. If you look at today's most power efficient HPC chip, the CPU of IBM's Blue Gene/Q, then you'll see that they achieve less than 4 GLOPS/Watt. Adapteva claims more than 9. So they're twice as good as IBM? Really?
-
there's lots of interest for this
I work in the HPC world, and there's plenty of interest for these skills outside of defense and financial. I agree that a lot of it is gov't funded, but locations like NCSA, which are in the process of finalizing acceptance of a huge system from Cray that has GPUs, some of the DOE national labs, or even NASA, are always looking for people like this and have mostly pure scientific agendas. Also, NVidia has been posting a lot lately looking for applications folks.
There's plenty of interesting work, it just takes a little to find it. Check out HPC specific job boards, like the HPCWire Job Bank, for example, or check out the jobs pages of places like NCSA, the DOE labs (LBL, LLNL, etc), NASA, or companies like Cray, NVidia, even Intel, since MIC is coming soon, and will likely be similar to a GPU in how it's programmed.
-
Re:True
Trust no one. Encrypt everything.
Fact: The NSA has shown itself, historically, to have knowledge and technology 20-30 years ahead of that available to the rest of the human race (case in point, their suggestion to modify DES' S-boxes to strengthen it against differential cryptanalysis, an attack using math no one had even heard of until almost 30 years later).
Fact: A general purpose quantum computer makes all commonly used encryption algorithms worthless (though a number of quantum-resistant algorithms have started to appear, such as http://tbuktu.github.com/ntru/).
Fact: Countless research programs have demonstrated the viability of quantum computing, and you can even buy a 128 bit quantum computer today.
Likely: The NSA already has the quantum equivalent of a Beowulf cluster of these, if not something much, much better.
Conclusion: Using encryption trusts the single least-trustable entity on the planet not to already have the ability to turn it into Swiss cheese. That said, the NSA doesn't give two damns about the "little" things like copyright infringement, kiddy porn, or terrorist plots, so most of us have no reason to care about this. The local doughnut-eaters still have no ability to read your encrypted emails. -
D-Wave sold a commercial Quantum computer in 2010
Err, uh,
Didn't D-Wave sell a commercial Quantum computer to Locheed Martin in 2010? Almost a year to the day?
Someone explain to me the difference between this quantum computer and the one they're trying to prove doesn't exist, please. -
Re:synthesis
See this. I became little suspicious after reading about the dependency between the value of the invention and the exclusivity of the license or sale.
-
Re:This article is almost painfully dumbed down...
The summary is pulled directly from the top of the article.
Here's the article from HPC Wire and some details from nvidia as well as the nvidia press release
-
A better articlehttp://hpcwire.com/hpcwire/2011-12-15/bgi_speeds_genome_analysis_with_gpus.html
Excerpt:
At BGI, he says, they are currently able to sequence 6 trillion base pairs per day and have a stored database totaling 20 PB.
The data deluge problem stems from an imbalance between the DNA sequencing technology and computer technology. According to Dr. Wang, using second-generation sequencing machines, genomes can now be mapped 50,000 times faster than just a decade ago. The technology on track to increase approximately 10-fold every 18 months. That is 5 times the rate of Moore's Law, and therein lies the problem.
Obviously it would be impractical to upgrade one's computational infrastructure at that rate, so BGI has turned to NVIDIA GPUs to accelerate the analytics end of the workflow. The architecture of the GPU is particularly suitable for DNA data crunching, thanks to its many simple cores and its high memory bandwidth.
-
Re:Completely different contract/machine/goals
This article explains that five years ago when NCSA made the bid, accelerators were very exotic technology. The move toward GPUs was actually at the behest of scientists who now see a way forward to speed up their codes with accelerators. Technology shifts and we adapt.
If they are so willing to adapt, why weren't they willing to accommodate IBM's change requests? It's not like IBM was totally unwilling to build a $200 million machine.
None? I know of several. It's all still in its infancy of course, but I'm convinced it's possible to get good speedup from GPUs on real science codes. It's not applicable to everything, but then that's why they aren't CPUs.
I was referring to annotations for GPU offloading. Codes that run on GPUs are in fact so common nowadays that in fact you'll be asked on conferences why you didn't try CUDA if you present any performance measurement sans GPU benchmarks.
:-) -
Re:Completely different contract/machine/goals
That's similar to what PGI is doing.
Yes. In fact they've been working together on it.
It's not that simple.
You are absolutely right. That's why I wrote, "not a small task."
You seldom achieve competitive e performance with this annotation type parallelization, simply because the codes were written with different architectures in mind.
The nice thing about this is the restructuring one does for GPUs generally also translates into better CPU performance on the same code. So one can enhance the code in a performance-portable way. That isn't possible to do without compiler directives. With directives, one can avoid littering the code with GPU-specific stuff.
They didn't go into that avenue because they knew that their codes wouldn't scale to the number of cores well.
This article explains that five years ago when NCSA made the bid, accelerators were very exotic technology. The move toward GPUs was actually at the behest of scientists who now see a way forward to speed up their codes with accelerators. Technology shifts and we adapt.
None of the supercomputer codes I know uses such a type of parallelization or accelerator offloading.
None? I know of several. It's all still in its infancy of course, but I'm convinced it's possible to get good speedup from GPUs on real science codes. It's not applicable to everything, but then that's why they aren't CPUs.
-
Re:2012 will be a big year for supercomputers.
-
Re:NNSA and IBM Blue Gene
Almost all the rest of the rest of the "supercomputers" out there like Cray are basically just PC clusters.
Cray's XT/XE line uses x86 processors, but everything else about them is almost completely custom, both hardware and software. For people who are looking for peak application performance at this kind of scale, the processor turns out to be one of the least important components.
Indeed, it might have been the DARPA-sponsored fully custom network for Blue Waters that sank IBM. They made a business commitment to only pursue HPC projects that turned a profit (not just revenue) last year, and this appears to be the first major casualty of that decision.
-
For a good article on this
-
NASA is using GPUs too
NASA is also using GPUs -- looks for climate / atmospheric modeling.
-
Re:has HPC really done anything useful?
A few years ago it was either GM or Ford installed a supercomputer to help with engine casts. They cast an engine using a complete piece of molten metal. There are a lot of parameters involved including the actual mass of the molten metal, metallurgical properties of the piece (carbon, molybdenum, silicon, chromium...) some of which were in trace amounts but ended up being important, then the precise temperature of the material, etc. The problem was that a piece that was cast into an engine block had the potential to crack while being machined, and they wanted to avoid putting money into a piece of metal that they suspected would crack. Only 1 in 10 crack, but it could cost a lot, and if they can weed out 5 of 6, they can save a lot of money. The project to put the supercomputer in cost over $5 million. It had a cost recovery of less than 6 weeks. Then too, there is another example of an ad-hoc supercomputer used to determine the gene sequence of the sars virus that was rampant a few years ago. It saved lives. There was the computation fluid dynamics model that George W. Bush ignored that showed New Orleans would flood in the event of a category 5 hurricane. It was shown to him about 6 months before Katrina. He couldn't afford Army engineers reinforcing dikes, he had a war to fight. There are countless gene folding simulations that show how disease spreads, and I remember a 17 year old used a supercomputer to solve a piece of the Cystic Fibrosis puzzle Here. There is also the computational fluid dynamics used to design jet engines, BMW employing one to make their cars more aerodynamic,
.... the list goes on. I know they use them to determine the cheapest path when making new drugs (you can pick fastest, cheapest, and fewest number of steps). How many examples did you want? Oil exploration? Pharmaceuticals? Automobile design/production? Solar cells and semiconductors and superconductors? Disease? -
Intel and AMD really need to re-think.
"It still needs another decade before China-made chips meet the needs of the domestic market. Hopefully after two decades, we will be able to sell our China-made CPUs to the US just like we are selling clothes and shoes."
All of these companies that work inside of China are only slitting their throats. -
Re:3 of Top5 Supercomputers already use NVIDIA GPU
Double-precision linpack performance increase over a CPU-only system is ~288GFLOPS per Tesla M2050 card (up to 4 cards per system - adding more doesn't help without going to exotic motherboards. See news report of IBM study). Raw performance is 515GFLOPS/card (double precision), so you're looking at ~56% utilization. Others report 53% overall on a massively parallel setup ( See: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5470353 )
A rough rule of thumb for linpack double-precision is 25% of the theoretical single-precision performance. The gain is 4-8x over a standard processor depending on whether the metric is peak performance, performance/$ or performance/W. If you can live with 1.5GB non-ECC (GTX480), then it's about 8-32x gain depending on what you're comparing it to. Some applications will see a smaller gain, some larger. If it's matrix math under the hood, then linpack should be in the ballpark.
See gpgpu.org for papers and news on different applications.
-
Re:more nukes :/
Nvidia Fermi (GTX 400 series)
GTX 470: 1088.64GFLOPS (32-bit) (215W(mfg. claim) $350; 3e9 transistors; 1280MB GDDR5; 448 Unified Shaders:56 Texture mapping units:40 Render Output units);
GTX 480:1344.96GFLOPS (32-bit) (250W(mfg. claim)-(500W tested max.) $500;3e9 transistors; 1536MB GDDR5; 480 Unified Shaders:60 Texture mapping units: 48Render Output units).Tesla M2050 1030GFLOPS(32-bit), 515GFLOPS(64-bit) 3GB ECC (M2070 is same but 6GB ECC GDDR5)
IBM linpack test May 2009: $7K Xeon, 48GB : 80.1 GFLOPS, 11GFLPOS/K$
adding $4K of Tesla M2050s (2 cards): 656.1GFLOPS (8.2x performance), 80GFLOPS/K$ (5.45x performance/price), (4.5x GFLOPS/W)So you're off by quite a bit.
-
Re:It almost makes me sad to have sold my ps3.
I'd say the only real mistake Sony made there was not continuing to make an OtherOS PS3 and simply sell it at $700 as a "research tool" and been done with it.
The cluster was built from purchases of the PS3 in wholesale lots.
2,000+ units for a single USAF project.
Sony had a brief swing at commercializing PS3 tech in its own HPC product. Sony Unveils Cell-Based Image Processing Appliance
It will be chill day in Hell before Sony allows another mass market consumer product to cannabalize its sales in other - more profitable - markets. It will be colder still before the OtherOS makes a return.
I figure its just a matter of time before we see hacked PS3s all over Craigslist like we see the hacked x360s now.
I have yet to hear that the OtherOS is running on a PS3 Slim.
The Fat is aging and out of production. The Cell never quite lived up to its promise.
The truth of it is that the PS3 is video game console - and support for the OtherOS of symbolic importance only. Every vote for the MOVE, for 3D in PS3 gaming, is a vote from the player's wallet for the firmware upgrade.
-
Re:It's now clear where M$ is headed to!
Not a chance in hell...
*sigh*
Guess I’ll just have to be satisfied with the images from TFA:
Image before/after integrating images from mosaic
Screenshot viewing horsehead nebula -
Re:It's now clear where M$ is headed to!
Not a chance in hell...
*sigh*
Guess I’ll just have to be satisfied with the images from TFA:
Image before/after integrating images from mosaic
Screenshot viewing horsehead nebula -
Not really news...
I remember reading that IBM was planning to put Cell in mainframes and other high-end servers several years ago, supposedly to accrue the same benefits. I don't really know whether or not that was ever followed through with, I haven't kept track of the story.
-
Re:Men...
not to mention that it, like almost all sociology, neurology, etc. are essentially hedge wizards pushing pseudo science.
i think true understanding of how our brains work will come from bottom-up (atomic-to-molecule-to-cellular-to-organ-to-organism) modeling and/or from advances in human computer interaction for the disabled -- computer-brain interfacing.
http://www.hpcwire.com/features/17882844.html
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.9017&rep=rep1&type=pdfhttp://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface
some of these studies used animals, sometimes unto their deaths, but at least there is a credible research direction rather than trying to support simpleton broad generalizations of "what people are like".
and all that is driving this tripe is big pharma looking to push more drugs that we don't properly understand -- or even can make credible claims of therapeutic value about.
uggh.
-
Yet Another Personal Supercomputer
Here's just a brief search for personal supercomputers of days gone (not too far) by. Most if not all are cheaper than the SGI. Being older they may not stack up spec-wise, and the definition will always be changing anyway. More than one claim to be 'first', and to SGI's credit they only claim it's 'their' first.
http://tech.slashdot.org/article.pl?sid=08/11/23/068234
http://www.researchchannel.org/prog/displayevent.aspx?fID=569&rID=4263
http://aslab.com/products/workstations/marquisk942.html
http://www.reghardware.co.uk/2006/06/07/tyan_unveils_typhoon/
http://www.hpcwire.com/features/Cray_Unveils_Personal_Supercomputer.html
-
Re:Yeah, mut how much useful stuff is happening?
your right, the Linpack benchmarks are indeed also measured in FLOPS, and that's what the TOP500 rankings are based on. however, when the Dawning is advertised as being capable of 180 TFLOPS or 160 TFLOPS--depending on who you ask--they're probably not referring to the Linpack benchmarks, which it only peaks at ~11.264 TFLOPS.
-
GPUsThe first supercomputer using nvidia GPUs + CUDA API makes its debut at #29 http://www.top500.org/system/9853
More info here: http://www.hpcwire.com/topic/processors/Tokyo_Tech_Boosts_TSUBAME_Super_with_GPUs.html
-
Re:Not cell-based, cell-assisted
-
Re:Summary should have a shout out
The non-double precision floating point enhanced version's (the version in the PS3) strength is further limited to integer and single precision floating point workloads.
BTW, the original Cell can do Double Precision in hardware. The big limitation it had was the DP was not PIPELINED so all DP instructions caused huge stalls in processing. You can use DP on the PS3 just fine and it's still fairly fast (especially compared to software DP) -- it's just not nearly as fast as SP.
The Cell's double precision hardware attains a very respectable 25 Gigaflops per second (peak), but its single precision performance is a phenomenal 256 Gigaflops (also peak).
The main new features of the PowerXCell 8i Processor are that DP is now fully pipelined and can attain over 100 GFLOPS (about a 5X improvement in DP execution due to stall removal) and that the memory interface now supports industry standard DDR2 memory so 16 GB of RAM per Cell can be used. The memory limitation with XDR was just as bad as the DP in limiting more common use of the Cell since XDR is expensive and hard to come by. -
Re:The one advantage they may enjoy..Couldn't see details, but this may use Sun's hypertransport switch as an interconnect. Sun doesn't make a Hypertransport switch and Ranger uses Infiniband just like other high-end x86 clusters.
An HPCWire article says it uses Sun's new 'Magnum' Infiniband switch.
The blade servers are connected using two Sun DDR InfiniBand "Magnum" switches, which were designed specifically for the highly scaled out cluster architecture of the Constellation line. The switch contains 3,456 ports, allowing the Sun system designers to collapse the hierarchy of switches and reduce the cabling normally required to connect thousands of nodes by a factor of six.
-
Re:Now We Know
There are a lot of HPC projects that were planning to use Barcelona, that were held back by the TLB bug.
An HPCWire article says Ranger uses the processors with the TLB bug, and reportedly it doesn't impact their application performance.
Developed by Sun Microsystems, Ranger was built from 3,936 Constellation blade servers, each containing four quad-core "Barcelona" Opterons running at 2.0 GHz. The 15,744 Barcelona processors in Ranger come from the batch that suffers from the highly publicized translation lookaside buffer (TLB) problem. TACC has used the recommended Linux kernel memory patch to work around that particular problem. Reportedly, the patch has little, if any, impact on performance.
Ranger was apparently one of the "specific customer deals" that continued to receive Barcelona chips after the erratum was discovered.
...spokesman Phil Hughes said AMD is shipping Barcelona Opterons now, but only for "specific customer deals." Industry sources have suggested to TR that those deals are high-volume situations involving supercomputing clusters. Such customers may run workloads less likely to be affected by any workarounds for the erratum that reduce L3 cache performance, and those customers could potentially consume hundreds of thousands of CPUs.
(Since the workaround BIOS patch slows some cache misses, maybe this means (a) HPC codes with large datasets are long running programs that can be and have been optimized to avoid idling CPU cycles while waiting for memory in cache misses by using techniques like cache prefetching hints or hyperthreading, and (b) the rest have small datasets so that with so many processors the dataset fits in all their caches?)
-
SGI bought LNXI for the software
As the HPCwire article noted, SGI was most likely interested in LNXI's software, particularly Clusterworx.
I know personally several Minnesota-based LNXI engineers who accepted offers at SGI and are now working out of the Eagan office. -
Re:Belly Up?
They have made offers to LNXI staff according to this article: http://www.hpcwire.com/hpc/2135016.html
-
Re:Consumer level
Wonder where the professional level processors are.
You mean these..
http://www.hpcwire.com/hpc/1886368.html -
Re:GigaFlops
I still wouldn't call the Cell's double precision performance slow.
-
There are many misunderstandings about parallelism
There are many misunderstandings about parallelism, as it is evident from this discussion. Some posters said that "my O/S already runs in multicore/multicpu systems, thank you very much"...some others said that some tasks are like word processing is non-parallelizable etc.
The misunderstandings lie not in what parallelism is, but what Intel and other companies are trying to achieve. The quest for parallelism is not about running lots of programs in parallel, but extracting the parallelism out of sequentially written programs, including kernels. In order to improve performance, any parallelism that lies in our applications but is not exploited right now must be utilized, if we want to see real improvements in the future.
Modern CPUs do many tricks in order to increase parallelism, like pipelining and out-of-order execution. Current research at Intel has produced an 80-core CPU. It is highly unlikely that any of us will run 80 programs simultaneously tomorrow, so either this research is for servers only (highly unlikely because Intel will need to sell a lot of these chips to cover research expenses), or Intel knows that programs can be parallelized even more.
Compilers can not identify all possible parallel paths of execution, because it is an intractable problem. Writing complicated multithreaded programs using threads, semaphores and mutexes is quite difficult. So what is needed is another software architecture that allows programs to be easily parallelized.
Ericsson has dealt with these problems a lot time ago, and the result was Erlang. Based on the Actor model, Erlang programs can contain thousands of objects working in parallel, and the programmer need not worry about how to lock/unlock resources.
Microsoft's only option is to ditch C and use another language for their O/S. C can not work with the Actor model, unless modified. The best option for them is to make a C-like low-level language that includes the Actor model as part of the language specification. They can combine Cyclone, a version of C which is safe, with the best parts of ADA and Erlang, and come up with a language which allows the easiest possible path to writing parallelizable programs. And there is a big opportunity for them to put bounds checking and garbage collection to all their code, so as that two basic problems (buffer overflows and wild pointers) are solved at last.
-
Re:One trillion calculations per second by 2012DARPA is the primary sponsor...
Check out this writeup at HPC wire. A major design goal of the TRIPS architecture is to support "polymorphism," that is, the capability to provide high-performance execution for many different application domains. Polymorphism is one of the main capabilities sought by DARPA, TRIPS' principal sponsor. The objective is to enable a single processor to perform as if it were a heterogeneous set of special-purpose processors. The advantages of this approach, in terms of scalability and simplicity of design, are obvious.
To implement polymorphism, the TRIPS architecture employs three levels of concurrency: instruction-level, thread-level and data-level parallelism (ILP, TLP, and DLP, respectively). At run-time, the grid of execution nodes can be dynamically reconfigured so that the hardware can obtain the best performance based on the type of concurrency inherent to the application. In this way, the TRIPS architecture can adapt to a broad range of application types, including desktop, signal processing, graphics, server, scientific and embedded. -
Re:At Least. I think the Cell processor overrated, and certainly not "the future" any more than transmeta's Crusoe was "the future".
How do your opinions rank against:
IBM
Department of Energy,
Medical device OEMs,
university researchers
and so on...opinion of the Cell processor's potential?
While I'll grant you the frame buffer access on the PS3 sucks, it would only take a driver from Nvidia or Sony to remove that restriction.
-
Re:At Least. I think the Cell processor overrated, and certainly not "the future" any more than transmeta's Crusoe was "the future".
How do your opinions rank against:
IBM
Department of Energy,
Medical device OEMs,
university researchers
and so on...opinion of the Cell processor's potential?
While I'll grant you the frame buffer access on the PS3 sucks, it would only take a driver from Nvidia or Sony to remove that restriction.
-
See also: DARPA HPCS Project
It's worth noting that Sun's Fortress project was not selected for Phase III of DARPA's HPCS project. (And for good measure, a link to a blog at Sun and an FAQ on how Fortress relates to the other HPCS languages/projects.)
-
Re:Question...Personally, I think Japan is building a gigantic supercomputer out of PS3s
;)That's precisely what this YDL distribution is aimed at. (I submitted this story here multiple times back when it happened, figuring that eventually it would take priority over the day's Jack Thompson story, but no dice.)
-
Re:Use of GPUs or PS3 chips?
Yes, that's in progress.
-
Using IBM's Cell Processors in scientific HPCI am surprised that no one has mentioned the Slashdot article on the study at LBL on use of cell processors in High Performance Computing:
The Potential of Science With the Cell Processor
http://science.slashdot.org/article.pl?sid=06/05/2 8/047223/It reference a second article:
Researchers Analyze HPC Potential of Cell Processor
http://www.hpcwire.com/hpc/671376.html/This discusses research at Lawrence Berkeley National Laboratory on using the STI Cell processor for scientific computing. From the article quoting the LBL paper:
"Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency,"
and
"We also conclude that Cell's heterogeneous multi-core implementation is inherently better suited to the HPC environment than homogeneous commodity multi-core processors."The paper went on to say that while the Cell processor was designed for single-precision 32-bit floating performance but with some simple changes to the design it could be optimized for double precision 64-bit floating performance.
This makes a lot of sense if this is the same Cell processor that IBM is using in their blade servers.
Really cheap, really fast 9 core processors!An interesting read.
RLH
-
HPCWire Interview
http://www.hpcwire.com/hpc/699401.html
There's some additional info about BlueGene and what Livermore thinks of it here. What this interview neglects to mention is the millions of dollars being spent on IBM and internal developers to get this code (and any others) working on BlueGene. I was briefed by the hardware and software teams that built BlueGene and I can tell you, it's no easy task to bring apps to that platform. Kuznezov seems to trivialize it in the interview and I'm gonna have to go back and review the process again. Maybe it has changed since my briefing in early 2004, but somehow I doubt it. -
Some informed views on WCC
See Scalability.org and the referenced HPCWire. The people writing these have a clue, and are in this market. Ask them what they think.
-
Re:Not able to RTFA, but my perspective...
FYI: "Myricom and Fujitsu Demo Wire-Speed 10 Gigabit Performance"
http://www.hpcwire.com/hpc/633629.html
450 nsec latency -
"Sensitive but Unclassified" due to J. Poindexter
I haven't seen it mentioned, but this is a Reagan era classification created by Former Admiral John Poindexter (of Iran-Contra scandal fame). Poindexter was hired back into the government by the current administration in February of this year as the new head of the Information Awareness Office. It's no surprise that this label is being misused again.
Good information about this at Dubya Report, Citizen Times and DS Star