gentryx · Slashdot Mirror

Re:That's it? on Researcher Shows How GPUs Make Terrific Network Monitors · 2013-11-22 04:46 · Score: 1

However this is not something I totally fault the author for not using since it is a rather obscure programming technique for GPU's at this time.

Good point. I guess this will change once Kepler GPUs are widely adopted and CUDA 6.0 is published: With Kepler you can spawn Kernels from within the GPU and unified virtual addressing will make it easier to push complex data structures into the GPU (according to the poster these appears to be some preprocessing happening on the CPU).

Exactly! on Indonesian Erruption Forces Evacuation of 1300 · 2013-11-04 03:04 · Score: 1

In Dortmund this volcano would be so much more power efficient! It should definitely be moved. Plus, winter is coming, so the locals would be able to save a bit on the heating.

That volcano has a bad efficiency... on Indonesian Erruption Forces Evacuation of 1300 · 2013-11-03 22:29 · Score: 3, Interesting

Slightly off-topic, but this reminded me of how yesterday 20000 people were evacuated in Dortmund (one of Germany's larger cities). And it didn't even need a full-fledged volcano to prompt this: a mere 4000 pounds, ~70 years old air mine was enough. Stuff like this is (still) daily business in Germany, though. They are still far from having cleared up all duds.

Re:Nope, the 320 GFLOPS is per node on Scientists Using Supercomputers To Puzzle Out Dinosaur Movement · 2013-11-01 10:41 · Score: 1

Oh, a joke... Sorry, didn't get it. Some people interpret GFLOPS as Giga Floating Point OPerationS. Which is IMHO sic(k).

Just came here to post exactly this. on Scientists Using Supercomputers To Puzzle Out Dinosaur Movement · 2013-11-01 10:20 · Score: 1

I imagine they tried to let the machine figure our how to press the gazillion of buttons to make the dinosaur go by testing thousands of combinations.

Nope, the 320 GFLOPS is per node on Scientists Using Supercomputers To Puzzle Out Dinosaur Movement · 2013-11-01 10:18 · Score: 1

...and the cluster consists of 332 nodes. So according to the lab's homepage the whole cluster is able to deliver 110 TFLOPS (Tera Floating-point operations per second). You'd need to buy a couple of GPUs to equal that.

I don't understand what you mean by acceleration unit. Each node delivers that performance instantly. There is no change over time.

Wat? on GCC 4.9 To See Significant Upgrades In 2014 · 2013-10-26 23:03 · Score: 1, Troll

I'm not sure whether I understood your post correctly as it seems to garbled be yes? If you doubt that RMS is objecting plugins in GCC then you're apparently new to /. and GCC.

BTW: not just Apple is pushing CLANG (and thereby LLVM), other companies include NVIDIA (CUDA uses LLVM) and IBM (CLANG was ported to Blue Gene/Q), just to name a few.

Biggest boon to GCC: lack of hackability on GCC 4.9 To See Significant Upgrades In 2014 · 2013-10-26 20:55 · Score: 0, Flamebait

...which is exactly why some folks are flocking to CLANG. Sure, not everyone wants to extend/modify his compiler, but actively preventing people from reusing your code isn't exactly what you should do if you want to keep a community thriving.

Great read: The Pragmatic Programmer on What Are the Genuinely Useful Ideas In Programming? · 2013-10-07 16:34 · Score: 3, Interesting

Exactly. Code Complete is a great book. I liked The Pragmatic Programmer -- from Journeyman to Master even better. It's slightly more meta, but the tips inside are really universa.

Some are even applicable beyond software engineering, e.g. "don't repeat yourself" (i.e. don't have two versions of the same information (e.g. your source in your repository and its documentation on your website) stored in two different places because the probability that over the time both will diverge equals 1. It's better to make one the master copy and derive the other from it.) I recommend this book to all my students.

Re:Very limited indeed on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-22 21:15 · Score: 1

There is more to it though than just parallelization and vectorization. Are you familiar with cache blocking? If not, here is a great paper on the subject. This is something the compiler won't do for you as it transforms the algorithm. Our library can do this (gives you approx. 2x speedup). The library can do this because it knows more about the problem domain compared to a (generic Fortran) compiler.

Re:No, I'm taking MY word for it. :-) on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-22 21:11 · Score: 1

Co-array Fortran is more generic than LibGeoDecomp, so there are problems you can solve with Coarrays where our library would be of little use. But then again there are algorithms which would require a lot of work just with Coarrays, but are a breeze with LibGeoDecomp. In short: both are solving different problems, albeit there is a certain overlap.

Re: QED on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-22 18:17 · Score: 1

I write a significant amount of stencil code, and I don't see myself using LibGeoDecomp; it seems to be both less efficient and more cumbersome than other solutions.

Well then, what about a challenge? Let's compare code size/performance for a simple example code? You'll use Fortran, I'll use my library.

I suggest a Jacobi-style smoother (v_ {t+1}(x, y, z) = (v_{t}(x, y, z-1) + v_{t}(x, y-1, z) + v_{t}(x-1, y, z) + v_{t}(x, y, z) + v_{t}(x+1, y, z) + v_{t}(x, y+1, z) + v_{t}(x, y, z+1)) * (1.0/7.0)) as the benchmark.

You seem to be an expert on the subject so I assume it won't be much of an effort or that you'll even have a solution readily at hands.

Re:QED on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-22 08:22 · Score: 1

Sorry, I got overexcited and did see something in your post that apparently wasn't there.

And yet I don't buy into this "OMG, C++ is either clumsy or slow compared to Fortran" FUD (I hope I'm paraphrasing it correctly this time). For a certain (perhaps smallish) domain LibGeoDecomp is such a library which makes it easy to write short, yet (nearly) optimal code with C++.

I don't doubt though that there are use cases where it's hard to come up with a good C++ solution while Fortran would outperform it in both, speed and simplicity.

HPC is just a niche market, too on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 22:56 · Score: 1

You're right: the current compute architectures we see in HPC are geared at data parallel problems of massive size. Clock speeds are stagnating, sometimes even stepping down (e.g. NVIDIA Kepler has its cores actually clocked slower that Fermi with its hot clock for the shaders). Your description sounds like you'd benefit from a singular core which is tuned for single thread performance (e.g. with really big caches, a large out of order execution window) and runs at 5-10 GHz (which might require liquid nitrogen cooling).

But then again this is another niche, probably even smaller than the current HPC market, so it might not be commerially viable to develop products for it.

Agreed. on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 21:24 · Score: 1

If your code is already parallelized, LibGeoDecomp might not have a terrible lot to offer for you. The blog post was by no means directed against Fortran as a language. Instead it advocates a way for folks to bring their existing, sequential Fortran codes to supercomputers without having to spend months doing the parallelization manually.

QED on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 21:15 · Score: 1

So you said Fortran codes we faster than C++ codes and now that's not the point any longer as they really aren't? Great, thanks!

The links you provided show that Fortran has some convenience functions for selecting parts of arrays and applying arithmetics to them. What I didn't see is anything you can't so with Boost Multi-Array and Boost SIMD.

No, I'm taking MY word for it. :-) on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 20:56 · Score: 1

Sorry, I should probably have added a disclaimer that I'm involved in the development of the library as my signature apparently doesn't make it obvious enough: I'm the project lead.

So far we've built about a dozen application with LibGeoDecomp, including porting a dozen large scientific codes towards it. You're right that porting a code usually involves debugging. But that's inevitable when parallelizing a previously sequential code anyway. We don't claim to do magic, we just have some cool tricks up our sleeves. And that's a Good Thing(tm). Because those who claim to cast magic usually disperse just b/s while clever tricks can save you weeks (months even) of work. Here is what you don't have to do if you use LibGeoDecomp:

You don't have to write a proven (and correct) parallelization that scales to 1850000 (that's 1.8M) MPI processes.
You don't have to devise your own domain decomposition and load balancing scheme.
You don't have to write scalable parallel IO and application-level checkpoint/restart code.
...and so on and so on. A more complete list is here.

As said, parallelizing a sequential code will almost always involve some sort of debugging, no matter which tool you use. But the library also brings a couple of facilities to ease that transition: 1. you can first adopt the SerialSimulator which performs no parallelization at all, but allows you to check the data transfer and callbacks. 2. you can then transition to those parallelization which run on a single node only (e.g. the CacheBlockingSimulator or the CudaSimulator) to check that there are no race conditions before (3.) you finally more to large scale systems using e.g. the HiParSimulator (used for full system runs on JUQUEEN, an IBM BG/Q and ATM the fastest European machine) or the HpxSimulator (used for runs on TACC's Intel Xeon Phi equipped Stampede; BTW: it's built on HPX, a parallel runtime to C++). 4. Finally you can piggy-back the TestCell onto your model, which will use checksums to validate the data the library gives back to your code.

FUD on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 08:55 · Score: 1

Care to backup those claims with actual code/numbers? I'm just asking because my FUD alarm just rang. Part of my job is performance engineering. My experience is that if you use C++ correctly, you get code which at least matches Fortran code.

Did you read TFA? on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 07:50 · Score: 1

Just asking because otherwise you'd had a better view on how intrusive (or not) this restructuring is. To give some numbers: a while ago we ported a simulation (video here) to the library. The simulation model was about 5000 lines of code. Not much, but the code was highly condensed and had been carefully modeled in the course of 3 years. We ended up having to change less than 100 lines to make it work with LibGeoDecomp. That's a far cry from a rewrite.

Efficient software is more than good assembly on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 07:39 · Score: 1

Your argument seems to focus mainly on how well a compiler can optimize a given code. But writing efficient software takes more. Ever tried to implement an AMR or 3D cache blocking in Fortran? It's a pain. Object orientation gives your programmers a huge boost in efficiency. And if they can use this efficiency to implement algorithms which converge faster, then this will make your code ultimately run faster. Even the last piece, the arithmetic kernel, can be done efficiently in C++ if you adopt modern libraries like Boost SIMD.

Re:Author here. on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 06:46 · Score: 1

Just to add another twist: even as an English native speaker I would not be surprised if you spelled Hertz wrong since it's a German name, and because Herz and Hertz are pronounced identically in German, it's even a common misspelling in Germany, too. :-)

Re:Modern Fortran on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 06:39 · Score: 1

Yeah, if your Fortran code already scales on big iron, then LibGeoDecomp probably doesn't have much to offer for you. This article was rather meant as a primer for those who are working on older, sequential Fortran codes which are not yet parallelized, and who don't want to go through all the pains of building an MPI-enabled parallelization for them.

The trick is to avoid solving the bigger problems on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 06:35 · Score: 1

We're using Boost Multi-array as a multi-dimensional array, so that's not really a problem. And since we call back the original Fortran code users are still free to use their original libraries (some restrictions apply -- not all of these libraries will be able to handle the scale of current supercomputers).

Regarding the speed issue: yeah, that's nonsense today. It all boils down writing C++ in a way that the compiler can understand the code well enough to vectorize it.

Very limited indeed on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 06:03 · Score: 4, Informative

I took a look at TFA and followed up by reading the description of LibGeoDecomp:

If your application iteratively updates elements or cells depending only on cells within a fixed neighborhood radius, then LibGeoDecomp may be just the tool you've been looking for to cut down execution times from hours and days to minutes.

Gee, that seems like an extremely limited problem space, and doesn't measure up at all to the title of this Slashdot submission. It might really be a useful tool, but when I clicked to this article I expected to read about something much more general purpose, in terms of 'bringing Legacy Fortran to Supercomputers'.

Correct. We didn't try to come up with a solution for every (Fortran) program in the world. Because that would either take forever or the solution would suck in the end. Instead we tried to build something which is applicable to a certain class of applications which is important to us. So, what's in this class of iterative algorithms which can be limited to neighborhood access only?

cellular automata
stencil codes
Lattice Boltzmann methods for computational fluid dynamics (technically a subclass of stencil codes)
Particle in cell codes
Short-ranged n-body simulations

It's interesting that almost(!) all computer simulation codes fall in one of the categories above. And supercomputers are chiefly used for simulations.

By the way, regarding the use of the word 'codes': I don't think English is the first language of this developer. Cut some slack.

Thanks :-) You're correct, I'm from Germany. I learned my English in zeh interwebs.

Author here. on A C++ Library That Brings Legacy Fortran Codes To Supercomputers · 2013-09-21 05:36 · Score: 3, Informative

The IEEE and Los Alamos National Laboratory seem to have a different opinion on this. And even the Oxford dictionary knows the use of codes. But surely those guys can't even spell gigahertz.

Slashdot Mirror

User: gentryx

Comments · 237