gentryx · Slashdot Mirror

Re:Post it to Slashdot! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-20 12:44 · Score: 1

Yes, I'm aware of Chapel. This is a good example for the current state of generic auto-parallelization:

It would be the perfect thing to put into this FAQ... "here's the alternatives... ___ ... here's what we do better than them. here's another alternative ... ___ ... here's what we do better than them. here's an alternative ___ and it's very good, if you want to do this somewhat different thing, maybe see them, but that's not our goal here". When you do this, someone searches for one of these other things, they'll come across your project, and if it's really better, you might get other users to switch. And if you're really better than the others, you won't lose anyone. And then your website becomes a respected resource instead of some niche product.

That's a great suggestion. I've updated the FAQ to include this. Thanks!

Also, I second the comment below about Fortran, especially if you can provide a simple example -- If you're targeting scientists that don't want to learn new libraries or new languages, that would be critical.

I've compiled a quick example on how Fortran kernels can be used from within the library. It's not perfect but should get people started.

Users of my library are mostly scientists who want to simulate something big, without having to spend months learning

...one of these other libraries? (facepalm)

To be fair: learning any of these tools (LibGeoDecomp, Physis, whatever) greatly reduces the effort for parallelizing a simulation code, even IF you already know MPI and/or CUDA. Not to mention if you DON'T know either.

OpenMP and MPI and CUDA and so on

Providing CUDA functionality to existing Fortran code without learning CUDA would be a godsend to many people, and people who use GPUs in science are absolutely the most open to new ideas.

This appears to be really tricky. Currently there is no freely available CUDA capable Fortran compiler, which is efficiently a show stopper for us. PGI is probably the only vendor to do so. We have a PGI license and could probably build some interface on top of Fortran CUDA, but then: who would use it? Very few, if any. I'd delay this until s/o clamors for it.

And as best I understand it, there is no point in combining OpenMP and MPI anymore -- for years MPI has been sufficiently fast on shared memory systems to be indistinguishable in performance -- the advantage to OpenMP is just ease of use.

Right. There is one benefit though: if your code can exploit the shared CPU caches. There are such algorithms for stencil codes, e.g. in this paper.

And finally, one thing that seemed bizarre after looking at your website for a while... there is no list of people involved. One can guess from looking at the publication list and such, but there is no "Developers: John Public, Jane Smith" anywhere.

Good point. Since I'm so involved with the project, this is something I never looked for. I'll add it to the website, too. Thanks!

Excellent question! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-20 10:48 · Score: 1

I didn't know the answer myself so I investigated. We've never done this before, but yes: with a bit of glue it is possible. Please take a look here. It's just a quick functional prototype. Let me know if you need a boilerplate for a larger code.

Re:Post it to Slashdot! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-10 21:22 · Score: 3, Interesting

It looks decent, though I go to the FAQ, and I see "Please look here for a short review of how it relates to the competition.", and I go to that link, and there is no information about "the competition".

Ah, sorry. As the text evolved, that paragraph was buried. I changed the layout and link so that it is more visible.

And... "So far no one has come up with a language/compiler/library that could automatically parallelize any sequential code on any hardware."... have you seen Chapel? It is not perfect, and it looks like you have a nicer polish to some things, but is actually quite good for many things.

Yes, I'm aware of Chapel. This is a good example for the current state of generic auto-parallelization: it works well, as long as the user augments his sequential code so that the compiler/runtime/whatever can distill the parallelism from it. That's still not possible without augmentation. So the user needs to understand how a parallel system works and how his algorithm might be mapped to it. Trivial for someone who does this for his daily living, but difficult for someone who's new to parallel computing.

Also, for many applications the optimal algorithms to be used on the various target hardware architectures differ significantly (e.g. for stencil codes a 2.5D wavefront on multi-cores, but a horizontal iteration with 32-wide stride on GPUs...) Such different algorithms can't be "discovered" by some generic software (at least no one, not even the Chapel developers have achieved this), so those algorithms have to be encapsulated in specialized libraries. Which is what we do for our domain "computer simulations".

(I just code to MPI directly... I don't see what the big deal is for parallel processing for the vast majority of things, but I see why there would be a niche for what you do. Best of luck.)

Thanks. :-) Users of my library are mostly scientists who want to simulate something big, without having to spend months learning OpenMP and MPI and CUDA and so on. So yeah, there is a niche. And thanks to the stagnating clock speeds and growing heterogeneity of HPC hardware, that niche is growing fast. Exiting times.

Re:Post it to Slashdot! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-10 20:51 · Score: 1

The library is really built with the mindset "one grid to rule them all". Also, it's not limited to solvers. But yeah, you definitely can use it to write multigrid solvers. It's a bit unorthodox though, so I guess anyone trying that should send me an email so I can explain the methodology. The basic idea is simple: 1. define the iteration scheme, 2. create multiple grids (or levels), 3. couple them (for interpolation), 4. run :-)

Re:Post it to Slashdot! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-10 18:27 · Score: 1

Thanks, I appreciate it!

No worries though. I'm currently in the technology transfer phase of my PhD, meaning that the project has achieved most of its scientific goals and I'm now rolling it out to the users. We already have a couple of users and the project even gathered a certain momentum recently, so I'm fine. But attracting more obviously wouldn't hurt.

Post it to Slashdot! on How To Turn Your Pile of Code Into an Open Source Project · 2013-09-10 17:15 · Score: 4, Funny

*wink*library for computer simulations *wink*

PR stunt unlikely on German Federal Police Helicopter Circles US Consulate · 2013-09-09 20:10 · Score: 1

It's unlikely that this is a PR stunt of the government to soothe the public. To give you some background information: the election campaigns here in Germany are in full blast now. The opposition used the recent revelations by Snowden to accuse the Merkel administration of breaking the constitution and betraying civil rights and values. The strategy of the coalition was to downplay everything, ensure everyone that the NSA was not pulling a dragnet through everyone's private data, and that there really way nothing to see. Please move along.

Now, this weird helicopter flight does not reflect that secure, self-reliant stance the coalition has presented before. Instead, it reveals that officials have little clue about what foreign intelligence is really doing on German soil. And so they have to rely on embarrassingly obvious means to gather new intel. This is no display of strength, but of weakness.

The 90s called, they want their Windows back. on The Steady Decline of Unix · 2013-08-19 07:04 · Score: 0

Wat? Replacing a Unix server with Windows boxes? Srsly? Sounds like a stupid idea, especially if you factor in admin costs.

The impact of metadata surveillance on EFF Sues NSA, Justice Department, FBI · 2013-07-16 07:59 · Score: 5, Insightful

Today, most US media seem to be obsessed with pointing fingers at Snowden. What few people realize is how this total surveillance of NSA and GCHQ tilt the balance of powers. Using graph theory, it is possible to compute (just from knowing who's talking to whom) who the agitators are in any given movement. If the Brits would have had the same technology back in 1770, there would have been no American Revolution. They'd simply have pinpointed and jailed the members of the Committees of Correspondence, leaving the revolution headless. A malevolent government could use this technology to suppress its own people. This is too much power.

Obviously he needs to apply the rule... on Dr. Dobb's Calls BS On Obsession With Simple Code · 2013-06-27 05:26 · Score: 1

...to his writing, too. Or, as Antoine de Saint-Exupéry has put it: "A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."

So the news is... on Dr. Dobb's Calls BS On Obsession With Simple Code · 2013-06-27 05:10 · Score: 1

Your code shouldn't be more complex than it really needs to be. But isn't that the way we thought about simple code all the time?

B and C are good... on Man Of Steel Leaps Over Record With $125.1 Million To Mixed Reviews · 2013-06-17 06:38 · Score: 2

...because a hero that is just and purely good and a villain that is only evil are boring. That's just me generally speaking, I haven't seen the movie. But I like it if characters have flaws and the enemy has good traits. It makes decisions and jugement more difficult. This is no Hollywood invention. Japanese movies have this since... there are Japanese movies.

Re:Those who live by the sword... on iPhone 4, iPad 2 Get US Import Ban · 2013-06-04 19:01 · Score: 2

On /. it is.

Re:One Size Doesn't Fit All -- Same in Supercomput on ARM In Supercomputers — 'Get Ready For the Change' · 2013-05-26 16:19 · Score: 1

True, but with limits. There is a reason why LRZ bought SuperMUC without GPUs: a) fewer, faster cores, b) users didn't have to change their codes. Now, machines like BG/Q scale extremely well, despite having such a high core count. But they have the interconnect built right into the chip architecture. We don't have anything comparable on current ARM designs, but hey, the future is gonna be interesting.

Not only Performance per $ on ARM In Supercomputers — 'Get Ready For the Change' · 2013-05-25 19:01 · Score: 2

...but also reliability (because supercomputers are really large and one failed node will generally crash the whole job, thereby wasting gazillions of core hours; that's one reason why SC centers buy expensive Nvidia Tesla hardware instead of the cheaper GeForce series) and IO and memory bandwidth and finally integration density. That one Intel chip can be more tightly integrated as it won't generate as much excess heat per GFLOPS (according to TFA...).

One Size Doesn't Fit All -- Same in Supercomputing on ARM In Supercomputers — 'Get Ready For the Change' · 2013-05-25 17:39 · Score: 4, Informative

There is already one line of supercomputers built from embedded hardware: the IBM Blue Gene. Their CPUs are embedded PowerPC cores. That's the reason why those systems typically have an order of magnitude more cores than their x86-based competition.

Now, the problem with BG is, that not all codes scale well with the number of cores. Especially when you're doing strong scaling (i.e. you fix the problem size, but throw more and more cores on the problem), then the law of Amdahl tells you that it's beneficial to have fewer/faster cores.

Finally I consider the study to be fundamentally flawed as it compares the OEM prices of consumer-grade embedded chips with retail prices of high-end server chips. This is wrong for so many reasons... you might then throw in the 947 GFLOPS, $500 AMD Radeon 7970, which beats even the ARM SoCs by a margin of 2x (ARM: ~1 GFLOPS/$, AMD Radeon: ~2 GFLOPS/$).

It's really about the applications on Has Supercomputing Hit a Brick Wall? · 2013-05-15 17:56 · Score: 1

I guess your major misunderstanding is that the applications running on supercomputers could somehow be done in the (loosely coupled) way that Google does its data mining. Since you're a professional, too, please refer to this Wiki article on stencil codes, one of the major classes of codes that run on supercomputers. If you find a way (or at least a pseudo-code formulation) to transform these applications into loosely coupled codes, then I would not be the only one to be curious to hear about it. You'd transform the whole industry. In fact this is not possible, though.

But I agree that software will need to help with reliability and will have to actively manage node eviction/addition.

BTW: comparing Google and Cray is really like comparing apples and oranges: they're in different markets. The market for supercomputers is extremely small, the market for (online) advertising is gigantic.

Re:No? on Has Supercomputing Hit a Brick Wall? · 2013-05-15 17:32 · Score: 1

Supercomputing is different from the web. If one node in a supercomputer fails, the whole system fails.

The harddisks are much slower on Has Supercomputing Hit a Brick Wall? · 2013-05-14 17:47 · Score: 1

...and this is the problem: the time we need to get all the data to disk is closing in on the MTBF. With the current technology an exascale system would suffer node failures even while taking a snapshot.

Google is not a Supercomputer on Has Supercomputing Hit a Brick Wall? · 2013-05-14 17:42 · Score: 1

Whenever someone on on /. likens Google's network to a supercomputer God kills a Pokemon. But honestly: the reason why Google can cope with these massive outages is that they're doing totally different computations from supercomputers. Google's compute jobs are losely coupled. They do data mining. That is fundamentally different from supercomputing where all compute jobs are tightly coupled. To give you a car analogy:

In the Google case millions of mechanics fix millions of cars in parallel. This is more or less trivial. If one of the mechanics is ill, another one can take over his task, or they simply wait until a replacement arrives.
In supercomputing your try to assign millions of mechanics to fix a single car in just a millionths of the usual time. This gets really tricky because they need to coordinate their actions tightly and if one of the mechanics is ill, others might trip over him and the whole job becomes a mess.

Not a good analogy, but I hope to correct the picture of Google being lightyears ahead of the supercomputing industry: they're simply working on very different problems. I wonder what makes you think that Google/Amazon/Facebook were 10 years ahead of Cray and academia? If they were, they'd simply take over Cray's market. And since Cray competes with IBM and Fujitsu, they'd probably try and claim parts of their market shares, too. This is not happening.

Re: Latency not as important as expected on Has Supercomputing Hit a Brick Wall? · 2013-05-14 17:26 · Score: 1

Interesting! Actually, barriers are today considered non-scalable and thus people just try to avoid them. It's feasible if your code needs only next-neighbor communication. Not all codes satisfy this condition, but then again we build these machines today for a very specific set of applications.

Software is the problem/solution on Has Supercomputing Hit a Brick Wall? · 2013-05-14 07:06 · Score: 1

Yes, in a way. We'll probably never be able to improve the hardware far enough that we can simply rely on it to fail gracefully (i.e. announce it's impending death a few seconds in advance). The reason is that ATM our systems contain approx. 20k nodes. Exascale systems will likely push this to 200k.Even if you assume a node will live 10 years in average, then you can estimate that every ~53 minutes one node of the system will fail.

My money is on the software: we'll need some kind of redundancy (e.g. a simulation code would need to store its mesh so that each part is held by multiple nodes, a bit like the redundancy we see in Bittorrent and other P2P networks). But that will require applications to be reengineered, and that will be really really expensive. Considering how the industry is struggling with the (comparatively easy) adoption of GPUs, I don't see this happening anytime soon. Interesting times ahead!

Re:No? on Has Supercomputing Hit a Brick Wall? · 2013-05-14 06:41 · Score: 1

Citation needed? I don't see why nodes would suffer ("exponentially") fewer hardware failures if clocked lower.

Latency not as important as expected on Has Supercomputing Hit a Brick Wall? · 2013-05-14 05:28 · Score: 3

Although latency isn't so much of an issue: the #1 systems of the last ~3 years did all have torus networks (all Blue Genes, all Crays, K computer, too). These networks only perform well for next neighbor communication -- which is fine since most codes running on these machines are simulation codes and they only need this type of communication. If you scale up the system, you'll typically also scale the size of the simulation instance (this is known as "weak scaling").

This means that your program can still spend the same time waiting for the network as it could on a smaller machine. The cables do not need to become shorter.

Re:No? on Has Supercomputing Hit a Brick Wall? · 2013-05-14 05:18 · Score: 5, Informative

Power consumption and MTBF: power consumption (high operating costs) be solved perhaps be solved by a larger budget, but the mean time between failures (MTBF) means, that the machine will fail before it can compute anything meaningful. Right know the machines we build, and even more importantly, the software we build rely on all parts of the machine to function. If even a single node fails, then the data it holds becomes inaccessible and the rest of the compute job crashes like a house of cards.

This can be remedied by taking frequent snapshots and then restarting from the last snapshot, but the time for checkpoint/restart has been continuously growing for the last systems. No one really expects exascale systems to do full system checkpoint/restart in a reasonable time frame. They'd spend more time taking snapshots than actually computing.

Source: I'm doing my PhD in supercomputing.

Slashdot Mirror

User: gentryx

Comments · 237