joib · Slashdot Mirror

Re:Strange on Simulate "The Day After Tomorrow" On Your PC · 2004-05-18 02:01 · Score: 1

Considering that Numerical Python is largely implemented in C, and uses LAPACK/BLAS for linear algebra, it's not that bad. As long as you use array expressions, meaning that loops are done in C, performance is quite good, often something like only 2 times slower than Fortran/C program using the same LAPACK/BLAS library. Which, just for the record, is a lot faster than a C program using some roll-your-own untuned linear algebra.

Re:Very great and all... on North America's Fastest Linux Cluster Constructed · 2004-05-14 19:50 · Score: 1

Hmm, that looks about how I would assume a supercomputer would be used.

If only...

\begin{rant_mode}
I think the reason we have a very different usage pattern over here is that the cluster is configured so that us normal users are not allowed to submit jobs using more than one node. Thus we are limited to a maximum of 32 cpu:s (1 node). On top of that, a backfill scheduler is used, meaning that smaller and shorter jobs can jump the queue. In practice this means that a 8-16 cpu job can usually start right away during non-office hours or with only a few hours waiting time. By comparison, when I once submitted a 32 cpu job, I got tired of waiting after it had been in the queue for a week..

Now, it makes me wonder why the supercomputer center blew shitloads of tax-payer money on that fancy federation/colony or whatever its called switch, when noone is allowed to use it.. :(
\end{rant_mode}

I'm not saying the p690 is a bad computer, in fact it's about 4 times faster than the Cray T3E it replaced. I just think that for the same money they could have gotten a much faster computer. Like an Itanium2/Quadrics cluster, for instance (they already have an Origin for shared memory jobs, so IMHO MPI performance should have been their top priority).

I'd rather have a jet powered beer cooler.. on Keeping Your Keg Cool Sans Ice · 2004-05-14 00:52 · Score: 2, Interesting

..like this one. IIRC it was even on /. a few years ago.

Re:before everyone starts shouting at once... on North America's Fastest Linux Cluster Constructed · 2004-05-13 21:17 · Score: 1

I'm somewhat familiar with supercomputing, and my impression is that the only place where you'll find assembler is in the (usually vendor-supplied) BLAS libraries (and perhaps in some of the tuned user-space MPI libraries too). The applications are all in Fortran (and C in some cases).

Given that supercomputer architectures change every few years, nobody has the time to rewrite their apps all the time.

Re:Very great and all... on North America's Fastest Linux Cluster Constructed · 2004-05-13 20:23 · Score: 3, Informative

There is a limit to how much you can effectively parallelize many problems. If that limit is 1, then you need a Cray or something.

Well, Crays are also parallel computers, so they won't help you much in this situation. Some Crays do have vector processors, but that is also a sort of parallelism. It's just that you use that parallelism through tuned BLAS libraries or with a vectorizing compiler (e.g. Fortran 95, HPF and such things), instead of doing it manually with MPI or threads or something like that. So if you're problem is totally serial, a vector processor won't help you either.

(Or you can just take the google route and let it fail and replace the whole box. But that really requires your whole application to be written to accomodate it.)

Not necessarily. Most supercomputers are not used to run a single job taking months, but rather they run lots of smaller and shorter jobs. On the p690 cluster where I do my stuff, I (and apparently most users) mostly run jobs using about 8-16 cpu:s , with a runtime of a few hours to a day. If one node would fail, the jobs that are executing on that node would also fail. It's no big deal, just resubmit the job to the queue when you get around to it.

Of course, if you're programming one of the very few and far between applications that has a runtime of months, you certainly want to save intermediate results once in a while. Not only to guard against hardware failure, but also so that the user can check the intermediate result and see if the app is still on the right track. It would be quite a bummer to use months of cpu time only to realize the entire thing is wasted because you specified the initial values wrong.. :-)

Re:Slashdot in the Late Nineties on Secure Architectures with OpenBSD · 2004-05-13 07:23 · Score: 1

Well, every other comment was a link to goatse.cx. And we didn't have those "In Soviet Russia" jokes. And I don't think the "*BSD is dying" trolls were around then either.

And there were lots of articles by some "Jon Katz" fellow that were supposed to be enlightening. *shudder*

Other than that, the same kind of drivel that fills the site today.

Re:The real question on Running Video Cards in Parallel · 2004-05-13 07:10 · Score: 1

Now imagine the same cluster but each machine has 2 or 4 dual-head graphics cards and each algorithm that can be created in Brook or similar is. That gives each machine up to 2 CPU's and maybe 8 GPU's that may be used for processing. The machines are clustered so a group of ~12 commodity machines (1 rack) could have 24 CPU's and 96 GPU's. Now that would be some serious computing power - and relatively cheap too (since 1-generation old dual-head cards are ~$100-$150).

Agreed, that would be cool. Vector processing on a budget, in essence.

By the way, does anyone know if there is any work going on to create toolkits for Octave and/or MatLab which would utilize the processing power of a GPU for matrix math or other common calculations?

Both matlab and octave use the BLAS library, as do most scientific applications needing dense linear algebra. I guess one could do a BLAS implementation for these GPU processors.

What I don't see happening is that scientists would be eager to write applications with Brook or whatever proprietary language these GPU:s support instead of standard Fortran or C. While the need for performance of course is the reason to use parallel programming in the first place, portability is incredibly important for scientific applications. Supercomputer centers don't tend to have very much vendor loyalty, rather they buy whatever gets them the most bang for the buck at the moment. Nobody has the time (nor wants) to rewrite applications every few years.

Now, what I think could happen is that some smart company would make a Fortran 95 compiler, which would offload array operations to the GPU(s) while doing the more complicated logic on the main CPU(s). Of course, such an approach would require quite a lot of bandwidth between system memory and the graphics card memory. In the short term, a BLAS implementation is about the best one can realistically hope for. That alone could produce a huge increase in bang-for-the-buck for a lot of applications.

Re:Intel's Chipset only supports One x16 PCIe on Running Video Cards in Parallel · 2004-05-13 06:05 · Score: 1

My understanding is that it's switch based. I guess the original poster of this particular subthread meant that Intel has saved a couple of cents by making only one port on the PCI Express switch 16x, the others are apparently slower.

Re:Well on Napster Gags University Over Fees · 2004-05-12 21:14 · Score: 1

You're correct, but I don't think this applies here. Secrets have a habit of leaking, especially if you tell the entire student body about it.

Re:50 trillion calcs/sec...how fast really? on World's Fastest Supercomputer To Be Built At ORNL · 2004-05-12 02:47 · Score: 1

Quantum chemistry, or ab initio, calculations tend to be a biggie. I wouldn't be surprised if ab initio alone would account for > 50 % of all supercomputer cpu cycles in the world.

Other big things are weather prediction, fluid dynamics, classical (i.e. "Newtonian") molecular dynamics with some kind of empirical potentials (e.g. protein folding and stuff can be thought of as MD).

Re:Article puts it all in perspective on Programming As If Performance Mattered · 2004-05-05 17:48 · Score: 2, Informative

WTF are you talking about?

I'm staring at the apt codebase on my screen just now, and it's all C++, baby. Ok, so there is a trivial amount of perl; sloccount summary:

Totals grouped by language (dominant language first):
cpp: 26481 (89.75%)
sh: 2816 (9.54%)
perl: 209 (0.71%)

This is for apt-0.5.14, but I can't imagine that the newest version in unstable (0.5.24) would be that different.

Now, if the rest of your story is true, that's mind-boggling. If the new teacher refused to judge your, from your description very fine, work just because he has a serious hard-on for gentoo, I seriously believe you should have taken it up with the dean of the faculty instead of just swallowing it and later complaining on /..

That being said, why chose apt in the first place? Now, I haven't profiled apt, but I guess it spends the majority of time waiting on network i/o or waiting for dpkg to finish anyway.

Re:The estimates are OK on Projected 'Average' Longhorn System Is A Whopper · 2004-05-04 18:19 · Score: 1

Lowering the voltage is no fast path to enlightment. Lower the voltage and the current through a transistor will be reduced. Consider that transistors are already getting so small that back-current is becoming a serious problemn, and you see why CPU voltages haven't really dropped much in the last few years. CPU voltage was 5 volts for quite a long time, then it dropped relatively quickly to the current levels of 1.0-1.5 volts, and it doesn't seems to get lower than that very quickly.

Re:Dead man's handle on What Happens To Your Data When You Die? · 2004-05-03 20:02 · Score: 1

Rumor has it J. Edgar Hoover maintained his position by keeping a file cabinet full of nasty stuff on powerful politicians in his office.

Yes, but then a snobby brit agent called John Mason stole the files, hid the microfilms, but got caught and the feds threw him in Alcatraz, from which he escaped. ;-)

Re:Hibernate? on Geronimo 1.0 Milestone Build M1 Released · 2004-04-29 04:21 · Score: 4, Interesting

I don't think that will happen, since Hibernate is LGPL and Geronimo is ASL.

That being said, Hibernate combined with Spring will do 99% of what EJB is used for, with a significantly reduced amount of pain.

Re:What are the chances... on International Space Station Gyroscope Fails · 2004-04-23 02:25 · Score: 1

... with the famous last words "let's take it for a spin around the block, shall we". ;-)

Re:Some Quotes... on VIA Announces Lead-Free Motherboard · 2004-04-22 19:04 · Score: 1

On the subject of nuclear energy, there are less nuclear reserves available on the planet than fossil fuels. If the planet relied on Nuclear Energy for all electricity, estimates are that we would run out in about 80 years.

IMHO, it's quite naive to assume that the scenario above would ever happen. A sharp increase in the use of nuclear energy would bring about the commercialization of breeder reactors. The technology already exists, but with current fuel prices traditional reactors are cheaper. With breeders current uranium reserves last something like 7000 years.

Re:I strongly disagree on Why MySQL Grew So Fast · 2004-04-20 20:19 · Score: 2, Insightful

Actually posgres isn't preforking, it forks an engine for each incoming connection.

As the parent poster said, windows applications tend to use threading more than unix apps, because fork is very inefficient on windows, and I think that IPC is not as good either as on unix (IPC is used for shared caches etc.). On the unix platform, there is no clear-cut winner in the multithreaded vs. multi-process "war".

E.g. apache 2 can be run both in multi-process mode and multithreaded mode. And guess what, on unix platforms the multiprocess mode is the default. If nothing else, this strongly suggests that at least the apache developers think that multi-process is better on unix than multi-threaded.

Now, the one thing where multithreading clearly is better than multi-process is the time to open a new connection (creating a new thread is much faster than forking). And that's why apache uses pre-forking and with postgres you use connection pooling or persistent connections. So in the end the fork overhead doesn't matter at all.

Re:Pretty simple. on Why MySQL Grew So Fast · 2004-04-20 19:56 · Score: 1

I think one reason might be that before PgSQL 7.0, pgsql was known for being both unstable and extremely slow. And only with 7.1 did the extremely useful outer joins get included (yes, you can emulate outer joins with subqueries and unions but that leads to quite loooong queries, as anyone who used openacs 3.x can attest).

Anyways, by the time 7.1 was out, which beat mysql in almost every way possible, LAMP had already become the established player.

M-x doctor on Ask the Robotic Psychiatrist · 2004-04-19 05:59 · Score: 1

I'm sure doctor-mode beats this thing.

Re:Qt is almost a like a language on A Taste of Qt 4 · 2004-04-19 01:06 · Score: 1

And yes new() and delete() do exist, sorry.

I stand by my claim: new() and delete() don't exist in standard C++.

Yes, of course new and delete can be overloaded, just like other OPERATORS. And yes, overloading happens to be done with methods having a special signature, in the case of new the method name happens to be "operatornew". Because the "operator" word is reserved in C++, you often see the new operator overloaded as

void* operator new(std::size_t) throw (std::bad_alloc);

to improve readability. Unfortunately, this might also confuse people to believe that the method is actually called "new".

Re:Qt is almost a like a language on A Taste of Qt 4 · 2004-04-18 20:00 · Score: 2, Insightful

I haven't used qt any more than for some simple Hello world, stuff, but IMHO that argument is quite poor. Qt sucks because it makes life easier? Umm, why not do your GUI programming in asm while you're at it, if you like pain?

The same argument could of course equally poorly be made against smart pointers. Why use a smart pointer when you could do manual memory management with new and delete? Oh, what heresy!

PS. new and delete are operators, not functions. There is no such thing as new() and delete().

Re:They're all clusters now anyway on Cray CTO: Linux clusters don't play in HPC · 2004-04-13 21:03 · Score: 1

IMHO, you're only partly correct.

From what I have understood from here on the other side of the pond, the US supercomputing establishment was deeply shocked by the introduction of the earth simulator. For a lot of people, it's a matter of national pride that the US nuke labs have the biggest and baddest computers around.

The current crop of supercomputers, i.e. commodity CPU:s, and their applications, typically communicating via MPI brought supercomputing to the masses. OTOH, for a typical scientific application, there are only so many MPI processes you can have before the communication overhead gets too large.

I agree with you that clusters are essentially the only way to go for many reasons, but at the same time I think we are going to see a reneissance of the vector processors. With the ever increasing cost of CPU designs, I don't think that we'll see many dedicated supercomputer-only CPU:s. What will happen, is that people will start to investigate how to make use of the (so far quite limited, but improving) vector support of commodity processors, i.e. SSE2, 3dnow, altivec etc. Another approach which I read IBM is working on, is to bunch together some normal CPU:s into one big "virtual vector CPU" (IIRC IBM is trying to bunch together 8 POWER CPU:s into one vector CPU).

With compilers and BLAS libraries supporting this new generation of vetor computers, the benefit for the user and application programmer is that you'll get a significant performance boost from the vector stuff before you start suffering the communication overhead of MPI. If this vectoring stuff improves performance 5-10 times without any extra effort for the application programmer, that's nothing to sneeze at, IMHO.

Re:He's wrong, but he's also right. on Cray CTO: Linux clusters don't play in HPC · 2004-04-13 20:35 · Score: 1

In fact, the earth simulator is a cluster too. The difference between the earth simulator and a typical beowulf cluster is that while the nodes in the beowulf today typically contain 2 (x86/IA64/etc.) CPU:s, each node in the earth simulator contains 8 vector processors (made by NEC IIRC).

Even in the earth simulator, AFAIK there is no shared memory between the nodes, they have to use message passing (e.g. MPI) just like the beowulf applications use.

The reason that the earth simulator is so impressive is that if your application starts to have scaling problems with say 8 MPI processes you're limited to 4 nodes in the beowulf. On the earth simulator, you can use 8 nodes and use threads inside the node (yes, you can use threads on the beowulf too, but it's usually not worthwile when there are only two CPU:s on each node), and furthermore the application can be even more parallelized by using some BLAS that is tuned for the vector processors.

Re:Oh boy... on Can You Spare A Few Trillion Cycles? · 2004-04-12 21:23 · Score: 2, Informative

This presentation explains the problems with Java floating point.

Incidentally, C99 has very nice support for IEEE 754 (improved numerics support was, in fact, one of the biggest additions compared to the old C89 standard).

Re:Java on top of OpenGL is happening... on The State of OpenGL · 2004-04-09 19:02 · Score: 1

It's not like you can't have unchecked exceptions in Java too. All exception classes that extend RuntimeException are unchecked. I use them a lot, and I agree it's very nice to avoid cluttering the code with throw/try/catch stuff.

Slashdot Mirror

User: joib

Comments · 928