Domain: netlib.org
Stories and comments across the archive that link to netlib.org.
Comments · 145
-
Re:It's not a popularity contest
The structure you are currently in, if it was large/complicated enough, was probably designed using structural analysis software written in Fortran (at least the BLAS calls in the solver, if not the UI). Even a Matlab, Python or R program is really Fortran once you get to the inner loops.
-
Re:Anybody know more details about the CPUs?
All the high-level technical details apart from the ISA ("not Alpha") you could want.
-
More info at ...
-
Re:What are you really asking?
I for one would like to see a historic timeline of absolute numbers for CPUs, memory, and mass storage. But that is not so easy to do. I have found little snippets here and there on Wikipedia, but not even a single master list of CPUs, let alone more hardware. There are master lists of CPU benchmarks but not spanning generations and radically different CPU sizes obviously. Here's DDR3 RAM: https://en.wikipedia.org/wiki/... DDR4, not really there in Wikipedia, though there are some articles that talk around the subject: http://www.extremetech.com/ext... A discussion of the LINPACK benchmark: http://www.netlib.org/utk/peop... History of hard drives, plenty of data but not complete and not tabular: https://en.wikipedia.org/wiki/... If you know where to get tables of the raw data without an enormous amount of work, I'd like to see it.
-
Re:Ah, but: how much of this ships to end-users?
Wow, you must be the perfect example of how a low Slashdot ID an insightful comment don't make.
If you even bothered to parse the headline, you'd have noticed that the talk is about scientists using Fortran, not OS kernel hackers, not Web programmers. Fortran is totally NOT the tool for any of the tasks you mention, just like most of the other languages just suck for parallel floating-point operation intensive applications.
But since you're an old hat, I'll take a stroll on your lawn and point out the following:
Do these scientists develop friendly graphical user interfaces for their Fortran programs?
Define "friendly". I have actually seen Fortran programs that print ASCII-art-like diagrams. Sure beats having to look at GB-long tables of numerical data at runtime.
Do these programs have robust and secure handling of all input?
In all my years of experience (>10, so I guess I'm quite the novice in Fortran-years) I have not seen a single security exploit implemented in Fortran. It's just not the tool for the job. BTW if anyone can point out such a thing, please let me know so that I can raise my hat in the right direction. Now, as far as robustness goes, Fortran lets us input data in many creative ways, it would be a shame to take away such a... ahem... feature.
How about configuration: are there dialogs for setting up preferences, which are persisted somewhere?
They are called ASCII files. They are pretty persistent, and they will remember settings that, more often than not, should have been long forgotten.
Do they package up user-friendly installers?
No need. All you need to do is unzip the folder.
How much of their stuff runs on new platforms like tablets and smartphones?
This has actually been done! Check it out here:
http://specificimpulses.blogsp...What non-Fortran-stuff do these programs integrate with? Anything over a network?
The modern Fortran versions can talk to C. Python uses Fortran libraries for math-intensive stuff. Fortran programs can number crunch across hundreds or thousands of nodes connected with InfiniBand. Network-y enough for you?
Where can I download a scientific Fortran program to evaluate its quality?
Start here: http://www.netlib.org/
And remember kids: If the old man scares you, just kick him in the nuts and run away!
-
Fortran!
"Name these languages that offer more performance AND safety than C++"
Yes, Fortran. Fortran 2003/2008 is a very good conventional straight-forward programming language, except for I/O. It has a better memory and execution model for high-performance computing.
http://en.wikipedia.org/wiki/Fortran#Fortran_2003
And it supports successful code-reuse, proven empirically over decades, even when hobbled with backward compatibility with ancient cruft/
www.netlib.org
Fourier transform routines written in the 1960's are still good. Ugly to look at, but it will compile and link and work just fine.
http://www.netlib.org/go/realtr.f
And of course it doesn't omit obvious things like FUCKING MULTIDIMENSIONAL ARRAYS for real, in the language, and interoperable everywhere. And they can start at 1 or 0.
And a pointer is not the same as an allocatable array/structure.
-
Re:Ummm...wutt?
s this LINPACK metric something that exercises the Crey's massive pipeline architecture, where huge arrays of numbers (the vectors) were operated on at lightning speed through pipeline (assembly line-style) chip design? Or is it just a looping test?
-
Re:R or WEKA ... Wait, What Exactly Are You Doing?
How about Netlib? Wait, they have a library for statistics and not statistics for a library.
Ultimately, you are a librarian and not a statistician. If you have a background in mathematics that includes statistics, then you might be able to use the "better" products. The problem, though, is that the person interpretting the results will also require an understanding of what the pretty charts mean. It seems unreasonable to send people to a statistics course to understand the kind of data you are collecting.
I do not know how the statistics are being stored or calculated now. If there is a database involved, most of those can do basic statistics directly in the query.
The more in-depth statistics programs are often used by people who want to spend large amouns of time analyzing data. Though they can produce standard charts quickly, it often takes a bit of exploring for a way to convey the desired message.
"Figures don't lie, but liars figure." - Mark Twain
-
LINPACK/LAPACK/Netlib
right up front: I know about this only because I work for these guys, but...
there's a whole host of Linear Algebra-related software written for high performance computing environments that is attributable largely to various teams of academics throughout the past 30 or so years. It is my understanding that these libraries get used by most anyone doing high-performance computing.
http://www.netlib.org/lapack/ http://en.wikipedia.org/wiki/LAPACK -
Re:Which aspect ratio?
A Gateway 2000 P5-90(90 MHz Pentium) running Windows NT could do 11 MFlops.
A Intel Pentium II Xeon (450 MHz) could do 98 MFlops
A couple of things to keep in mind.
Even if a chip is capable of performing operations at a certain speed, it will slow down if starved of data. (see Ahmdahl's Law) Cray designed their computers to keep the processors fed. PC manufacturers didn't necessarily do so, for reasons of cost.
Prior to the release of the 68040 and 80486 chips, the floating point unit was strictly optional on Personal Computers. If you bought a workstation, sure the unit would be well integrated into the machine and the os. But ordinary PCs, particularly inexpensive one,s did without, If necessary, floating point could be emulated, slowly.
Modern PC CPUs do very well indeed.. A single core of an Intel Pentium Woodcrest (circa 2006) running at 3 GHz scores 3018 MFlops. Theoretically. it could run at 12000 MFlops, but something is slowing it down. Maybe it needs more cache? Ah, well, the world has moved on since then. Besides, thats only one core, and today's super computers contain tens of thousands of cores.
-
Re:Which aspect ratio?
> Nope, it was 1993.
Great. That makes my 1993 cites all the more relevant.A Sun 3/260 (25 MHz 68020, 20 MHz 68881) was capable of 0.46 MFLOPS on linpack,, while a Cray X/MP/416 (2 proc. 8.5 ns) was capable of 143 MFLOPS. Quite the gulf. Did the VideoToater have custom chips to sped up renders?
Of course, the Cray was rendering to film resolution, while the VideoToaster only needed to render ntsc frames And, of course, the graphic industry had learned to update its algorithms in the meantime.
-
Re:Math environments are hackable hobbyist friendl
The OS designed-or-not for mathematics is not compelling. fdlibm, good stuff.
-
The actual benchmark does stress interconnects
Yes, noticed that.
Here's the actual benchmark used for Top500: "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". It solves linear equations spread across a cluster. The clustered machines have to communicate at a high rate, using MPI 1.1 message passing, to run this program. See this discussion of how the algorithm is parallelized. You can't run this on a set of machines that don't talk much, like "Folding@home" or like cryptanalysis problems.
Linpack is a reasonable approximation of computational fluid dynamics and structural analysis performance. Those are problems that are broken up into cells, with communication between machines about what's happening at the cell boundaries. Those are also the problems for which governments spend money on supercomputers. (The private market for supercomputers is very small.)
So, quit whining. China built the biggest one. Why not? They have more cash right now.
-
The actual benchmark does stress interconnects
Yes, noticed that.
Here's the actual benchmark used for Top500: "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". It solves linear equations spread across a cluster. The clustered machines have to communicate at a high rate, using MPI 1.1 message passing, to run this program. See this discussion of how the algorithm is parallelized. You can't run this on a set of machines that don't talk much, like "Folding@home" or like cryptanalysis problems.
Linpack is a reasonable approximation of computational fluid dynamics and structural analysis performance. Those are problems that are broken up into cells, with communication between machines about what's happening at the cell boundaries. Those are also the problems for which governments spend money on supercomputers. (The private market for supercomputers is very small.)
So, quit whining. China built the biggest one. Why not? They have more cash right now.
-
Re:os contribs
if your into technical progamming / numerical stuff maybe work on the Open Source FORTRAN's as they are poor performers compared to the paid for counterparts.
If numeric computing is of interest...
Actually there are thousands of Fortran modules that scientists, engineers, and mathematicians still rely on because they are well known, organized, and trusted.
Look for local users who may have some "kludge"-works that they doing like MATLAB, an old Fortran module from NetLib, maybe a couple C or Fortran module from Numeric Recipes, all bound together with some very brittle Korn Shell scripts. That would be perfect to help refine the process into an Octave (or other Free Software alternative) script, and re-write (or quite likely simply replace) the Fortran and C modules with Octave toolboxes (or in Octave or C/C++).
-
Re:That's quite interesting
- A gas is a compressible fluid.
But it cannot be treated as such when the density gets too low. You couldn't treat the edge of the atmosphere as a fluid. I don't care what it is when I use a CFD, only what it behaves like.
- The most general solvers are the most handicapped. Even the ridiculously costly commercial solvers (Ansys, Fluent, etc.) solve a limited number of problems. I was working on a project that attempted to numerically simulate the effect of electromagnetic waves on the brain. Obviously, you need to solve the Maxwell's equations in horrible medium that is your brain. That's when I realized how woefully indadequate the commercial solvers (that claim to simulate the problem) are.
My brain's a medium? I thought I heard some dead people in there.
:)Seriously, I entirely agree. I don't for a moment pretend that I'm anything like up enough on PDEs or high-end maths problems (it's been a while) to identify the best packages either. The best I can do is say such software exists. I'll list here the packages I list and use - not just for PDEs but for maths and logic problems as a whole. I'll leave it to you and others skilled in the subject to pass judgement on their quality.
- ATLAS - A nice, optimized BLAS (Basic Linear Algebra System) implementation
- HOL4 - Higher Order Logic proof assistant
- Hypre - Preconditioner for linear equation solvers
- LAPack - Linear Algebra Package that runs over BLAS
- ScaLAPack - Subset of LAPack optimized for highly parallel computers
- Overture - A PDE solver
- PHAML - A PDE solver for 2D elliptic partial differential equations
- SUNDIALS - An expansive (and rather nice) PDE solver
- VSIPL++ - Nice little signal processing library
- A gas is a compressible fluid.
-
Re:That's quite interesting
- A gas is a compressible fluid.
But it cannot be treated as such when the density gets too low. You couldn't treat the edge of the atmosphere as a fluid. I don't care what it is when I use a CFD, only what it behaves like.
- The most general solvers are the most handicapped. Even the ridiculously costly commercial solvers (Ansys, Fluent, etc.) solve a limited number of problems. I was working on a project that attempted to numerically simulate the effect of electromagnetic waves on the brain. Obviously, you need to solve the Maxwell's equations in horrible medium that is your brain. That's when I realized how woefully indadequate the commercial solvers (that claim to simulate the problem) are.
My brain's a medium? I thought I heard some dead people in there.
:)Seriously, I entirely agree. I don't for a moment pretend that I'm anything like up enough on PDEs or high-end maths problems (it's been a while) to identify the best packages either. The best I can do is say such software exists. I'll list here the packages I list and use - not just for PDEs but for maths and logic problems as a whole. I'll leave it to you and others skilled in the subject to pass judgement on their quality.
- ATLAS - A nice, optimized BLAS (Basic Linear Algebra System) implementation
- HOL4 - Higher Order Logic proof assistant
- Hypre - Preconditioner for linear equation solvers
- LAPack - Linear Algebra Package that runs over BLAS
- ScaLAPack - Subset of LAPack optimized for highly parallel computers
- Overture - A PDE solver
- PHAML - A PDE solver for 2D elliptic partial differential equations
- SUNDIALS - An expansive (and rather nice) PDE solver
- VSIPL++ - Nice little signal processing library
- A gas is a compressible fluid.
-
Thanks to Sun
Note that the cited paper location is docs.sun.com; this version of the article has corrections and improvements from the original ACM paper. Sun has provided this to interested parties for 20odd years (I have no idea what they paid ACM for rights to distribute).
http://www.netlib.org/fdlibm/ is the Sun provided freely distributable libm that follows (in a roundabout way) from the paper.
I don't recall if K.C. Ng's terrific "infinite pi" code is included (it was in Sun's libm) which takes care of intel hw by doing the range reduction with enough bits for the particular argument to be nearly equivalent to infinite arithmetic.
Sun's floating point group did much to advance the state of the art in deployed and deployable computer arithmetic.
Kudos to the group (one hopes that Oracle will treat them with the respect they deserve)
-
Re:While there may be "newer" languages
BTW, a very ill-advised design choice of Python: http://www.python.org/dev/peps/pep-0211/ Ask any numerical analyst to know why it is a terrible idea to solve a linear system with inv(A)*b. But make sure you have at least half an hour free.
To make a long story short; solving Ax=b by calculating x=inv(A)*b is a terrible idea because calculating inv(A) is an inherently difficult thing. While it would be extremely useful to have inv(A), it's not strictly neccessary to obtain in in order to solve Ax=b.
At the most basic level, the technique which most would be aware of to solve Ax=b is basic Gauss Elimination, with an augmented matrix and back substitution. In fact, this is often the very first thing people learn how to do in a linear algebra course. It isn't much better than finding the inverse, but it saves a lot of computation in the long run.
Of course there are many other techniques. Happily however, most packages can now automatically make the best choice on which technique to use, depending on the properties of A. In Matlab and Octave, it all boils down to using the left division operator like so
x=A\b
instead of the inverse calculating
x=inv(A)*bUsing the first command, Matlab and Octave will choose a technique that best suits the matrix A. This page has a list of all the techniques that Matlab can use to solve the linear system. To my knowledge, Octave has a number of techniques as well, but I'm not sure if it's as comprehensive as Matlab. Also, Octave's left division operator has been known to have bugs.
And to return to the main topic, Octave and Matlab both use LAPACK extensively, which is written completely in Fortran(and based on BLAS). There's really no other language for linear algebra.
-
Re:Hold the hyperbole
http://www.netlib.org/linpack/
Note that if you've got a vector machine you usually use LAPACK, which is optimized for that architecture.
-
Re:Not so sure its the first
Blue Waters will be the first to deliver a sustained petaflop on "real-world" applications, meaning various scientific simulations. Specifically, the program solicitation required prospective vendors to explain how their proposed systems would sustain a petaflop on three types specific types of simulations, one each in turbulence, lattice-guage quantum chromodynamics, and molecular dynamics.
Granted, Roadrunner was the first machine to deliver a petaflop on the Linpack benchmark (though certainly IBM's own implementation of it). The benchmark does nothing more than set up and solve a system of linear equations. Roadrunner solved a system of 2,236,927 equations (in other words, it had a 2,236,927-by-2,236,927 coefficient matrix) in 2 hours.
But Blue Waters is planned to deliver a petaflop on applications that normally don't sustain >80% of theoretical peak; these applications are lucky to get near 20%.
-
Astronomy and theoretical physicsFrom your post I gather that you will primarily use self-study. As regards reading material I suggest you have a look here: http://www.phys.uu.nl/~thooft/theorist.html Prof. 't Hooft is a Nobel-prize winner in physics and he has put together a page with "open source" reading material on physics which he recommends to anyone with aspirations of becoming a theoretical physicist.
As an aspiring astronomer your profile will strongly resemble that of a theoretical physicist. And you'll certainly need to know about just about everything he lists on that page: from classical mechanics, optics, special and general relativity, quantum mechanics, statistical mechanics, thermodynamics, plasma physics, plain old electromagnetism, to electronics. 't Hooft lists freely downloadable high-quality reading material on just about every topic!
And although you didn't ask, don't forget the computational side of things! Most astronomers I know are heavy computer users and very good programmers.
So make sure you know about Fortran and the libraries that are written in it (e.g. have a look at http://www.netlib.org/liblist.html and acquaint yourself with Lapack, Sparsepack, fftpack, cephes etc). Many of those routines can also be found in Matlab, Octave, Scilab, etc., but if you need full control and a standalone executable on a big supermini you might have to go back to Fortran and C++), And make sure (well
... I hardly need tell a mathematics undergraduate but I can't omit it) that you know about Maple and/or Mathematica.But
... if I may be so bold ... whilst reading and self-study are an indispensable element of learning physics they are rarely sufficient. You'll also need to learn a special way of thinking that sometimes comes hard to people with a background in mathematics. Which is to know when and where to cut corners and use approximations, and sometimes even go beyond the mathematics you know.Think of Paul Dirac (of the Dirac Delta function). His "function" isn't a function at all, it's a distribution, and trying to think of it as a function will lead you to contradictions. But Dirac set up a formalism using it (and got the properties right !) without knowing about distributions (they were invented later partly to put what he had done on a firm mathematical basis). He simply let mathematical firmness go hang at the appropriate moment. Now I'm not comparing you to Dirac (and I'm certainly not encouraging you to take liberties with mathematics), but Dirac was a physicist first and a mathematician second. That's what I mean. Someone suggested the Feynman Lectures
... they're great (if sometimes a tough read) exactly because Feynman makes this very point.You see
... in Physics, the physics comes first and the mathematics second; meaning that in thinking about physics problems you'll have to think in terms of physics (of course greatly helped by the mathematical formalisms in which physical laws are couched) but if you'll need to be able to think through a physical line argument without necessarily working through all the maths. Physicists do this as a second nature, and you'll need to learn how. -
A few things to know
First, the Top500 list has plenty of value. What most people do not realize (or should realize) is it is one data point on the HPC spectrum. If your HPC program does not perform the same or similar matrix operations as HPL then the ranking is meaningless to you. To some the list has become a public relations contest.
Second, performance is virtually independent of the OS (unless you are using TCP). Most big clusters use InfiniBand and run applications in "user space" by-passing the kernel. The rest of the code is crunching numbers.
Third, for the right cost, anyone can get a system on the Top500 list. It is a rather simple price/performance calculation, by the way. Breaking into the top 10 might be a little more difficult.
Finally, HPC and Linux are synergistic. Take a look at Why Linux on Clusters? to get the full story. The Windows model does not work very well in this space.
-
Linpack? So does it run Linux?
Apparently, not necessarily. It's just some Fortran routines.
So much for that joke.
-
Re:This CELL is not single precision
The difference between the two Cells it is not "abysmal", let's do some comparisons:
a) PS3 "Classic Cell" 1 PPC64 w/ 2 threads and 7 SPEs (8, but one disabled, defective or not):
GFLOPS 32-bit (float): 3.2GHz * 8 FLOPS/Hz * 7 SPEs = 179.2 GFLOPS
GFLOPS 64-bit (double): 3.2GHz * 1 FLOPS/Hz * 7 SPEs = 22.4 GFLOPS (huge penalty, because of simulation via unoptimized simple precission operation)
GFLOPS 64-bit (double) a optimized 32-bit operation: 3.2GHz * 3.9 FLOPS/Hz * 7 SPEs = 87.36 GFLOPS
b) Roadrunner "New Cell" 1 PPC64 w/ 2 threads and 8 SPEs:
GFLOPS 32-bit (float): 4GHz * 8 FLOPS/Hz * 8 SPEs = 256 GFLOPS
GFLOPS 64-bit (double): 4GHz * 4 FLOPS/Hz * 8 SPEs = 128 GFLOPS
P.S. ad hoc rewrite, based on my own Journal at Barrapunto (spanish /.). -
Re:The CELL processor is single precision
HPL, the code that is run for the Top500 number (or an equivalent implementation, optimized for the target architecture), is double precision. This is perhaps the most important rule of the benchmark: the calculation must be carried out in full double precision. The 1.026 Pflop/s number does represent a double-precision workload.
-
Re:Navigating the Nuances of Non-conformanceFirstly, I want to state that I have no problem with Excel as a tool for the accounting profession. They have been the primary target market for spreadsheets ever since the days of Visicalc and 123, and their feedback has ensured that Excel works well for their needs.
When it comes to banking, I don't think it's a good choice at the scientific modelling end (forecasting, derivative pricing). It's convenient to have one universal tool which can be used for both accounting and forecasting, but the fact is that most scientific calculations have to contend with numerical instabilities, and this cannot be hidden behind a simple user interface.
A classic example is the computation of variance, which if you want a banking application occurs as part of VAR modelling for example. Computing a variance as commonly done is an unstable process, which needs to be done in a non-obvious way.
Where I think it is dangerous to pick and choose improvements to the floating point standard is that the exact properties of this standard are also the basis for axioms of computer arithmetic, which in turn are used by numerical analysts to identify the properties of common algorithms. Nobody is going to analyse the behaviour of all algorithms on non-standard implementations, and the stabilizing practices identified as working for the standard may not work on variants.
Of course, Microsoft are simply meeting the demands of their users, but that just brings us back to the original point about choosing another tool for real calculations.
(2) Students of math and science should be taught not that there is one particular way that numerical computation is done, but rather should be taught why fixed point / BCD is different than floating point / IEEE 754 so that they can make informed decisions about which tools suffice in which circumstances.
There is often a serious difference in performance and accuracy between a hand rolled method, and a properly debugged and optimized implementation such as can be found on netlib. For historical reasons, those implementations are mostly Fortran and C.While it's true that in principle one could choose a language and other aspects such as the floating point standard for each application, in practice I think it's much better to know about and rely on existing code that works. That also means interfacing with old languages, etc. and may well look like dogma to some extent.
-
Argonne Jet Lag Diet
This looks like it might be related to research (PDF) done a few years back by the Argonne National Laboratory on diet and jet lag. There's a summary of their diet here. It's more complex than just fasting. I've used it travelling to Israel, Russia and Burma, and it's worked well for me.
-
BLAS libraries
Blas libraries history dates back to 1979 (http://www.netlib.org/blas/faq.html#1.2, http://www.netlib.org/blas/faq.html#1.2), their netlib implementation probably contains some routine which has remained unchanged since then. They are still widely used.
-
Re:My Guess its at Netlib or at NIST
I was looking for some mathematical routines to port into Python and ended up poking around at http://www.netlib.org/ and http://www.nist.gov/ where there are huge repositories of mathematical functions, most written in Fortran.
One of the most interesting things after perusing much of the code I was looking for, was that instead of using integration routines for calculating things like Bessel functions, Hankel functions, and other differential equation related functions, they simply used look up tables and curve fitting.
BINGO! The math routines used to compute some special statistical functions in early versions of Excel, for example, area under the normal curve and its inverse, go back to Hastings approximations from the mid 50s. They are rational function approximations. I first saw them back in the 60s as cited in the National Bureau of Standards "Handbook of Mathematical Functions". People still use these approximations today. -
My Guess its at Netlib or at NIST
I was looking for some mathematical routines to port into Python and ended up poking around at http://www.netlib.org/ and http://www.nist.gov/ where there are huge repositories of mathematical functions, most written in Fortran.
One of the most interesting things after perusing much of the code I was looking for, was that instead of using integration routines for calculating things like Bessel functions, Hankel functions, and other differential equation related functions, they simply used look up tables and curve fitting.
I suppose in the 1960's that made perfect sense as computers were so slow. But even today, I don't know why I shouldn't do the same thing. With EM and circuit simulation software its GIGO. There are so many parasitics to model, that you can only ever get an approximation anyway, so what difference does it make if you get a tiny error from a look up table, vs. the "exact" integration routine value? -
PVM - 1989
Forget SETI at home look at PVM. First release 1989 !!
http://en.wikipedia.org/wiki/Parallel_Virtual_Machine
Description here
http://www.netlib.org/pvm3/book/node17.html
Main channel is to pvmd. "backchannel" is the process to process communication.
--
The PVM system is composed of two parts. The first part is a daemon , called pvmd3 and sometimes abbreviated pvmd , that resides on all the computers making up the virtual machine. (An example of a daemon program is the mail program that runs in the background and handles all the incoming and outgoing electronic mail on a computer.) Pvmd3 is designed so any user with a valid login can install this daemon on a machine. When a user wishes to run a PVM application, he first creates a virtual machine by starting up PVM. (Chapter 3 details how this is done.) The PVM application can then be started from a Unix prompt on any of the hosts. Multiple users can configure overlapping virtual machines, and each user can execute several PVM applications simultaneously.
--
The general paradigm for application programming with PVM is as follows. A user writes one or more sequential programs in C, C++, or Fortran 77 that contain embedded calls to the PVM library. Each program corresponds to a task making up the application. These programs are compiled for each architecture in the host pool, and the resulting object files are placed at a location accessible from machines in the host pool. To execute an application, a user typically starts one copy of one task (usually the ``master'' or ``initiating'' task) by hand from a machine within the host pool. This process subsequently starts other PVM tasks, eventually resulting in a collection of active tasks that then compute locally and exchange messages with each other to solve the problem. Note that while the above is a typical scenario, as many tasks as appropriate may be started manually. As mentioned earlier, tasks interact through explicit message passing, identifying each other with a system-assigned, opaque TID.
-- -
Re:Is this that silly..
Let us understand something. There is a free (public domain) set of software that does this very thing. This library is basically a reimplementation of some the same APIs that optimizes it for the AMD processor. Note that Intel has a similar offering already called Intel Math Kernel Library. The APL has always been a free download for evaluation purposes. Now, companies like The Mathworks can use the library for free.
-
Re:Good and bad news
Wow. The leagues of uninformed.
You think there are two things here, Matlab and Octave. Matlab is proprietary, and Octave followed it. It's as simple as that to you.
But wait, where does much of the meat in Matlab come from? Netlib. OPEN SOURCE! HAHAHAHA (Some of the Netlib code has license restrictions, some does not.)
http://www.netlib.org/
What does Matlab use for optimized BLAS routines to run super-quick on your Windows/Linux/Mac? ATLAS. Check out the Sourceforge page:
http://math-atlas.sourceforge.net/
The really important thing for me is that now that Octave is out there (actually, Octave has been around since about 1994), the explorations that I made in undergrad in Matlab can be done entirely in Octave now and forever. A good tool doesn't get worse as it gets old, it just gets used more.
If there was once a patent on hammers, there is no less usefulness in (but much lower prices on) hammers after the patent expires. Now we get much of Matlab's functionality completely free. Congratulations John Eaton, et al., for giving all who follow another tool to use freely to build bigger and better tools.
And as others have mentioned, if you don't like Matlab/Octave, use another tool that tried to accomplish the task of a high-level numerical tool in a different way. To me, however, I can code up an algorithm, test out concepts, and produce incredibly helpful visualizations in a matter of minutes using Matlab or Octave. Any tool this powerful has a learning curve to get over before it is so efficient, and I climbed that learning curve with Matlab, but I was able to use Octave immediately because I had already gone through that process using Matlab.
If you made a completely innovative new tool, it likely wouldn't be worth it for me to use for a while because I am so fast at coding Matlab/Octave, and the whole point in these tools is to make the programmer's job easy (if I wanted fast code execution, C or Fortran could be used). -
Good and bad newsThe good news is that they are doing in a free way what the Matlab Co. has been charging (a lot!) for, which is distributing an API to use all those libraries the US Federal Government labs give away for free.
The bad news is that they are wasting their time using the Matlab syntax, while there is a much better alternative for doing exactly the same thing. Python is universal, if there's anything you can do with a computer, the simplest way to do it is with Python, so why do it the hard way? -
Re:Openness is Fundamental to Mathematics
Anyways, when number-crunching gets tough, when you got a huge amount of data to process, neither MATLAB nor Mathematica aren't gonna cut it anyway and, typically, engineers, physicists and applied mathematicians will drop down to that mighty working horse called Fortran and use routines from something like LAPACK that are, indeed, open source.
In fact, this leads to a further point: not only it's important that this mathematical software be open source, but that it not be released under a restrictive license such as the GPL. Releasing code under licenses like BSD or putting code in the public domain allow the incorporation of these openly tested, scrutinized and widely used routines even in commercial code thereby garanteeing quality through and through the user's experience and choices, whether he/she be using proprietary software or not. In this way, the complaints made in the article disappear. -
Try J. Comput. Phys. and J. Sci. Comput.
This isn't my area, but my Ph.D. is in applied and computational math, and I've spent a great deal of time solving first-order hyperbolic problems where characteristics cross. (In my context, level set methods where the zero contours can split and/or merge.)
For a hyperbolic problem like this, you'll want to be careful. Since the waves have variable propagation speeds, there's a possibility for shock formation. (characteristics can cross) Think of Burger's equation as a nice, tangible first-order analog. In such a case, it will be important to choose a numerical method that satisfies some kind of entropy condition to handle the shock. Similar things have been encountered in level set methods, where you solve an equation of the form ft + V |grad(f)| = 0, where V is the variable speed of an interface that's represented as the zero contour of f.
Since second-order wave equations are so important in physics, you may want to check out the Journal of Computational Physics. You should probably also try the Journal of Scientific Computing.
As for visualization, you'll probably want to check out the "industry standards" Matlab and Mathematica. You could plot the time evolution of level surfaces of your wave equation, for instance. As for other softare, I'd generally advise pulling together what you can find at netlib, although more cutting-edge stuff may require you to roll your own C/C++ or FORTRAN. But any of that stuff will be faster than running in Matlab or Mathematica, and it will take a whole lot less memory.
Best of luck, and have fun!
:-) -- Paul -
Re:How does it compare to a PS3?
Sorry for replying to myself, but I found an interesting paper about the subject. Seems that a PS3 should have Rpeak of 14 Gflop/s with double precision floating point operations. Sounds to me that with a proper clustering solution a four-node PS3 cluster would be significantly faster than Microwulf. And it would probably be a smaller, too
:) -
Imagaine blah blah cluster
Yeah, it runs linux.
-
Re:The title is misleading202 of these 1.81 TFlops single precision http://www.pcper.com/article.php?aid=363
for BlueGene 367 TFlops double precision http://www.netlib.org/utk/people/JackDongarra/faq
- linpack.html#_Toc27885741no thanks
-
Re:What kinds of apps does this make reasonable?
Atomistic simulations of biomolecules. Chain a bunch of those together, and you begin to simulate systems on realistic time scales. Higher-resolution weather models, or faster and better processing of seismic data for exploration. Same reason that we perked up when the R8000 came out with its (for the time) aggressive FPU. 125 MFlops/proc@75MHz was nothing to sneeze at 15 years ago. If they can get this chip into production in usable quantities, and if it has the throughput, then they're on to something this time.
Of course, this could just be a single-chip CM2; blazingly fast but almost impossible to program. -
Re:C didn't fail...but it has still not superseded FORTRAN
Well, there are still people who ride horses, so you could also say that cars haven't superseded horses, right? BTW, the all-capital spelling is definitely out these days, according to the latest standards, the language is now called "Fortran".
there are still issues in C that prevent C code from being as heavily optimized as FORTRAN
There are some C compilers that do not generate code that is as optimized as the best Fortran compilers. Also, a mediocre programmer can create a C program that doesn't optimize very well. After all, C is an extremely flexible language that will let you do almost anything you want. But I don't know of any issues that will not let a program written by a competent programmer in C be as optimized as the best Fortran program.
Having said all that, I must admit that I still use Fortran to some extent. For instance, I use LAPACK, in the FORTRAN-77 version. But it's compiled in my machine using the ATLAS optimizer, which is a C program that creates a version of LAPACK that's optimized for the machine you are using. You see, to get the ultimate performance they had to use C...
Pointer aliasing, it's true, could create problems for optimization in some cases, but that could be circumvented by specific "#pragma" declarations, as I mentioned in my GP post. But the most efficient optimizations, such as using SSE2 in Pentium CPUs or the equivalent instructions in AMD machines, go way beyond what can be used by simple syntax manipulations.
To use the full power of things like vector-like CPU operations you need to use an algorithm suited for the purpose. There is no language or compiler that will change a poorly designed program into a good one. Using the right algorithm is the first step in optimizing, I know of no compiler that will analyze a program and decide, "hmmm, this is actually a Fourier transform, let's use an FFT instead". -
Re:Why downplay it?
Perhaps the CPU manufacturers are desperately competing against the GPU manufacturers for developers of scientific applications? Nvidia just announced their 8800 series GPU's with support for BLAS, a foundation library for intensive engineering calculations.
All the engineering, digital content creation, and gaming use similar algorithms to model/visualize water, fire and smoke. However, engineering does require high precision (64-bit floats) while animation and gaming can get away with lower precision (16-bit floats).
It's really going to be overkill to use a quad-core CPU with 64-bit precision floating-point units to run a single-threaded game engine that does all the animations effects on the GPU. It might still be overkill even if a game engine had separate threads for handling player input, game server communication, physics and rendering. -
Re:Not fair comparison
Can't you fuckers use Google?
Cell in the PS3 has 7 SPE units at 3.2GHz, and an AltiVec unit on the PPE. Add in nVidia's
RSX which is about the same as their GeForce 8xxx series is supposed to be.
http://www.netlib.org/utk/people/JackDongarra/PAPE RS/cell-linpack-2006.pdf
There's a good performance white paper on the Cell. Peak performance on the
units put together is something like 250 GFLOPS/s. Real world performance is
about 100 GFLOPS/s in the standard BLAS benchmark. Page 11 and 13 for pretty
graphics.
http://www.linuxclustersinstitute.org/Linux-HPC-Re volution/Archive/PDF06/06-Cozzini_S_final.pdf
Core 2 Duo; I dunno. But there is an Opteron and Xeon for you. Check page 6 for
the pretties again.
5 GFLOP/s? I think 100 divided by 5 is about 20. Yes, it's the same test, BLAS
is used in LINPACK and LINPACK is how they benchmark supercomputers. It's a
pretty good test.
10 GFLOP/s maximum? It's still 10x the performance in best case for a top end Opteron,
for 8x the power consumption! Core 2 Duo might be a bit faster, but consider.. Core 2
Duo also comes in 1.6GHz chips for laptops (same performance scaled) whereas Cell for
the PS3 comes in ONE 3.2GHz version. You could have to find a nice, 3.2GHz chip from
Intel, with memory bandwidth that doesn't exist, to even get that down to single
figure multiplication factor. -
Re:Stream processing is NOT new
Kestrel is recent. I remember ICL's DAP systems back in the late 1970s/early 80s, and it looks as though it wasn't the only one around then: http://www.netlib.org/utk/lsi/pcwLSI/text/node11.
h tml -
How about the FFT algorithm?
The Singleton implementation (and it's chronologic comrades) makes the grade in my opinion. This was software which squeezed every drop of performance out of the primitive machines they had at the time and made many avenues of scientific research available where they had not before.
-
Similar SituationI've been in similar situation, except that we chose Java as opposed to C++ for the "lower" level language. For the higher level langauge, we have both Matlab and R. We're also dealing with research situation. Here's my experience...
- Prototypes are best done in higher level language, in this case, Matlab. Hands down. You want to test a new research methodologies fast. You don't want to get bogged down by unneccessary programming constructs. Moreover, it's the scientist's job who do the prototyping (since he invents the algorithm, right?). Scientists are more familiar with Matlab than C++ or Java.
- Be aware that prototypes are ALWAYS poorly structured. More often than not, they're more like spaghetti code and/or copy and paste. Prototypes are just prototypes. They're just proof of concepts that a particular method works.
- Consequently, you may want a cadre of C++/Java developers to do the structuring. It's more like 4 low level developers per 1 scientist who does the prototyping. Often times the scientists don't know low level languages well. So, it's the C++/Java developers' job to figure out the scientist's program. Of course, the scientist would have to explain how the algorithm / source code works. On the contrary, reading MatLab / R is NOT as hard as what people says here (they must be smoking cracks. Don't listen to them). The only trouble is to familiarize yourself with Matlab / R API, which can be cryptic for some.
- With respect to libraries, Matlab / R have a throng of ready made scientific libraries. Of course, for C++ you can use GSL, LAPACK, ATLAS, etc. But the problem is that sometimes the library call does NOT correspond one-to-one. So, you'll need to tinker around to find out how each library call behaves. Moreover, for some operations, like Matrix / Vector operations, are very simple in Matlab / R, since matrices / vectors can be treated as if they were scalars. Be careful in porting those.
- In addition, you'll need to profile the library call. Make sure you actually GAIN speed with such library calls (or else you'll need to use something else). In addition, speed is NOT the only concern. Accuracy is VERY important. You don't want to use a speedy library with expense of accuracy. In scientific programs, this tradeoff is OFTEN NOT desirable. Make SURE the libraries have the same accuracy level. This is often the grounds to dismiss some unknown library who only claims that they're fast. Losing one degree of accuracy is often a BIG issue in scientific library. For example: If a library is at least 10^-16 accurate may not be acceptable as opposed to a library with an accuracy of at least 10^-17. Think of simulations, which may have millions upon millions of iterations. One degree difference in accuracy often makes a HUGE difference in the final result. Therefore, it is OFTEN more desirable to obtain an open source library where you can inspect the algorithm and point out places where a library call may lose accuracy.
- Familiarize yourselves with many scientific algorithms that improves accuracy. In Matlab or R, they autodetect pathological situations where accuracy can lose. You'll need to do that manually in C++. For example: If you have a nearly singular matrix, you'll need SVD for better accuracy. In general, you can use QR decomposition. If the accuracy is not really an issue, LU decomposition might be enough. In Matlab / R, it can detect the matrix automatically when you try to invert the matrix and use the appropriate algorithm. Pay attention to that. Make sure your C++ program also behaves similarly.
- You MUST make A LOT of regression tests. If it is possible, make the prototype run with the same file format as in the final product. If it isn't possible, convert the file format first and then confirm with the scientist that both of them are exactly the same. Make sure that for all tests, both returns the same numbers
-
Similar SituationI've been in similar situation, except that we chose Java as opposed to C++ for the "lower" level language. For the higher level langauge, we have both Matlab and R. We're also dealing with research situation. Here's my experience...
- Prototypes are best done in higher level language, in this case, Matlab. Hands down. You want to test a new research methodologies fast. You don't want to get bogged down by unneccessary programming constructs. Moreover, it's the scientist's job who do the prototyping (since he invents the algorithm, right?). Scientists are more familiar with Matlab than C++ or Java.
- Be aware that prototypes are ALWAYS poorly structured. More often than not, they're more like spaghetti code and/or copy and paste. Prototypes are just prototypes. They're just proof of concepts that a particular method works.
- Consequently, you may want a cadre of C++/Java developers to do the structuring. It's more like 4 low level developers per 1 scientist who does the prototyping. Often times the scientists don't know low level languages well. So, it's the C++/Java developers' job to figure out the scientist's program. Of course, the scientist would have to explain how the algorithm / source code works. On the contrary, reading MatLab / R is NOT as hard as what people says here (they must be smoking cracks. Don't listen to them). The only trouble is to familiarize yourself with Matlab / R API, which can be cryptic for some.
- With respect to libraries, Matlab / R have a throng of ready made scientific libraries. Of course, for C++ you can use GSL, LAPACK, ATLAS, etc. But the problem is that sometimes the library call does NOT correspond one-to-one. So, you'll need to tinker around to find out how each library call behaves. Moreover, for some operations, like Matrix / Vector operations, are very simple in Matlab / R, since matrices / vectors can be treated as if they were scalars. Be careful in porting those.
- In addition, you'll need to profile the library call. Make sure you actually GAIN speed with such library calls (or else you'll need to use something else). In addition, speed is NOT the only concern. Accuracy is VERY important. You don't want to use a speedy library with expense of accuracy. In scientific programs, this tradeoff is OFTEN NOT desirable. Make SURE the libraries have the same accuracy level. This is often the grounds to dismiss some unknown library who only claims that they're fast. Losing one degree of accuracy is often a BIG issue in scientific library. For example: If a library is at least 10^-16 accurate may not be acceptable as opposed to a library with an accuracy of at least 10^-17. Think of simulations, which may have millions upon millions of iterations. One degree difference in accuracy often makes a HUGE difference in the final result. Therefore, it is OFTEN more desirable to obtain an open source library where you can inspect the algorithm and point out places where a library call may lose accuracy.
- Familiarize yourselves with many scientific algorithms that improves accuracy. In Matlab or R, they autodetect pathological situations where accuracy can lose. You'll need to do that manually in C++. For example: If you have a nearly singular matrix, you'll need SVD for better accuracy. In general, you can use QR decomposition. If the accuracy is not really an issue, LU decomposition might be enough. In Matlab / R, it can detect the matrix automatically when you try to invert the matrix and use the appropriate algorithm. Pay attention to that. Make sure your C++ program also behaves similarly.
- You MUST make A LOT of regression tests. If it is possible, make the prototype run with the same file format as in the final product. If it isn't possible, convert the file format first and then confirm with the scientist that both of them are exactly the same. Make sure that for all tests, both returns the same numbers
-
No, this is why we have subroutine librariesAlthough I agree with your point that crafting optimised assembly language routines is way beyond most users (and indeed a waste of time for all but an expert) there are certain "standard operations" that
(a) lend themselves extremely well to optimisation
(b) lend themselves extremely well to incorporation in subroutine libraries
(c) tend to isolate the most compute-intensive low-level operations used in scientific computation
SGEMM
If you read the article, you will find (among others) a reference to a operation called "SGEMM". This stands for Single precision General Matrix Multiplication. This is the sort of routines that make up the BLAS library (Basic Linear Algebra Subprograms) (see e.g. http://www.netlib.org/blas/). High performance computation typically starts with creating optimised implementation of the BLAS routines (if necessary handcoded at assembler level), sparse-matrix equivalents of them, Fast Fourier routines, and the LAPACK library.
ATLAS
There is a general movement away from optimised assembly language coding for the BLAS, as embodied in the ATLAS software package (Automatically Tuned Linear Algebra Software; see e.g. http://math-atlas.sourceforge.net/). The ATLAS package provides the BLAS routines but produces fairly optimal code on any machine using nothing but ordinary compilers. How? If you run a makefile for the ATLAS package, it may take about 12 hours (depending on your computer of course; this is a typical number for a PC) or so to compile. In this time the makefile will simply run through multiple switches and for the BLAS routines and run testsuites for all its routines for varying problem sizes. And then it picks the best possible combination of switches for each routine and each problem size for the machine architecture on which it's being run. In particular it takes account of the size of caches. That's why it produces much faster subroutine libraries than those produced by simply compiling e.g. the BLAS routines with an -O3 optimisation switch thrown in.
Specially tuned versus automatic?: MATLAB
The question is of course: who wins? Specially tuned code or automatic optimisation? This can be illustrated with the example of the well-known MATLAB package. Perhaps you have used MATLAB on PC's, and wondered why its matrix and vector operations are so fast? That's because for Intel and AMD processors it uses a specially (vendor-optimised) subroutine library (see http://www.mathworks.com/access/helpdesk/help/tech doc/rn/r14sp1_v7_0_1_math.html) For SUN machines, it uses SUN's optimised subroutine library. For other processors (for which there are no optimised libraries) Matlab uses the ATLAS routines. Despite the great progress and portability that the ATLAS library provides, carefully optimised libraries can still beat it (see the Intel Math Kernel Library at http://www.intel.com/cd/software/products/asmo-na/ eng/266858.htm)
Summary
In summary:
-large tracts of Scientific computation depend on optimised subroutine libraries
-hand-crafted assembly-language optimisation can still outperform machine-optimised code.
Therefore the objections that the hand-crafted routines described in the article distort the comparison or are not representative of real-world performance are invalid.
However
... it's so expensive and difficult that you only ever want to do it if you absolutely must. For scientific computation this typically means that you only consider handcrafting "inner loop primitives" such as the BLAS routines, FFT's, SPARSEPACK routines etc. for this treatment, and that you just don't attempt to do that yourself. -
Re:Why bother?
Why not just use lapack (http://www.netlib.org/lapack/) or essl(http://scv.bu.edu/SCV/IBMSP/ESSL-howto.html) or gsl (http://www.gnu.org/software/gsl/) ? Read Numerical recipes if you want to understand what the code is doing.