sgi _may_ have several gigaflops/node next year? puh-leez! g4s have been doing this since _last_ year!
Maybe, maybe not. The G4's greatly hyped AltiVec vector unit can only do 32-bit floating point arithmetic. "Real" supercomputer sites tend to care much more about 64-bit floating point, and the G4's peak on that is one op per clock cycle (i.e. 400 MHz G4 peaks at 400 MFLOPs 64-bit). On the other hand, the Itanium can issue two 64-bit FP instructions every cycle, and both can be multiply/adds -- you can get up to 4 FP ops every cycle (i.e. 800 MHz Itanium peaks at 3.2 GFLOPS 4-bit). There's also the question of memory bandwidth; the G4 uses the same crappy PC100 memory system that current IA32 boxes use: 800 MB/s, 300-350MB/s sustained.
Disclaimer: I work at a "real" supercomputer site that has prerelease Itanium systems, and my code was one of the ones shown in the SGI booth. I'm not allowed to talk about Itanium performance numbers yet, unfortunately.
This is very useful for production systems with very large amounts of memory. For instance, Cray systems have a capability where bad bits in memory can be "flawed out" on the fly. Extending Linux to support the same kind of thing (especially in combination with ECC memory!) would very useful for shops that have big memory requirements and need as many 9s of uptime as they can get.
Do 'they' ever do anything useful with these things? It's fun drooling, but I personally think something with this much power would be well-suited to intense graphics work or something equally resource-consuming. So far all i've heard is done with supercomputers is number-crunching.
That's because that's what supercomputers do -- crunch numbers. Mostly big physics, chemistry, and engineering problems. Where I work (a supercomputer center), that IS what's considered useful. The target audience for this thing is NSF-funded scientific researchers, and they (NSF) didn't pay $36M+ for a really nice Quake server...
That's the only way to know for sure which one's faster.
In my experience, SGI has a better HPC software development environment (read: Fortran compiler) than HP, but then again I haven't touched an HP system since we decommissioned our Exemplar 2.5 years ago.
Does anyone know if this system will support OPENMP?
Probably. The building blocks are ES-40s, which are 4-way SMP systems. The individual ES-40s are connected via Quadrics, which is a fairly fast (and *very* expensive) network fabric. One way you could write a parallel application for such a system is to break your problem up across boxes with MPI, and then use OpenMP to parallelize the loop structures within the program running on each box. I've written a couple codes this way, and it's not really any harder than doing pure MPI.
OTOH, you wouldn't be able to use more than 4 processors on it using just OpenMP (unless Quadrics does some funky shared-memory-between-boxes stuff I don't know about). To get larger processor counts for a purely OpenMP application, you'd need a large SMP or ccNUMA system like a Compaq GS320 (up to 32 CPUs), a Sun UE10k (up to 64 CPUs), or an SGI Origin (up to 512 CPUs).
gnuplot is insanely powerful and flexible, because it's scriptable and supports a ton of output formats. I don't know if it can do the shaded area between two curves bit, though. It also has some limited support for 3D plots, although if you're serious about 3D you should really look at IBM's OpenDX.
grace is also a good choice if you like GUI plotting tools, but I'm so used to gnuplot that grace seems awkward...
The O2s with R10ks and R12ks are stuck as 32-bit machines because of the way the memory system was designed, according to a friend of mine who used to work for SGI as a field service critter. Apparently the O2 was originally designed for the R5k, and somebody in marketing decided it was a good idea to shoehorn the R10k in there... R10k O2s were sometimes slower than R5k O2s at the same clock!
I hadn't realized until I checked the SGI website today that an R12k-based O2 was ever made... I thought they'd been discontinued when the Intel-based VWs came out. The O2 kinda sucks as a 3D graphics machine, too; it was designed as a media editing workstation for audio and video, and the only hardware 3D support it has is Z-buffering...
Apple PowerMac with PowerPC G4 (with AltiVec extensions) would have been the optimum choice.
Oh really? Care to cite some real world benchmarks that show that?
The G4 and its vector unit are cute and all, but there's two big problems with using them for HPC/supercomputing applications:
Memory bandwidth: Memory bandwidth is probably the most important thing for single processor performance on scientific applications; a fast processor is useless if you can't keep it fed with data. The G4s use standard PC100 memory, which means that they have a theoretical peak memory bandwidth of 800 MB/s and sustained (measured) memory bandwidth in the 300-350 MB/s range. The 21264-based Alpha systems I've seen have sustained memory bandwidth in excess of 1 GB/s, which is a big part of why they scream for number crunching.
Compiler support: Scientific applications are generally written in Fortran; don't bother whining that Fortran's a crappy language, because it's not for number crunching apps (and AFAIK FSL's main application is a big Fortran code anyway). As far as I know nobody makes a Fortran compiler that can take loops and convert them directly into AltiVec instructions -- and without that AltiVec unit, a PowerPC's floating point performance is comparable to an Intel Pentium III at the same clock (i.e. not that great).
The G4 could be a great scientific platform... if these two problems get fixed. Till then it's an also-ran.
Greg pointed to the PBS batch system which has a liberal license, but unfortunately requires user registeration before downloading.
I'd just like to point out that there is an excellant GNU licensed queuing system that I've used in the past called Generic NQS. It certainly is worth a go if you're building clusters. Having said that, I'd like to see a product comparison of the various versions of queuing systems.
I looked at Generic NQS briefly when we started working on our cluster at OSC. My understanding of GNQS was that it did not deal well with multiple execution nodes and parallel jobs, whereas PBS does a pretty good job of this. You can also purchase a PBS support contract from MRJ, which is a big plus in a production HPC environment.
(a) A static compiler must compile for a lowest common denominator platform (typicly Pentium). A JIT can determine the processor type at run-time and produce processor-tuned code.
That depends largely on your environment and how widely you plan to disseminate binaries. Where I work, we regularly compile codes where we need good performance with the highest possible optimization targeted to the specific CPU type on the machine, but we're hardly a typical shop.
Simple fact. For raw computation that doesn't incurr array bounds checking, Java is as fast or faster today then C/C++. (And static compiling doesn't help you with the bounds checking penalty.)
That's a pretty bold assertion. What kind of "raw computation" applications were these: number crunching, DB access, GUI desktop apps, what? Also, what architecture or architectures were these comparisons done on? If you're talking number crunching, will it outperform a good optimizing Fortran 77 or Fortran 90 compiler? (Note: Neither g77 nor the Sun Fortran compilers constitute a "good optimizing Fortran compiler". I'm talking about something like the Portland Group's Fortran compilers for IA32 or the SGI MIPSpro Fortran compilers for MIPS. Compaq's Visual Fortran may or may not qualify; I've never used it.)
The reason I ask is, I work in high performance computing, and I deal with trying to optimize codes all the time. I find it difficult to believe that JIT compiling is the panacea you describe it as, but if you've got numbers to back it up I'd love to see 'em, if for no other reason than to dissuade our physicist users from trying to program in C++...
No mention of the IBM mainframes of the 60s and 70s, the DEC PDP and VAX series, Seymour Cray and his supercomputers, or the workstation explosion of the 80s. This seemed very focused on PCs to the exclusion of everything else. That's kind of sad, really; there's a hell of a lot more to computing than PCs.
In this case, you can reverse engineer a piece of proprietary, licensed software whose license terms prohibit that activity, and you can make your own DVD player. But you don't necessarily have that right under our current set of laws and are technically in violation of the license agreement and probably several intellectual property laws as well.
Show me the clause is a DVD's license which says you have to use a licensed player, as I have yet to see one. (I'm not trying to be argumentative here; I honestly haven't seen one.) If the license doesn't require this, how can the MPAA and the DVD CCA require it and have a legal leg to stand on?
AMD processors with SMp Linux, what a joke. Can you say PowerPC 7400 G4's with OS 10 or another UNIX variant.
If the version of OS X Server I saw last spring is any indication, OS 10 is a total non-competitor. It had serious problems even compiling fairly generic ANSI C code (lmbench, MPICH).
And the G4 is not all it's cracked up to be. There's not enough memory bandwidth on the PC100 bus to sustain anything close to the FP rates Motorola and Apple like to point at. There are also no vectorizing compilers for the PPC 7400; the Metrowerks compiler will do inline AltaVec assembler, but it doesn't recognize vectoizable loops autmoatically and it doesn't support the linga franca of scientific computing (i.e. Fortran).
(Disclaimer: I work for Ohio Supercomputer Center but don't speak for them, yada yada yada...)
This seems to be aimed more at the high availability (HA) market than the high performance computing (HPC) market. Comparing with a Compaq Himilaya is *not* a way to win points with HPC centers, because HPC centers don't buy Himilayas -- they buy mostly various breeds of Crays, SGI Origins, and IBM SPs, with a smattering of Beowulf clusters and large Sun configurations as well. The Patmos site also doesn't talk about floating point performance, which the HPC centers consider critical.
The Patmos site never really describes their systems as "supercomputers" (although the phrase "super system is used once or twice), so this seems like bad reporting and/or a misunderstanding of what a supercomputer really is on CNN's part.
(In case you're wondering what I consider a supercomputer, I personally think a super is anything capable of multiple GFLOPS that is used for scientific computations.)
Often, when pondering for no reason, I wonder how many "state of the art" programming techniques that have existed in CS (fuzzy logic, neural networks, hello world in every language) have been utilized by the meteorlogical sciences people?
Probably not many. This is not necessarily a bad thing; many "state of the art" programming techniques result in lousy performance, and part of the point of the weather simulations is to get things out as quickly as possible.
I'm not knocking their abilities for development of software to solve their problems, but I always come to a single issue: they are primarily meteorologists who have learned to program as opposed to program and who's primary focus is meteorology. What if they had an influx of people who's background is entirely programming? People who program because they focus on programming.
The problem with this is that people "who program because they focus on programming" usually don't have much background in the underlying physics that the weather models simulate. These weather models are essentially fluid dynamics simulations: big, nasty, coupled sets of nonlinear partial differential equations that have been approximated in some way to make them solvable. Most of the CS folks I know simply don't know enough about either the physics or the math needed to approximate the physics -- it's not something they're normally exposed to.
These models are typically written in Fortran -- not because the meteorology people are computing troglodytes, but because Fortran is still the best option for scientific computing. The issues for generating optimized Fortran code are very well understood; C and C++ are much more difficult because of all the pointer chasing. There's also a huge body of scientific libraries for Fortran that C and C++ simply don't have by virtue of not being around as long.
Now, it looks like I'm bashing CS people. I'm not, and in fact there is room for a lot of work from CS folks on front-end integration stuff. Here's what I mean: There are on the order of a half dozen model codes the NWS uses for forecasting. Each one generates Lord only knows how much data per run. Correlating all this data and presenting it in a cogent, easily understandable format (for the expert, not necessarily the layman) is something the scientific computing community in general really needs the CS folks for. Another thing CS faculty could do for the scientific computing community is teach more about code optimization methods for modern cached-based architectures (taking advantage of data locality for cache reuse, minimizing memory references, etc.). These topics usually aren't even touched upon in a CS curriculum except possibly a graduate level high performance computing class, and they really should be discussed in the introductory computer architecture classes.
I imagine a BSD variant would be best - still open source, but the TCP/IP stack is faster, so you'd probably lose less in inter-processor communication.
If you're running a private gigabit-class network (GigE, Myrinet, Giganet, etc.) and have a separate control network (typically Fast Ethernet), there's no reason to run TCP/IP over the high-speed network. In tht case, you could bypass the TCP/IP stack entirely and have the message passing system (typically an MPI implementation) talk directly to the hardware -- the "user space"/"OS bypass" approach. This is what Myricom's GM and the various VIA implementations let you do. Most of the larger Beowulf cluster installations are going with something like this.
I must admit that I find it very surprising that they're going to the trouble of buying fast DEC Alphas and then connecting them with something as pokey as Fast Ethernet. I hope their RMHD and other calculations are pretty close to embarassingly parallel (i.e. almost no IPC), or the network will definitely end up being a performance bottleneck.
But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories.
I seriously doubt that. To use the AltiVec part of the G4 (which is what gives its absurdly high peak performance), you need to be either hand-writing PPC/AltiVec assembly code or using a vectorizing PPC/AltiVec compiler, and I have not heard of *any* of the latter. Also, the memory system on the G4 isn't much (if any) better than that on a standard Pentium III, which frankly sucks (~300MB/s). A Beowulf cluster comprised of Alphas with a Myrinet network will likely wipe the walls with a similarly sized G4 cluster with Gigabit Ethernet, and will cost about as much -- large GigE switches are expensive.
DMF is an SGI/Cray product that does transparent file migration between disk and tape. We make pretty extensive use of it on our mass storage server (8 CPU Origin 2000 + ~1TB FC RAID + ~40TB tapes in an IBM 3494 robot).
Unfortunately, I don't think there's anything like that just yet for Linux. SGI's OpenVault may do some of the things you want, but it's mostly the device communication layer and not a complete solution.
I'd like to be able to turn off the starting of a new window from a hyperlink (i.e. ignoring the "target=new" attribute to the HTML tag). If I want to open something in a new window, I'll do it myself, thanks.
I'd like to be forewarned when a page with JavaScript is "mined" with something that fires off multiple top-level windows when you back out of or close the page.
(I'd like to see the "Open Link in This Window" option that was in the right-click menu in Netscape 3 return, as well. I used that a lot, and it was removed in Netscape 4 for reasons I've never understood.)
A supercomputer doesn't have what you'd consider an operating system. It's a front-end computer that does all I/O, provides the usual operating system services, and controls the supercomputer. Linux is perfectly practical for the front-end. It would be nice to see a Linux in there.
Bruce, what are you talking about? I suspect you haven't seen a Cray machine for quite some time.
Most of the current Cray vector machines (like our T90) have their own OS (UNICOS) and IO subsystem. The IO's in a physicially separate box, like in a classical mainframe, but it's still part of the machine. There are generally a couple workstations (usually Suns) attached directly to the machine, but those are system consoles and monitoring stations. A Linux box might be appropriate for one of these monitoring stations, but that's about it. And if you think a Linux machine could handle the I/O that a Cray's capable of, you're insane. We're talking multiple GB/s.
The idea for the SV2 (as it was explained to us at a Cray User Group workshop last fall) is that it piggybacks on an MIPS-based SN1 (next generation SGI Origin ccNUMA machine). That implies IRIX (with features ported from UNICOS), not Linux. I doubt Linux on Intel or MIPS will be ready for the kind of prime-time SGI's going to be selling the SV2 for by the time they ship.
(Before you flame me for putting down Linux in this particular context, consider the following: I've been using Linux as my primary OS at home since '93, and I'm one of the guys working on the Beowulf cluster of SGI 1400Ls at OSC. I'm rooting for Linux too, but it's not always the right answer.)
Wow great, so you managed to cut and snip parts of my post without actually telling me what operating systems Linux has killed.
Coherent, for one. Possibly SGI's planned-at-one-point IRIX/IA64 as well. SCO's various offerings and Solaris/x86 aren't doing so hot either, at least in the area I'm in. I'll be surprised if SCO is still in business at the end of 2001.
you can get up to 4 FP ops every cycle (i.e. 800 MHz Itanium peaks at 3.2 GFLOPS 4-bit)
That should read "64-bit", not "4-bit". We talking about the Itanium and not Star Bridge's FPGA-based machine, after all. :)
sgi _may_ have several gigaflops/node next year? puh-leez! g4s have been doing this since _last_ year!
Maybe, maybe not. The G4's greatly hyped AltiVec vector unit can only do 32-bit floating point arithmetic. "Real" supercomputer sites tend to care much more about 64-bit floating point, and the G4's peak on that is one op per clock cycle (i.e. 400 MHz G4 peaks at 400 MFLOPs 64-bit). On the other hand, the Itanium can issue two 64-bit FP instructions every cycle, and both can be multiply/adds -- you can get up to 4 FP ops every cycle (i.e. 800 MHz Itanium peaks at 3.2 GFLOPS 4-bit). There's also the question of memory bandwidth; the G4 uses the same crappy PC100 memory system that current IA32 boxes use: 800 MB/s, 300-350MB/s sustained.
Disclaimer: I work at a "real" supercomputer site that has prerelease Itanium systems, and my code was one of the ones shown in the SGI booth. I'm not allowed to talk about Itanium performance numbers yet, unfortunately.
This is very useful for production systems with very large amounts of memory. For instance, Cray systems have a capability where bad bits in memory can be "flawed out" on the fly. Extending Linux to support the same kind of thing (especially in combination with ECC memory!) would very useful for shops that have big memory requirements and need as many 9s of uptime as they can get.
--Troy
Do 'they' ever do anything useful with these things? It's fun drooling, but I personally think something with this much power would be well-suited to intense graphics work or something equally resource-consuming. So far all i've heard is done with supercomputers is number-crunching.
That's because that's what supercomputers do -- crunch numbers. Mostly big physics, chemistry, and engineering problems. Where I work (a supercomputer center), that IS what's considered useful. The target audience for this thing is NSF-funded scientific researchers, and they (NSF) didn't pay $36M+ for a really nice Quake server...
That's the only way to know for sure which one's faster.
In my experience, SGI has a better HPC software development environment (read: Fortran compiler) than HP, but then again I haven't touched an HP system since we decommissioned our Exemplar 2.5 years ago.
--Troy
Does anyone know if this system will support OPENMP?
Probably. The building blocks are ES-40s, which are 4-way SMP systems. The individual ES-40s are connected via Quadrics, which is a fairly fast (and *very* expensive) network fabric. One way you could write a parallel application for such a system is to break your problem up across boxes with MPI, and then use OpenMP to parallelize the loop structures within the program running on each box. I've written a couple codes this way, and it's not really any harder than doing pure MPI.
OTOH, you wouldn't be able to use more than 4 processors on it using just OpenMP (unless Quadrics does some funky shared-memory-between-boxes stuff I don't know about). To get larger processor counts for a purely OpenMP application, you'd need a large SMP or ccNUMA system like a Compaq GS320 (up to 32 CPUs), a Sun UE10k (up to 64 CPUs), or an SGI Origin (up to 512 CPUs).
gnuplot is insanely powerful and flexible, because it's scriptable and supports a ton of output formats. I don't know if it can do the shaded area between two curves bit, though. It also has some limited support for 3D plots, although if you're serious about 3D you should really look at IBM's OpenDX.
grace is also a good choice if you like GUI plotting tools, but I'm so used to gnuplot that grace seems awkward...
The O2s with R10ks and R12ks are stuck as 32-bit machines because of the way the memory system was designed, according to a friend of mine who used to work for SGI as a field service critter. Apparently the O2 was originally designed for the R5k, and somebody in marketing decided it was a good idea to shoehorn the R10k in there... R10k O2s were sometimes slower than R5k O2s at the same clock!
I hadn't realized until I checked the SGI website today that an R12k-based O2 was ever made... I thought they'd been discontinued when the Intel-based VWs came out. The O2 kinda sucks as a 3D graphics machine, too; it was designed as a media editing workstation for audio and video, and the only hardware 3D support it has is Z-buffering...
Apple PowerMac with PowerPC G4 (with AltiVec extensions) would have been the optimum choice.
Oh really? Care to cite some real world benchmarks that show that?
The G4 and its vector unit are cute and all, but there's two big problems with using them for HPC/supercomputing applications:
The G4 could be a great scientific platform... if these two problems get fixed. Till then it's an also-ran.
Greg pointed to the PBS batch system which has a liberal license, but unfortunately requires user registeration before downloading.
I'd just like to point out that there is an excellant GNU licensed queuing system that I've used in the past called Generic NQS. It certainly is worth a go if you're building clusters. Having said that, I'd like to see a product comparison of the various versions of queuing systems.
I looked at Generic NQS briefly when we started working on our cluster at OSC. My understanding of GNQS was that it did not deal well with multiple execution nodes and parallel jobs, whereas PBS does a pretty good job of this. You can also purchase a PBS support contract from MRJ, which is a big plus in a production HPC environment.
(a) A static compiler must compile for a lowest common denominator platform (typicly Pentium). A JIT can determine the processor type at run-time and produce processor-tuned code.
That depends largely on your environment and how widely you plan to disseminate binaries. Where I work, we regularly compile codes where we need good performance with the highest possible optimization targeted to the specific CPU type on the machine, but we're hardly a typical shop.
Simple fact. For raw computation that doesn't incurr array bounds checking, Java is as fast or faster today then C/C++. (And static compiling doesn't help you with the bounds checking penalty.)
That's a pretty bold assertion. What kind of "raw computation" applications were these: number crunching, DB access, GUI desktop apps, what? Also, what architecture or architectures were these comparisons done on? If you're talking number crunching, will it outperform a good optimizing Fortran 77 or Fortran 90 compiler? (Note: Neither g77 nor the Sun Fortran compilers constitute a "good optimizing Fortran compiler". I'm talking about something like the Portland Group's Fortran compilers for IA32 or the SGI MIPSpro Fortran compilers for MIPS. Compaq's Visual Fortran may or may not qualify; I've never used it.)
The reason I ask is, I work in high performance computing, and I deal with trying to optimize codes all the time. I find it difficult to believe that JIT compiling is the panacea you describe it as, but if you've got numbers to back it up I'd love to see 'em, if for no other reason than to dissuade our physicist users from trying to program in C++...
No mention of the IBM mainframes of the 60s and 70s, the DEC PDP and VAX series, Seymour Cray and his supercomputers, or the workstation explosion of the 80s. This seemed very focused on PCs to the exclusion of everything else. That's kind of sad, really; there's a hell of a lot more to computing than PCs.
In this case, you can reverse engineer a piece of proprietary, licensed software whose license terms prohibit that activity, and you can make your own DVD player. But you don't necessarily have that right under our current set of laws and are technically in violation of the license agreement and probably several intellectual property laws as well.
Show me the clause is a DVD's license which says you have to use a licensed player, as I have yet to see one. (I'm not trying to be argumentative here; I honestly haven't seen one.) If the license doesn't require this, how can the MPAA and the DVD CCA require it and have a legal leg to stand on?
...and it only took them, what? A year more than the rest of the industry?
--Troy
AMD processors with SMp Linux, what a joke. Can you say PowerPC 7400 G4's with OS 10 or another UNIX variant.
If the version of OS X Server I saw last spring is any indication, OS 10 is a total non-competitor. It had serious problems even compiling fairly generic ANSI C code (lmbench, MPICH).
And the G4 is not all it's cracked up to be. There's not enough memory bandwidth on the PC100 bus to sustain anything close to the FP rates Motorola and Apple like to point at. There are also no vectorizing compilers for the PPC 7400; the Metrowerks compiler will do inline AltaVec assembler, but it doesn't recognize vectoizable loops autmoatically and it doesn't support the linga franca of scientific computing (i.e. Fortran).
(Disclaimer: I work for Ohio Supercomputer Center but don't speak for them, yada yada yada...)
This seems to be aimed more at the high availability (HA) market than the high performance computing (HPC) market. Comparing with a Compaq Himilaya is *not* a way to win points with HPC centers, because HPC centers don't buy Himilayas -- they buy mostly various breeds of Crays, SGI Origins, and IBM SPs, with a smattering of Beowulf clusters and large Sun configurations as well. The Patmos site also doesn't talk about floating point performance, which the HPC centers consider critical.
The Patmos site never really describes their systems as "supercomputers" (although the phrase "super system is used once or twice), so this seems like bad reporting and/or a misunderstanding of what a supercomputer really is on CNN's part.
(In case you're wondering what I consider a supercomputer, I personally think a super is anything capable of multiple GFLOPS that is used for scientific computations.)
Often, when pondering for no reason, I wonder how many "state of the art" programming techniques that have existed in CS (fuzzy logic, neural networks, hello world in every language) have been utilized by the meteorlogical sciences people?
Probably not many. This is not necessarily a bad thing; many "state of the art" programming techniques result in lousy performance, and part of the point of the weather simulations is to get things out as quickly as possible.
I'm not knocking their abilities for development of software to solve their problems, but I always come to a single issue: they are primarily meteorologists who have learned to program as opposed to program and who's primary focus is meteorology. What if they had an influx of people who's background is entirely programming? People who program because they focus on programming.
The problem with this is that people "who program because they focus on programming" usually don't have much background in the underlying physics that the weather models simulate. These weather models are essentially fluid dynamics simulations: big, nasty, coupled sets of nonlinear partial differential equations that have been approximated in some way to make them solvable. Most of the CS folks I know simply don't know enough about either the physics or the math needed to approximate the physics -- it's not something they're normally exposed to.
These models are typically written in Fortran -- not because the meteorology people are computing troglodytes, but because Fortran is still the best option for scientific computing. The issues for generating optimized Fortran code are very well understood; C and C++ are much more difficult because of all the pointer chasing. There's also a huge body of scientific libraries for Fortran that C and C++ simply don't have by virtue of not being around as long.
Now, it looks like I'm bashing CS people. I'm not, and in fact there is room for a lot of work from CS folks on front-end integration stuff. Here's what I mean: There are on the order of a half dozen model codes the NWS uses for forecasting. Each one generates Lord only knows how much data per run. Correlating all this data and presenting it in a cogent, easily understandable format (for the expert, not necessarily the layman) is something the scientific computing community in general really needs the CS folks for. Another thing CS faculty could do for the scientific computing community is teach more about code optimization methods for modern cached-based architectures (taking advantage of data locality for cache reuse, minimizing memory references, etc.). These topics usually aren't even touched upon in a CS curriculum except possibly a graduate level high performance computing class, and they really should be discussed in the introductory computer architecture classes.
I imagine a BSD variant would be best - still open source, but the TCP/IP stack is faster, so you'd probably lose less in inter-processor communication.
If you're running a private gigabit-class network (GigE, Myrinet, Giganet, etc.) and have a separate control network (typically Fast Ethernet), there's no reason to run TCP/IP over the high-speed network. In tht case, you could bypass the TCP/IP stack entirely and have the message passing system (typically an MPI implementation) talk directly to the hardware -- the "user space"/"OS bypass" approach. This is what Myricom's GM and the various VIA implementations let you do. Most of the larger Beowulf cluster installations are going with something like this.
I must admit that I find it very surprising that they're going to the trouble of buying fast DEC Alphas and then connecting them with something as pokey as Fast Ethernet. I hope their RMHD and other calculations are pretty close to embarassingly parallel (i.e. almost no IPC), or the network will definitely end up being a performance bottleneck.
But, I would expect that the performance from a 8 or 16 box cluster of G4's with Gigibit Ethernet would pretty much blow away a beowulf cluster in both the performance and price categories.
I seriously doubt that. To use the AltiVec part of the G4 (which is what gives its absurdly high peak performance), you need to be either hand-writing PPC/AltiVec assembly code or using a vectorizing PPC/AltiVec compiler, and I have not heard of *any* of the latter. Also, the memory system on the G4 isn't much (if any) better than that on a standard Pentium III, which frankly sucks (~300MB/s). A Beowulf cluster comprised of Alphas with a Myrinet network will likely wipe the walls with a similarly sized G4 cluster with Gigabit Ethernet, and will cost about as much -- large GigE switches are expensive.
8 DS10 1Us (@$3k) + 8 Myrinet cards (@$1.4k) + 1 16-port Myrinet switch (@$4k) = $39.2k
8 G4s (@$2.5k) + 8 Gigabit Ethernet cards (@$0.7) + 1 8-port Gigabit Ethernet switch (@$15k) = $40.6k
So where's the Linpack #'s???
We're workin' on it. There should be something official announced at SC99 nect week.
DMF is an SGI/Cray product that does transparent file migration between disk and tape. We make pretty extensive use of it on our mass storage server (8 CPU Origin 2000 + ~1TB FC RAID + ~40TB tapes in an IBM 3494 robot).
Unfortunately, I don't think there's anything like that just yet for Linux. SGI's OpenVault may do some of the things you want, but it's mostly the device communication layer and not a complete solution.
I'd like to be able to turn off the starting of a new window from a hyperlink (i.e. ignoring the "target=new" attribute to the HTML tag). If I want to open something in a new window, I'll do it myself, thanks.
I'd like to be forewarned when a page with JavaScript is "mined" with something that fires off multiple top-level windows when you back out of or close the page.
(I'd like to see the "Open Link in This Window" option that was in the right-click menu in Netscape 3 return, as well. I used that a lot, and it was removed in Netscape 4 for reasons I've never understood.)
Not too many computer science people program in it, but it's still *very* popular in the scientific computing community. --Troy
A supercomputer doesn't have what you'd consider an operating system. It's a front-end computer that does all I/O, provides the usual operating system services, and controls the supercomputer. Linux is perfectly practical for the front-end. It would be nice to see a Linux in there.
Bruce, what are you talking about? I suspect you haven't seen a Cray machine for quite some time.
Most of the current Cray vector machines (like our T90) have their own OS (UNICOS) and IO subsystem. The IO's in a physicially separate box, like in a classical mainframe, but it's still part of the machine. There are generally a couple workstations (usually Suns) attached directly to the machine, but those are system consoles and monitoring stations. A Linux box might be appropriate for one of these monitoring stations, but that's about it. And if you think a Linux machine could handle the I/O that a Cray's capable of, you're insane. We're talking multiple GB/s.
The idea for the SV2 (as it was explained to us at a Cray User Group workshop last fall) is that it piggybacks on an MIPS-based SN1 (next generation SGI Origin ccNUMA machine). That implies IRIX (with features ported from UNICOS), not Linux. I doubt Linux on Intel or MIPS will be ready for the kind of prime-time SGI's going to be selling the SV2 for by the time they ship.
(Before you flame me for putting down Linux in this particular context, consider the following: I've been using Linux as my primary OS at home since '93, and I'm one of the guys working on the Beowulf cluster of SGI 1400Ls at OSC. I'm rooting for Linux too, but it's not always the right answer.)
Wow great, so you managed to cut and snip parts of my post without actually telling me what operating systems Linux has killed.
Coherent, for one. Possibly SGI's planned-at-one-point IRIX/IA64 as well. SCO's various offerings and Solaris/x86 aren't doing so hot either, at least in the area I'm in. I'll be surprised if SCO is still in business at the end of 2001.