Domain: beowulf.org
Stories and comments across the archive that link to beowulf.org.
Stories · 9
-
10 Years of Beowulf Clustering
Quirk writes "Wired News has a blurb celebrating the 10th birthday of the Beowulf cluster. Attendees recalled the initial fear and loathing the Beowulf project had to overcome. The Beowulf project takes its name from an epic poem penned circa 1000 A.D." -
Open Source Distributed Shell Tools?
ColonelForbin74 asks: "While some may assume that most larger server clusters run advanced / custom software(i.e. Beowulf, cfengine, OSCAR), many of those stuck in the not-research-this-site-runs-production world know this simply isn't the case. Many people like myself are working with medium-to-large scale clusters with little help other than shell for() loops and some SSH trusted keys. What application-level tools are out there that might help SysAdmin / AppSupport types like myself run commands across a given cluster, push files out, etc? In my desperation to have some sort of tool in my toolbox, I've actually created one. However, I have a hard time believing this is the best thing out there, and would appreciate all the ideas and links I can get!" -
Imagine a Beowulf Cluster of Penguin Computers
Pii writes "News.com is running a story about Penguin Computing acquiring Scyld Computing, a company founded by Donald Becker, of linux ethernet driver and Beowulf cluster fame. Becker will stay on as Penguin's Chief Technology Officer, and the companies claim they don't expect any layoffs as a result of the merger." -
Linux Clusters Finally Break the TeraFLOP barrier
cworley submitted - several times - this well-linked submission about a slightly boring topic - fast computers. "Top500.org has just released its latest list of the world's fastest supercomputers (updated twice yearly). For the first time, Linux Beowulf clusters have joined the teraFLOP club, with six new clusters breaking the teraFLOP barrier. Two Linux clusters now rank in the Top 10: Lawrence Livermore's "MCR" (built by Linux NetworX ) ranks #5 achieving 5.694 teraFLOP/s, and Forecast Systems Laboratory's "Jet" (built by HPTi) ranks #8 reaching 3.337 TeraFLOP/s. Other Linux clusters surpassing the teraFLOP/s barrier include: LSU's "SuperMike" at #17 (from Atipa ), the University at Buffalo at #22 and Sandia National Lab at #32 (both from Dell ), an Itanium cluster for British Petroleum Houston at #42 (from HP ), and Argonne National Labs at #46 (from Linux NetworX ) reached just over the one teraFLOP/s mark with 361 processors. In the previous Top500 list compiled last June, the fastest Intel based Netfinity 1024 processor clusters from IBM were sub-teraFLOP/s and the University of Heidelberg's AMD based "HELICS" cluster (built by Megware ) held the top tux rank at #35 with 825 GFLOP/s." -
Ask Donald Becker
This is a "needs no introduction" introduction, because Donald Becker is one of the people who has been most influential in making GNU/Linux a usable operating system, and is also one of the "fathers" of Beowulf and commodity supercomputing clusters in general. Usual Slashdot interview rules apply, plus a special one for this interview only: "What if we made a Beowulf cluster of these?" is not an appropriate question. -
Update From Cray World
rchatterjee writes "Cray, the only mainstream recognizeable name in supercomputing, has been busy lately. Their totally new MTA-2 supercomputer design will use a UltraSPARC-III powered Sun Fire 6800 server to just feed the data to the MTA-2's processor. They're also refocussing on Vector Supercomputers and are going to release their first new vector supercomputer since Tera Computing bought them, the SV-2 in 2002. And if that wasn't enough they have a deal with API networks to develop Alpha processor based Beowulf clusters of Linux machines that as a cluster will run the same operating system as Cray's T3E supercomputers. Seymour Cray would be proud. You can get a quick overview of all the latest Cray developments from this article on Cnet." -
Answers About The New NOAA Massive Linux Cluster
On May 23 we requested questions for Greg Lindahl, chief designer of the new NOAA Forecast Systems Laboratories massive Alpha Linux Cluster. Here are his answers. Fascinating stuff for people interested in big-time parallel computing.Who Else?
(Score:4, Insightful)
by AlarmistYou've built a large cluster of machines on a relatively pea-sized budget.
Are other government agencies going to duplicate your work? Have they already? If so, for what purposes?
Greg:
There are a lot of government agencies building large clusters, such as the Department of Energy's Sandia National Lab, which has the 800+ processor CPlant cluster today, with another 1,400 processors on the way. Like FSL, they use their cluster for scientific computing. The well-known Beowulf clusters started within NASA, another U.S. government agency.
However, the Forecast Systems Lab (FSL) system is a bit different from these other clusters: it's intended to be a production-quality "turn key" supercomputer, and it contains all the things supercomputer users are used to, such as a huge robotic tape storage unit (70 terabytes of tapes), and a fast disk subsystem (a bandwidth of 200 megabytes/second.) The FSL system is also much more reliable than your average cluster -- in its first three months of operation, it was up 99.9% of the time. During that time we had quite a few hardware failures (due to a power supply problems), but no work was lost, because of our fault-tolerant software.
Beowulf in General
(Score:4, Interesting)
by BgJonson79How do you think the new wave of Beowulf clusters will affect all of supercomputing, not just forecasting?
Greg:
The kinds of problems that scientists solve have different computational needs. In the mid 1970's, the most cost effective machine to use for just about any problem was a Cray supercomputer. These days, desktop PCs far are cheaper per operation than the "big iron", so that's why this interest in clusters has sprung up. The availability of production-quality commodity clusters like the FSL machine is a new development in the field.
IBM already sells IBM's idea of a commodity cluster; it uses IBM's RS/6000 business servers as building blocks. I think commodity clusters can deliver far more bang for the buck as an IBM SP supercomputer, but then again I am a cluster evangelist.
In the beginning...
(Score:5, Interesting)
by zpengoHow did you come to be the project's chief designer? I'm curious to know the background of anyone who gets to work on such an interesting project.
Greg:
Well, let's see: I'm a dropout from an Astronomy PhD, and for fun I dress up in funny clothes (I'm the one in yellow) and play the hurdy gurdy. I've only taken one computer science class since I started college. I assure you that you're never going to meet anyone much like me in this field.
Seriously, I've worked in scientific computing for quite a while, and I've had a chance to work with a lot of people and learn from them. I was also helped quite a bit learning about distributed systems while working on IRC and later, the Legion distributed operating system. The art of designing a system like this is understanding the customer's needs, understanding what solutions are possible, and understanding what can actually be delivered, be made reliable, and hit the budget.
In addition, it's worth pointing out what this sort of project involves. Most of the interesting development parts are done by other people. Compaq designed the Alpha processor, and they and legions of Linux hackers provided Linux on the Alpha. Compaq supplied their extremely good compilers (FSL mostly uses Fortran.) Myricom supplied the interconnect and an MPI message-passing library which was optimized for their interconnect. HPTi provided the software glue that turned all this into a complete, fault-tolerant system. Without all these great building blocks, we would never have been able to produce this system.
The Future of the Control Software
(Score:5, Interesting)
by PacketMasterI built a Beowulf-style cluster this past semester in college for independent study. One of the biggest hurdles we had was picking out a message passing interface such as MPI or PVM. Configuring across multiple platforms was then even worse (we had a mixture of old Intels, SunSparcs and IBM RS/6000's). What do you see in the future for these interfaces in terms of setup and usage and will cross-platform clusters become easier to install and configure in the future?
Greg:
We provided an easy-to-use set of administrator tools so that the Forecast Systems Lab (FSL) cluster can be administered as if it were a single computer. This is a fairly difficult to do if you have a big mix of equipment, but the FSL system will never become that complex. There's already been a lot of development of programs for administering large clusters of machines; they just tend to not get used by other people. I'll admit that I'm part of that problem; I took some nice ideas from other people's tools, added some of my own, and re-invented the wheel slightly differently from everyone else.
Beowulf Alternatives?
(Score:5, Interesting)
by vvulfeBefore deciding on a Beowulf clusters, what different options did you explore (Cray? IBM?), and what motivated you to choose the Beowulf System?
Additionally, to what would you compare the system that you are planning to build, as far as computing power is concerned?
Greg:
The company I work for, HPTi, is actually a systems integrator, so we didn't decide to go out and build our own solution until we had checked out the competition and thought they didn't have the right answer. For the computational core of the system, Alpha and Myrinet were much more cost effective than the Cray SV-1, the IBM SP, and the SGI O2000. A more cost-effective machine gives the customer more bang for their buck.
I'd compare the system that we built to the IBM SP or the Cray T3E, as far as computing power is concerned. Both are mostly programmed using the same MPI programming model that FSL uses, which is the main programming model that we support on our clusters.
Biggest whack in the head?
(Score:5, Insightful)
by technosHaving built a few small ones, I got to know quite a bit about Linux clusters, and about programming for them. Therefore, this question has nothing to with clusters.
What was the biggest 'WTF was I thinking' on this project? I'd imagine there was a fair amount of lateral space allowed to the designers, and freedom to design also means freedom to screw up.
Greg:
We actually didn't make that many mistakes in the design. We had some wrong guesses about when certain technology was going to be delivered -- the CentraVision filesystem (more about that below) for Linux arrived late, and we had to work with Myrinet to shake out some bugs in their new interconnect hardware and software. Our biggest problem with our stuff was actually getting the ethernet/ATM switches from Fore Systems to talk to each other!
Imagine ...
(Score:4, Interesting)
by (void*)... a beowulf of these babies - oh wait! :-)
Seriously, what was the most challenging of maintenance tasks you had to undertake? Do you anticipate that a trade off point where the number of machines makes maintenance impossible? Do you have any pearls of wisdom for those of us just involved in the initial design of such clusters, so that maintaining it in the future is less painful?
Greg:
Hardware maintenance of the FSL machine actually isn't hard at all. If a computational node fails, we have a fault tolerance daemon which removes the failed node from the system and restarts the parallel job that was using that node. The physical maintenance of a few hundred machines actually isn't so bad; these Alphas came with three-year on-site service from Compaq. (Hi, Steve!)
More interesting than hardware maintenance is software maintenance. You can imagine how awful it would be to install and upgrade 276 machines one by one. Instead, we have an automated system that allows the system admin to simultaneously administer all the machines. We suspect that these tools could scale to thousands of nodes; after all, they're just parallel programs, like the weather applications that the machine runs.
Question about maintenance.
(Score:5, Interesting)
by Legolas-GreenleafA major problem with using a beowulf cluster over a single supercomputer is that you now have to administer many computers instead of just one. Additionally, if something is failing/misbehaving/etc., you have to determine which part of the cluster is doing it. I'm interested a] how much of a problem this is over a traditional single machine supercomputer, b] why you chose the beowulf over a single machine considering this factor, and c] how you'll keep this problem to a minimum.
Besides that, best of luck, and I can't wait to see the final product. ;^)
Greg:
You haven't described a problem, you've described a feature.
We've provided software that allows administration of the cluster as if it was one machine, not many. This software also allows FSL to test new software on a portion of the machine, instead of taking the whole thing down. The software on the machine can also be upgraded while the machine is running, instead of requiring downtime.
Since the hardware is fairly simple, it's actually quite easy to find a misbehaving piece of hardware. And in this kind of system, a hardware failure only takes out a small portion of the machine.
For example, on an SGI O2000 or similar large shared-memory computer, a single CPU or RAM chip failure takes out the entire machine. The interconnect on an O2000 is not self-healing like the interconnect we used, Myrinet. These features make a cluster more reliable than a "single machine".
Why alpha?
(Score:5, Insightful)
by crowWhy did you choose Alpha processors for the individual nodes? Why not something cheaper with more nodes, or something more expensive with fewer nodes? What other configurations did you consider, and why weren't they as good?
Greg:
We did a lot of benchmarking before settling on Alphas for this particular system -- in general we're processor agnostic, happily using whatever gives the highest performance for each customer. We could have bought more nodes if we had gone with Intel or AMD, but the total performance would have been much lower for this customer.
The Future of Scientific Programming?
(Score:5, Interesting)
by Matt GleesonThe raw performance of the hardware being used for scientific and parallel programming has improved by leaps and bounds in the past 10-20 years. However, most folks still program these supercomputers much the same way they did in the 80's: Unix, Fortran, explicit message passing, etc.
You have worked in research with Legion and in industry at HPTi. Do you think there is hope for some radical new programming technology that makes clusters easier for scientists to use?
If so, what do you think the cluster programming environment of tomorrow might look like?
Greg:
Actually, in the end of the 1980's, Unix was new in the supercomputing scene, and most sites still used vector machines. It's only in the 1990s that microprocessors and MPI message-passing have become big winners. And that's because of price-performance, not because it's easier to use than automatic vectorizing compilers. Ease of use for supercomputers reached its peak around 1989.
I do think there's hope of new approaches, however. One great example is the SMS software system developed at FSL. This software system is devoted to make it easy to write weather-forecasting style codes, and involves adding just a few extra lines of source code to parallelize a previously serial program. The result can sometimes efficiently scale to hundreds of processors, still can run on only one processor, and FSL has enough experience with non-parallel-programming users to know that they can change working programs and end up with a working program. (If you've ever heard of HPF, then this is somewhat like HPF, except it actually works.)
Today, the best programming environments are ones that hide message-passing, either in specialized library routines or using a preprocessor approach like SMS. By the way, Legion allows you to program distributed objects with minimal source code changes. I expect more of the same thing in the future.
My crystal ball isn't good enough to tell me what the next revolutionary change will be. I'm actually pretty happy with the evolutionary changes I've seen recently.
Job management
(Score:4, Interesting)
by gcoatesOne of the weaknesses for beowulfs seems to me to be a lack of decent (job) management software. How do you split the clusters resources? Do you run one large simulation on all the CPUs, or do you run 2 or 3 jobs on 1/2 or 1/3 of the available CPUs?
Is there provision for shifting jobs onto different nodes if one of them dies during a run?
Greg:
We use the PBS batch system to manage jobs; it handles splitting the cluster resources among the jobs. At FSL, there are typically 10+ jobs running at the same time; the average job uses around 16 out of the 264 compute nodes.
If a compute node dies during a run, a HPTi-written reliability daemon marks the dead node as "off-line" and restarts the job. The user never knows there was a failure.
Weather forecasting in general.
(Score:5, Interesting)
by Matt2000Ok, a two parter:
As I understood it weather models are a fairly hard thing to paralleliz (how the hell do you spell that?) because of the interdependence of pieces of the model. This would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU bandwidth is pretty low right? And that's why I thought most weather prediction places chose high end super-computers because of their custom and expensive inter-CPU I/O?
Greg:
Weather models are moderately hard to parallelize; in order to process the weather in a given location, you need to know about the weather to the north, south, east, and west. For large numbers of processors, this does require more bandwidth than fast ethernet provides, and that's why we used the Myrinet interconnect, which provides gigabit bandwidth, and which scales to thousands of nodes with high bisection bandwidth, unlike gigabit ethernet.
As far as disk I/O goes, yes, most clusters are fairly weak at disk I/O compared to traditional supercomputers from Cray. We are using the CentraVision filesystem from ADIC along with fibre channel RAID controllers and disks. This is more expensive than normal SCSI or IDE disks, but provides much, much greater bandwidth for our shared filesystem.
Second part: Is weather prediction getting any better? Everything I've read about dynamic systems says that prediction past a certain level of detail or timeframe is impossible. Is that true?
Greg:
The quality of a weather prediction depends on a lot of things: the quality of the input data, which has gotten a lot better with the new satellites and other data collection systems recently deployed; the speed of the computer used to run the prediction; the quality of the physics algorithms used in the program, which have to get better and better as the resolution gets finer and finer; and the expertise of the human forecaster who interprets what comes out of the machine. All of these areas have limits, and that's why forecasts have limits.
What about a dnet type client?
(Score:5, Interesting)
by x0I am curious as to whether (no pun intended...:)) or not you have ever done any testing to see if a distributed.net type environment would be useful for your type of work?
It seems to me that there are more than a few people who are willing to donate spare cpu cycles for various projects. At a minimum. you could concentrate on the client side binaries and not worry as mouch about hardware issues.
Greg:
Most supercomputers, like the FSL system, are in use 100% of the time doing real work. The biggest provider of cycles to distributed.net are desktop machines, which aren't used most of the time. Running distributed.net type problems on the FSL cluster is a bit of a waste, since the FSL cluster has a lot more bandwidth than distributed.net needs.
---------------
In closing, I'd like to thank Slashdot for interviewing me, and I'd like to point out that I got first post on my own interview -- perhaps the only time that this will ever happen in the history of the Universe?
-
Feature:Beowulf, Beyond the Hype
Michael Eilers has written a sort of introduction to Beowulf, what it does, what it doesn't do, and why we should care. It really is a sort of quickie distributed computing FAQ that many of you might enjoy. So hit the link below and find out. The following is a feature by Slashdot Reader Michael Eilers Beowulf beyond the Hype A Quickstart to the Beowulf Concept During the last weeks the Beowulf project got a lot of attention in the PC press and even on Slashdot. With Red Hat's Extreme Linux CD the relevant mailing lists show an increasing number of newbie questions. Unfortunately the informat ion Red Hat provides on their Extreme Linux web pages is less than informative and full of hype. This may result in disappointed users. It seems appropriate to make some comments an hardware and software and give some guidance for the v ery beginner.The name Beowulf stems from an old English tale and was the name of the first e xample of this class of computers. In fact a Beowulf is nothing else as a local computer network. You might say I have a small network in my flat (f.e. an old 486 connected to my newer machine), do I have a Beowulf? The answer is yes. Yo u do own already the hardware to start. Even if your connection is via PLIP/SLI P you can call your construction a Beowulf as soon as ping is successf ul. Forget all the hype about expensive special networking stuff like switches, Myrinet or SCI. For some tasks it is helpful but others won't benefit. A 10/10 0Mb connection seems to be sufficient for serious starting.
A Beowulf is not a solution for all of your speed problems. Building a Beowulf has not the same effect as f.e. the increasing of clock speed. With a Beowulf y ou won't see a speedup of your daily software and the class of software that is already adopted to Beowulfs can have very different speedups.
The hardware part is nothing more than connecting PCs with standard networking hardware. The main idea is to make your PCs to talk to each other. The most com mon solution is message passing. There are two main ways for doing message pass ing: PVM and MPI. The decision between them is mainly a matter of taste (see here for a comparing paper). I will foc us on PVM but keep in mind that MPI is as good. You can get that stuff here. The pvm3.x.x.tar.gz package has a long history and is rock solid stuff. Sinc e 1993 I did install it on half a dozen unices and never met a major problem. A fter unpacking and compiling play around (yes there is a "hello world" example) . After playing with the examples. You will see that the main commands are pvm_ send and pvm_receive and If you think that you do understand the examples start your own programming If you know what matrix multiplication is, try to impleme nt a parallel version. This is an instructive example. If you have problems you may ask for a debugging utility. Try to get xpvm-1.2.5.tar.gz from the above U RL. Its not a debugger but it visualises the behaviour of your parallel code. T he whole thing may take you two or three evenings. After that you know the basi cs of the Beowulf concept of making a pile of PCs looking as one machine.
Now you may have some questions:
Q: Sounds interesting, but I don't have a network at home. What can I do?
A: You have a network (the loopback device) in your Linux box. This means that you can install PVM/MPI and play with it. Of course you won't see a speedup. :-(Q: I have access to a computer pool but I'm only a common user.
A: You don't have to be "root". You can install PVM/MPI as a normal user and tr ansform this pool in your personal Beowulf.Q: I'm not a C programmer, but I use [Perl|Tcl|Python]?
A: There are interfaces available at the PVM home page (MPI??).Q: Im not a programmer. Are there interesting applications?
A: Im sure there are plenty of applications that exploit the power of Beowulfs. Most of the stuff lives in academic environment and this means that availabilit y and quality differs. I use f.e. GAMESS a quantum chemistry program that uses MPI. Maybe one appl ication need's to be specially mentioned. There are two Beowulf-ready patches f or the famous POVray ray-tracing program. PV MPOV is more flexible but less robust and FLY3 is robust but a little inflexible. If you use POVray very o ften and play with the idea of buying a PII[34]00 MHz you may rethink this idea if you checked the povbench res ults at The fastest rendering was done with a messa ge passing version of povray.Q: If you state that building and using of Beowulfs is that easy why aren't the re more Beowulfs?
A: I don't know why there aren't more, but I think this situation will change.Q: I'm a little confused about the many packages that allows computing of Beowu lfs?
A: Indeed there is a whole zoo of packages for programming networked workstatio ns. For a first attempt you don't need them but some of them solve special prob lems. For an overview about the important ones check out the "Linux Parallel Pr ocessing HOWTO" by Hank Dietz.Q: Is there a PVM/MPI version for Windows95?
A: Yes there are Win32 versions of PVM and MPI, but who cares. In fact every (pseudo)multitasking operating system with network support can in principal be used to build a Beowulf (f.e. there is Win 3.x port for PVM).Q: Where can I get more Information?
A: As you may have seen from the above the message passing part is the stuff th at is tricky. You find links to books and tutorial for parallel programming on the PVM/MPI home pages. Hardware related information at introductory level you will find in the "Beowulf HOWTO" by Jacek Radajewski and Doug Eadline. A comprehensive overv iew on hardware and software in the "Linux Parallel Programming HOWTO" by Hank Dietz.Q: Do I need the Beowulf software packages from the Beowulf project at NASA?
A: If you use Linux, then you probably use one of the network driver developed by Donald Becker. So the Beowulf project is already at your home and in this se nse necessary. The rest of NASA's Beowulf software provided for the use with cl usters helps you to manage a large cluster but it's not necessary and probably not the first step to do and beyond the scope of a quickstart. Even the suits a t NASA have realized that Beowulfs are a powerful tool, but the shutdown of the Beowulf web pages is like preventing the production of cars by closing a horn factory.Q: If you can connect local PC's to look as one computer, why not coupling comp uters via Internet to a supercomputer?
Michael Eilers
A: Standard message passing software uses communication protocols that are very sensitive to packet loss. But there are activities in this area. Look for the keywords "metacomputing" or "hypercomputing". -
Editorial:Towards World Domination
Chris Tyler has written an excellent piece examining the recent Gartner Group article we mentioned yesterday, and discussing what Linux needs to do in order to achieve Linus' vision of Total World Domination. It's an excellent piece worth your time. Competing: Pushing Our Products Towards Total World Domination Chris Tyler, Global Proximity Corporation, June 4, 1998I recently read a Gartner Group report on free operating system software which was mentioned on Slashdot. It was interesting reading, and although I bristled at some of the assumptions and conclusions, it contained a number of valid points.
Many of us recognize the value of the free/open source model and would prefer to see it become a prominent model for software development and distribution. Our reasons vary widely, but among other things, it's simply more pleasant to work with open source products-- if I have to administer a system, I'd rather administer one that works well and that I can fix myself if something breaks.
So let's take some ownership here. We have the license to the source code, the rights to distribute the software, we've contributed code and documentation and tech support. Let's call open source operating systems "our products" and view the open source community as an entity that is in competition with the proprietary OS vendors.
If we want the open source model to prevail outside of its existing domain (mentioned in the report as "academia, application development, Web servers"-- that is, technical areas and the Internet), and assuming that the paper is valid, then we need to address the issues present in the paper. Here I am primarily addressing the Linux space, because it appears to be the free OS with the largest installed base, but the comments could be applied to our other OS products as well.
Most of the issues raised stem from this paragraph:
"Unix systems at free or minimal charge will lack the performance tuning, scalability and hardware platform support to make them suitable for large commercial applications through 2002 (0.9 probability)."
This statement surprised me in part. Linux appears to be a leader in scalability (from Itsy to Beowolf), has solid hardware support ( in many cases, least as strong as NT), and matches or outperforms other operating systems without tuning (e.g., SAMBA serving).
Looking closer at the report's arguments, though, is revealing. The authors suggest that Linux is weak in the areas of:
- driver support (for newer or proprietary devices, this is undisputable true);
- SMP beyond 4-way support (this is debatable);
- NUMA support (none);
- distributed systems and network management (e.g., OpenView, TME, Unicenter);
- applications development;
- performance tuning for high levels of scalability (">500 concurre OLTP users").
Items (a) and (e), driver and application development, are somewhat beyond the control of the development community, especially if these are taken to mean proprietary hardware driver support and proprietary commercial package application support.
However, to paraphrase a line from the movie Field of Dreams, "If you build it, they will come"-- if we take care of the other issues, the commercial vendors will add support. We have a couple of vendors porting their products to our OS's now, and this will snowball-- as commercial applications appear, more of our systems will appear in commercial settings, and more vendors will recognize the expanding market. All of the major DBMS vendors have said that they have at least an experimental Linux port in-house; it's just a matter of breaking the dam for the commercial-software-on-free-OS market to explode.
(Please realize: I'm not advocating proprietary vs. open source applications here, just recognizing that there is room in the world for both... but let's at least get them to work together on systems based on open source).
There are a number of things that we can do to address the other issues. Putting SMP and NUMA support aside for the moment (Linus is working hard on the SMP implementation and I think that it's moving well), we should concentrate on items (d) and (f).
Distributed Systems and Network Management: Our products are weak in this area. We don't plug into UniCenter (or TME or OpenView or anything else) and those systems can't be consoled on Linux. We can approach this problem in one of two ways: (i) we can write a network / distributed systems management tool; or (ii) we can write UniCenter plug-ins. I think that we should pursue both.
What if we offered to write the UniCenter plug-ins for CA in return for having them port the UniCenter console to Linux? Would they go for it? We won't know until we ask.
High Levels of Scalability: What a wretched phrase! Let's try "Scalabili ty to Very Large systems". If we can create Beowolf/Extreme Linux systems running into the gigaflops, surely we can come up with some amazing tpC figures.
Novell's UnixWare (back when it was Novell's and not SCO's) captured the attention of many IS managers when Oracle and Novell demonstrated record-breaking tpC and $/tpC figures. We should be able to do the same. There are some pieces we need to put together to make this fly:
- raw partition support for DBs would be good (these were used in the UW benchmarks);
- a commercial DBMS would be good (hey, Informix, here's your chance), or we can tweak PostgreSQL and friends into the stratosphere; and
- we'll need a hardware supplier, because the benchmarks should be on a
standard system configuration (could be a mainstream vendor like
Compaq or a Linux HW vendor like Paralogic).
Conclusion: Our products in general and Linux in particular have got what it will take to beat the Gartner Group's predictions. However, we need to get our collective act together to push the envelope a bit in certain directions and then to prove to the world that open source OS's are competitive.
David Cutler, director of Windows NT development at Microsoft (and architect of both VMS and NT) used to say somewhat rudely to his NT development team: "If you break the build, your ass is grass, and I'm the lawnmower."
In a bigger sense, open source is a ride-'em mower being driven by a penguin, a gnu, and a little red guy wearing sneakers. Let's start the engine...