Debian Cluster Replaces Supercomputer For Weather Forecasting
wazza brings us a story about the Philippine government's weather service (PAGASA), which has recently used an eight-PC Debian cluster to replace an SGI supercomputer. The system processes data from local sources and the Global Telecommunication System, and it has reduced monthly operational costs by a factor of 20. Quoting:
"'We tried several Linux flavours, including Red Hat, Mandrake, Fedora etc,' said Alan Pineda, head of ICT and flood forecasting at PAGASA. 'It doesn't make a dent in our budget; it's very negligible.' Pineda said PAGASA also wanted to implement a system which is very scalable. All of the equipment used for PICWIN's data gathering comes off-the-shelf, including laptops and mobile phones to transmit weather data such as temperature, humidity, rainfall, cloud formation and atmospheric pressure from field stations via SMS into PAGASA's central database."
Many distro's add kernel patches and add different drivers to the initrd.
Also the core os ( most minimal installation ) has many different tools and libs.
Also at time of release they can pick from many different versions of a single package.
That in combination with what version of GCC and compile flags can and does make a huge differance.
And at least with Debian you really do know how the systems was build, with RedHat I still wonder...
Marcel
What was the age and the specs of the SGI being replaced?
Going by Moore's law, a factor of 20 performance improvement takes about 6 to 8 years. If the SGI was at least that old, this isn't news -- it's just the state of the art these days. In other words, small clusters capable of weather forcasting are relatively run-of-the-mill.
Of course, props to linux for being the enabler in this case.
Most weather prediction centers have adapted their weather forecast models to use Linux clusters. By running an operational forecast model on a cluster, it's easy for forecasters to scale the models so that they can be run (albeit slowly) on desktop machines, and are easily worked on by real meteorologists (versus IT professionals). At my university, we use a large cluster of machines on a RedHat enterprise system, and then able to scale the models and run them on multiple processors using MPICH compilers and batch jobs. Really, using a Debian cluster is no different then using a RedHat cluster. My colleague has access to the NOAA machine, which has more processors then you can shake a stick at... he talks about some code that takes 3 days to run on his personal workstation that takes 2 minutes on 40 processors. With the relatively low cost of a linux cluster, weather forecasting models can be run quickly and efficiently on numerous processors at a local level. With the ease of use of a Linux machine versus some of the supercomputers, it puts the power in the meteorologists to make those changes to the model so that it can improve forecasts.
Debian works out just fine for these kind of tasks. Here in the Netherlands the national compute cluster Lisa runs on Debian (http://www.sara.nl/userinfo/lisa/description/index.html) with 800~ to a 1000 nodes (I think the page needs updating by now).
Debian will run multiple services reliably under heavy load. From my limited experience, it's one of those distros where you "Set It And Forget It" and that's that.
Once you got it configured the way you want it, there's little intervention involved to maintaining it. It'll just keep chugging along. The keyword there is "correctly". Follow the readmes, howtos, and best practices, and you're golden.
It's also one of the oldest distributions which always kept to the spirit of GNU/Linux in general: community development and enrichment. Debian developers pride themselves on that spirit. To make the best software for humans. (At least that's what I gather from hanging out with Debian folk) These people are not only passionate in the software that they write, they do it without wanting anything in return, being humble in the way they do it, and wanting no reward for doing it. To them, their reward is in other people using their software and loving it! In my opinion they're not recognized enough.
But what do I know? I just use the software.
The binary package management really says it all.. you shouldn't be running anything but compiled source on a performance cluster.
/bin/ls though? I don't think it matters to anyone on a high performance cluster. Just so long as the cluster apps are optimised then the rest is just noise - better to have a system that's less work for your administrators so they can concentrate on what's important.
Wow - how many performance clusters do you run again?
Not that I run a "performance cluster" as such - but I do run a bunch of machines that are very busy, all on Debian.
You know what? We compile the couple of programs where CPU is the bottleneck from source. We also compile Cyrus IMAP from source because we apply a pile of patches, but if someone else was packaging up all those patches in upstream, I'd be happy for them to be compiled there. Disk IO is the issue with Cyrus, and a custom compile won't help with that.
Yeah, we build our own kernels as well - that's another point that's worth the effort to customise.
You've got it all wrong; you should be using built-in tools like these:
more weather - For when you need a new update.
less weather - Got too much weather? Reduce it!
vi weather - When you want to change the weather.
emacs weather - When you want to change the weather on 15 separate planets at once.
cat weather - It's raining... oh, never mind.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Not sure why you call Debian a desktop distro, it's much more useful as a server.