Debian Cluster Replaces Supercomputer For Weather Forecasting
wazza brings us a story about the Philippine government's weather service (PAGASA), which has recently used an eight-PC Debian cluster to replace an SGI supercomputer. The system processes data from local sources and the Global Telecommunication System, and it has reduced monthly operational costs by a factor of 20. Quoting:
"'We tried several Linux flavours, including Red Hat, Mandrake, Fedora etc,' said Alan Pineda, head of ICT and flood forecasting at PAGASA. 'It doesn't make a dent in our budget; it's very negligible.' Pineda said PAGASA also wanted to implement a system which is very scalable. All of the equipment used for PICWIN's data gathering comes off-the-shelf, including laptops and mobile phones to transmit weather data such as temperature, humidity, rainfall, cloud formation and atmospheric pressure from field stations via SMS into PAGASA's central database."
yeah, but does it run herd?
You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
Why Debian? A desktop distro? That's got to be one of the least scalable and cluster-friendly distros. If they would invest a little to set things up properly they could get a lot more performance out of their machines.
can anything that is "free" put a dent in ANY budget? if something gets bloated as it ages - dump it and go to OLD VERSION. shiiiiit.
How different can Debian really be compared to RedHat in terms of stability? They both use the Linux kernel, and GNU tools, and follow the LSB, no?
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
How is that news? You can get an 8 way workstation for $1,000 nowadays. Big deal. No innovation in networking or storage either. And RedHat is the better distro, good luck managing 2,000 nodes with Debian and have fun when an apt-get update destroys all of them. Piece of cake with RHN. RedHat is better built too and with tested packages that offer consistency. And by the way, no sane person would use gcc to compile his "high performance" code. There is a reason why there are compilers like PGI's and Intel's.
meh!
What was the age and the specs of the SGI being replaced?
Going by Moore's law, a factor of 20 performance improvement takes about 6 to 8 years. If the SGI was at least that old, this isn't news -- it's just the state of the art these days. In other words, small clusters capable of weather forcasting are relatively run-of-the-mill.
Of course, props to linux for being the enabler in this case.
Most weather prediction centers have adapted their weather forecast models to use Linux clusters. By running an operational forecast model on a cluster, it's easy for forecasters to scale the models so that they can be run (albeit slowly) on desktop machines, and are easily worked on by real meteorologists (versus IT professionals). At my university, we use a large cluster of machines on a RedHat enterprise system, and then able to scale the models and run them on multiple processors using MPICH compilers and batch jobs. Really, using a Debian cluster is no different then using a RedHat cluster. My colleague has access to the NOAA machine, which has more processors then you can shake a stick at... he talks about some code that takes 3 days to run on his personal workstation that takes 2 minutes on 40 processors. With the relatively low cost of a linux cluster, weather forecasting models can be run quickly and efficiently on numerous processors at a local level. With the ease of use of a Linux machine versus some of the supercomputers, it puts the power in the meteorologists to make those changes to the model so that it can improve forecasts.
oh, wait...
Do not mock my vision of impractical footwear
I tried: /lib/libearthquake-2.3.so.0 is not a symbolic link"
apt-get -f -y install gweather
But it failed with something about "ldconfig:
Is libearthquake in unstable?
Interactive Visual Medical Dictionary
A cron job that echoes "Hot and humid" once a day.
What, only one?
Debian is pretty great, but notived that you need a pretty big virtual machine
allocation if you are to use it within vmware as the log files clog up the system pretty quickly.
there is a setting somewhere but by default is expects a normal hard drive and not for example a 1gb allocation.
"an eight-PC Debian cluster"
"[we] wanted to implement a system which is very scalable"
8 PCS ? 64 cores at most. And they call that scalable ? Come on, today's top500 top machines scale on 10'000 cores. They're 15 years late.
From TFA What was even surprising to us is that Intel FORTRAN is also free of charge ...
I bet Intel are surprised too. Their compilers are not that free of charge. The people at the Philippine government's official weather service are hardly "not getting compensated in any form"
http://www.intel.com/cd/software/products/asmo-na/eng/219771.htm
The article lack, as usual, information about what those machines actually do when they compute together.
What I want to know is: Do they have a big 64 bit addressable RAM image spread over all nodes, communicating with pthreads, like I prefer? Or perhaps they have several 32bit RAM images communicating with some special message protocol. Or perhaps they just have lots of quite independent but equal programs running, as an ensemble. Or perhaps some kind of pipeline where the different parts of the calculation run on different machines.
All those free and commercial producers of supercomputers, why don't they tell us clearly how they are supposed to be used? Personally, I prefer one big image, because I am a physicist kind of person, knowing that this simple computational model will save me lots of work, and also work fast in practice, as long as I do not write too stupid code, i.e. with tight nonlocal interdependencies. But from what I see, many of them appear to use 32bit operating systems, which makes this impossible, and they thus have to use message passing protocols, which make everything much more complicated. For instance: I can do a big 3D wave simulation by having a big 3D array spanning several machines, and updating it piecemal. However, if I have to cut it into 64 sub-cubes, and using message protocols to glue their edges together, then the work required to do this is a significant road block, and extra code like that also introduces bugs.
Could this be solved by something as simple as using an NFS file, memory mapped piecemal to different machines to do automatic cross-machine data sharing?
Kim0
This makes me very happy.
"The bad machine doesn't know he's a bad machine."
I felt a great disturbance in the force - as if millions of geeks had ROFl'ed and then were silent.
"It doesn't cost enough, and it makes too much sense."
The article mentions the Global Telecommunications System (GTS) It would be cool to know
how they get their GTS data, probably use a satellite downlink. There is a GPL GTS switch that's developed for Debian:
http://metpx.sourceforge.net/
what else is new?
FYI, I am a little biased, but Debian is the distro that constantly gives me the least trouble.
And customise absolutely everything yourself.
There must be a thousand distros out there, so why not?
Everybody uses broad generalizations.
I know debian is a solid distro that still cares a lot about its philosophy and execution, but aren't we getting a little loose with the "insightful" business? As mods become disconnected from validity, Slashdot continues its slide toward becoming an online ITWeek cum National Inquirer.
I do, however, remember the old days when slashdot was both irreverent AND relevent, alas.
a Beowulf Cluster?
I know Linux and Debian are very configurable, but I'd be very interested in knowing how they configured a numerical weather prediction model to predict tsunamis. I mean what are the chances that Australian PCMag headline writers have their heads up their asses?
GTS was until recently largely an X.25 PSTN. I learned X.25 helping maintain message-switching software at a military weather forecasting center; we were a subsidiary node of GTS in that capacity.
I know that many nodes in the GTS have gone to FTP or TCP socket streaming over the Internet (or VLANs running under the Internet). Old-sk00l by 'net standards, but Très moderne in the WMO timescale.
Welcome to the Panopticon. Used to be a prison, now it's your home.
The article says that their preference on workstations is Ubuntu which is "basically Debian-based." Ubuntu isn't just Debian-based, it's entirely dependent upon Debian for it's continued development.
Dinomite.net
So, x64 processors beat out the MIPS 10000 from 1997? Go figure.
Exactly! MetPX is a tcp/ip only switch. It implement WMO manual 386 tcp/ip sockets, as well as file exchanges over ftp & sftp. It was written to accomplish a transition away from proprietary mainframe stuff exchanging X.25 with the GTS. It also does AFTN (Aviation Fixed Telecommunications Network) over tcp/ip, in contrast to traditional X.25. It is used for the Canadian gateway between GTS and AFTN in Canada, as well as the GTS node itself.
Many think of the GTS as an X.25 network, but X.25 is going away. All of the commercial
switch vendors, as well as MetPX, support WMO sockets at a minimum. File based exchanges are
the new frontier. This software is such a niche application, that there isn't a lot of ''community'' that will be interested. It's kind of a vertical market thing. So it hasn't exactly taken the world by storm.
If anyone's interested in setting up a Debian cluster, there's a walkthrough at http://debianclusters.org/