Grid Computing Saves Cancer Researchers Decades
Stony Stevenson writes "Canadian researchers have promised to squeeze "decades" of cancer research into just two years by harnessing the power of a global PC grid. The scientists are the first from Canada to use IBM's World Community Grid network of PCs and laptops with the power equivalent to one of the globe's top five fastest supercomputers. The team will use the grid to analyze the results of experiments on proteins using data collected by scientists at the Hauptman-Woodward Medical Research Institute in Buffalo, New York. The researchers estimate that this analysis would take conventional computer systems 162 years to complete."
If you run it on a low level you can only increase your usage by about 1-2 and still help the project, there is no logical reason to run the client at 100% if it's going to cost you a bomb, where as at 1-2% you won't win any contests, but you will be helping the project and paying at most a buck or two extra on electric a month.
I like muppets.
Every time these "connect desktops to become the fastest computer in the world" articles come up, I have to dust off my Cluster Urban Legends article to clear up the mis-conceptions that abound. I also did a piece on the Linux Magazine site as well that debunks much of the spam-bot supercomputer legend (need to register for that one)
HPC for Primates. Read Cluster Monkey
Folding@home has reached a petaflop out of PS3 games. A record supposedly, from the BBC news. http://news.bbc.co.uk/2/hi/technology/7074547.stm
I run their PC sw on my systems I keep on. They are getting results, and publishing papers based on the research.
The research is being done by scientists at Princess Margaret Hospital in Toronto, a government run hospital. If you knew anything about health care in Ontario you'd know that profit is the last thing on their mind.
Some of what I say is fact, some is conjecture, the rest I'm just blowing out my ass...you guess.
"The researchers estimate that this analysis would take conventional computer systems 162 years to complete."
They're always saying, "We've knocked decades off of our work by using the right tool for the job." That's like me saying I knocked decades off of the calculations to run an energy minimization on a hexane molecule by running it on my Core 2 Duo instead of my Atari 800.
I mean, let's face it. They weren't going to let the friggin' program run for 162 years. The problem became solveable when the hardware became available. Hell, within 5 years, that "conventional computer system" will be able to solve it in a fraction of that 162 years and 5 years later, a fraction of that. So what do you do? You wait until the hardware meets up with ability to solve the problem. They haven't saved decades. They probably haven't even saved a decade. Within a decade they'd probably be able to run it in a few days on a conventional computer.
Linux's CPU frequency scaler has this option. For example the 'conservative' governor has the file /sys/devices/system/cpu/cpu0/cpufreq/conservative/ignore_nice_load. So a program running with lower than default priority will not increase CPU frequency.
I use a script to handle CPU frequency changes. When I'm at home with my laptop, I use the "ignore nice" option which in practice will turn the fan off. YMMV. When I go somewhere, I can set the CPU to full steam.
Escher was the first MC and Giger invented the HR department.
Not quite. The machine learning bit comes second. You have to spend the CPU cycles to extract features from the images first. Only then can your favourite ML technique tell you if the features are predictive. The first ~1000 features (already computed, locally) show some promise, and that's why this project will explore the image feature space a bit more (~12000 features). Once we get Grid results back from our human-scored image set, any features that are a clear waste of time will be dropped.
Third, the techniques selected are basically arbitrary. Most egregiously, there appear to be NO Fourier transforms included in the analysis!!Again, not really. The techniques selected are based heavily on our own research and on successful methods drawn from the literature. I can confirm that no Fourier analysis is done. Fourier analysis can tell you that there are high-frequency components in the image. So can simple edge detection. And a Radon transform will find the straight edges of a protein crystal. Publish your Fourier-based method of distinguishing amorphous precipitate from protein crystal, and I will include it in Phase II of the project. Before you do that, maybe also read up on wavelets.
So why are they taking 5 hours per unit? It appears that they have chosen to implement an exhaustive GLCM search that is an order of magnitude slower, rather than using existing estimation procedures that are ~98.5% accurate.More time = more exploration of feature space. Show me proof of a "98.5% accurate" approximation method, and I will make sure that gets in to Phase II as well.
By the way, I noticed that the "Slashdot Users" team on the World Community Grid is ranked #4. You guys are huge contributors. Whether you contribute to this project, or Dengue Fever, or whichever, thanks.
Christian Cumbaa
Research Associate
Ontario Cancer Institute
Here is a more complete story: between changing compilers, moving from the development platform to the target platforms, and identifying some redundant computation in one corner of the algorithm, we were able to reduce the run-time from about six hours to five minutes. This allowed us to undo some rather brutal compromises (accuracy for speed) we had made in a previous stage of development, when we thought the analysis was running unacceptably long for Grid purposes.
The extra hours are not busy work.
Christian Cumbaa
Research Associate
Ontario Cancer Institute