Grid Computing Saves Cancer Researchers Decades
Stony Stevenson writes "Canadian researchers have promised to squeeze "decades" of cancer research into just two years by harnessing the power of a global PC grid. The scientists are the first from Canada to use IBM's World Community Grid network of PCs and laptops with the power equivalent to one of the globe's top five fastest supercomputers. The team will use the grid to analyze the results of experiments on proteins using data collected by scientists at the Hauptman-Woodward Medical Research Institute in Buffalo, New York. The researchers estimate that this analysis would take conventional computer systems 162 years to complete."
Wanna bet they discover that maple syrup or Canadian back bacon cures cancer?
Kevin Smith on Prince
...as a competition with friends. But then I realized that I didn't really need to use my computers as heaters...and did a number for the planet and closed the client.
Obligatory blog plug: http://www.caseybanner.ca/
If they wanted to knock that 10 years down to 5 they could just buy a chunck of the storm worm bot net!
insight through the mind
I hope they're using programs that've had a few computer scientists' eyes over them. One of the issues I see with supercomputing is that people tend to see it as a way to get around dumb code(1) — if the computer's fast enough, you can implement *five* infinite loops, have an exponential time algorithm, and still get the calculations done before dinner!
(1) although from their point of view, it's just slow code.
Ask me about repetitive DNA
Okay, not that I'm knocking how cool this grid computing is, but that estimate of 162 without grid computing couldn't possibly be taking into account the acceleration of computing power. Maybe with today's computers it would take 162 years, but after the first couple of years just get a new computer and cut the time in half.
:(
Which reminds me of how towards the end of my grad school career I did hours long simulations that would have taken weeks at the beginning of grad school. I was in grad school a long time
Free the Quark 3 from asymptotic confinement! Bring your charm! Don't get down! All colours and flavours welcome!
But do we see a chunk of the profit that they'll be making off the cancer drugs they make from this data that OUR computers analyzed and then is eventually sold to us for too-high-to-afford prices?
Every time these "connect desktops to become the fastest computer in the world" articles come up, I have to dust off my Cluster Urban Legends article to clear up the mis-conceptions that abound. I also did a piece on the Linux Magazine site as well that debunks much of the spam-bot supercomputer legend (need to register for that one)
HPC for Primates. Read Cluster Monkey
Folding@home has reached a petaflop out of PS3 games. A record supposedly, from the BBC news. http://news.bbc.co.uk/2/hi/technology/7074547.stm
I run their PC sw on my systems I keep on. They are getting results, and publishing papers based on the research.
I'm very glad to help cancer research, but will this also result in the development of drug patents that (a) bankrupt some patients, and (b) prevent other researchers from improving on those drugs?
Because that would make me feel a little less charitable with my computing power. (Only a little, though.)
"The researchers estimate that this analysis would take conventional computer systems 162 years to complete."
They're always saying, "We've knocked decades off of our work by using the right tool for the job." That's like me saying I knocked decades off of the calculations to run an energy minimization on a hexane molecule by running it on my Core 2 Duo instead of my Atari 800.
I mean, let's face it. They weren't going to let the friggin' program run for 162 years. The problem became solveable when the hardware became available. Hell, within 5 years, that "conventional computer system" will be able to solve it in a fraction of that 162 years and 5 years later, a fraction of that. So what do you do? You wait until the hardware meets up with ability to solve the problem. They haven't saved decades. They probably haven't even saved a decade. Within a decade they'd probably be able to run it in a few days on a conventional computer.
According to the World Community Grid website:
;-)
World Community Grid is making [this] technology available only to public and not-for-profit organizations to use in humanitarian research that might otherwise not be completed due to the high cost of the computer infrastructure required in the absence of a public grid. As part of our commitment to advancing human welfare, all results will be in the public domain and made public to the global research community.
WCG uses the Berkeley Open Infrastructure for Network Computing (BOINC) client, an open source software project that runs on Linux, Mac and Windows. Headline should read Open Source Software Cures Cancer
BoincStats shows you who is contributing to World Community Grid projects. Check it out...and ask yourself why you aren't contributing.
How could they knock decades of research off when we are less than 10 years (TM) away from a cure?
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
We "the people" run the software and pay the millions of dollars of hardware and electricity costs. When the problem is solved the University patents everything (thank you suckers) and licenses the technology for for a small fortune to some back stabbing Megacorp (TM) drug company. So when "we the people" get sick we have the wonderful knowledge that we have paid twice for the ripp-off drugs. So all things being fair, if you want my cpu spare time I want a part of the license fees to pay for the drugs that cost a house when I get sick.
I know this research, and the people involved in it very, very well, and I think this project is a very sad, very large waste of computing time.
Let me back up and explain what the project is doing. To simplify a little bit, the vast majority of "work" in the cell is done by proteins. While DNA can be thought of as something like a simple "string", proteins have complex three-dimensional shapes. Knowing those 3D shapes is of great interest to biologists. There are several reasons for that. One is that it can allow easier design of drugs targeted at a specific part of the protein. Another is that by seeing the shape, we can understand how all the mutations that occur in disease might be affecting its function.
The primary way to determine the shape of the protein is to take the protein and to grow it into an ordered crystal. You can then shine an x-ray beam through the crystal, and the diffraction pattern that emerges can be, through some very complex math, reverse-engineered into a 3D structure. Typically the most difficult part of this process is finding the specific chemical conditions that will allow a crystal to grow. These conditions differ from protein to protein.
This project is not "solving cancer", by any means. Rather, the people in Buffalo have generated a high-throughput way of screening different chemical conditions to determine which ones might allow a protein to grow. They use robotics to screen about 1000 conditions, and take pictures of each condition. The question then becomes: can you automatically process the pictures to find crystals. That's the goal of this project, to help automatically identify crystals in this screen.
So why do I object so strongly to this work? There are three reasons.
First, the project has nothing to do with cancer. In fact, the proteins being analyzed are not in any way "cancer-specific proteins" -- many of them are not even human!! This "cancer" pitch is a sales job, and nothing but a sales job. As a cancer researcher, it offends me that people try to use the disease to justify research that is this unrelated.
Second, the project is ill-conceived, technically. In no way did the group in question (Igor Jurisica's lab, in Toronto) carefully select a machine-learning approach to identify good ways of analyzing images. Instead, they have just selected something like 1000 different techniques, and are running *all* of them on every image they have. It's a fishing expedition, with the hope that one of those thousand metrics they return will be a useful predictor.
Third, the techniques selected are basically arbitrary. Most egregiously, there appear to be NO Fourier transforms included in the analysis!! Further, the images generated by the software appear to be transforms of something called "gray level cooccurrence matrices", and the computation of those can be estimated in no more than five minutes. So why are they taking 5 hours per unit? It appears that they have chosen to implement an exhaustive GLCM search that is an order of magnitude slower, rather than using existing estimation procedures that are ~98.5% accurate. Is that an excuse to use more computer time? Is there any scientific merit to that? Why aren't Fouriers included, since they are a standard technique for image analysis?
I have a number of computers that I run various BOINC projects on, but this will NEVER be one. It's a fishing expedition, being sold as cancer research, and that is a sad way to deceive the public.