Here is a more complete story: between changing compilers, moving from the development platform to the target platforms, and identifying some redundant computation in one corner of the algorithm, we were able to reduce the run-time from about six hours to five minutes. This allowed us to undo some rather brutal compromises (accuracy for speed) we had made in a previous stage of development, when we thought the analysis was running unacceptably long for Grid purposes.
The extra hours are not busy work.
Christian Cumbaa Research Associate Ontario Cancer Institute
In no way did the group in question (Igor Jurisica's lab, in Toronto) carefully select a machine-learning approach to identify good ways of analyzing images. Instead, they have just selected something like 1000 different techniques, and are running *all* of them on every image they have. It's a fishing expedition, with the hope that one of those thousand metrics they return will be a useful predictor.
Not quite. The machine learning bit comes second. You have to spend the CPU cycles to extract features from the images first. Only then can your favourite ML technique tell you if the features are predictive. The first ~1000 features (already computed, locally) show some promise, and that's why this project will explore the image feature space a bit more (~12000 features). Once we get Grid results back from our human-scored image set, any features that are a clear waste of time will be dropped.
Third, the techniques selected are basically arbitrary. Most egregiously, there appear to be NO Fourier transforms included in the analysis!!
Again, not really. The techniques selected are based heavily on our own research and on successful methods drawn from the literature. I can confirm that no Fourier analysis is done. Fourier analysis can tell you that there are high-frequency components in the image. So can simple edge detection. And a Radon transform will find the straight edges of a protein crystal. Publish your Fourier-based method of distinguishing amorphous precipitate from protein crystal, and I will include it in Phase II of the project. Before you do that, maybe also read up on wavelets.
So why are they taking 5 hours per unit? It appears that they have chosen to implement an exhaustive GLCM search that is an order of magnitude slower, rather than using existing estimation procedures that are ~98.5% accurate.
More time = more exploration of feature space. Show me proof of a "98.5% accurate" approximation method, and I will make sure that gets in to Phase II as well.
By the way, I noticed that the "Slashdot Users" team on the World Community Grid is ranked #4. You guys are huge contributors. Whether you contribute to this project, or Dengue Fever, or whichever, thanks.
Christian Cumbaa
Research Associate
Ontario Cancer Institute
Here is a more complete story: between changing compilers, moving from the development platform to the target platforms, and identifying some redundant computation in one corner of the algorithm, we were able to reduce the run-time from about six hours to five minutes. This allowed us to undo some rather brutal compromises (accuracy for speed) we had made in a previous stage of development, when we thought the analysis was running unacceptably long for Grid purposes.
The extra hours are not busy work.
Christian Cumbaa
Research Associate
Ontario Cancer Institute
Not quite. The machine learning bit comes second. You have to spend the CPU cycles to extract features from the images first. Only then can your favourite ML technique tell you if the features are predictive. The first ~1000 features (already computed, locally) show some promise, and that's why this project will explore the image feature space a bit more (~12000 features). Once we get Grid results back from our human-scored image set, any features that are a clear waste of time will be dropped.
Third, the techniques selected are basically arbitrary. Most egregiously, there appear to be NO Fourier transforms included in the analysis!!Again, not really. The techniques selected are based heavily on our own research and on successful methods drawn from the literature. I can confirm that no Fourier analysis is done. Fourier analysis can tell you that there are high-frequency components in the image. So can simple edge detection. And a Radon transform will find the straight edges of a protein crystal. Publish your Fourier-based method of distinguishing amorphous precipitate from protein crystal, and I will include it in Phase II of the project. Before you do that, maybe also read up on wavelets.
So why are they taking 5 hours per unit? It appears that they have chosen to implement an exhaustive GLCM search that is an order of magnitude slower, rather than using existing estimation procedures that are ~98.5% accurate.More time = more exploration of feature space. Show me proof of a "98.5% accurate" approximation method, and I will make sure that gets in to Phase II as well.
By the way, I noticed that the "Slashdot Users" team on the World Community Grid is ranked #4. You guys are huge contributors. Whether you contribute to this project, or Dengue Fever, or whichever, thanks.
Christian Cumbaa
Research Associate
Ontario Cancer Institute