Folding@Home - Yet Another Distributed Client
braind writes: "The Stanford group has developed a new way to simulate protein folding ("distributed dynamics") which should remove the previous barriers to simulating protein folding. However, this method is extremely computationally demanding and we need your help.
You can read more on the site." It's interesting seeing all these projects coming out - just a reminder that distributed is still around and we can always use more on our team. *grin* [addendum from timothy:] Note that the SDK used for this project was discussed here a few days ago, so you can even roll -- err, fold -- your own.
Whilst projects like distributed.net and Seti@Home have clocked up shocking amounts of processor time (410497.11 years on Seti@Home), they're still running on the 'cool factor' of having your machine break codes or search for aliens.
Sites such as ProcessTree, and others, have been talking of paying for your computer time with micropayments, but so far nothing seems to have got off the ground.
Presumably with the added incentive of cash, the number of computers taking part will rocket. Does anyone have any firm information on the progress of these schemes?
(1) proteins are not static structures, they tend to change conformations in response to stimuli like binding to a ligand, or changes in the electrostatic microenvironment around them.
(2) many proteins don't like to fold in isolation, they require the presence of other proteins that they naturally interact with.
(3) protein sequence is linear (so-called primary structure); while local structural details may be predictable with some reliability (the so-called secondary structure, things like alpha helices and beta sheets), ultimately it is the final 3D fold with long range interactions (tertiary and higher structures) that form the final structure. You can imagine that the longer the protein, the harder it is to fold, due to the increased number of potential tertiary interactions.
determination of the structure of a protein, and even relatively large protein complexes is not as technically challenging as it used to be for biophysicists these days. Tom Steitz's group at Yale has managed to crystalize and solve the structure of the large ribosomal subunit (a **HUGE** molecule as far as the average biological molecular complex goes) at 2.4 angstrom resolution, which in itself is a monumental feat. I would not be surprised if Steitz is in contention for the Nobel prize for this work.
The holy grail is eventually being able to reverse engineer a protein or ligand that is able to bind to part of a particular protein, using rational design. This is much harder than solving a structure. Pharmaceutical companies would love to be able to design this type of molecule for use as designer drugs, since it would take away much of the cost of R&D through trial and error. Big companies such as Merck basically screen for drugs the way Thomas Edison used to test materials; by having a warehouse full of stuff and testing it all.
That being said, it's still a cool project :)
NO CARRIER
Does anyone know exactly what models they will be using? Because there are only a few ways to actually go about this:
1 - Use a known protein structure that is similar to the one under study, but silghtly different. You can also look for common motifs in a structure / sequence to compare the two. Basically you look at the sequences, and say, "Hey, those two proteins have similar sequences, so they probably look the same too."
2 - Good old ab initio methods where you reduce the conformational energy to the optimal folding pattern. This is basically looking only at the sequence and saying "If I were a protein, what would I look like."
Both are relatively time consuming, but I'm not sure how suited distribution is to this task. The first method requires a great deal of database lookups, and the second requires a lot of computing power under the hood. With distribution, you don't have the database backend to work with, so it must be the brute force method. But I have yet to see any studies where ab initio have been anywhere near a 95% level of accuracy (compared to x-ray crystal structures). The best I've seen is around 75%. This isn't quite as helpful as it might sound. You can get some good results and working models this way, but you can't do a great deal with drug design with an inaccurate model.
They had links to the papers citing their algorithms, but they links were not yet active... If they have a better way to do this, I'll be quite impressed, but for now, I think that a machine like IBM's Blue Gene has a better chance.
And neither of these methods really takes into account post-translational modifications, phosphorylations, cleavage, activation, etc... (basically all the extra stuff your cells do to proteins before they are "activated").
"If we knew what we were doing, it wouldn't be called research." - Einstein
A few years ago I worked in computational chemistry for a pharmaceutical company. Determining the conformation of a molecule is a *hard* problem. We're dealing with quantum mechanics (QM) rather than classical mechanics and many-body QM problems are notoriously difficult. For example if you have just *two* particles the space of possible configurations is 6 dimensional (in this simple example you can use symmetry to simplify things). The wavefunction is a function on a six dimensional space. For a protein you might want to deal with hundreds or thousands of nuclei and many more electrons. You might be determining a wavefunction on a 100,000 dimensional space. Let me give a taste of how big that is. Imagine we discretise this space so that we only have *ten* steps along each dimension. Then we have 1 with 100,000 zeros after it discrete points in the space. That's *big*. So clearly any attempt to solve this problem on a classical (ie. non-quantum) computer is a gross approximation. I have serious doubts about our ability to solve this problem today - even with a billionfold increase in the power of computers. When I worked in this computational chemistry department all of the molecular modelling packages had parameters you could tune. A computational chemist would run a simulation. If the result wasn't to their taste they'd tweak the parameters and run it again. Then they'd run it a few more times. As X-ray data came in they'd fine tune their parameters to make their simulated model match. Eventually they'd give a seminar showing how their simulation matched the real results - when in fact all they'd done is find the set of simulation parameters that matched reality. These parameters were purely hacks tweaked to make things look like the experimental results. They had no a priori worth. If you took these tweaked parameters and tried them on the next simulation with a different parameter guess what! They wouldn't work. And this was for relatively simple biologicaly active compounds - not entire proteins. This is a problem that grows exponentially with the number of bodies. Thankfully some of these people realised that what they were doing was no better than Voodoo. So I hope someone can convince me that there have been big improvements before we collectively build the world's biggest waster of CPU time. Keep your cycles for SETI@home - at least then they might be useful.
--
-- SIGFPE
From their site:
Presumably if you volunteer to port to system x they'll have to let you see the source code. They might even let you see it if you ask nicely for all I know.
As for SETI, I don't know if their code is available at all (I think not --at least officially); but I know they do not want any unofficial versions around and that they've even refused assistance to produce versions optimized for the 3DNow extensions in AMD chips (none exist now AFAIK).
> I hope they come out with a version that can work without the screensaver.
yea, you're not alone and we do have one (for linux and windows): check out the Folding@home site and go to the download page, sign up, and then download.