Supercomputer Sets Protein-Folding Record
Nicros writes with this snippet from Nature News:
"A specially designed supercomputer named Anton has simulated changes in a protein's three-dimensional structure over a period of a millisecond — a time-scale more than a hundred-fold greater than the previous record. ... The simulations revealed how the proteins changed as they folded, unfolded and folded again. 'The agreement with experimental data is amazing,' says Chandra Verma, a computational structural biologist at the Bioinformatics Institute of the Agency for Science, Technology and Research in Singapore. Simulating the basic pancreatic trypsin inhibitor over the course of a millisecond took Anton about 100 days — roughly as long as computers spent toiling over previous simulations that only spanned 10 microseconds."
..it's a rather poor article. It talks in very basic terms about proteins and their folding, talks a bit more about the scientist who founded the institute behind the computer, and says fuck-all about the construction of the computer itself.
Bah. For a publishing house of Nature Publishing Group's (intellectual and economic) muscle, one should expect more.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
It's about someone (a rich someone) building a really big computer to tackle a really, really, really, really, really, really, really complex physical/chemical problem that we currently know dick all about.
If protein folding was equivalent to fluency in English, we'd be at "bwawubda?"
over a period of a millisecond — a time-scale more than a hundred-fold greater than the previous record
This phrasing always confuses me where they say "It's this much faster so it's x times greater!"
So they're a hundred fold greater and they're a millisecond...? Does that mean the other guy took 1/100ths of a millisecond?
The performance of a 512-node Anton machine is over 17,000 nanoseconds of simulated time per day for a protein-water system consisting of 23,558 atoms
So... how many libraries of congress per second??
This research is extremely important for finding new drugs, and therefore I applaud the originators of the project, especially D.E. Shaw who apparently put also a lot of funding into it. I wish more (rich) people put their money into such immensely useful projects. It is not just a noble thing to do, it is also smart, since we all could one day benefit from this kind of research.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
This has been the promise of computer simulation - "in silico" drug design - for decades. It hasn't panned out. And I say this as someone who makes a living doing exactly what these folks have done. High throughput bench work is far more efficient, time and money wise, than computer simulation. Hard to say when or if that will change.
46 & 2
So where do I send my cheque?
"Did you exchange a walk on part in the war for a lead role in a cage?"
I love it how simple-minded tech geeks, usually IT guys, programmers and even people who should know better like electrical engineers, think that the internet is more complex than the human body... Here we have ONE molecule, simulated for a lousy millisecond, and it took more than THREE MONTHS. How many molecules in the human body? Our body is performing a truly staggering amount of computation. Actually, every bit of matter is, everything including "inanimate" matter, it's really all the same. We just happen to be more complex.
I wonder how accurate this is? Information Processing in Human Body And when we do start uderstanding how the huge amount of molecules in one cell behaves, we can maybe start understanding how the huge number of cells becomes US. Including things like diseases and aging. Once that is done, hello life-extension! Isn't that more interesting than tin cans floating in a vacuum? I think so. But then again, I'm crazy; I think wanting to have more time is the same as wanting to have more space. It's humans exploring the universe, it's just that we need to live longer than we do if we think we really are going to explore the universe. After all, can a mayfly explore a city? It'll be dead in three days. That's us, in space.
That's a little unfair to Folding@Home. Shaw has a lot of resources to pour into this project - he's lured faculty members away from universities to work for him instead and has the equivalent of several large labs worth of advanced researchers. He also has an immensely larger budget than most non-profit labs, and he's self-employed so he doesn't have to answer to granting agencies or tenure committees. I think what he's doing is great but he's really one of the only people who could have pulled this off. It's difficult to know what approach will work best in advance, and both Shaw and Vijay Pande have been very innovative in approaching the problem from completely different angles.
By the way, this approach has been tried before with less stellar results - I'm thinking of the MD-GRAPE project in Japan. You're also assuming that every problem is equally well suited towards custom ASICs, but actually, molecular dynamics is far easier to do this with than many other methods. For instance, Rosetta (Rosetta@Home and Fold.It) is doing structure prediction, not folding, using a mostly statistics-based energy function and Monte Carlo sampling, and this isn't something you can trivially offload to a specialized chip. In that case, distributed computing is by far the most efficient solution.
From experimental evidence we know the folding rates of certain proteins at various temperatures, we know the flow rates for ion channels, and so on. A lot of these macro-properties can't be tested in the short simulations that current computers can do, but they can easily be reached by the DE Shaw machine.
Good thing F@H runs on the GPU, which is many times faster than the CPU at these operations.
Also, don't forget what it takes to build supercomputer capable of doing this, and that resources put into building supercomputers are then not available for the consumer market. Distributing this stuff allows for a compromise between absolute best performance and letting people have powerful computers at home.
The physicists have been doing protein folding for decades now. We know the basics of the physics but it requires a lot of computer power to perform useful simulations. This article is probably just another small step along the way. But if they cannot explain exactly what they have done that is new, then they probably haven't done so much that is new.
Actually, Folding@Home can also simulate these time scales by means of Markov state models. The trajectory is pieced together out of data collected from many short simulations, whereas the Anton trajectory is generated from a single MD run, but in practice that distinction is usually irrelevant. Protein dynamics are stochastic, so for any time scale longer than about 1 ns, both approaches given equally "realistic" or "valid" trajectories.
That's not to criticize Anton. It's an amazing piece of hardware and they're doing amazing work with it. But of the two approaches, Markov state models are probably going to prove more valuable in the end. They make more efficient use of whatever computational resources you have available, they give more insight into the structure of the folding pathway, and they can be run on commodity hardware that many more people have access to. David Shaw has even admitted they'll eventually have to start using them. By the third generation of Anton, he expects to have hit limits on how far they can parallelize a single MD run, so Markov state models will be the only way they can keep adding processing power.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
I didn't RTFA since I've already heavily researched these guys. D.E Shaw is the kind of billionaire I would be.
Summary: The actual atomic interaction equations are simulated very fast. Distributing the results of a local interaction to the rest of the simulation quickly, is hard.
http://www.deshawresearch.com/publications/Simulation%20and%20Embedded%20Software%20Development%20for%20Anton,%20a%20Parallel%20Machine%20with%20Heterogeneous%20Multicore%20ASICs.pdf
http://cacs.usc.edu/education/cs653/Shaw-msMD-SC09.pdf
The fact that it takes 100 days to simulate a few milliseconds of molecular activity hints at the potential speed of future computers. I know the actual process isn't precisely analogous to the computation, but I suspect there are more elegant ways to compute than the methods we use today. Our brains "outperform" the best supercomputers, with energy requirements supplied by a bowl of oatmeal for a few hours of activity. The mind boggles at the possibilities.
This and no other is the root from which a tyrant springs; when first he appears as a protector - Plato (423 to 327 BC)
For instance, Rosetta (Rosetta@Home and Fold.It) is doing structure prediction, not folding, using a mostly statistics-based energy function and Monte Carlo sampling, and this isn't something you can trivially offload to a specialized chip. In that case, distributed computing is by far the most efficient solution.
Right on the money. Because most of its applications use Monte Carlo as you mention, Rosetta requires lots of independent trajectories anyway. It's trivially parallelizeable (embarassingly parallel if you prefer) so distributed computing is the solution we use for pretty much everything. The Baker lab has the BOINC Rosetta@home and the rest of us use university-size clusters.