Slashdot Mirror


Google's DeepMind Predicts 3D Shapes of Proteins (theguardian.com)

Google's DeepMind is using an AI program, called AlphaFold, to predict the 3D shapes of proteins, the fundamental molecules of life. "DeepMind set its sights on protein folding after its AlphaGo program famously beat Lee Sedol, a champion Go player, in 2016," reports The Guardian. The company says "It's never been about cracking Go or Atari, it's about developing algorithms for problems exactly like protein folding." From the report: DeepMind entered AlphaFold into the Critical Assessment of Structure Prediction (CASP) competition, a biannual protein-folding olympics that attracts research groups from around the world. The aim of the competition is to predict the structures of proteins from lists of their amino acids which are sent to teams every few days over several months. The structures of these proteins have recently been cracked by laborious and costly traditional methods, but not made public. The team that submits the most accurate predictions wins. On its first foray into the competition, AlphaFold topped a table of 98 entrants, predicting the most accurate structure for 25 out of 43 proteins, compared with three out of 43 for the second placed team in the same category.

To build AlphaFold, DeepMind trained a neural network on thousands of known proteins until it could predict 3D structures from amino acids alone. Given a new protein to work on, AlphaFold uses the neural network to predict the distances between pairs of amino acids, and the angles between the chemical bonds that connect them. In a second step, AlphaFold tweaks the draft structure to find the most energy-efficient arrangement. The program took a fortnight to predict its first protein structures, but now rattles them out in a couple of hours.

5 of 51 comments (clear)

  1. They took our jobs! by Anonymous Coward · · Score: 3, Interesting

    DeepMind is moving out of the realm of curiosity (games) to things that employ people with a high degree of specialization. Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30. Granted, they had prior work to inform them. Anyway, this is interesting because this kind of development can put the PhD's in my lab out of a job - and they thought the truck drivers would be first to get automated!

    1. Re:They took our jobs! by bill_mcgonigle · · Score: 2

      You think that's bad? Radiologists are already significantly better with AI and give it a few more iterations and you'll only need a few of the best radiologists to handle the edge cases, then it's all machine learning on outliers.

      Sorry about that fellowship you did - back to primary care with you - don't forget to swap out that BMW for a Prius.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:They took our jobs! by ragahast · · Score: 2

      Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30

      That's not a correct reading of the results. First, previous efforts are based on putative understanding about how proteins fold. Obviously, this understanding is incomplete - or the physics based methods would perform better. (Even statistical potentials like in Rosetta are physics based in important ways). Second, DeepMind isn't even on the radar in the server component of CASP. The server competition is intrinsically more difficult because it requires robust software that isn't highly dependent on user parameters. Rosetta for example is ~20th in the general competition and 4th place in the server competition.

      Finally, DeepMind has not demonstrated the historical performance of their approach. They should see how well novel protein folds solved after e.g. 2005 are predicted using only structures solved before 2005 to train. To the extent that Rosetta works, it works in such an environment. In fact, one its first results was a novel fold (Top7).

      --
      .:Semper Absurda:.
  2. Re:Research Paper Needed by rkordmaa · · Score: 2

    It's not a physics model, well duh. The other competitors were physics models though and they came second no matter how much compute power they happened to have. Sure, physics models might be able to solve many different problems with lesser modifications, but if AI can be trained to solve a specific type of problem more efficiently than physics solvers could... Well, a more efficient solution is still a more efficient solution, even if it's not an answer to the ultimate question of life, the universe, and everything. If you can solve a problem better than anyone, then that's hardly a parlor trick.

  3. Re:Research Paper Needed by rl117 · · Score: 2

    Protein structure is intrinsically a computational chemistry / physics modelling problem. It requires modelling of solvent and solute interactions with the protein structure, van der Waals, dipole, ionic and other electrostatic interactions, hydrophobic/hydrophilic interactions with the solvent and itself (and lipid bilayer for membrane proteins), minimising free energy of the whole structure, of which there may be multiple stable variants under different conditions, and interactions between different subunits as well as multimeric associations. It's hard, and it's computationally expensive. I used to work with a team of such modellers, all hardcore physics and maths people, in the computational biology department I used to work in. "AI" might be able to recognise certain patterns. And it might be able to predict certain structural motifs with a reasonable degree of accuracy. But it will always be limited by the training dataset as the parent tried to explain. It's not modelling, it's guessing by interpolation. There's no intelligence; it can't extrapolate to make predictions which it hasn't been trained specifically for. And there's no guarantee that the structures it predicts will be stable or valid in any way. "AI" of this sort isn't magic, and it's certainly not intelligent. It's questionable that it's even a valid technique for good science. Because science and data analysis should be understandable, not a black box.