Google's DeepMind Predicts 3D Shapes of Proteins (theguardian.com)

← Back to Stories (view on slashdot.org)

Google's DeepMind Predicts 3D Shapes of Proteins (theguardian.com)

Posted by BeauHD on Monday December 3, 2018 @09:00PM from the medical-progress dept.

Google's DeepMind is using an AI program, called AlphaFold, to predict the 3D shapes of proteins, the fundamental molecules of life. "DeepMind set its sights on protein folding after its AlphaGo program famously beat Lee Sedol, a champion Go player, in 2016," reports The Guardian. The company says "It's never been about cracking Go or Atari, it's about developing algorithms for problems exactly like protein folding." From the report: DeepMind entered AlphaFold into the Critical Assessment of Structure Prediction (CASP) competition, a biannual protein-folding olympics that attracts research groups from around the world. The aim of the competition is to predict the structures of proteins from lists of their amino acids which are sent to teams every few days over several months. The structures of these proteins have recently been cracked by laborious and costly traditional methods, but not made public. The team that submits the most accurate predictions wins. On its first foray into the competition, AlphaFold topped a table of 98 entrants, predicting the most accurate structure for 25 out of 43 proteins, compared with three out of 43 for the second placed team in the same category.

To build AlphaFold, DeepMind trained a neural network on thousands of known proteins until it could predict 3D structures from amino acids alone. Given a new protein to work on, AlphaFold uses the neural network to predict the distances between pairs of amino acids, and the angles between the chemical bonds that connect them. In a second step, AlphaFold tweaks the draft structure to find the most energy-efficient arrangement. The program took a fortnight to predict its first protein structures, but now rattles them out in a couple of hours.

51 comments

Min score:

Reason:

Sort:

They took our jobs! by Anonymous Coward · 2018-12-03 21:25 · Score: 3, Interesting

DeepMind is moving out of the realm of curiosity (games) to things that employ people with a high degree of specialization. Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30. Granted, they had prior work to inform them. Anyway, this is interesting because this kind of development can put the PhD's in my lab out of a job - and they thought the truck drivers would be first to get automated!
1. Re:They took our jobs! by rkordmaa · 2018-12-04 00:28 · Score: 1
  
  That's a ridiculous position to take, science is about working on things we are somewhat aware of, but don't quite yet understand properly. And every question you answer opens up ten new ones. Nobody cracks protein structures for the sake of cracking protein structures, these problems are solved because they are prerequisites to solving other problems. Like looking for an Alzheimer cure etc. It's not about solving the riddle, it's about things you can do once you have solved the riddle.
2. Re:They took our jobs! by 110010001000 · 2018-12-04 00:37 · Score: 0
  
  Yeah, no. Protein folding computer programs have been around for years. It wasn't done by hand. More hype from the AI nutters.
3. Re:They took our jobs! by rkordmaa · 2018-12-04 00:54 · Score: 1
  
  Well no, it's not hype. They aced the competition by a good margin, clearly they are doing something right. It's not like the other competitors are solving these problems manually.
4. Re:They took our jobs! by bill_mcgonigle · 2018-12-04 01:11 · Score: 2
  
  You think that's bad? Radiologists are already significantly better with AI and give it a few more iterations and you'll only need a few of the best radiologists to handle the edge cases, then it's all machine learning on outliers.
  Sorry about that fellowship you did - back to primary care with you - don't forget to swap out that BMW for a Prius.
  
  --
  My God, it's Full of Source!
  OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
5. Re:They took our jobs! by Anonymous Coward · 2018-12-04 01:30 · Score: 0
  
  Since when was 25 out of 43 "aced"?
  When I was in school that would have been called "failed". Just because it got 22 more correct than second place doesn't change the fact that it got less than 60% correct.
  All this really tells us is that the expensive and laborious methods for determining protein folding is still necessary as the simulation methods aren't very good.
6. Re:They took our jobs! by Anonymous Coward · 2018-12-04 02:10 · Score: 0
  
  google deep mind has IQ of 56, I'm sorry for those proteins....
7. Re:They took our jobs! by rkordmaa · 2018-12-04 02:26 · Score: 1
  
  Heh, sure, Google's result obviously doesn't mean that the work is done and over with, nothing else to there worth bothering with. But it's still a much better tool than what was available before for the given task. And that's just the first result, AlphaGo took quite a while until it was capable of conclusively beating the world master. I would expect their results for protein folding improve. Especially if they combine it with existing models to cover any blind spots.
8. Re:They took our jobs! by 140Mandak262Jamuna · 2018-12-04 02:49 · Score: 1
  
  Come on, you can do better than this. I finally looked you up and you got 2^8 +5 comments. You can do much better than this. Why obsess with AI nutters, space idiots and Tesla fanbois?
  Who is a bigger idiot, these nutjobs or the one with the mission to correct every nutjob on the net? Just chill. Why waste good time chasing these nutjobs.
  Best of luck buddy.
  
  --
  sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
9. Re:They took our jobs! by Anonymous Coward · 2018-12-04 02:57 · Score: 0
  
  Google's result is definitely interesting. In particular, it suggests that the protein folding problem might possibly not be completely hopeless.
  But competitions like CASP are a very low bar. Very few labs are seriously doing protein structure prediction because the overwhelming consensus is that predicting the structure of a truly novel protein sequence (where there are no known structures of similar distantly related proteins) is impossible. Occasionally, a lab might sacrifice a grad student to a protein structure prediction project (kind of like buying a lottery ticket): you know that the grad student will certainly fail but you don't really like the grad student anyway and you like the fantasy of solving the protein folding problem. So, these days, the only people (trying) to do protein structure prediction are a few grad students that nobody likes and then some cranks and crackpots at the fringes of science with enough tenure to not need to make any real scientific discoveries.
  Google "winning" at the CASP protein structure prediction competition is like some teenagers "successfully" beating up a drunk homeless guy. Not a high bar at all even slightly. But if it turns out that Google is really onto something - that it's really possible to predict the structures of truly novel protein sequences - then this is a huge discovery and very much worthy of multiple Nobel prizes.
10. Re:They took our jobs! by Anonymous Coward · 2018-12-04 06:01 · Score: 0
  
  And they sucked by comparison. Try not to call people who are smarter than you "nutters". You're showing your ass in public.
11. Re:They took our jobs! by Anonymous Coward · 2018-12-04 06:03 · Score: 0
  
  The people that he calls "nutjobs" are anything but. Limited minds misinterpret the world around them. Limited minds with poor personalities lash out at the people who are smarter than them.
12. Re:They took our jobs! by ceoyoyo · 2018-12-04 06:21 · Score: 1
  
  More like swap out that Maserati for a BMW. The primary care types do pretty well too, but it's hard to match the throughput of a good radiologist.
  Problem with the primary care physicians is that the part of their job that's not vulnerable to machine learning is done better by nurses. Surgeons should have job security for a while.
13. Re:They took our jobs! by 140Mandak262Jamuna · 2018-12-04 06:58 · Score: 1
  
  You are an anonymous Coward. He has level 35 achievements. Not going to treat him like just another troll. Actually regret calling him a dimwit. He will come around, once he sorts out his demons.
  
  --
  sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
14. Re:They took our jobs! by ragahast · 2018-12-04 07:06 · Score: 2
  
  Google's team of 10 people produced a better result with 2 years of work than the entire academic field has been able to produce in the last 30
  That's not a correct reading of the results. First, previous efforts are based on putative understanding about how proteins fold. Obviously, this understanding is incomplete - or the physics based methods would perform better. (Even statistical potentials like in Rosetta are physics based in important ways). Second, DeepMind isn't even on the radar in the server component of CASP. The server competition is intrinsically more difficult because it requires robust software that isn't highly dependent on user parameters. Rosetta for example is ~20th in the general competition and 4th place in the server competition.
  Finally, DeepMind has not demonstrated the historical performance of their approach. They should see how well novel protein folds solved after e.g. 2005 are predicted using only structures solved before 2005 to train. To the extent that Rosetta works, it works in such an environment. In fact, one its first results was a novel fold (Top7).
  
  --
  .:Semper Absurda:.
15. Re:They took our jobs! by nospam007 · 2018-12-04 08:25 · Score: 1
  
  "Since when was 25 out of 43 "aced"?
  When I was in school that would have been called "failed". Just because it got 22 more correct than second place doesn't change the fact that it got less than 60% correct."
  Second place had 7% correct. Winner got 800% more correct answers than 2.
  That's not bad IMHO.
16. Re:They took our jobs! by Anonymous Coward · 2018-12-04 11:49 · Score: 0
  
  And the pocket calculator has crunched more numbers in a few decades than all the computers of human history.
  * "computer" used to mean someone (usually a girl) who sat at a desk and did the actual pencil + paper arithmetic
TRUMP says this is China, taking our jerbs by Anonymous Coward · 2018-12-03 21:28 · Score: 0

and for once, he is right. Am I right!
Research Paper Needed by BSalita · 2018-12-03 21:28 · Score: 1

I'm looking forward to the research paper to address key questions. What resources (training, inference) did Google use and how do they compare to the competition? Was this mostly a machine learning problem with big data, or a big data problem with some machine learning? Is there a GitHub yet?
1. Re:Research Paper Needed by MrMr · 2018-12-03 22:18 · Score: 1
  
  Normally the CASP proceedings appear more than a year after the meeting. There is some info on their own website: https://deepmind.com/blog/alph...
  An interesting question is the claim that they generate shapes ab initio, but using a neural network. I wonder how much the network has been trained to recognize existing (evolutionary dependent) protein families and their patterns vs. a new random sequence folder. The former may be just as useful in practice but may teach us a bit less about the mechanics of folding.
  Looking forward to the publication.
2. Re:Research Paper Needed by Anonymous Coward · 2018-12-03 23:36 · Score: 0
  
  How do research papers work when the researcher doesn't know how it works?
  The AI has the logic not the researcher. They don't know how it does it, they only know in generic ways how their neural net works.
  A lot of these DNN papers are going to be boiletplate nn configuration data and lists of training sets.
3. Re: Research Paper Needed by Anonymous Coward · 2018-12-03 23:53 · Score: 0
  
  The problem here is that what we generally want to do is simulate proteins and their behaviour. The simulation programs are complex, the models are highly parametrised and they need lots of computer power. So, the big question is how hood are they? The problem is that there is s paucity of experimental data for direct comparison.
  The point of this challenge isnâ(TM)t so much that we need to predict the fold of random proteins - thatâ(TM)s of limited interest. The point is that if your physics-based program can get the right answer itâ(TM)s a good sign that the approximations that youâ(TM)ve had to make are reasonable. From this point of view the Google solution is completely useless - it gives you the answer, but we were interested in the process that created the answer and not the answer itself.
  If the Google crowd want to do something actually useful they should investigate predicting small molecule crystal structures instead.
4. Re:Research Paper Needed by rkordmaa · 2018-12-04 00:30 · Score: 1
  
  Does it matter how NN gets the result? You only need to check if it's correct or not.
5. Re:Research Paper Needed by Anonymous Coward · 2018-12-04 00:37 · Score: 0
  
  Yes. Because science are in the business of making predictions. And NN do not work outside the bounds of their training set. As an exercise for the reader. Take some 1 dimensional data that randomly samples a sine function. Then train a NN to represent that data. A simple Linear ReLU Linear ReLU Linear network will do fine. You’ll see that the trained NN function is just a a bunch of line segment approximations of the true function. Your also see that it’s once you move beyond the bounds of the training data, it diverges into something useless.
6. Re: Research Paper Needed by Anonymous Coward · 2018-12-04 00:56 · Score: 0
  
  Thought experiment guy will no doubt be doing crystals shortly given theres a crystal on the front of a circular polarizer filter he mentioned, that crystal slows light in one axis different from the other. The only know field in light is electric oscillating so the interaction with that crystal is electric oscillation.... probably his f2 donuts lining up in 1 axis and f1s in another or some such....see his proton and electron models..... but he's probably struggling with computer power right now and needs to do another reduction to go even attempt molecules. Just guessing. Man that guy is a kook. Hope he doesn't explain diffraction in some simple easy to understand way that also applies to gravitational lensing next week. The bastard. Man I hate that troll.
7. Re:Research Paper Needed by rkordmaa · 2018-12-04 00:59 · Score: 1
  
  They had the best predictions in the entire competition though, by a good margin too.
8. Re:Research Paper Needed by Anonymous Coward · 2018-12-04 01:02 · Score: 0
  
  Because it’s only interpolating within the bounds of its training set. Give it data well outside and it will fail miserably.
9. Re:Research Paper Needed by rkordmaa · 2018-12-04 01:14 · Score: 1
  
  Well, duh. It's trained to predict protein folding, it wouldn't be much good for that if it were trained to play go. But it clearly outperforms the physics model based solutions. Which in turn outperform what you can pull off by scratching your head at the problem. It still wins, which is what the competition is about.
10. Re:Research Paper Needed by Anonymous Coward · 2018-12-04 01:19 · Score: 0
  
  I clearly have to explain this to you because you are a retard. If you have 1d data at 1,3,5,6,8,10. You can make a fairly accurate predictions at 2,4,7,9. We call that interpolation. What we cannot predict is points at 14,20,200,-50. Because that requires extrapolation.
  NN can only do interpolation. They cannot by their very nature do extrapolation. And no googles parlor trick does not put perform full physics model.
11. Re: Research Paper Needed by rkordmaa · 2018-12-04 01:22 · Score: 1
  
  More accurate predictions are useful for verifying actual protein structures though, which you need to know to make simulations about their behaviours. Solving more protein structures very much does have value on it's own. And just because they tried their hand at protein structures, doesn't mean they won't do crystal structures next.
12. Re:Research Paper Needed by rkordmaa · 2018-12-04 02:09 · Score: 2
  
  It's not a physics model, well duh. The other competitors were physics models though and they came second no matter how much compute power they happened to have. Sure, physics models might be able to solve many different problems with lesser modifications, but if AI can be trained to solve a specific type of problem more efficiently than physics solvers could... Well, a more efficient solution is still a more efficient solution, even if it's not an answer to the ultimate question of life, the universe, and everything. If you can solve a problem better than anyone, then that's hardly a parlor trick.
13. Re:Research Paper Needed by rl117 · 2018-12-04 03:25 · Score: 2
  
  Protein structure is intrinsically a computational chemistry / physics modelling problem. It requires modelling of solvent and solute interactions with the protein structure, van der Waals, dipole, ionic and other electrostatic interactions, hydrophobic/hydrophilic interactions with the solvent and itself (and lipid bilayer for membrane proteins), minimising free energy of the whole structure, of which there may be multiple stable variants under different conditions, and interactions between different subunits as well as multimeric associations. It's hard, and it's computationally expensive. I used to work with a team of such modellers, all hardcore physics and maths people, in the computational biology department I used to work in. "AI" might be able to recognise certain patterns. And it might be able to predict certain structural motifs with a reasonable degree of accuracy. But it will always be limited by the training dataset as the parent tried to explain. It's not modelling, it's guessing by interpolation. There's no intelligence; it can't extrapolate to make predictions which it hasn't been trained specifically for. And there's no guarantee that the structures it predicts will be stable or valid in any way. "AI" of this sort isn't magic, and it's certainly not intelligent. It's questionable that it's even a valid technique for good science. Because science and data analysis should be understandable, not a black box.
14. Re:Research Paper Needed by rockmuelle · 2018-12-04 04:29 · Score: 1
  
  I wish I had mod points for this.
  Related: I wonder how well humans familiar with folding motifs and all the confounding factors present in nature would do vs. the models. While most chemists rely on modeling, NMR, and crystallography, the techs running these systems all have intuitions built up from years of generating structures.
  Would some of them outperform the models in the same way Google's approach did?
  -Chris
15. Re: Research Paper Needed by Anonymous Coward · 2018-12-04 05:37 · Score: 0
  
  Last CASP Baker labs (fold it associated team) had the highest overall score, this time they got destroyed by other methods - so no, they wouldn't do better.
16. Re:Research Paper Needed by ragahast · 2018-12-04 07:12 · Score: 1
  
  I wonder how much the network has been trained to recognize existing (evolutionary dependent) protein families and their patterns vs. a new random sequence folder.
  That's why they should use the historical validation approach! Train on structures solved before 2005, then predict only novel folds solved after 2005. Perform well in that context and I'll be impressed.
  
  The former may be just as useful in practice but may teach us a bit less about the mechanics of folding.
  Unlike the physics-based and statistical potential methods, can the DeepMind approach ever contribute to understanding how proteins fold? IMHO that's an open question, and one that's critical to their presumably forthcoming publication. For example, do their features weights say something interesting about cation-pi interactions? Rosetta infamously ignores cation-pi because of overfitting concerns (even though cation-pi can be very structurally important).
  
  --
  .:Semper Absurda:.
17. Re:Research Paper Needed by Mab_Mass · 2018-12-04 08:30 · Score: 1
  
  Your also see that it’s once you move beyond the bounds of the training data, it diverges into something useless.
  
  This, right here.
  A.I. is not some kind of magic bullet that solves all problems. Far from it, since all models depend deeply upon the set of training data that gets fed to it. In this simple sine wave example, it is trivial to come up with something outside of the training data, which shows quite clearly that not all problems are well-suited for machine learning.
  In terms of Alpha Fold, the set of training data is almost certainly the set of solved structures, with appropriate management of redundant/overly similar structures. Now, how they manage to bin/aggregate/select portions of this data to work around the variable length of protein sequences is not clear without seeing a detailed publication. These are the very important details that make or break machine learning.
  Taking a step back, however, this work isn't quite as groundbreaking as it may seem to a person unfamiliar with the field. From the brief descriptions on the AlphaFold blog, it looks like they are using the NN to predict contact maps and bond torsion angles, followed by some kind of minimizer. These techniques in general are well-established tools in the field of structural biology. The real innovation is using their custom deep NN framework.
  Don't get me wrong, though. This problem is hella hard, and kudos for the authors for beating out the Zhang lab for the top spot. The Zhang lab has been working intensely on this problem for a long time. More than anything, that shows how powerful the deep NN approach can be.
18. Re:Research Paper Needed by Anonymous Coward · 2018-12-04 19:20 · Score: 0
  
  "the techs running these systems all have intuitions built up from years of generating structures."
  It's almost like they trained their neurons with repeated input.
STOP GOOGLE NOW by Anonymous Coward · 2018-12-03 22:00 · Score: 0

Stop Google now - before it's too late!
I love AI by Anonymous Coward · 2018-12-03 22:02 · Score: 0

Where can I buy its stocks!
1. Re:I love AI by mermeid007 · 2018-12-03 22:46 · Score: 1
  
  On a popular website, if at all. Hurry! It's almost Christmas!
Best Post by Anonymous Coward · 2018-12-03 22:03 · Score: 0

Perfect Post for 3d Shapes of Protein.
Thank you for sharing this post
1. Re:Best Post by Anonymous Coward · 2018-12-03 22:07 · Score: 0
  
  https://www.indianpremierleague-2019.com/
fortnight? by Anonymous Coward · 2018-12-04 00:20 · Score: 0

Does it use furlongs as well?
Google's algorithms do what algorithms have always by Anonymous Coward · 2018-12-04 02:08 · Score: 0

Fixed it. Verbiage matters.
Dont use relu by Anonymous Coward · 2018-12-04 02:15 · Score: 0

If you want none linear behavior you need more than relu's. If the training set contains all the magic rules (even if it doesn't contain all possible outcomes), a dnn should find that magic. But the researchers won't know those rules, they end up wih a magic black box.
I don't think peer review is relevent here in the real world. Results matter more than a 'peers' opinion of the results. Here goog have a concrete result.
1. Re:Dont use relu by Anonymous Coward · 2018-12-04 02:54 · Score: 0
  
  No amount of ReLUs will help you predict at 100*Pi when your training data only extends from 0 to 2*Pi.
2. Re:Dont use relu by ragahast · 2018-12-04 07:20 · Score: 1
  
  Results matter more than a 'peers' opinion of the results.
  You misunderstand the process. Peer opinions are based on the results. They are also based on years of study leading to an appreciation of what results are actually 1) interesting and 2) useful. These are crude words for the distinction, but to illustrate, if AlphaFold were to work perfectly it would only be useful. It wouldn't improve understanding and thereby advance science beyond making some specific current task potentially easier. (Even if it might be really great for engineering).
  
  If the training set contains all the magic rules
  There's good reason to think this training set doesn't contain all the magic rules. What the AlphaFold team should do is use structures solved before 2005 to train their model, and structures with novel folds solved after 2005 to test. If they can achieve very high absolute performance in that context, all critics will be silenced.
  
  --
  .:Semper Absurda:.
Wake up! This isn't important! by Anonymous Coward · 2018-12-04 03:56 · Score: 0

What's important is whether this program meets my definition of what an AI is (which is "exactly a human in every way, but instead of hating me like a regular person, it should love me and mod up my Slashdot posts"). Until AI can do that, I'm going to stand watch here, posting this desperately stupid comment on every story that references AI, and also on some that don't.
Thank you all for staying with me in these challenging times.
-10100100100100
"who cares how" by Anonymous Coward · 2018-12-04 04:59 · Score: 0

neural nets are just crystal balls with a memory and dumb as a dog.
It only works in cases that don't matter as much by goombah99 · 2018-12-04 16:51 · Score: 1

The method they used only works when there are a gazillion similar sequences. It doesn't work for a unique sequence. So it's not an "ab initio" method, it's a fold recognition method done by recognizing the contacts then free form folding to fit that. But it can't infer contacts without massive sequence alignments to other proteins. Thus it has great value in those cases but other methods work in all cases not just that special case.

--
Some drink at the fountain of knowledge. Others just gargle.