New Deep-Learning Software Knows How To Make Desired Organic Molecules (nature.com)
dryriver shares a report from Nature about a neural network-based, deep-learning software that is as good as trained chemists in figuring out what reagents and reactions may lead to the successful creation of a desired organic molecule: Chemists have a new lab assistant: artificial intelligence. Researchers have developed a "deep learning" computer program that produces blueprints for the sequences of reactions needed to create small organic molecules, such as drug compounds. The pathways that the tool suggests look just as good on paper as those devised by human chemists. The tool, described in Nature on March 28, is not the first software to wield artificial intelligence (AI) instead of human skill and intuition. Yet chemists hail the development as a milestone, saying that it could speed up the process of drug discovery and make organic chemistry more efficient. Chemists have conventionally scoured lists of reactions recorded by others, and drawn on their own intuition to work out a step-by-step pathway to make a particular compound. They usually work backwards, starting with the molecule they want to create and then analyzing which readily available reagents and sequences of reactions could be used to synthesize it -- a process known as retrosynthesis, which can take hours or even days of planning. The new AI tool, developed by Marwin Segler, an organic chemist and artificial-intelligence researcher at the University of Munster in Germany, and his colleagues, uses deep-learning neural networks to imbibe essentially all known single-step organic-chemistry reactions -- about 12.4 million of them. This enables it to predict the chemical reactions that can be used in any single step. The tool repeatedly applies these neural networks in planning a multi-step synthesis, deconstructing the desired molecule until it ends up with the available starting reagents.
Yeah, finding syntheses of controlled substances from uncontrolled precursors is an obvious use of this tech.
what good does it do to you to be able to computer-generate more compounds?
As a chemist, I can tell you that there are different bottlenecks at different stages of the development. Early on in the process, when you are working a lab bench scale, and you synthesize in the gram range, you can probably work around some issues. Some key processes may be an issue and you have problems synthesizing the right compounds. Also, there's the problem of how many sequential steps you need to make. If your compound is simple and you need to mix only two different compounds, then a 90% yield is fine. But if each step gives you 90% yield and you have 10 steps in a row you're down to 35%. Now imagine how many steps a compound like Taxol would take if you started from scratch...
Finally, there's the large scale process, the one that allows you to produce at industrial scale, think kg or tons, the molecules that survived the trial process. Here, you have a bunch of other considerations. Health hazards of the precursors (it's much simpler to work with stuff that has no environmental/hazards hazards, than on a fully sealed line), safety hazards (if the reagent burns in contact with air or water, it's probably a bad option), cost, yield, etc. At this step the reactions are typically redesigned, because what works well at the bench scale, doesn't work at industrial level. So, if you can do this step with a computer and it gives you a better option, it's a benefit, even if you're applying it only to the molecules you already know.