Toward a 3D Search Engine
Plasma Droid writes "NewScientistTech has a story about a 3D molecular search engine that is over 1,500 times faster than anything previously developed. The researchers, from Oxford University, developed a lightning-fast way to quickly match 3D shapes mathematically. This could not only speed up searches for new drugs, but lead to 3D search engines, for finding objects uploaded to platforms such as Google Earth, they say." The problem will be in jump-starting the supply of 3D data about molecules and everything else.
I've always been of two minds about whether the drug industry was a good example of patents being cost-effective, because I suspect that very good technology will soon emerge that makes pharma R&D less expensive, by making it primarily a data-processing (esp. simulation) issue. Seems like this tech might be the first piece of that puzzle?
My turnips listen for the soft cry of your love
This is a really cool advance when working with molecules you already know the shape of, but it still doesn't get around the problem of what shape a molecule is in the first place. A protein molecule will naturally collapse into the shape with the lowest energy. If there are 100 atoms in the main chain, that's 99 different angles that it could have, that's 99 degrees of freedom. I hear that genetic algorithms are pretty good at finding the most lightly shape though, so this may not be as big a problem as it used to be.
Currently, the most common way to find the 3D shape of a particular molecule within a database is to superimpose a candidate over the query molecule and see how much of it overlaps. But this is time consuming, partly because it requires both molecules to be precisely aligned.
Yes, that's currently "the most common way" because at least you can tell what you're getting: when you get a match, you can actually say how close the different shapes are to one another.
The new technique uses a different approach. It analyses the position of the different atoms within a molecule to understand its shape. These relative positions can be mapped and stored a molecular database.
That's actually not a "new technique", it's an old technique. It's what people used to do before they tried to overlay 3D shapes accurately. They used to do that because computers used to be too slow to do the accurate comparison.
As the article points out, there is only limited 3D shape information available at all. Few people need to do 3D queries right now, and there is little data to do them on, so optimizing speed is the wrong thing to do; we need to optimize accuracy and scientific relevance.
Hmmm. Maybe it depends on whether you can convert from internal coordinates to a 3D structure. What you seem to be suggesting is moving through structure space, matching as you go along.
So at any point, you have to generate images of the 'neighbours' of the current structure. It could work. Maybe.
This is quite an interesting achievement. The tools that I am familiar with can only search for 2D structures like functional groups (alcohol groups, aromatic rings, etc). At their best, they might give the ability to search for R- and S- stereoisomers, but that is it. This is pretty enough for tasks like solvent design that are quite frequent in the chemical process industry, but in the pharmaceutical R&D they need more powerful tools.
I will give a simple example of an enzyme: These nice molecules catalyze reactions of vital importance in the modern pharmaceutical industry by providing a chemical "lock" where the "keys" (i.e. the reacting molecules) will dock on. This enables them to react and form a new molecule that will then undock from the enzume leaving the "lock" free for the next pair.
These "locks" are actually 3D structures of appropriately aligned molecules. This is where this search ability comes in: The chemist suspects how the appropriate lock would look like for catalyzing his reaction (3D alignment of functional groups), much like someone suspects what the right keywords for a Google search are. Then he feeds the data to the machine and gets the molecules that are likely to be of assistance in his work. After that, he can make experiments testing these enzymes to see if they actually work.
This should speed things up very much in biochemical research. It means less literature research and less failed experiments.
So the summary says it's 1500 times faster. OK then, if i double the number of items in the database and compare again, is it still 1500 times faster? What if we do a million times the number of items?
It's nice to know what shape a molecule is. It would be even nicer to be able to make a molecule in a particular shape. If you map an enzyme's active site -- its topology, charge distribution over the surface, possibility for organometallic or hydrogen bonding -- you have a much better chance of finding some interesting analog to the enzyme's substrate that'll make the system do something new. Even better, you could take an existing molecule that you *want*, and form an enzyme surface so that two cheap molecules, exposed to your new enzyme surface, will find it thermodynamically favorable to become the molecule you want, and suddenly you're in a very profitable business: you can breed chemical engineering factories rather than having to build them.
This poses a problem, similar to the (unstated) problem posed by the molecular printers in Neal Stephenson's Diamond Age: what happens when this sort of stuff starts to become widely available and people start engineering enzymes or instructing their printers to produce, say, heroin, or TNT? With molecular printers, presumably the first versions would only be able to produce structural stuff: printing bicycles, not martinis. But if we get to the point where we can design enzymes for a desired substrate -> product reaction, we have a real problem because it's all wet chemistry and there isn't an obvious hardware/firmware way to block people making anything their inventive, twisted little minds can come up with.
Mind you, I think that's great. I miss the days where I could order almost any chemical I wanted without having to wade through masses of paperwork, tracking, and laws intended to ban any drug analog that might have pharma activity. But it is going to have some very exciting side-effects.
Nostalgia's not what it used to be.
This makes me wonder if this could evolve to more general purpose 3-D searches, such as facial recognition, searching for a specific shape of car, suspect identification in a crowd based upon a combination of body shape, face, etc.
Go to: http://shape.cs.princeton.edu/search.html/ and select "Protein Database" from the drop down list, and enter "random" as the keyword. Next, the "find similar shape" links do full 3D feature vector matching against a database of 16900 protein molecule models, in a fraction of a second. But apparently this new method is "1500 faster than anything previously developed"? Maybe the authors never checked the current 3D shape matching literature?
Okay I just read the original research article in the royal society. I'm struck by three things 1) the guys who did this are big players in the bussiness 2) the work is startlingly unoriginal and seems to have no reading outside their narrow community in other areas where geometric hashing on moments is routine. 3) They don't even seem to appreciate what is interesting about their own work (the speed--no, all geometric hashes are that fast). But rather the only interesting thing is why their ad hoc, and not particularly imaginative, feature vectors empiricall may beat other proposals. Since they only compare it against some ancient ones one can't really decide if these feature vectors are better or if computers got better since 1992.