Toward a 3D Search Engine

Enter Search Term: by LiquidCoooled · 2007-03-08 05:32 · Score: 5, Funny

Boobies, extra large please.

--
liqbase :: faster than paper

Re:Enter Search Term: by xtracto · 2007-03-08 05:56 · Score: 0

Oh come on mods!

This was funny... please buy a sense of humour...

--
Ubuntu is an African word meaning 'I can't configure Debian'

3d search engine? by Anonymous Coward · 2007-03-08 05:33 · Score: 0

this could revolutionize pr0n.

WOO HOO! by Lumpy · 2007-03-08 05:37 · Score: 2, Funny

Finally I can search for Dodecahedron porn!

--
Do not look at laser with remaining good eye.

Re:WOO HOO! by Kenja · 2007-03-08 05:38 · Score: 5, Funny

Hot molecule on molecule action! See uncensored carbon bonding!

--

"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"

Shape versus negative space by goombah99 · 2007-03-08 05:37 · Score: 5, Informative

It's pretty easy to geometrically hash or construct reduced feature vectors for matching. People (like me) have been doing this for years. It's much harder to know if a molecule will fit into a crevice or negative space. THe latter is probably more important to drug design. the reduced feature vectors let you know quickly if two molecules are simmmilar in shape. Which is the title given to the article. But then this is discussed in the context of drug targets. A harder problem. What maybe new or clever here is that they found a very useful set of feature vectors.

--
Some drink at the fountain of knowledge. Others just gargle.

Re:Shape versus negative space by Anonymous Coward · 2007-03-08 05:44 · Score: 4, Funny

It's pretty easy to geometrically hash or construct reduced feature vectors for matching. People (like me) have been doing this for years

I bet you have to beat the chicks away with a stick.
Re:Shape versus negative space by Anonymous Coward · 2007-03-08 07:05 · Score: 0

I would say is that it is relatively easy to do a search once such a set of reduced feature vectors has been created, but finding the correct set of features automatically and without prior suppositions or assumptions about the data is not as easy. Finding the most appropriate representation is key, and relies on finding an appropriate data set on which to do the analysis which is representative enough but not too large.

With drug discovery it is also important to look at potentially more than the 3D shape, though, such as charge distributions.

For an alternative technology see www.cs.york.ac.uk/auramol
Re:Shape versus negative space by certain+death · 2007-03-08 07:21 · Score: 0

Well...I am sure he is beating SOMETHING :o) bump, bump, bump...is this thing on?

--
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus

so? by mastershake_phd · 2007-03-08 05:39 · Score: 1

Its going to be full of spam in under a year. You cant stop those guys.

--
Libertarian Leaning Political Discussion Forum.

Re:so? by Bat+Country · 2007-03-08 07:13 · Score: 2, Funny

Great, I can finally search for the chemical formula for C14L11S, which honestly has been puzzling me for some time. Apparently it affects the molecule P3N1S.

--
The land shall stone them with the bread of his son.
Re:so? by iago-vL · 2007-03-08 09:11 · Score: 2, Funny

Am I the only one who had to stop and think, "Ok 14 atoms of carbon combined with..... what the hell element is 'L'?"

--
http://www.skullsecurity.org/blog/
Re:so? by The+Great+Pretender · 2007-03-08 10:30 · Score: 1

No, but the 14-year old gamer inside of me quickly burst out, slapped my forehead and set me straight.

--
A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.

Impact on Pharma (esp. patents) by Mateo_LeFou · 2007-03-08 05:40 · Score: 4, Interesting

I've always been of two minds about whether the drug industry was a good example of patents being cost-effective, because I suspect that very good technology will soon emerge that makes pharma R&D less expensive, by making it primarily a data-processing (esp. simulation) issue. Seems like this tech might be the first piece of that puzzle?

--
My turnips listen for the soft cry of your love

Re:Impact on Pharma (esp. patents) by Anonymous Coward · 2007-03-08 05:57 · Score: 0

People have been thinking that for forty years, and we're still a long, long way from it being the case. That garbage can now be generated 1500 times faster isn't going to suddenly change that.
Re:Impact on Pharma (esp. patents) by ThosLives · 2007-03-08 06:20 · Score: 2, Insightful

The problem isn't that it takes a while to find new stuff. The problem is the barriers to entry are so high that sufficient competition can't take place, hence there is no pressure to work quickly. Basically the medical industry is *not* a free market.
Now, I don't think the barriers need to be removed, because most of the high barrier is to ensure that treatments are effective without nasty side effects. About the only part of the barrier I can see being removed is somehow changing the liability laws, but I don't know what would be acceptable.

--
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
Re:Impact on Pharma (esp. patents) by Anonymous Coward · 2007-03-08 06:26 · Score: 0

I agree. A drug is usually designed to interact with a single protein, but the cell consists of many different proteins, fatty acids and other molecular substances, that can possibly also interact with the drug. There's no quick way of telling with which ones the drug will interact. Also a drug is usually added systemically, ingested or trough infusion. You can't predict the effect of the drug in all the cells that the drug reaches.
Re:Impact on Pharma (esp. patents) by Anonymous Coward · 2007-03-08 07:06 · Score: 0

GIGO. They can (and do) a LOT of simulations. The problem is the mathematical models used in molecular modeling of drug/enzyme interactions aren't good enough yet to actually predict much. Ab initio models (from first principles, just using physical laws and constants) can only work with tiny molecules in very simple environments right now. Anything bigger and they need to add multiple fudge factors, which work in only limited situations.
Right now, they're still years away from getting all the kinks out of modeling how water (three atoms) works, let alone molecules with thousands of atoms floating around in a complex soup.
And as far as getting a drug to market - finding a drug that interacts with your target is the EASY part.
Re:Impact on Pharma (esp. patents) by Red+Flayer · 2007-03-08 07:14 · Score: 2, Insightful

The problem is the barriers to entry are so high that sufficient competition can't take place, hence there is no pressure to work quickly.

Except the barriers to entry are mostly not regulatory in nature. As with most advanced R&D-based industries, the barriers are brainpower and equipment. There's plenty of capital out there to handle the hit-and-miss nature of drug design, and the regulatory restrictions on drug production and marketing are not barriers to entry for research.

IMO, what is truly limiting the pharma industry is profit incentive. Big pharma researches the things that will make them the most money -- which, BTW, are not cures for diseases, but rather treatments for conditions.

The 'competition' you speak of has nothing to do with R&D of new drugs. Barriers to entry prevent new entrants from producing and selling a commodity good, and new drugs are by no means commodities (patents have a lot to do with that). If you're talking about R&D as a commodity, that's a whole different discussion -- but again, it's brainpower and equipment that are the limiting factors causing the barriers to entry.

As for incentive to work quickly, that is not the case. There is definitely an incentive to work quickly as there is competition from all the big companies -- look at the COX2 inhibitors that were all the rage as low-side effect NSAIDS a couple years ago until certain really bad interactions manifested. Merck, Schering-Plough, everybody was in the game when the new sub-class was discovered. It was literally a rush to market, which is why the adverse effects weren't recognized until post-phase 4 trials.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Re:Impact on Pharma (esp. patents) by ponos · 2007-03-09 00:00 · Score: 2, Interesting

IMO, what is truly limiting the pharma industry is profit incentive. Big pharma researches the things that will make them the most money -- which, BTW, are not cures for diseases, but rather treatments for conditions.
This is not entirely accurate. From a business standpoint, if you sell a cure and your competitor sells a "treatment", you'll erase them from the map. So they would definitely like to "cure" things. However, most of the rich, western people do not suffer from diseases per se, but from "risk factors" like hypertension, diabetes, hypercholesterolemia etc etc. The treatments for these conditions are extremely effective but a cure is almost impossible (unless you manage to install a new pair of kidneys or a new pancreas etc).
Except the barriers to entry are mostly not regulatory in nature. As with most advanced R&D-based industries, the barriers are brainpower and equipment. There's plenty of capital out there to handle the hit-and-miss nature of drug design, and the regulatory restrictions on drug production and marketing are not barriers to entry for research.

FDA approval is a regulatory barrier and demands very lengthy, very expensive and time consuming pre-clinical and clinical testing. You can't just stab someone with a syringe full of X just because the computer said it works. You need to go through all proper procedures, including testing in mice, primates, healthy volunteers, otherwise healthy patients (i.e. patients that don't have anything else than the disease you want to treat) and the general patient population. You also have to determine lethal doses, drug interactions with a billion other things (foods? additives? common drugs?), allergic reactions etc etc.

My point is that the "hit and miss" process is not just a wasted stack of paper or some CPU cycles but a process involving real patients, possible deaths, legal battles. After that you'll need a host of research publications to persuade the medical community, marketing exposure etc. A "miss" is a very, very costly thing. Take Merck and Vioxx for example.

P.
Re:Impact on Pharma (esp. patents) by Red+Flayer · 2007-03-09 05:01 · Score: 1

FDA approval is a regulatory barrier and demands very lengthy, very expensive and time consuming pre-clinical and clinical testing.

But it's not a barrier to entry, since established companies must also comply with FDA regulations. Barriers to entry imply that only new entrants face the the barrier.

Take Merck and Vioxx for example.
That is exactly what I was referring to with the COX2 inhibitors... Vioxx is the specific example.

From a business standpoint, if you sell a cure and your competitor sells a "treatment", you'll erase them from the map. So they would definitely like to "cure" things.
Recent examples? What is happenig now is that they primarily try to put out a competing product, rather than a cure. It's not about putting your competitor out of business, it's about maximizing your profits. The two are not the same, since again, R&D (and drug products!) are not a commodity good where you maximize profits by removing competitors.

My point is that the "hit and miss" process is not just a wasted stack of paper or some CPU cycles but a process involving real patients, possible deaths, legal battles.
Agreed. But patients' lives, legal battles, etc, boil down to fiscal liability and cost... from the business standpoint, which is what the drug companies consider.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Re:Impact on Pharma (esp. patents) by Anonymous Coward · 2007-03-09 20:56 · Score: 0

"But it's not a barrier to entry, since established companies must also comply with FDA regulations."

If you are a big company with an established stream of revenue you have the financial reserves to take a new potential drug through trials and this is a financial base that new companies don't tend to have. Thus it is difficult for a new company to launch itself as one initially researching and producing new drugs as the regulatory framework has the unfortunate side effect of creating a barrier to entry. The need for trials is, of course, cruicial.

The easier routes are now to start up a company producing generics to get the revenue stream in place giving the financial reserves to assemble an R&D team and get new drugs through trials. This is beginning to happen in India, for example. Alternatively you can set up a company that does just the initial (pre-trials) phases of discovery and then passes on the potential drugs to a big pharmaceutical company to take the risk on the trials and marketing phases. There are a number of companies on this latter path.

I'll bring the Hot Grids by Mateo_LeFou · 2007-03-08 05:43 · Score: 2, Funny

couldn't resist

--
My turnips listen for the soft cry of your love

Good, but just one tiny bit of the problem by filthWisard · 2007-03-08 05:43 · Score: 5, Interesting

This is a really cool advance when working with molecules you already know the shape of, but it still doesn't get around the problem of what shape a molecule is in the first place. A protein molecule will naturally collapse into the shape with the lowest energy. If there are 100 atoms in the main chain, that's 99 different angles that it could have, that's 99 degrees of freedom. I hear that genetic algorithms are pretty good at finding the most lightly shape though, so this may not be as big a problem as it used to be.

Re:Good, but just one tiny bit of the problem by GMO · 2007-03-08 06:03 · Score: 1

It's not a protein search engine, it's for small molecules.

Also, the search space for polypeptides is more restricted than that. There are only so many allowed torsion angles.
Re:Good, but just one tiny bit of the problem by picob · 2007-03-08 06:53 · Score: 2, Interesting

Usually the aminoacid sequence is known, and you can find structures of similar aminoacid sequences in databases using a BLAST (search algorithms). If that doesn't give a structure of which the structure (preferably from a crystal, otherwise NMR) was determined you can try to predict the protein structure: proteins have domains, small subsequences of which the shape is known. Many domains are known that have a particular shape. If you have determined a few of these then it becomes a lot more easy to determine the rest of the protein.
Re:Good, but just one tiny bit of the problem by illerd · 2007-03-08 06:55 · Score: 1

Sequence similiarity tends to imply structural similarity. Find another protein with a similar peptide sequence and a known structure, use this structure as your search query, and you've got a pretty good guess of what your protein might look like. Better yet, you've got a good starting point for your hackish protein folding method (monte carlo, genetic algorithm, neural networks, whatever)
Re:Good, but just one tiny bit of the problem by tfoss · 2007-03-08 10:53 · Score: 1

I hear that genetic algorithms are pretty good at finding the most lightly shape though, so this may not be as big a problem as it used to be.

They may be *better* at predicting structure, but they are still a shit long way from being any good. Remember that whole big Blue Gene deal, building the biggest baddest computer out there, that was done pretty much to be able to predict protein structure, and (last i heard) they still aren't even close. Every so many years a new technique for prediction comes out (Ohh, threading! *wait x years* Ohh, genetic algorithms! etc etc) with big expectations that works for a few proteins and thats about it.

-Ted

--
-=-=- Quantum physics - the dreams stuff are made of.

Comment removed by account_deleted · 2007-03-08 05:47 · Score: 3, Insightful

Comment removed based on user account deletion

Re:Problem...? by LordPhantom · 2007-03-08 05:51 · Score: 2, Insightful

No, that will be a problem. Once you have the database, what exactly am I supposed to input for searching? Will I need to learn how to create a 3D model in order to search for similar objects?
The rest of your comments are pretty valid, however in this case that would seem to be aside the point. Searching objects in this fashion would be as simple as metadata that is appropriate for 3d model searches. Rather than provide a base model, you could search the metadata supplied with/for/generated for shapes, and once you have a few from the library, use THOSE as searches for -similar- or combined models. It's actually quite possible, if of questionable use - not to mention your criticism could be thrown back at you by simply saying "What!??! A search engine for sound? That will never work, I'd have to learn how to whistle".

Re:Problem...? by drinkypoo · 2007-03-08 05:53 · Score: 1

No, that will be a problem. Once you have the database, what exactly am I supposed to input for searching? Will I need to learn how to create a 3D model in order to search for similar objects?

Even if you do, you can use a sketching tool (like google sketchup... mmm, sketchup) to whip out a basic 3d model.

Also, it could be done through a tree-selection process - where you pick from perhaps 9 images the model that looks the most like the one you want, and you continue in this vein until you find (or don't find) the one you're looking for. I don't know if their software would work well with this approach, though.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Speed versus Thoroughness by wsherman · 2007-03-08 05:57 · Score: 3, Insightful

NewScientistTech has a story about a 3D molecular search engine that is over 1,500 times faster than anything previously developed.

The implication both from the summary and from the article itself is that this new search is just as thorough as other search methods but much faster. To prove thoroughness they would have had to show that anything found by other search methods will also be found by their new, much faster, search method. I doubt very much that they were able to do prove this rigorously.

That's not to say that the problem of matching 3D molecular shapes is not important or that their research is not valuable. I would say, though, that it is misleading to claim that they have solved the 3D search problem with a much faster algorithm. There are many different measure of 3D similarity and, for many measures of similarity, the only way to guarantee an optimum match is by exhaustive search.

Note that, in general, every search will be exhaustive in the sense that the query must be compared to every entry in the database. The problem is that many measures of similarity have additional parameters that must be optimized by exhaustive enumeration for each comparison. The classic example is a measure of 3D similarity that pairs each atom in the query with an atom from the structure in the database. In the general case, all possible pairings must be tried through an exhaustive enumeration.

Re:Speed versus Thoroughness by drinkypoo · 2007-03-08 06:08 · Score: 1

In the general case, all possible pairings must be tried through an exhaustive enumeration.

Why should that be true? We are able to categorize textual content and build indexes based on word structure. Why couldn't we do the same thing with 3d objects, and thus be able to discard a large number of comparisons up front?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Speed versus Thoroughness by wsherman · 2007-03-08 06:36 · Score: 1

In the general case, all possible pairings must be tried through an exhaustive enumeration.
Why should that be true?

For some measures of 3D similarity there are shortcuts and for other measures there aren't shortcuts. For example, what happens if part of our query molecule is very similar to part of a molecule in the database we are searching? Does that count as a match or not? If the answer is that it does not count as a match, then we could sort our search database by number of atoms - only those molecules that have the same number of atoms as the query need to be considered. If the answer is that it does count as a match then all parts of our query molecule need to be compared to all parts of of every molecule in the database.
One of the most common methods for comparing molecules is to pair atoms in the query molecule with atoms in the molecule from the database and then add up some measure of the distance between the pairs of atoms. The most common measure of distance is root mean square (RMS) deviation. The problem with pair-distance similarity measure is that changing even a single pairing can dramatically change the best alignment (i.e. a rotation and a translation). The consequence of this is that the only way to be sure that the best pairings have been found is to try all possible pairings.
The deeper problem is that it's not clear that a rigid alignment is desirable. Many molecules are quite flexible. In that case, an optimal search would consider all possible pairing and all possible molecular conformations. Obviously, this is quite a large search space. A search algorithm that could find a guaranteed optimal pairing and conformation without exhaustive search would be a huge innovation. That doesn't seem to be what the people in the article have done, though. For that matter, it may even be impossible.
Re:Speed versus Thoroughness by illerd · 2007-03-08 06:48 · Score: 1

The implication both from the summary and from the article itself is that this new search is just as thorough as other search methods but much faster. To prove thoroughness they would have had to show that anything found by other search methods will also be found by their new, much faster, search method. I doubt very much that they were able to do prove this rigorously.

...the only way to guarantee an optimum match is by exhaustive search... I haven't read the paper, but I don't think this (a thorough comparison) is as hard as you think it is. The bioinformatics community is pretty good about sharing datasets and software. There are benchmarks datasets that researchers use for comparing shape-matching techniques. Pick, say, 100 query molecules and a database of 10,000 molecules. Search the database for each query, 1,000,000 queries, multiplied by the number of techniques you're comparing. Not that much work. Throw in Kabsch-style cRMS matching as a ground truth, and you're standing on pretty solid ground. Like I said though, I haven't read it, so who knows if they did this.

For any difficult optimization problem, there's bound be a hack that works very nicely. Maybe they found the hack.
Re:Speed versus Thoroughness by wsherman · 2007-03-08 07:33 · Score: 1

I haven't read the paper, but I don't think this (a thorough comparison) is as hard as you think it is.

What I was referring to was guaranteeing that a particular search method can find the best match. If I understand what you're saying, it may not be that important to guarantee a best match - which is a good point.
With respect to guaranteeing that a search has found a best match, there are two problems. The first problem is that the search method may not reflect what is actually desired. If you want to find the inhibitor that binds most tightly to an enzyme then find the molecule that has the smallest RMS deviation from a rigid alignment to a known inhibitor may not give the tightest binding. The second problem is that even if you restrict yourself to rigid RMS deviation, the only way to guarantee the best RMS deviation is to use that as your search method.
Re:Speed versus Thoroughness by Anonymous Coward · 2007-03-09 09:33 · Score: 0

There are sets of standard results for these sort of comparisons which are used as the gold standard for performance which we used to cross validate the system that I worked on and developed. These standard sets of results often use a slow but sure method (e.g. Bronn-Kerbosch) with sometimes additional verification by hand. This is not to say that a new method necessarily gets exactly the same results - often there are a few extra matches (which may be false positives), or maybe a few rejections (false negatives), but if the overall rate of these is low and at the margins of possible matches then the method can be viewed as acceptable. Often (when looking at drug discovery for example) it is better to not have false negatives at the expense of a few false positives as it is better not to miss the new wonder drug even if it means an extra couple of days discounting the false positives by cross-validating with other methods.

Typically with this sort of work the fast method is not the last method employed, but it makes employing the slow 'gold standard' methods on a handful of matches a much more tractable problem, or even allows eyeball examination of the matches by experts.

they got it backwards by oohshiny · 2007-03-08 05:58 · Score: 3, Interesting

Currently, the most common way to find the 3D shape of a particular molecule within a database is to superimpose a candidate over the query molecule and see how much of it overlaps. But this is time consuming, partly because it requires both molecules to be precisely aligned.

Yes, that's currently "the most common way" because at least you can tell what you're getting: when you get a match, you can actually say how close the different shapes are to one another.

The new technique uses a different approach. It analyses the position of the different atoms within a molecule to understand its shape. These relative positions can be mapped and stored a molecular database.

That's actually not a "new technique", it's an old technique. It's what people used to do before they tried to overlay 3D shapes accurately. They used to do that because computers used to be too slow to do the accurate comparison.

As the article points out, there is only limited 3D shape information available at all. Few people need to do 3D queries right now, and there is little data to do them on, so optimizing speed is the wrong thing to do; we need to optimize accuracy and scientific relevance.

Re:they got it backwards by Anonymous Coward · 2007-03-09 09:45 · Score: 0

"Few people need to do 3D queries right now"

It's very important for pharamceutical companies.

"and there is little data to do them on"

There are many databases of 3D representations of molecules.

Here's a little one to play around with (180MB uncompressed)

ftp://helix.nih.gov/ncidata/3D/nciopen3d.mol.Z

I am sure there are many more.
Re:they got it backwards by oohshiny · 2007-03-09 13:42 · Score: 1

It's very important for pharamceutical companies.

I didn't say it wasn't important, I said few people are doing these searches. The reason that's important is because it means that users can generally run this stuff on their desktops for hours, which is a lot more compute power available than, say, for your average web query.

There are many databases of 3D representations of molecules.

There are indeed. But the actual number of comparisons you need to do numbers in the thousands, not in the billions, as it is for other kinds of content.

Here's a little one to play around with (180MB uncompressed)

That is a negligible amount of data compared to other search tasks. I have 10000 times more text and image data than that sitting around on my desktop alone.

The fact remains: we don't need ultra-fast 3D comparisons at this point, we need ultra-accurate comparisons, because even ultra-accurate comparisons would be fast enough to solve the 3D search problems people actually have.

Hack the gibson! by Anonymous Coward · 2007-03-08 06:00 · Score: 1, Funny

We had 3d search engines over a decace ago: http://imdb.com/title/tt0113243/

Not enought structures? by ajax142 · 2007-03-08 06:00 · Score: 4, Insightful

The author lists an apparent problem of this 3D search as a lack of molecular structures and calls for a "jump start" in the supply of 3D data, I call BS on this claim. A quick look at the Cambridge Structural Database shows 400,977 strucutures of 363,931 different molecules. There are another 89,064 structures of inorganic molecules in the Inorganic Crystal Structure Database. On the biological side there are 3,425 structures of Nucleic Acids in the NDB as well as 42,082 structures of proteins and polypeptides in the PDB. If that still isn't enough for the authors, fire up any number of ab initio quantum chemistry programs and in a short time you can create a library of good guesses for the structure of small molecules.

I tend to think the authors of the article are refering to the problems of a "useable form" for the structures and easy access of many of these databases. The first problem is mearly a problem of converting between the various structural file formats out there, something a good programmer (or grad student) can solve is a few weeks or less. The second is a bureaucrat issue and not a scientific one.

Re:Not enought structures? by at0mjack · 2007-03-09 00:34 · Score: 1

No, completely wrong, I'm afraid :). The context here is virtual screening in drug discovery: you either have a protein cavity of known shape or you have a known inhibitor of a protein in an (either known or modelled) bound conformation. The question is "Which other molecules could fit the cavity?". The problem is that molecules are flexible. The average drug-size molecule has 6-10 rotatable bonds, and anywhere from 50 to several thousand different plausible 3D shapes. Crystallographic data from the CSD doesn't help: that tells you what structure each molecule takes up in a solid crystal, which will be completely unrelated to the shape it may adopt inside a protein active site. You mention QM programs: these are still quite a few orders of magnitude too slow to do conformation searches on databases drug-sized molecules. There are programs to do this using classical models, but all of them have issues, and the size of the databases becomes an issue. We (http://www.cresset-bmd.com/) have an in-house database holding up to 50 conformations on 4 million molecules: this is heading towards a terabyte of data and took a reasonable-sized Linux cluster a month to generate. That database is simply all of the compounds you could buy: if you wanted instead to search all compounds you could plausibly make in 2 reactive steps from commercially-available reagents you'd have a database with more than 10^20 compounds.

Lots of 3D bio data out there by ghoti · 2007-03-08 06:00 · Score: 1

The problem will be in jump-starting the supply of 3D data about molecules and everything else.

Well the RCSB Protein Data Bank would be a start, and there are tons of molecule data bases with 3D data that are only waiting to be thoroughly mined. The pharmaceutical companies have them, and there are free ones too.

In fact, the motivation for this research undoubtedly was the abundance of data that is out there but can't/could not be searched efficiently.

--
EagerEyes.org: Visualization and Visual Communication

Re:Lots of 3D bio data out there by at0mjack · 2007-03-09 00:36 · Score: 1

Firstly, only some families of proteins have any x-ray structural data about them: there are whole families that are effectively uncrystallisable.
Secondly, the protein's 3D shape is only half the battle. Small molecules are generally highly flexible, so to search them in 3D you need to enumerate their potential shapes first. That's not trivial for large sets of compounds.

Re:Problem...? by GMO · 2007-03-08 06:10 · Score: 2, Interesting

Hmmm. Maybe it depends on whether you can convert from internal coordinates to a 3D structure. What you seem to be suggesting is moving through structure space, matching as you go along.

So at any point, you have to generate images of the 'neighbours' of the current structure. It could work. Maybe.

Quite interesting by excelsior_gr · 2007-03-08 06:14 · Score: 3, Interesting

This is quite an interesting achievement. The tools that I am familiar with can only search for 2D structures like functional groups (alcohol groups, aromatic rings, etc). At their best, they might give the ability to search for R- and S- stereoisomers, but that is it. This is pretty enough for tasks like solvent design that are quite frequent in the chemical process industry, but in the pharmaceutical R&D they need more powerful tools.

I will give a simple example of an enzyme: These nice molecules catalyze reactions of vital importance in the modern pharmaceutical industry by providing a chemical "lock" where the "keys" (i.e. the reacting molecules) will dock on. This enables them to react and form a new molecule that will then undock from the enzume leaving the "lock" free for the next pair.

These "locks" are actually 3D structures of appropriately aligned molecules. This is where this search ability comes in: The chemist suspects how the appropriate lock would look like for catalyzing his reaction (3D alignment of functional groups), much like someone suspects what the right keywords for a Google search are. Then he feeds the data to the machine and gets the molecules that are likely to be of assistance in his work. After that, he can make experiments testing these enzymes to see if they actually work.

This should speed things up very much in biochemical research. It means less literature research and less failed experiments.

Ehm... it's how much faster? by lagfest · 2007-03-08 06:17 · Score: 2, Interesting

So the summary says it's 1500 times faster. OK then, if i double the number of items in the database and compare again, is it still 1500 times faster? What if we do a million times the number of items?

Great by organgtool · 2007-03-08 06:24 · Score: 1

So now whenever I search for information about caves or black holes, I'll get sent to goatse.

related problem by smellsofbikes · 2007-03-08 06:32 · Score: 2, Interesting

It's nice to know what shape a molecule is. It would be even nicer to be able to make a molecule in a particular shape. If you map an enzyme's active site -- its topology, charge distribution over the surface, possibility for organometallic or hydrogen bonding -- you have a much better chance of finding some interesting analog to the enzyme's substrate that'll make the system do something new. Even better, you could take an existing molecule that you *want*, and form an enzyme surface so that two cheap molecules, exposed to your new enzyme surface, will find it thermodynamically favorable to become the molecule you want, and suddenly you're in a very profitable business: you can breed chemical engineering factories rather than having to build them.

This poses a problem, similar to the (unstated) problem posed by the molecular printers in Neal Stephenson's Diamond Age: what happens when this sort of stuff starts to become widely available and people start engineering enzymes or instructing their printers to produce, say, heroin, or TNT? With molecular printers, presumably the first versions would only be able to produce structural stuff: printing bicycles, not martinis. But if we get to the point where we can design enzymes for a desired substrate -> product reaction, we have a real problem because it's all wet chemistry and there isn't an obvious hardware/firmware way to block people making anything their inventive, twisted little minds can come up with.

Mind you, I think that's great. I miss the days where I could order almost any chemical I wanted without having to wade through masses of paperwork, tracking, and laws intended to ban any drug analog that might have pharma activity. But it is going to have some very exciting side-effects.

--
Nostalgia's not what it used to be.

Possible application? by MercBoy · 2007-03-08 06:40 · Score: 2, Interesting

This makes me wonder if this could evolve to more general purpose 3-D searches, such as facial recognition, searching for a specific shape of car, suspect identification in a crowd based upon a combination of body shape, face, etc.

Re:Possible application? by excelsior_gr · 2007-03-08 07:07 · Score: 1

I suppose yes. After all, in the article it says that they are looking at the position of specific points in the general 3D structure and check their geometrical characteristics (skewness, relative distances, etc). This is what face-recognition software does in 2D right?

Re:Problem...? by zippthorne · 2007-03-08 06:51 · Score: 1

fine, hum then. the union of {people who can whistle} and {people who can hum} is quite large. Even if you only consider the subset of each who {would like to find random songs from vague recollections}

--
Can you be Even More Awesome?!

Re:I can do that! by drinkypoo · 2007-03-08 07:07 · Score: 1

That's great! Now if you could just do that 750,000 times in the next fifteen seconds, and tell me which shape in the set is most similar to this thing in my pocket...

(cue dick size jokes in 3...2...1)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

FFT by gr8_phk · 2007-03-08 07:14 · Score: 1

OK, so would it be helpful to do a 3D FFT of the density of the space containing the molecule centered at the CG ?? The frequency content is invariant under rotation, and the lowest spatial frequencies should be representative of the overall shape of the molecule. Just asking if you've tried this and how well it worked. It's just off the top of my head, but very old-school for image processing. I also suspect it may have some usefulness in matching molecules with the inverse space of other molecules.

Re:FFT by goombah99 · 2007-03-08 09:13 · Score: 1

Quick answer: yes variations on FFT have been tired out the wazoo. they are inded very successful for kinds of docking problems.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:FFT by Anonymous Coward · 2007-03-09 02:09 · Score: 0

Using the centroid as the base point can make finding partial matches (e.g. finding whether part of one molecule matches parts of another) or determining if a particular small fragment is contained within a larger molecule, but the particular implementation at Oxford may work around this. The big advantage of starting things up using the centroid rather than trying to do other forms of more complex feature extraction in the initial stage is that it is computationally cheap to do. If the next stages of the process can cope with the data and also produced good results cheaply then it may not be worth doing the computationally expensive feature extraction upstream. Often tools have a series of stages, pipelined, with the tools that use more computationally expensive methods towards the end of the chain, but operating on a smaller number of potential matches passed through from the previous stage.

I don't get it by Pedrito · 2007-03-08 07:18 · Score: 1

I guess what's unclear is what kinds of molecules they're trying to match. I work part-time in a university lab doing drug research. We synthesize variants of existing molecules and test them for efficacy in various diseases, though we do mostly work on cancer-related drugs. Some of the molecules we work with are very large and very complex. But finding what else is out there isn't genereally that difficult. Molecules are divided into a number of families and families of molecules are generally pretty similar to each other in shape. Searching by family name or for molecular sub-parts, generally works pretty well, I've found.

But there's a lot more to the chemistry than just the shape of the molecule. When it comes to drugs, you're often looking for something that will bind with a given protein and while shape plays a part in that, the functional groups on the molecule are major drivers in whether or not the molecule will actually do its work.

They don't really give enough specifics in the article to know how valuable this really is.

Typically by HomelessInLaJolla · 2007-03-08 07:26 · Score: 1

Proteins are typically characterized through X-ray crystallography. The drawback with X-ray analysis is that the protein must be in a crystallized form--this typically means that millions of occurences of the same protein are crystallized together. The shape that a protein takes such that it can form a crystal may not be the shape that the crystal takes when in the heterogenous solution of a cell. Fesik, at Abbott Laboratories, made ground breaking advances in the realm of solution phase study of the shape of proteins--SAR by NMR analysis. Still concerns remain because the solutions used to lend to NMR analysis are not the same as the heterogenous environment within a cell.

This creates a much larger problem in drug design. The medicinal chemists design molecules to fit active sites of proteins and enzymes but the shape of that active site is only determined from Xray, NMR, or computer generated lowest energy conformations. It is no surprise then that 3/4 of molecules which are advanced to clinical trials fail efficacy studies: in short, they simply do not work. Looking back it's quite logical that they do not work because they were designed to fit a shape that was not a proper representation of the shape which the protein takes within the actual cells, in vivo.

Making note of this was usually received with extreme vitriol by the management.

--
the NPG electrode was replaced with carbon blac

Great... by The+Orange+Mage · 2007-03-08 07:29 · Score: 2, Funny

Just what we need...another dimension to lose things in.

Anchoring by HomelessInLaJolla · 2007-03-08 07:31 · Score: 1

While it is fairly easy to predict the geometric shape of a small molecule the more difficult question is one of alignment. If an entire set of molecules, typified as more than one hundred, is considered then how are all of them aligned in 3D space such that they can be properly fit into the target active site?

I'm disappointed that I cannot read the actual article. While at Abbott (informally) and while at Battelle (in formal intellectual property documentation), I proposed that a vector (the term "vector " was in my IP release forms) for describing molecules in 3D space based on electronegativity, eletrophilicity, nucleophilicity, entropy (freedom of mation), and bulk (volume).

--
the NPG electrode was replaced with carbon blac

really? by GMO · 2007-03-08 07:55 · Score: 2, Insightful

Although the crystal structure is not the same as the structure in solution, it can't be that far off.

Crystals are pretty watery, much like the cell. Unless packing contacts are altering the active site, they are unlikely to be much different.

Also, the bulk of the structure is there to keep the active site residues in a particular orientation.

Perhaps management vitriol was partially justified? :) Only joking, you may be right. I don't work on drug design, only backbone structure.

Re:really? by HomelessInLaJolla · 2007-03-08 08:37 · Score: 1

The particular 3D crystalline form can differ even from one recrystallization solvent to another. In extreme cases a different configuration at even one rotatable center may significantly affect the shape of the rest of the protein.

The hope is that a given protein remains within a particular probability space and that the shape of the active site, refined gradually over millions of years, is highly stable. When 3/4 of drugs entering phase I clinical trials fail efficacy, though, the numbers speak for themselves.

--
the NPG electrode was replaced with carbon blac

Distorted Expectations by Jekler · 2007-03-08 08:23 · Score: 1

It's announcements like these that cause me to ponder just how far behind we are in terms of software development progress.

Back in 1993 I had a whole suite of MS Flight Simulator programs. (different cities were packaged separately. To the best of my recollection, I had Chicago, New York, LA, and Paris). Obviously the game detail was limited, this was before 3D accelerators, but the buildings were still 3D and key locations had fairly accurate roads. I remember reading in more than one computer magazine that these flight simulators were just the beginning, in 5 years (1998) we'd have 3D maps of the whole world. Looking for directions would be a thing of the past, we'd all have programs that could visually tour every nook and cranny of every location in the world.

It's astounding that computers were set to have a virtual earth in 1998. It's 14 years after I read those articles and we're not even close. Google Earth, the closest representation of such a vision, is about 1% of the way towards it.

I'm also reminded of the rise of VRML/3DML back in 1996. There was a site run by Superscape (vwww.com the Virtual World Wide Web), with links to hundreds of 3D web sites. Deployment of the 3D web was imminent! I thought it was the wave of the future, it was just a matter of time and refinement. 11 years later, we've all but tossed VRML/X3D/3DML in the toilet. The progress those technologies have made is absolutely minimal, not what you'd expect as a result of over a decade of work.

So am I excited about a 3D search engine? Not really. I don't even see it happening in my lifetime, never mind the next few years.

Re:I can do that! by blakmac · 2007-03-08 08:29 · Score: 0

nice. he replies to my post, and I get modded down for redundancy. in soviet russia, you mod slashdot -1 redundant!

--
http://wstewart.php0h.com - the sugarbuzz project blog

Re:Problem...? by Doctor+Memory · 2007-03-08 08:39 · Score: 1

No, that will be a problem. Once you have the database, what exactly am I supposed to input for searching? Will I need to learn how to create a 3D model in order to search for similar objects? Depends. Did you have to learn how to spell in order to use a text search engine?

The people who are going to be using this sort of database are going to already have tools available to create their models. People have been creating MOL and PDB files for quite awhile now, and if there isn't a file converter/importer then I'm sure there will be soon. Plus, researchers often want to just search for things that are similar to something they're already looking at. So what they'll do is take whatever model they're currently playing with, lop off chunks of it, and submit the remaining bit to the search engine to see if they've got anything similar on file. So it's not like anyone's going to have to sit down and drag-n-drop individual atoms until they have their model built up...

--
Just junk food for thought...

existing 3D molecule search engine by dr_blurb · 2007-03-08 08:46 · Score: 2, Interesting

Go to: http://shape.cs.princeton.edu/search.html/ and select "Protein Database" from the drop down list, and enter "random" as the keyword. Next, the "find similar shape" links do full 3D feature vector matching against a database of 16900 protein molecule models, in a fraction of a second. But apparently this new method is "1500 faster than anything previously developed"? Maybe the authors never checked the current 3D shape matching literature?

Re:existing 3D molecule search engine by dr_blurb · 2007-03-08 09:40 · Score: 1

The link is http://shape.cs.princeton.edu/search.html (without the trailing slash)
Re:existing 3D molecule search engine by at0mjack · 2007-03-09 00:40 · Score: 1

The Oxford group's technique is looking at a different problem: small molecule 3D shape matching. Surprisingly, this is actually harder than protein shape matching: proteins have a defined 3D shape, but small molecules are flexible and can a variety of shapes. So, you either need to have a flexible fitting method, or you need to enumerate 'example' shapes for each molecule you want to search against.
Compare your search against ~17K protein structures to a search across the roughly 4 million commercially-available compounds, each with 100 example conformations stored. You can see why algorithm speed becomes an issue.
Re:existing 3D molecule search engine by Anonymous Coward · 2007-03-13 09:42 · Score: 0

Princeton's 3D Shape Search engine is indeed a high quality shape matching method, but it is hard to tell how fast it is exactly from reading the relevant papers or making the suggested web-based query.

In any case, if you read the paper published at the Royal Society by these Oxford scientists, you will see that their method has a comparison rate of about 14,000,000 conformers per second (where a conformer is a 3D realization of a molecule, sometimes loosely speaking called simply a molecule).

This still can be three order of magnitudes faster than the Princeton's 3D Shape Search engine.

Crappy reasearch by Anonymous Coward · 2007-03-08 09:23 · Score: 1, Interesting

Okay I just read the original research article in the royal society. I'm struck by three things 1) the guys who did this are big players in the bussiness 2) the work is startlingly unoriginal and seems to have no reading outside their narrow community in other areas where geometric hashing on moments is routine. 3) They don't even seem to appreciate what is interesting about their own work (the speed--no, all geometric hashes are that fast). But rather the only interesting thing is why their ad hoc, and not particularly imaginative, feature vectors empiricall may beat other proposals. Since they only compare it against some ancient ones one can't really decide if these feature vectors are better or if computers got better since 1992.

Ewww... by Anonymous Coward · 2007-03-08 09:56 · Score: 0

I knew carbon was unique, but four-way bonding? That's just wrong...

Re:Ewww... by Anonymous Coward · 2007-03-08 12:37 · Score: 0

I knew carbon was unique, but four-way bonding? That's just wrong...
The front, the back, the top and the bottom. That doesn't seem too different from other forms of bonding.

Re:Problem...? by Anonymous Coward · 2007-03-08 11:01 · Score: 0

Check out http://www.vizseek.com/
searching a database of 3d objects with hand-sketches... Not a total search solution, but it does well for actual 3d models of objects.

UP TO 1000 times faster by Anonymous Coward · 2007-03-08 12:37 · Score: 0

Sorry, but sounds like BS to me. When was the last time in the field of technology something actually went 1500 times faster for any reason.

Just once..

Yet... they constantly claim these amazing performance increases. This appears to more practically be one of those cases where the best possible scenario is paraded as the expected average.

The claim that someone writes a program that is realistically 1500 times faster than it's leading competition is more or less ridiculous. Perhaps the problem is in how we measure speed/performance, but such a jump in performance is almost certainly not true in a realistic manner. That for the claim to be reasonably true it would have to come with a stipulation like the technology isn't actually useful for another 10 years.

If it's too good to be true... it's not true. I don't think we've EVER in the history of math or computers seen a 1500 TIMES improvement in anything no less that improvement being overnight. The claim has to be flawed to appear more impressive or just flawed period. Maybe it searches 1500 times faster but doesn't actually find anything :P or lacks significant detail. I just ask slashdot readers to realistically fathom how much 1500 times faster would actually be and then reason the statistic possibility of that claim being true. I doubt my Core 2 system is actually 1500 times faster than an NES or perhaps even Atari. It could be for certain uses, but overall I can't see the performance of even 20+ years having created a 1500 times performance leap. Don't bother using FLOPS to estimate such performance however as it a pathetic representation of real world performance. MHZ is pretty useless also, but a basic rule of thumb is that non x86 processor dust x86 processor on a mhz to mhz level. That said it would be realistic to say a Core 2 system is far less than 1500 times as powerful as an NES. While this supposed breakthough was made through software I guess.. I find that even less likely to be true. The search routine they are comparing it too would have to be so badly written to run that much slower it's just ridiculous to think drug companies are really that stupid. They aren't using a search routine that's 1500 times slower. There is just no way that is a realistic claim.

Re:UP TO 1000 times faster by Anonymous Coward · 2007-03-13 09:56 · Score: 0

Sorry, but you demonstrate to know very little about science...

I'll make it nice and easy for you. Take your fancy Core 2 system and your favourite programming language. Now define a float matrix of say 100,000 rows and 12 columns, take any column a query (that is, a vector of 12 floats) and compare the query against the rest of columns (vectors).

This is precisely what the Oxford Scientists have done and I have obtained the same published comparison rate. Then, check the rates previously reported in the chemoinformatics literature et voila: you got the 1500 fold improvement!

The hard bit of course is to come out with a methodology to encode the shape of a 3D object in such a compressed way (just 12 float numbers or descriptors, as they call them) and demonstrate, as they did in the paper, that very similar shapes to that of the query are found.

Musical pattern searching by smtrembl · 2007-03-08 13:38 · Score: 1

One thing more important and easier to do than 3d mesh matching is musical pattern matching---like searching on consecutive notes or chords or rythmes. It would be really easy to find a song with relative tone, and music is easy to index and search by interfacing over midi. Is google listening to us, musicians? Simon.

For robots by Anonymous Coward · 2007-03-08 14:19 · Score: 1, Insightful

Great for robot AI technology. With a couple cameras and some laser equipment, get a good 3D representation of what it's looking at, then run it through the list and find a match.

Re:For robots by Anonymous Coward · 2007-03-13 10:11 · Score: 0

How right you are!

That was called the generic 3D object recognition problem by Rodney Brooks (MIT), who highlighted it in New Scientist as one of the most important problems for the next 50 years.

I would love to read about the state of the art in this matter, can anyone recommend a good review paper or book?

83 comments