Datamining Medline for Gene Interactions - Pubgene
An Anonymous Coward wrote: "According to an article in the 5 May 2001 issue of New Scientist , biologists in Norway have developed a computer program to datamine Medline to predict interactions between genes. Some of the relationships hadn't been predicted before and were found to be real. The scientists' PubGene database and tools are available for experimentation." Wow.
Also, a paper in Bioinformatics was published recently which tries to extract protein interactions. They used a dictionary of words related to interactions, and then look for proteins which are mentioned in the same sentence that contain one of those dictionary words, along with part of speech analysis to improve accuracy.
Something like that.
Would it be of any assistance to setup something similar to distributed.net or seti, and just have people designate their idle cycles to processing all sorts of genetic interactions. For example, combining human DNA with DNA of other creatures to provide cures for various diseases. It'd be interesting to cross some human DNA with that of a lizard and get some regeneration action goin on (ala spiderman -- the doctor dude who turns into an alligator when it's least convenient. ;)
-kidlinux.
Dacels Jewelers can't be trusted.
we could get onto Enkephalonetics,
Which is......?
How soon until I get this sort of thing to write my English papers for me?
But it's already been done! Quick, go to
megahal.sourceforge.net and grab the latest sourcecode. Build it, and then run some sample english papers through it. Feed it some other miscellaneous stuff for good measure (how about the script to the movie Terminator and a few of the better Slashdot trolls), and let 'er rip! Sure, it may not make *logical* sense, but since this is for an English course, you'll get graded higher for originality and "thinking outside the box". Just don't mention my name when they come to take you away to your padded cell.
-1: Sleep Deprived
It's only software!
I believe that this is similar to that which Granny Weatherwax refers to as "headology".
"Enkephalos" is ancient Greek for "Head" or "cranium". Therefore, enkephalonetics = using your head. =)
How to Lobby Politicians http://www.zeta.org.au/~aldis/lobby.html
One of the final year comp sci projects here, reminds me of this, although AFAIK far simpler. One idea though, that was brought up during the student's presentation, is that this might work very well in a distributed computing situation.
Perhaps this will be the next SETI@home ?
Female Prison Rape in NY
So I guess you havn't heard of psychohistory?
How we know is more important than what we know.
Or worse, get good enough that they will replace researchers as well as be a tool for them. When one of these programs, while its operators aren't looking, deduces the existence of the scientific journals and starts trying to publish papers on its own, they'll have to worry.
I'm wondering if it's much different than Google is doing with web page data, on a larger scale?
Why is this moderated to Troll?
though doctors should ultimately bear the weight, pharmacist are the ones who are to screen for the interactions. Anyone can pass out meds, a pharmacist is schooled for a reason.
Hmm, shades of HAL? Lets hope when he's eventually developed that the developers (or creators?) incoroprate Asimov's 3 Laws of Robotics into this thing.
Wouldn't want these smart AI programs to "suggest" something that would be potentionally harmful.
-Cyc
/.'s 10 Millionth
Hmm. I think what you describe already exists. Unfortunately, nobody seems to know. Well, I left the company one year ago, but please have a look at www.kelman.de
As many of you may know, the drug/biologics industry has fought bitterly to protect ANY and ALL information relating to gene therapy trials as trade secrets. In fact, they protected such trade secrets about a certain adenovirus vector so well that none of the right people knew it had killed monkeys and seriously injured several humans before it killed Jesse Gelsinger last year (with a little help from a U. Penn research egotist). The upshot is that when these greedy moron subhuman bastards seek to protect prior evidence of toxicity as "trade secrets," people die for lack of the suppressed knowledge. It's shameful. We'll have to hope FDA and HHS have the balls to get these gene cowboys in line before they kill again.
Eloi are stupid, throw morlocks at them!
I think that the main thing is demonstrates is how poorly scientists choose to represent their data in the first place. Not only do we choose to put all this vital stuff into something which is totally unamenable to computation, but we sign the copyright over to various commercial interests.
Phil
This is why the Internet is so much more than a bubble, why there really IS a new economy with new laws, and why old farts in their fifties should get a clue.
If I hear another lame-ass comment about how the Internet is just like the tulip bubble in the 19th century, I am going to send them this link.
And, oh yeah. Perl is not just for script-kiddies either. So there.
Will code a sig generator for food
something like this, perhaps?
check out compare-stuff.com/pubmed to analyse relative co-occurrence in PubMed articles.
you can compare more than just gene names too: disease/condition names, reagents, techniques, author's addresses, whatever...
reload the entry page to see different examples of what you can compare
or ask similar questions on the web at large with the vanilla version: compare-stuff.com
here's the/an answer
quick plug for compare-stuff.com/pubmed. see my other /. posts for more info.
Maybe I'm missing something here, but isn't the fact that the other genes are being mentioned in the same article as the first gene already imply a relationship between the two? Why else would the authors mention them in the same article?
Another thing to consider is that scientist don't just go around randomly picking a gene and studying it. There are generally reasons why the gene is interesting, and those genes are studied more than others. There is a whole field of the sociology of science that deals with how the way scientists go about doing science influences the results that they find. It annoys the heck out of most scientists.
It is good that there were some relationships that the program found that had not been previously found, but essentially the program is an automated review article generator with a meta analysis component to organize and sort the data.
The idea posted above, that drug interaction would be a good thing to do as well, I can heartily agree with. Have the program go through not only the literature but also the PDR (Physician's Desk Reference), categorize pharmacological responses (e.g. what drugs cause blood pressure to rise by what mechanisms) and not only could we possibly avoid some nasty drug interactions, but perhaps we could find where some drugs act synergistically with each other to generate greater or new results that were not previously thought of.
If you can't beat them, embrace and extend them.
Now, what we need is for some the slashdot karma geeks to get off their arses and write an open source dataminer for slashdot articles amd posts. Of course it must have a NL front end and be able to answer questions like "how many dumb stories have Commander Taco posted?" or "how many /. users are communists or libertarian, or into goat sex?" Just a thought.
How soon until I get this sort of thing to write my English papers for me?
I'm betting, 25 years, too late for me.
I'd like to make something that compiles an essay from paragraphs and phrases in other works, that could be made in the next 2 years I think.
Too busy staying alive... ~ R.A.
In regards to your first point... I've done a pretty large amount of work with Medline. Certainly the technique does to some extent rely on a standard nomenclature. This is probably not such a hurdle, though. Each citation indexed in Medline is tagged with particular MeSH headings. MeSH is a controlled vocabulary of medical terms, with quite extensive supplements that include genetic and chemical information. The most relevent part here is that each heading is associated with a number of synonyms. So in addition to each article being indexed against a controlled vocabulary (by a trained human indexer), that vocabulary itself provides relationship information between various terms, both internal and external to the actual vocab. Also, there's the whole Unified Medical Language System, but I'm not really up to speed on that. It's pretty much independent from MeSH, and it's not used directly in Medline, AFAIK.
Steven N. Severinghaus
One question.. are the results of this project public domain, (in some way) or are they going to be snapped up and made proprietary? Seti is one thing (can't really make that a business),but I'm leery of donating my time to make someone else's fortune.
Sure structures and motifs are good to have, but there are a lot of structures out there that we don't know much more about. And the issue here is about interactions, beyond simple statements like "this is a catalytic protein" or whatever.
Can one give Pubgene a pdb or fasta file-- and find papers on homolougous genes or structurally similar proteins-- or must one use BLAST, or a fold recognition algoritm prior to searching Pubgene?
No, you are supposed to have a set of genes names that you are working with. Homology can be asses elsewhere. What you can ask this system is about known and inferred interactions out there.
Lars
__
Reality or nothing.
As a lot of people here have noticed, the basic technique used in PubGene is quite simple. The novelty of their work is mostly in how to evaluate co-citation of genes, and perhaps also in the quite comprehensive setup. Several other systems have been suggested and setup for discovering protein-protein interactions, gene interaction networks, and also automatic discovery keywords to be associated with genetic conditions.
More elaborate techniques have also been suggested for learning about the interactions. By simple text analysis, you can deduce with fair (but not perfect) certainty if a gene is up or down regulating another gene. Other systems try to find support for hypothesis on interaction networks by doing pubgene-similar analysis. If your experiments support many tentative networks, you can let the vast amounts of knowledge in the published literature dismiss the bad suggestions.
The need for systems like this is huge. More articles than ever are being published, and there is no way a researcher can keep up with the information flow. New technology also admits large scale genome-wide experiments that generates enormous amounts of data. Such data needs to be analysed automatically, and if we can tie in the published knowledge, the value of the data increases.
If you are interested in systems like these, look up the works of Andrade, Valencia, Bork, Ouzounis, and their collaborators!
Lars
__
Reality or nothing.
Uh, yeah, but why not go directly to PubMed? OK, you don't get relative co-occurrence and those nice little charts, but on the other hand, I cannot come up with a research question where you want the relative co-occurence.
Lars
__
Reality or nothing.
Sure, a drug interactions database would be useful. No question. Relevance to the topic? Hmmm, a tangent at best. OK. Whatever.
But what got my goat was the claim that "more than 100,000 deaths per year are caused by adverse drug reactions" and yet "By contrast, deaths due to traditional herbal remedies are so rare they're hard to find."
This is such blindingly bad use of statistics that I have to howl. It isn't so much like comparing apples with oranges, as like comparing apples with trilobites. Consider the populations: why are people taking traditional herbal medicines? For colds, indigestion, general malaise. Not for heart disease, strokes, cancer or anything life threatening. People at risk of death are a lot more likely to risk dangerous combinations of drugs. Well, derrr.
No matter how cynical you become, it's never enough to keep up.
If this is actually READING articles and then making insights about their content, this could be a revolutionary search tool for any field! These guys should contact Lexis-Nexis or some other fact finding service.
Finding information is a hell of a skill - I know that a lot of my time as a grad student has been spent on literature reviews.
+++ ATH0 +++
By contrast, deaths due to traditional herbal remedies are so rare they're hard to find. I'm not dismissing modern medicine entirely - far from it - I'm just pointing out some disturbing facts.
So why are gene interactions so hot, yet medicine interactions so neglected in research? And why, for that matter do so few people know that they could substantially reduce their risk of heart disease and cancer by going vegetarian or vegan? Surely the governments of the world should be funding research and education on these two topics on a massive scale - it could save thousands upon thousands of lives - and even from a callous economic point of view, the savings in terms of medicaid and lost economic productivity due to ill-health would be huge! In fact, official guidelines still endorse a meat-based diet despite the well-known health risks, and there is NO serious attempt to co-ordinate drug safety information between regulatory bodies internationally. That's right, none - regulatory bodies in the UK often ignore bans in the US, and vice-versa. What's more, the support for even collating data on side effects of medicines at a government level is poor - particularly in the UK.
The reason is the same in both cases, and it's very simple. Profit. Profit for the drugs companies, to be precise. Pharamaceutical corps profit from ill-health, and they don't exactly relish the idea of their drugs getting banned or contraindicated for safety reasons, either. Campaign funds, and the revolving door between the FDA and the drugs/biotech industries helps keep the government in line. For more info see http://www.drrath.com/
Female Prison Rape in NY
Bill Gates suggested something similar in this very lame book (after The Road Ahead, I really expected better), of course, he figured people would be using Excel pivot tables.
How we know is more important than what we know.
Yeah, a nice way to improve it would be to do something like a genscan search, where you take the raw protein sequence (or gene sequence) and look for conserved portions that are known to interact with other conserved portions (kinda like the zinc finger motif and such). Then go through the literature like this project has done, or a new kind of database of genechip-type data, showing which genes are expressed together, and correlate the data together. Then go from there with things like Chromatin Immunoprecipitation (ChIP) to find out what's really interacting. The future's looking good for us biologists! :-)
"I may not have morals, but I have standards."
"I may not have morals, but I have standards."
Medicine Interaction related deaths are caused by either incompetent doctors, patients not disclosing medications fully, or foolish patients not reading warning labels. It's basically idiocy and carelessnes that causes this. When taking medication, remember that you are putting a chemical into your body, and we have a pretty good idea as to how they interact with other ones clinically, if not molecularly (experimenting with that's probably unethical anyhow). Medical interaction research isn't neglected, but is a standard and critical part of getting a drug approved for use. If you want to know more, go to www.fda.gov and look at their massive database on drugs. They've got info on interactions aplenty, particularly their medwatch database.
And as for why gene interaction is so hot, is that it's the real key to a lot problems. You thought that the human genome was it? No no no... that was only the beginning... it was the map for gene interactions. The genes are worthless if we can't figure out what they do and how they interact. I mean, we can't even tell you how an E. coli works even though we've got the genome. There will be a lot of profit out of finding protein interactions, sure, but it'll be to find cures. I work in a lab that's trying to figure out gene therapy in prostate cancer. We need to know the genetic mechanisms for therapy to be effective. Or don't you want cancer cured?
"I may not have morals, but I have standards."
"I may not have morals, but I have standards."
Hehe, not to diminish what you did (very cool project that I thought about writing myself last year) but just that we're heading in to major league waters here, and it's pretty exciting. Punnet squares are an important part of genetics because of inheritance, but the stuff now is all gene expression and interaction. It's pretty terrifying, because that's where the real work is all going to be, but it's also incredibly exciting, because bio is going to be the science of this century.
:-) If you're at a University that's paying online fees, you can read journal articles that they link to from University IP's as well.
If you're interested in slightly higher level concepts, I just found this website at my college's webserver (it's a class I had to take, intro to Molecular Bio) and it looks like it's got some good info through the flash animations. If you want the hardcore stuff, go to the NCBI site where you can browse the genome, search for proteins and genes, and do all the stuff real biologists do
"I may not have morals, but I have standards."
"I may not have morals, but I have standards."
No offense, but this is really really different than an 8x8 punnet square (which isn't really that bad, I've done dozens of 'em by hand). This is hardcore datamining the scientific literature, involving lots and lots of parsing keywords and finding gene interactions. What I'd like to see is a tool to do this with raw gene and protein sequences (thinking... possible cool project for me!) Then a tool to combine those would be sweet! Mmmm... genes....
"I may not have morals, but I have standards."
"I may not have morals, but I have standards."
The article says that the algorith assumes that two genes interact when both are mentioned in the same paper. Imagine that the paper actually shows "gene 1 and gene 2 do NOT interact". Nevertheless this new algorith perpetuates and extends the mistaken idea that they do.
Some leap forward: "Information in, Error out"!
to err is human, to forgive is divine, to forget is... umm...
Naw, I got my GED after my junior year of high school. Now I'm just the average working stiff. :-P
I thought about college, but after high school, and with what I've heard about how colleges treat undergrads (required to live on the dorms with crappy Internet access, kicked out if you post Bad Things, no privacy, disinterested professors and dumb students), I have no desire to pay ridiculous amounts of money for college when I'd rather be learning.
It annoys me that it seems to be impossible to do anything between having an extremely casual interest in something and making it your whole career. You can't just go take classes that interest you, because they have prerequisites, and general education requirements, and all sorts of hassle. If I wanted to actually do anything related to genetics, for example, I'd have to spend at least 4 years in school studying it, and then get a low-level job at some place, and then decide that I'm not that interested in it after all, and what then?
(As an aside, why is it that the simplest things are always overlooked by beginner's resources? Why, for example, don't they introduce all the basic terminology and notation for a topic as soon as the topic appears? I hate having to refer to a portion of the thing I'm working on as "that thingy over there", especially if I'm asking for help. I've seen this in computer science, physics, chemistry, and biology books. They don't even have a "notation" section in the back, or if they do, it's next to useless. (And this problem may be more limited to high school, but when I would ask the teachers, they would actually tell me "don't worry about that". Or they wouldn't know.))
If you know of any entry-level resources for learning various sciences, I'd be most interested. I'll be sure to check out those sites if I'm ever at a computer with Flash, and I may play around with making a Punnetizer Deluxe or something :)
--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
And no, the punnet square really isn't hard as such, but I was pretty sure the teacher's motive was to catch as many students in fatigue or misalignment errors as possible. Also keep in mind the fact that >60% of the students were still confused by phenotypes.
Regardless, writing code to do it for me transformed the assignment from painful drudgery into a fascinating exercise. I was especially proud of realizing — on the way to gym class, no less — that it could all be represented as bitmasks. (I think this was when I first truly grokked the power of C.)
Genetics was really the only thing that captured my interest in biology; sadly, the class didn't linger long on that topic. I'm still interested in it, all from an amateur perspective, of course. If I get time I think I'd like to make some new software that does multiple generations and traits requiring more than one gene.
--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
Thus was The Punnetizer born. Once I had the basic functionality working, I went hog-wild with output formats. So you can have your Punnet squares in ASCII text, HTML, LaTeX, and CSV. What was really fun was running it on a StarFire with 2GB of RAM with the maximum number of traits. The output HTML was something like 347MB. :P
Anyway, that was one of the few times we impressed Cowell. He actually volunteered to give us extra credit. Of course, he graded our next assignment extra tough, but oh well. :P
--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
If you were writing an article about a gene that regulates insulin production, you probably wouldn't be mentioning a gene that produces monoamine oxidase. In fact, the program relies on the fact that there will be some relationship between the genes. Otherwise, it's all random.
I'd say you lose, but as you posted anonymously, that's a given.If you can't beat them, embrace and extend them.
--
--
You sure got a purty mouth...
>>>Enkephalonetics
>>Which is?
>using your head
Or, more generally, using anyone's head.
Computers are cute toys, but we've already seen wetware being used to mediate the control of mechanisms. If we can use it to mediate information processing, computers will be relegated to the status of diagnostic tool, low-end user interface, and arithmetic calculator.
Cybernetics is machines that think.
Encephalonetics will be brains used as machines.
The spelling with the k's (kibernetics, enkephalonetics) is just how you say it if you're actually ancient Greek.
--Blair
What we need now is a way to re-animate Norbert Wiener.
We can tell him, "you know back when you said that machines could do the thinking? Called it 'kibernetics'? Well, it turns out we couldn't do that, so we've adapted humans to do the thinking and we feed it into machines so they can digest it seven times better than fishing around the Science Citation Index. It's only half as good as experimentation but at a micro-fraction of the cost..."
I think at that point he'd understand the human mind and we could get onto Enkephalonetics, which is where this little electromechanical distraction is really leading us.
--Blair
It's like Google for genes!
don't be dumb... you have to print todays papper yesterday and next weeks magizine last week otherwise people realize that today's paper covers yesterday and this weeks magizene is all about the trends going on last month just by looking at the date. With the current system they only catch on to the information lag time if they are otherwise sentient and informed....
er... no offense... 8-)
--
Rob White,
Cv - Cv = 0 Therefore there is an absolute frame of reference.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
Interesting technique, but it depends in large part on the use of a standard nomenclature. If a protein is known as "p89" in one article, and as "acetylcholinesterase II" in another article, a link cannot be established so easily.
This technigue, morover, appears only to collate published interactions-- helpful, perhaps, in guiding the conduct of basic research, and the avoidence of duplicate studies-- but less useful when the goal of a researcher is determining the function of unknown genes, or putative protein products. In those cases, protein fold databases or motif databases are much more useful.
Can one give Pubgene a pdb or fasta file-- and find papers on homolougous genes or structurally similar proteins-- or must one use BLAST, or a fold recognition algoritm prior to searching Pubgene?
This is interesting. I used to work at a place trying to do "meaning based search" in the medical field. They were working on among other things ontology based search and a search for protein-gene relationships.
/labbook and a host of others are working on this stuff to sell to pharama companys to do better search and allow quicker more accurate drug creation.
There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.
3rdmill and spotfire
There is a lot of computing power in the life sciences field,and a lot of data created with gene-clips and assay data. People can't sort it all out anymore some computer analysis makes everything faster. Look at the human genome.
It's good that this is done in the open. I would be uncomfortable with genetic engineering being an open source discipline (ie. downloading a gene set and coding your own modifications to it then compiling it into a living creature), but I'd much rather have the knowledge out in the open than locked up for who knows what to happen to it.
Even Slashdot wants to hide some things
Genome@home
First off, yeah... that reaction pretty much sums it up. "Wow."
But I find it interesting that their method was so simple. It didn't involve any real complicated methods... basically a glorified text scanner. Yet, it was able to predict some new interactions that hadn't existed before. Still, it was only 7 times better than random guessing... I wonder if that could be improved any?
Humorless sig goes here.