International Challenge To Computationally Interpret Protein Function
Shipud writes "We live in the post-genomic era, when DNA sequence data is growing exponentially. However, for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered. The Critical Assessment of Function Annotation, or CAFA, is a new experiment to assess the performance of the multitude of computational methods developed by research groups worldwide and help channel the flood of data from genome research to deduce the function of proteins. Thirty research groups participated in the first CAFA, presenting a total of 54 algorithms. The researchers participated in blind-test experiments in which they predicted the function of protein sequences for which the functions are already known but haven't yet been made publicly available. Independent assessors then judged their performance. The challenge organizers explain that: 'The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available. The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.'"
Here I was, hoping for another Folding@Home.
Your similes are magnificent. They are like eggs on a pancake, butternut waiting to be waffle-ironed.
It's about time we start focusing on the future and on what we know will work. Understanding how matter organizes itself into life is one the biggest challenges ahead. I propose that we understand how life works, how life works and how to extend it before this decade is out.
Without a good plan, we'll be at it for decades. Here's what I think genomic researchers should do.
Genes (and proteins) are obviously organized hierarchically. Which means there must be a control hierarchy in there somewhere. To unravel and properly classify the genome, researchers must first identify and understand the hierarchical control system. Only then can they begin to populate the branches with the correct genes.
After the tree is completely built and all the genes have found their correct locations on the tree, then it's a matter of going through the tree from the top down and switching the branches of the tree off/on one at a time to see what happens. It's hard but it can be done.
Unfortunately there doesn't have to be "a" control hierarchy: each subsystem can have its own hierarchy (or none) that uses its own unique control mechanisms, they don't have to operate by the same rules, they can mess with each other by lots of different ad hoc means. And that's just the genes: the proteins are much harder to model, at least as far as useful predictions go.
It's been ad hoc with no code review for over 3 billion years.
Stunning. Absolutely astounding. Yet another AC has taken Science by the balls and shaken the Universe to it's core. Dizzying intellect, artistic prose. He's probably six feet tall, blonde and with the chiseled features of a Grecian statue.
Oh. Wait.
Faster! Faster! Faster would be better!
This is the dumbest thing, not related to football, that I have read all day. "obviously" hierarchical? That's utterly idiotic. And I mean utterly, betraying a complete lack of any experience with metabolic processes. Many, perhaps even most, protiens do many things in many circumstances, and have dynamic equilibria within more than one metabolic chain, as do many of the small molecules which are produced.
Fugue for Aaron Swartz
"It's been ad hoc with no code review for over 3 billion years." This again, is immensely stupid. First, natural selection is constantly weeding out undesirable variations, and second the genome is highly tectonic, constantly removing or altering pathways. It's not teleological, but DNA is the coding mechanism precisely because it is not a passive storage medium.
Fugue for Aaron Swartz
So much sarcasm, and yet you can't even tell it's from its...
Don't discount that as stupid. Most of what he said is true. Evolution makes you write code that works, not good or clean code, just code that works. The only time evolution comes into lay is when the code can't even compile.
http://tinyurl.com/42geekcode
Nature doesn't design out of Knuth, and it is a big mistake to act or think like we will find nice analogs of human type design.
Fugue for Aaron Swartz
AC ==?
Genes (and proteins) are obviously organized hierarchically. Which means there must be a control hierarchy in there somewhere.
Obvious nonesense, if not then point to the "control hierarchy" in an ant colony (no, the Queen ant does issue orders to the soldiers and workers).
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
He/she didn't say the output wasn't reviewed. It is obviously held to a rather stringent standard. They said the *code* was ad hoc and not reviewed for 3 billion years. As someone who works with said code on a daily basis, I thought that summed it up rather nicely. It isn't judged for readability. It's judged for getting the job done under a fair degree of urgency. Imagine a shop coding in PERL under a continuous deadline of now with no commenting.
Don't discount that as stupid. Most of what he said is true. Evolution makes you write code that works, not good or clean code, just code that works. The only time evolution comes into lay is when the code can't even compile.
Indeed there's even some selective pressure for code obfuscation. Viruses take advantage of compression for example. New functions usually evolve from faulty events in old genes. There's no pressure to remove accidental calls to the wrong subroutine if they don't matter, hence a lot of messages go to the wrong place as well as the right place. Even in higher animals you see this (dog's legs that scratch themselves when you scratch their ribs) is probably some back propagation on the nerve network that was not necessary to remove for proper operation of the dog.
Some drink at the fountain of knowledge. Others just gargle.
I am not a biologist so forgive me my ignorance but when people say that DNA is the blueprint for an organism I never understand how a bunch of proteins can determine an organism's shape and behavior. Aren't there more factors that determine those things, like the surroundings in which the DNA is used, like chemicals that the growing organism is surrounded with, temperature, etc?
-- Cheers!
That is all nice, but most of these prediction algorithms are based on one or more of the following assumptions, which are not always true:
So any prediction should be taken with a grain of salt and experimentally verified, which brings us back to " ... with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available ...."
Do these algorithms autonomously file for patents on their findings and issue legal threats to competing algorithms as well?
Computational power can scale infinitely, and scales geometrically with time.
No, it can't. There are fundamental limits to information storage and computation. Those limits are a lot better than we can achieve, but they exist.
Or make the problem more efficient.
A better algorithm always works. It's worth noting here that at worst, one can just make the protein physically and see what happens in real time. So it can't be that hard computationally.
I just realized that the word assassin has ass in there twice. Thank you /. for limiting how much I can write in the title.
"There are fundamental limits to information storage and computation."
Do those limits rely on assumptions about dimensions and time? Might dark energy and/or dark matter change some of those assumptions and thus make limits that feel so fundamental now evaporate?
640k ought to be enough for anybody.
No, it can't. There are fundamental limits to information storage and computation. Those limits are a lot better than we can achieve, but they exist.
What are these fundamental limits?
So what we need now is some kind of all seeing/knowing entity to sort it out? ;p
...a post-genomic world be one in which we had stopped fiddeling with genes and DNA and such?
Aren't we more in the midst of a Genomics Revolution?
Or more accurately, we are in the infancy of the Genomics Revolution.
THINK! It's patriotic
This TED.com talk by Danny Hillis is informative on this topic, http://www.ted.com/talks/danny_hillis_two_frontiers_of_cancer_treatment.html "Danny Hills makes a case for the next frontier of cancer research: proteomics, the study of proteins in the body. As Hillis explains it, genomics shows us a list of the ingredients of the body -- while proteomics shows us what those ingredients produce. Understanding what's going on in your body at the protein level may lead to a new understanding of how cancer happens."
Do those limits rely on assumptions about dimensions and time?
No. They derive from the second law of thermodynamics, which assumes very little...
Do those limits rely on assumptions about dimensions and time?
Yes. But these are assumptions borne out by our observations of our reality.
Might dark energy and/or dark matter change some of those assumptions and thus make limits that feel so fundamental now evaporate?
No. Dark energy is somewhat relevant in that an expanding universe does have an easier time of dissipating heat and a higher theoretical limit on information that can be packed into a cosmologically large space-time ball of given radius (the surface area (which is proportional to the maximum information a space can contain) of the ball becomes an exponential function of the radius rather than a fixed power). Against this, you have the problem of greatly reducing the number of states of the universe to which you can access and change.
So as I understand it, with dark energy, you can cram more information into a given space, but you have less information to cram.
Dark matter has little bearing except being something which can occupy the same space as your computational system and hence, reduce the maximum theoretical information density before a black hole is formed (since that bit of space has both information and dark matter in it).
So what you're saying is http://xkcd.com/224/
Second law of thermodynamics is statistical (Fluctuation Theorem). Can we exploit statistics to find ways to violate the Second Law consistently enough to expand the current "fundamental" limits?
So, at the very least, our "fundamental limits" might not hold, if our observations about reality turn out to be like the flatlanders', and there are really more dimensions than we can sense?