Human Genome More Like a Functional Network
bshell writes "An article in science blog says we may have to rethink how genes work. So called "junk DNA" actually appears to be functional. What's more it works in a mysterious way involving multiple overlaps that seems to be connected in some sort of network." From the article:
"The ENCODE consortium's major findings include the discovery that the majority of DNA in the human genome is transcribed into functional molecules, called RNA, and that these transcripts extensively overlap one another. This broad pattern of transcription challenges the long-standing view that the human genome consists of a relatively small set of discrete genes, along with a vast amount of so-called junk DNA that is not biologically active.
The new data indicates the genome contains very little unused sequences and, in fact, is a complex, interwoven network. In this network, genes are just one of many types of DNA sequences that have a functional impact. "Our perspective of transcription and genes may have to evolve," the researchers state in their Nature paper, noting the network model of the genome "poses some interesting mechanistic questions" that have yet to be answered."
Its what we in the programming field would call the Data Segment.
It's somewhat funny - I remember having this exact discussion with my genetics professor. I was a chem major who is now a developer.
... but perhaps I've just looked at too much dissassembler. I will feel a little vinticated if this is proven.
It seems to me that DNA/RNA is "machine code" and data which runs on the laws of nature. It's a layer removed from silicon design, more akin to a self-modifying FPGA.
In other words we're so far only looked at the boot code and associated data. The "program" is what we were calling junk.
And it makes sense - if you think of the program as a massive recursion network which builds common parts (stem cells) and then organizes and specializes.
I know that's a simple bastardization
I said no... but I missed and it came out yes.
After assembling something, if there are any parts left over I simply declare them to be extra junk. With scientists declaring the same thing about DNA they can't identify, I guess the old saw is true, great minds do think alike.
I've always suspected that "junk DNA" was the key to micro-evolution and speciation. I read an article once about how bacteria that could not metabolize lactose were cultured in a lactose-rich liquid. After about 60 generations, some bacteria that could metabolize lactose appeared. It turns out, they had non-functional genes for metabolizing lactose in their junk DNA, and somehow those genes were re-activated.
He who lights his taper at mine, receives light without darkening me.
Why it was called junk before you'd ask? Because our definition of what is useful wasnt all that accurate.. just looking at so called open reading frames and declaring everything else to be junk does not work. There is also the problem with insertions in a gene sequence that are either not or alternatively used. There are plenty of sequences that are never translated (no proteins are made of it) BUT without them we would be missing a big chunk of regulators etc. 'Recent' findings like ribozymes, IRES elemtens, attenuation elements etc. are all not translated into a protein yet serve a very specific function. Some of this 'junk' also serves as a insulator / separator between various sequences. We may never be able to map every nucleotide to some function but declaring it junk from the get go was just looking to be proven wrong. Just look up NCBI and look for some good reviews on this topic ;)
Whenever I read something like this, I get a reminder how poor is biologists' comprehension of Computer Science, Information Theory, and languages.
Whenever I read a post like this, I get a reminder how poor is most techies' comprehension of biology, and more specifically, what biologists do.
Third, why this obsession with zeroing in on a magic gene that causes X? Do they think the language of DNA is context free? Defects could indeed be expected to have no context, but for the rest-- which genes determine a person's blood type? Eye color? Skin color? Going about that task by trying to find the magic gene for something like that is like a person who never learned to read trying to figure out the plot of a book by trying to recognize patterns of letters.
Okay, why do we care? Because finding the genes (note my use of the plural there) that influence certain traits is the first step toward understanding the overall processes that create them. Obviously this is most critical in the area of genetic disease, although it's interesting for everything else too. We've known for decades that most traits, including diseases, aren't controlled by a single "magic gene." What statistical geneticists try to do is find locations on the genome which have a strong relationship to the trait of interest. And we know perfectly well that there will be a whole bunch of these locations for most traits, and that some of them may represent genes and some may represent something else. The purpose is basically to give the wet-lab biologists something to zero in on.
Second, two of the examples you chose -- blood type and eye color -- are really terrible ones for your argument, because genetically speaking they're very simple traits (two or three loci each, IIRC) and, at least in the case of blood type, we know exactly where they are in the genome. Eye color I'm not sure about, and skin color is a little more complicated, but not a whole lot more so.
Please do not confuse the pop-sci "scientists seek gene for X" writeups with what really goes on in the world of genetic research. It has exactly as much to do with real science as TV portrayals of hackers have to do with real computing.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Why would we evolve to lose the appendix? Evolution doesn't work that way. It's not causing people to die, so it's gonna stay there. The only way evolution would get rid of it is if people mutated to have no appendix and they were somehow better able to reproduce. Human society being how it is, there isn't much that's gonna make you unable to reproduce. That's probably part of the reason we have so many genetic diseases now - they can be treated, so they don't kill you, so they get passed on.
Being one of the 0.1% of /.ers that believe God created mankind, (and that we have been in slow genetic decline ever since),
I thought when this 'Junk DNA' was mentioned many years ago that given time, that opinion will be reversed.
Thus there was an advantage to ID biologist who would have the opinion, 'cells are an incredible biological computer with beautiful design, this is great fun reverse engineering it all, and there won't be Junk DNA because that goes against God creating life, so lets keep looking for its purpose'
flame away
46137
As a Grad student in CS whos research is in computational genetics, I get to attend some really "fun" seminars that hang on the fence between Genetics and CS.
The problem with pure CS profs is that they all want to abstract the problem as to have a nice mathematical definition. Well if we could do that properly, the problem would solve itself. It's funny, since it is mostly the AI profs who want to get into computational genetics.
You're right, the geneticists probably don't have a solid understanding of the underlying mathematics (statistics) or CS (algorithms/random processes), but their intuition is invaluable in most cases.
Oh, by the way, before you get on your high horse (as I am on mine), you should check out the human epigenomics project (contrasted to the HapMap project) and you would realize that they don't view the system as static.
"junk DNA" reminds me of the mysterious "dark matter", or "god" or whatever words we use to name something we know nothing about and don't understand, to give them some sort of magical status. It would probably be better to call it "unknown DNA", or "DNA Incognita", or even why not "Here be Dragons", to better remind us of how ancient maps were conceived (answer : it took ages to "publicly" discover all continents and isles).
One thing I'm sure is that Nature doesn't waste resources, only Humans do, so each yet unknown thing has certainely a very good reason to be there.
Votez ecolo : Chiez dans l'urne !
crashfrog, you may have to correct me, but here's a start...
/bin/perl and that's how it knows to start copying off DNA code. (While on the subject, just because it has #! /bin/perl doesn't mean it will get executed, and even after it's been executed it might get a SIGKILL.) Promoters are not just found in DNA, but read on wikipedia for more on that.
There's really almost no selection pressure against extra DNA sequences,
This refers to the process in evolution where an organism fails to reproduce due to having a disadvantage that the other critters in the species don't have. So if a pig that has useless DNA sequences tacked on in its genome has a statistically lower chance of having piglets, there's pressure against those useless DNA sequences.
crashfrog is saying that for a reason he explains (below) extra DNA isn't going to have any effect on the organism's chances of reproducing.
particularly ones with no associated promoter.
A promoter is a marker in the DNA strand. The protein "machine" (a transcription factor) that gets the "data" off the DNA and into the cell's outside chemistry has a "socket" that matches the "plug" formed by the specific pairs of the "promoter" marker. It's like the transcription factor searches for #!
One of the proofs of this is the fact that the human genome is comprised more of endogenous retroviruses than actual functional sequences.
I'm not sure if I can do this last sentence piece by piece, so here goes...
An endogeneous retrovirus is a kind of virus that infects DNA. So when the cell splits, the virus gets copied along with it. For instance, some scientists think Multiple Sclerosis is one of these retroviruses that has infected our DNA. So when we look at the entire human genome, all the pairs in the whole DNA sequence, and we look at where all the promoters are, it seems (according to current theory -- we may learn more about this!) at a first glance there are some pretty long stretches with no promoters. That is to say, they are either empty sectors on the disk, or some of them look like retrovirus DNA code.
How'd I do at explaining that? Like I said, crashfrog should probably amend my explanation...
Be careful here--you might just show your own ignorance. "Biologists" is a very broad term that covers a vast array of topics. Sure, ecology might not require much knowledge of computers and information theory, but such things are required reading for fields like molecular biology or modern genetics.
Not necessarily. Sure, that may be the case for single-celled organisms that rapidly reproduce, whose selective forces dictate sheer metabolic efficiency, but for multi-cellular organisms, like mammals, there's good reasons to believe that that simply isn't true.
Evolution isn't like a programmer. It isn't some transcendental force guiding a species to some aesthetically "perfect" design. The result of natural selection frequently isn't the "best" solution but rather whatever happens to work. In fact, many times adaptations based upon the selective pressures of the present are, in time, ultimately maladaptive for the species. A classic example of this is the trait for the disease sickle-cell anemia in humans which originally served to offer slight resistance to malaria but otherwise causes health problems and even death.
A more efficient genome doesn't necessarily mean greater fitness. Consider the following example. For a large multi-cellular organism, which do you think has more reproductive/survival significance: (1) a mutation that deletes a few bases of non-coding DNA OR (2) a mutation that brightens a metabolically-wasteful, colorful marking that attracts mates?
OR that they are mostly random. The current model of DNA/genetics states that most of the DNA in the human genome is non-coding, not (significantly) subject to evolution. As such, it gets shuffled around (i.e. randomized) during cross-over events and mutations. That being the case, one wouldn't expect it to be very redundant or compress very well.
In short, because that's what's easiest. A holistic approach to genomics research like you're describing is not currently technologically, academically or economically feasible for a myriad of reasons. The science just is not there yet.
As an aside, I suspect we'll start to see a more integrated approach to genomics once the relatively low-hanging fruit of the one-gene --> one-protein research lines are throughly covered. However, I wouldn't expect such things to happen in our lifetimes given the difficulty of that aforementioned task and the sheer profitability of more conventional approaches. But what do I know? I'm "just a biologist." =P
-Grym
Whenever I read something like this, I get a reminder how poor is biologists' comprehension of Computer Science, Information Theory, and languages. So, 90% of genes aren't "junk" after all. To anyone who does know something about the aforementioned topics, duh!
If they hadn't suspected it, multiple groups around the world wouldn't have worked on this thing for such a long time. It's one thing to have a theory, another to prove it, despite what creationists may sayFirst, evolution would weed that sort of thing out in a hurry. Two organisms with genes that achieve the exact same thing, but one has a more efficient encoding? No contest!
Actually, generally no and genome sizes can very a lot. There are a great many things that can complicate this. But you do see effects like this in cases like viruses that have limited space to pack DNA in the virus capsid. Not only do these viruses not have junk DNA, but even use some compression like techniques.
Second, ever tried compressing a DNA sequence? They don't compress very well! Meaning, they don't have much redundancy.
I think you are thinking of the coding regions. Redundancy is a notable feature of many non-coding regions.Third, why this obsession with zeroing in on a magic gene that causes X? Do they think the language of DNA is context free? Defects could indeed be expected to have no context, but for the rest-- which genes determine a person's blood type? Eye color? Skin color? Going about that task by trying to find the magic gene for something like that is like a person who never learned to read trying to figure out the plot of a book by trying to recognize patterns of letters.
I think you've chosen very poor examples to illustrate you're point. Those are all features controlled by a very small number of genes or a single gene. In other context though, this could be an important way of thinking. For example, cell machinery matters too. Kinda like software vs hardware.To match your analogy, if you can't read you have no hope of understanding the plot. First you have to figure out how to read. You might be able to figure out words from patterns of letters though. You have to start somewhere.
The magic gene thing is a matter of hoping for a solution that is actually simple and viable. If it's one gene, a single drug has a good chance of working. There are many diseases that actually work this way, so why wouldn't you look for a simple answer first. Things that involve lots of genes, like cancers, haven't had much success.
Yes, and yes.
Anyway, it made sense to focus on the almost-understood parts first since the mapping techniques were very limited (but far more efficient each year) and the task so massively huge it would have been stupid not to limit the first steps to a better understanding of the most easy purpose of the DNA, which is protein encoding.
Fully understand the DNA will take decades, if not centuries, and maybe someday scientists could be sure some parts of the DNA are actually useless, but that "90% junk" looks like that thing about the neurons maybe not being the only kind of cells participating in the intelligence.
Just remember that scientist are human, they are trying hard to understand the unknown, but that doesn't prevent them to make mistakes or false assumptions, quite the contrary.
Only to idiots, are orders laws.
-- Henning von Tresckow
And the wikipedia Teleological Argument article links to Argument from poor design, which gives examples of poor design.
One of which is "Portions of DNA -- termed "junk" DNA -- that do not appear to serve any purpose."
Great Windows SFTP Server!
Machine simulation of genetic/evolutionary algorithms often produces so-called "junk" which when analysed further, this frequently proves to be tied to the function of the overall organism in mysterious ways. I'm sure that leading GA researcher John Koza made this observation in early papers, but it's something that anyone playing with genetic algorithms will encounter sooner or later.
I couldn't find the quote I was looking for, but only this broad statement from Genetic Programming: Biologically Inspired Computation that Creatively Solves Non-Trivial Problems, Koza (1998):
you had me at #!
The result of natural selection frequently isn't the "best" solution but rather whatever happens to work.
Exactly, and it's a popular misconception that evolution is always about the "best" and anything that is 1% "better" is going to dominate. Which simply isn't true, or our appendix would have vanished long ago. The fact is that appendicitis isn't enough of a problem to select against it strongly. The appendix just doesn't help, so the genes to maintain it aren't selected for either, resulting in the slowly fading vestigal organ.
To emphasize this fact, I like to describe natural selection not as "survival of the fittest" but rather "survival of the sufficiently fit".
The enemies of Democracy are