I'm not really a fan of mass environmental sequencing, because quite frankly it's downright cavalier. There's definitely this misguided undercurrent in bioinformatics (of which Craig Venter seems to be the somewhat unwitting originator) wherein the "grab it all at random, then just keep grabbing until you probably have it all" mentality gets applied to everything. I remember being taught in my fourth-year Computational Biology course by a very well-meaning medical informatician that committing "post hoc ergo propter hoc" was what it meant to perform boolean gene network analysis. Of course, I could tell that this was mostly miscommunication and not really what she intended to say (although seeing her stumble though Word a week later made me somewhat concerned) but I'm pretty sure my classmates (themselves mostly more interested in medical imaging than biology) swallowed it hook, line and sinker.
Collecting data that you don't have enough dimensions to resolve is always a stupid idea, unless you want to answer a very specific question through it (or no question whatsoever) and are certain this will be no problem. Environmental sequencing certainly accomplished the goal of appreciating diversity, as you put it, but I think if someone was claiming that it could help with making sense of what's actually going on, they're probably not very logically rigorous people. I think the best use for the technique would be akin to cDNA library generation, except with species instead of mRNAs, and unique species identification tags instead of ESTs. Now all we need is a really, really big microarray...
Nah, it's less exciting than that. We're pretty sure that most of the non-functional DNA has no purpose other than functioning as spacer material. When a chromosome is functioning normally in a cell, it forms a roughly spherical shape comprised of many loops that go out toward the edge and then back in toward the middle. The outermost parts of these loops are the starting positions of very important genes, which makes them more accessible to the proteins that are supposed to make use of them. The non-functional DNA provides enough flex room to let the chromosome get bent like this. The bends themselves are accomplished by proteins called histones. One of the ways in which the cell can effect gene regulation is by changing how far out a particular gene's promoter region is.
For this DNA, the important thing about it is that we know the sequence does not matter (or, at least, matters very little.) The repetitive elements that comprise this DNA may provide a gripping point (like handles in a rock climbing gym) but they're not really important themselves. Since humans are so complex and we're so good at finding and storing food, the cell has every motivation to find little tricks like this to streamline these complex processes. Bacteria, by contrast, have absolutely no ability to support the stress caused by chromatin remodelling, and the gaps between genes are typically less than a thousand nucleotides.
That being said, on occasion some of the dead and inactive DNA rises to the challenge and becomes useful. One of the most prodigious components of the non-functional human genome are the corpses of retroviruses that integrated their payloads with us and then became inactive due to mutation. We've co-opted these genes on a few occasions, including once to anchor the placenta to the uterine lining during pregnancy, an adaptation that allows higher mammals to support much larger fetuses. (For comparison, mice lack this.) The cervical plug that forms during pregnancy to protect the fetus from the exterior environment is also comprised predominantly of malformed and misshapen bits of random viral capsid (shell) proteins.
A lot of people assume the genome has to be this nice, neat, clear-cut thing, and get indignant at the shortsightedness of scientists who seem to be arrogant about what is important and what isn't. The truth is that we know a lot more now, and the genome is really more like a giant vibrating box of LEGO bricks that sometimes assembles random bits of useful stuff out of itself.
Well, the trick to getting a fat cancer research grant (FCRG) is that you have to be able to contort your pet project into something that somehow conveniently plays into the hands of human cancer research. Photosynthesis might be a bit too far off, but I'm sure you could get good bucks for studying root nodules or the effects of unengineered agrobacterium. Then again, one of the labs I worked in as an undergrad studied the neurodevelopment of C. elegans under a FCRG, so perhaps I'm being too stringent in my definition of contortability.
5% of the whole human genome is under evolutionary pressure (i.e., we know it does something.) 1% of it contains protein-coding genes, most of which are non-negotiable and critical to living. It's actually only about 0.1% of the total DNA that changes from person to person.
The reason I say exhaustive physical simulation is the silver bullet to biological problems is that, while it may be fantastically slow and inefficient, it will finally get us an honest and reliable model of what's really happening. We can then study that model, and determine what optimisations and simplifications we can reliably make.
Traditionally, when biologists try to simplify a system (such as just looking at enzyme binding graphs, an act that makes me nauseous because it's so abstract and useless) we do so with radical reductionist objectives (i.e., only looking at the interactions of couple of genes and their products) and a grossly incomplete picture (of the physics involved in those phenomena.) Further, most such studies are done with empirical measurements, which are really a game of broken telephone and fraught with error margins. It's taking the problem of determining which straw was responsible for breaking the camel's back—and putting it in a wind tunnel.
The modern bioinformatics perspective, of holistic collection followed by logically-backed reduction, avoids all these problems. I'm not saying we'll be simulating biological organisms with absolute fidelity forever, merely that we can't really appreciate the biology until we've seen it in its native environment.
The term "junk DNA" is now only used by shoddy science journalism. We're quite comfortable with how DNA and RNA do what they do. There's a mystery about what happens on the protein side, and the question about what functional bits of RNA (called microRNA) interact with what genes is sheerly a matter of ridiculously obtuse combinatorics. Say whatever else you will about them, fat cancer research budgets have taught us a lot about the essentials.
Yes, but it's fraught with errors, and making a single molecule longer than a few thousand bases costs a great deal. A bacterial genome is 0.4-3 million bases; humans are 3.1 billion in total. That's why only the Venter Institute has done it.
There are a few key puzzles we need to crack—protein dynamics, mostly—and then we'll have the ultimate acceleration method: we'll be able to simulate it all. Right now we're content with trying to sort out those challenges, and fixing the disorders we already know how to recognize. It'll keep us busy for a long time. In the interim we'll just play with increasingly clever tricks to sort out the patterns in the sequences, and catalogue lots of people so we have things to work with when the time comes.
You archive it until you need it. A situation-specific microarray might cost a hundred dollars; those tend to stack up with every hospital visit. With whole exome sequencing like this, you pay the fee once, and have all* the data medical science will ever need about you.
* Not counting repetitive elements, promoter regions, UTRs, spacer DNA, or the epigenome, all of which are known to describe at least a few diseases.
Pacbio promises a trillion unicorn farts per second. It's hard to take them seriously. As far as medical applications are concerned, the read length in the Ion Torrent system is ten times the size it needs to be, since most (known) diseases occur due to mutations in the very specific and non-repetitive exome, or in its close vicinity. No one (that I know of) has ever seriously proposed using this hardware for de novo sequencing of large eukaryotes, especially since the machine currently on offer doesn't have the well capacity to sequence the whole human genome!
That being said, there's always paired-end reads. I'm guessing the protocol for doing so with this system doesn't exist yet, but they tend to solve most of the repetitiveness problems for shorter read lengths.
About the read/write thing: synthesizing large amounts of DNA from scratch still costs ungodly amounts of money. Further, the ABI IonTorrent system being advertised here is a destructive read; you have to treat a blood sample with a large number of chemicals and then stuff it in a big machine. It's no Star Trek scanner.
You mean like observing it?:) Our most remote radio transmissions have only gotten as far as a few stars in our near vicinity, and they're statistically indistinguishable from background noise by the time they get there. If we are to blame, it's probably bad breath.
Actually, it's not only hard, but at present, downright impossible to make such predictions. However, being pessimistic about our effect on the planet in this regard isn't really a big deal, since we already know quite certainly that we affect the environment in some catastrophically bad ways; we are, for example, currently overbooking the planet by two and a half binary orders of magnitude (i.e., six times) its ability to sustain first-world lifestyles. Only once we've got some of these more apparent and easily rectified problems under control should we allow ourselves to stop being paranoid about our impact on the world around us and reassess—while it might seem like a good idea to make sure we know exactly what our target is before marching off, we unfortunately cannot sit around twiddling our thumbs forever while waiting to do so; environmental contamination has already driven countless exotic species to extinction (one of the most prominent but least photogenic examples being the Yangtzee river dolphin), and hesitation can really only make things worse.
Carl Sagan agreed with this; in Cosmos he pointed out that interstellar wars would be rare because the technology differences would generally amount to a matter of no contest. How do we know when we have technology strong enough, though, without a point of reference?
In addition to all the other responses, the prisoners have to be treated in hospitals. The disease can (and does) spread to the rest of the population from there.
Besides the obvious responses to such an inhumane and unenlightened question, prisoners end up treated in normal hospitals, through which the disease can spread to other patients. In fact, the documentary we watched in first year biology introducing the subject was predominantly about a woman who was affected through that exact vector.
For those interested in exactly how prevalent this sort of thing is, be aware that drug resistant TB is in almost every country in the world; it's just really bad in those particular three countries. This journal article from 2006 has maps showing the incidence rates per country.
I'm not really a fan of mass environmental sequencing, because quite frankly it's downright cavalier. There's definitely this misguided undercurrent in bioinformatics (of which Craig Venter seems to be the somewhat unwitting originator) wherein the "grab it all at random, then just keep grabbing until you probably have it all" mentality gets applied to everything. I remember being taught in my fourth-year Computational Biology course by a very well-meaning medical informatician that committing "post hoc ergo propter hoc" was what it meant to perform boolean gene network analysis. Of course, I could tell that this was mostly miscommunication and not really what she intended to say (although seeing her stumble though Word a week later made me somewhat concerned) but I'm pretty sure my classmates (themselves mostly more interested in medical imaging than biology) swallowed it hook, line and sinker.
Collecting data that you don't have enough dimensions to resolve is always a stupid idea, unless you want to answer a very specific question through it (or no question whatsoever) and are certain this will be no problem. Environmental sequencing certainly accomplished the goal of appreciating diversity, as you put it, but I think if someone was claiming that it could help with making sense of what's actually going on, they're probably not very logically rigorous people. I think the best use for the technique would be akin to cDNA library generation, except with species instead of mRNAs, and unique species identification tags instead of ESTs. Now all we need is a really, really big microarray...
Nah, it's less exciting than that. We're pretty sure that most of the non-functional DNA has no purpose other than functioning as spacer material. When a chromosome is functioning normally in a cell, it forms a roughly spherical shape comprised of many loops that go out toward the edge and then back in toward the middle. The outermost parts of these loops are the starting positions of very important genes, which makes them more accessible to the proteins that are supposed to make use of them. The non-functional DNA provides enough flex room to let the chromosome get bent like this. The bends themselves are accomplished by proteins called histones. One of the ways in which the cell can effect gene regulation is by changing how far out a particular gene's promoter region is.
For this DNA, the important thing about it is that we know the sequence does not matter (or, at least, matters very little.) The repetitive elements that comprise this DNA may provide a gripping point (like handles in a rock climbing gym) but they're not really important themselves. Since humans are so complex and we're so good at finding and storing food, the cell has every motivation to find little tricks like this to streamline these complex processes. Bacteria, by contrast, have absolutely no ability to support the stress caused by chromatin remodelling, and the gaps between genes are typically less than a thousand nucleotides.
That being said, on occasion some of the dead and inactive DNA rises to the challenge and becomes useful. One of the most prodigious components of the non-functional human genome are the corpses of retroviruses that integrated their payloads with us and then became inactive due to mutation. We've co-opted these genes on a few occasions, including once to anchor the placenta to the uterine lining during pregnancy, an adaptation that allows higher mammals to support much larger fetuses. (For comparison, mice lack this.) The cervical plug that forms during pregnancy to protect the fetus from the exterior environment is also comprised predominantly of malformed and misshapen bits of random viral capsid (shell) proteins.
A lot of people assume the genome has to be this nice, neat, clear-cut thing, and get indignant at the shortsightedness of scientists who seem to be arrogant about what is important and what isn't. The truth is that we know a lot more now, and the genome is really more like a giant vibrating box of LEGO bricks that sometimes assembles random bits of useful stuff out of itself.
Well, the trick to getting a fat cancer research grant (FCRG) is that you have to be able to contort your pet project into something that somehow conveniently plays into the hands of human cancer research. Photosynthesis might be a bit too far off, but I'm sure you could get good bucks for studying root nodules or the effects of unengineered agrobacterium. Then again, one of the labs I worked in as an undergrad studied the neurodevelopment of C. elegans under a FCRG, so perhaps I'm being too stringent in my definition of contortability.
5% of the whole human genome is under evolutionary pressure (i.e., we know it does something.) 1% of it contains protein-coding genes, most of which are non-negotiable and critical to living. It's actually only about 0.1% of the total DNA that changes from person to person.
The reason I say exhaustive physical simulation is the silver bullet to biological problems is that, while it may be fantastically slow and inefficient, it will finally get us an honest and reliable model of what's really happening. We can then study that model, and determine what optimisations and simplifications we can reliably make.
Traditionally, when biologists try to simplify a system (such as just looking at enzyme binding graphs, an act that makes me nauseous because it's so abstract and useless) we do so with radical reductionist objectives (i.e., only looking at the interactions of couple of genes and their products) and a grossly incomplete picture (of the physics involved in those phenomena.) Further, most such studies are done with empirical measurements, which are really a game of broken telephone and fraught with error margins. It's taking the problem of determining which straw was responsible for breaking the camel's back—and putting it in a wind tunnel.
The modern bioinformatics perspective, of holistic collection followed by logically-backed reduction, avoids all these problems. I'm not saying we'll be simulating biological organisms with absolute fidelity forever, merely that we can't really appreciate the biology until we've seen it in its native environment.
They're handy when you need to exaggerate something.
You may still fly to Mars! But best give up on Jupiter; your heart is crap.
The term "junk DNA" is now only used by shoddy science journalism. We're quite comfortable with how DNA and RNA do what they do. There's a mystery about what happens on the protein side, and the question about what functional bits of RNA (called microRNA) interact with what genes is sheerly a matter of ridiculously obtuse combinatorics. Say whatever else you will about them, fat cancer research budgets have taught us a lot about the essentials.
Yes, but it's fraught with errors, and making a single molecule longer than a few thousand bases costs a great deal. A bacterial genome is 0.4-3 million bases; humans are 3.1 billion in total. That's why only the Venter Institute has done it.
There are a few key puzzles we need to crack—protein dynamics, mostly—and then we'll have the ultimate acceleration method: we'll be able to simulate it all. Right now we're content with trying to sort out those challenges, and fixing the disorders we already know how to recognize. It'll keep us busy for a long time. In the interim we'll just play with increasingly clever tricks to sort out the patterns in the sequences, and catalogue lots of people so we have things to work with when the time comes.
You archive it until you need it. A situation-specific microarray might cost a hundred dollars; those tend to stack up with every hospital visit. With whole exome sequencing like this, you pay the fee once, and have all* the data medical science will ever need about you.
* Not counting repetitive elements, promoter regions, UTRs, spacer DNA, or the epigenome, all of which are known to describe at least a few diseases.
Pacbio promises a trillion unicorn farts per second. It's hard to take them seriously. As far as medical applications are concerned, the read length in the Ion Torrent system is ten times the size it needs to be, since most (known) diseases occur due to mutations in the very specific and non-repetitive exome, or in its close vicinity. No one (that I know of) has ever seriously proposed using this hardware for de novo sequencing of large eukaryotes, especially since the machine currently on offer doesn't have the well capacity to sequence the whole human genome!
That being said, there's always paired-end reads. I'm guessing the protocol for doing so with this system doesn't exist yet, but they tend to solve most of the repetitiveness problems for shorter read lengths.
Gene hacking already is the next nerd occupation (I should know; I'm in the middle of it. Mozilla even funds projects for it.) Here's one starting place if you're really interested.
About the read/write thing: synthesizing large amounts of DNA from scratch still costs ungodly amounts of money. Further, the ABI IonTorrent system being advertised here is a destructive read; you have to treat a blood sample with a large number of chemicals and then stuff it in a big machine. It's no Star Trek scanner.
That too.
You mean like observing it? :) Our most remote radio transmissions have only gotten as far as a few stars in our near vicinity, and they're statistically indistinguishable from background noise by the time they get there. If we are to blame, it's probably bad breath.
Actually, it's not only hard, but at present, downright impossible to make such predictions. However, being pessimistic about our effect on the planet in this regard isn't really a big deal, since we already know quite certainly that we affect the environment in some catastrophically bad ways; we are, for example, currently overbooking the planet by two and a half binary orders of magnitude (i.e., six times) its ability to sustain first-world lifestyles. Only once we've got some of these more apparent and easily rectified problems under control should we allow ourselves to stop being paranoid about our impact on the world around us and reassess—while it might seem like a good idea to make sure we know exactly what our target is before marching off, we unfortunately cannot sit around twiddling our thumbs forever while waiting to do so; environmental contamination has already driven countless exotic species to extinction (one of the most prominent but least photogenic examples being the Yangtzee river dolphin), and hesitation can really only make things worse.
Sounds like we're gonna need pretty big shoes. :)
(And also this comment, too.)
Yes. That being said, all known living organisms are more error-prone than they need to be. See this comment.
Carl Sagan agreed with this; in Cosmos he pointed out that interstellar wars would be rare because the technology differences would generally amount to a matter of no contest. How do we know when we have technology strong enough, though, without a point of reference?
In addition to all the other responses, the prisoners have to be treated in hospitals. The disease can (and does) spread to the rest of the population from there.
Besides the obvious responses to such an inhumane and unenlightened question, prisoners end up treated in normal hospitals, through which the disease can spread to other patients. In fact, the documentary we watched in first year biology introducing the subject was predominantly about a woman who was affected through that exact vector.
Will fix!
For those interested in exactly how prevalent this sort of thing is, be aware that drug resistant TB is in almost every country in the world; it's just really bad in those particular three countries. This journal article from 2006 has maps showing the incidence rates per country.
Sorry, but your eugenics program will have to wait for another day. Drug-resistant TB is everywhere.