First Sequencing Of Plant Genome
cthugha writes: "The genome of Arabidopsis thaliana has just been completely sequenced, making it the first plant species to have its genome fully sequenced. The fact that we have animal and plant genomes now should give us greater insight into the common aspects of eukaryotic life. Nature has good coverage here. The ABC has a shorter and easier-to-digest report, but the emphasis is on the fact that Australian scientists could not participate due to lack of funding rather than on the technical details."
Mapping and sequencing are different. Mapping is basically identifying major landmarks in the genome and their relationship to one another. Mapping is an essential first step to sequencing, but sequencing goes into greater detail. It identifies every "letter" of the genome. However, while it is a major acheivement, merely having the full DNA sequence of an organism is only part of the puzzel. We still have to figure out what it all means and how it all works. What part of the genome codes for genes? What part of the genome regulates expression of which genes? How do those gene products fold into functional proteins... So, by mapping & sequencing the genome, we are neither figureing out what every gene does in a plant, nor are we figuring out in a vague way what groups of genes do. Instead, we are documenting significant amounts of information which will assist in both those pursuits.
WARNING: according to some mail I subsequently received from the investigators at the Max Planck Genome Intitute, the above sequence is incomplete and was intended only as a private communication within their research group. Please don't download it.
If you want to, um, compile your own version of A. thaliana, see
/
ftp://warthog.mips.biochem.mpg.de/pub/cress/MAR
Haploid Genome Sizes (collected from various sources):
Homo sapiens (human): 3.3 x 10^9 bp, # of genes unknown
Drosophila melanogaster (fruit fly): 1.8 x 10^8 bp, 13,601 genes (if you believe Celera has sequenced it all)
Caenorhabditis elegans (worm): 95.5 x 10^6 bp, 19,820 genes.
Saccharomyces cerevisiae (yeast): 12 x 10^6 bp, 5,885 genes.
E. coli (bacterium): 4,639,221 bp, 4,377 genes.
Hemophilus influenzae (simpler bacterium): 1,830,138 bp, 1,738 genes.
Arabidopsis thaliana: 1.17 x 10^8 bp, ~25,000 genes.
Wheat: 16 x 10^9 bp, ~30,000 genes.
Let's try not to let fact interfere with our speculation here, OK?
While I certainly agree that analyzing genomes will be something that will take decades, it is somewhat misleading to say that a whole organism is thousands of times more complicated than a virus. Yes, viruses only have a few genes, but this is only because most of their functions are handled by the host organism.
By mapping the genome, are we actually figuring out the underlying structure of what every gene serves to do in a given plant? (more like a decision tree) or are we just figuring out in a vague way what groups of genes do what (more like a bayesian belief net)?
Neither, unfortunately. Basically a genome is analogous to the binary code of an executable -- you can't just look at it and follow the logic of the program (or organism, as the case may be). However, there is a field of study called bioinformatics which attempts to extract useful information from the raw genomic data, and in order to do this, many techniques from AI and machine learning are used, such as Hidden Markov Models.
Simply put, with finding out what whole genomes do, you get a pretty precise roadmap of what's going on. Not only that, if you can't zoom in too much on one part of the map, you can go find another map that has a similar part and zoom in on that. Got it?
Well, that's a bit of a stretch. For example, to really understamd what's going on you have to have gene expression information, and you can't get that from the genome -- you have to use microarray data ("gene chips"). And even then you can argue that what you really want to look at is the complete set of proteins and their abdudances (the "proteome") and not the genome at all.
The human genome has yet to be fully sequenced. What you are thinking of is the announcements of several draft sequences, with many missing and erroneous areas. The complete sequence won't be available until next year at the earliest.
Secondly, although I'm all for enthusiasm for genomics, the human genome actually will be (at least for the forseeable future) one of the *least* useful genomes. Why? Because we can't do experiments on humans. When we have sequenced many plants and animals and gotten a good idea of how they work from experiments, then (and only then) will the human genome be of any practical use.
It was picked because
1) it has a small genome -- many plants actually have genomes longer than the human genome.
2) Arabidopsis is is a small, fast growing plant, well suited for experimentation.
It is important that people realize that sequencing a genome is a beginning and not an end. Having a genome means that more sophisticated studies can be done -- it doesn't mean that we now know everything about the plant.
Wheat is a hexaploid plant, meaning 6 duplicated sets of each chromosome. It is thought that all wheat is descended from three individuals about 10 years ago. By multiplying chromosome sets new species can arise extremely quickly in plants.
So, that's why wheat has a lot of genes.
If tits were wings it'd be flying around.
Well, I meant 10 THOUSAND years ago, not 10 years. I remember eating bread sometime before 1990 on more than one occasion.
If tits were wings it'd be flying around.
The current paradygm in genetics is that life is
the genetic code and essentially information.
And it is very complicated to unravel, using one
of the largest supercomputing configurations on
the planet.
Its been fully sequenced for a decade, yet
people don't fully understand how it works.
Image understanding plants or animals with tens of thousands of genes.
Quantity is confusing in the genetic world.
Wheat has 16 billion base pairs or five times human.
Plant genes tend to duplicate alot according to the first plant genome.
With regards to animals, the fly genome has only 2/3rds the gene of the worm genome.
The low end of human of human estimates- 35,000
genes- is not much more than these plants or animals.
Lemme know when they map that sucker, and figure out how to make regular lawn grass produce 60% THC....hehehe
Your Momma's so fat she makes emacs look like nano!
Last I heard we had the human genome completed. Why all of a sudden move on to the plant. We were all hyped with what the discovery could do to medicine and how it would change our lives. It seems as if they just filed it away and moved on. I for one would like to see some real life applications to what these scientists are doing. It would be as if I wrote a program that could recover ANY windows crash and just saved it on cd and threw it in my filing cabinet.
...grow your own Drosophilia!"
I've done that, and you can too. Just leave a banana sitting on a countertop for a couple weeks.
Yeah, I know, spontaneous generation was debunked over a century ago, and baby Drosophila (Drosophilae?)come from mommy and daddy Drosophila(e). So please no "what are you, stupid?" rants.
Thank you for providing a concrete example of the high costs of research. However I don't see that it supports your argument. Oil exploration is expensive too, but you don't see oil companies patenting the use of oil as fuel. (Maybe they just wish they'd thought of it sooner)
Research is damned expensive and without assurance of return, it's not entirely feasible to invest the resources.
Assurance? What assurance? There aren't any assurances of a return on investments (for projects like the sequencing of Arabidopsis). There's the off-chance of a return, and big payoffs get more probable when patents are awarded, but it's still not much of an assurance.
Patents are a means of holding information hostage. I understand the need for them but I don't have to like them. I particularly don't like them when what is being patented is a process that a living organism has been doing for free since time immemorial.
If I invent a new widget, I have the right to hold the schematics hostage, releasing them only to those who pay. But who invented the gene that codes for usefulase? I don't claim to know, but I'd bet money it's not the one applying for the patent!
IANAL.
Further analysis is needed to figure out what molecules are created by each gene and under what circumstances. For example, neurons have on part of their surface a receptor for serotonin. This "receptor" is a molecule of a certain shape which the serotonin molecule fits into, and when this happens the receptor causes a change in behavior in the cell. There's a gene sequence someplace which builds the receptor molecule and adds it to the surface of the cell -- but this level of genetic maps don't tell us exactly where this gene sequence is and what the shape of the receptor is. Further research is needed to find the location of this genetic sequence, to analyze the exact genetic code, and what molecules that code can build.
Even that won't tell us everything about a cell -- some drugs work by fitting into a receptor near a receptor whose action they are targeted to block, and the drug works because the rest of its physical shape crowds the target receptor so what usually activates that target receptor cannot reach the receptor. It takes a lot of study to figure out the 3-D shape of the surface of a cell to understand what can be going on in the molecular soup of life.
Research is damned expensive and without assurance of return, it's not entirely feasible to invest the resources. Nobody has ever work solely for the public good, you know. Either they are seeking fame or they are being funded by someone higher up who has a vested interest in their results.
I am an active researcher in the field of plant biology, so
I can speak with some authority on the costs involved in doing basic research. I have an assay that I do routinely to directly measure the rate of transcription of a single gene, called a nuclear run-on-assay. Each one of these assays costs around 300 dollars to run and takes bout two weeks from start to finish. To get statistically valid numbers, I need to repeat each experiment twice, effectively tripling the cost (900.00) and the time to over a month (And this does not count the cost of paying me). If I want to ask any meaningful set of questions, I am going to need to run a lot more of these under different conditions. Can you see how the cost adds up? It would cost even more if I didn't make a lot of my own materials from scratch.
Other assays and techniques are equally expensive. A friend of mine is getting ready to clone a "promoter", which is the part of a gene that actually controls how it's expressed. The minimum cost for cloning and sequencing this promoter will be around 2000 dollars. Actually doing experiments on it later will cost even more.
Hi,
> I particularly don't like them when what is being patented is a process that a living organism has
> been doing for free since time immemorial.
I couldn't agree more. There is however one class of patents I can "kind of tolerate". I think research done for tax-payers money (ie. at the (state?) universities and gov. organizations) should be protected from the greed of corporations (pharmaceutical and bio|agro-tech in particular).
Regards,
kovi
Like other have said, other organisms can (and many do) have more base pairs than we do - just like they have more chromosomes. For instant, a fern plant has something in the ballpark of 1200 chromosomes! Compared to us, you would think the fern is a super-being. However, there is much less information per chromosome in the fern, whereas in a human chromosome, the information is much more dense. It is nature's way of making things more efficient perhaps. Just because there is more "Stuff" there, that doesn't mean there is more information in the stuff. Remember, quality, not quanity. :)
Man is born free; and everywhere he is in chains.
I was working there for little while during the summer. The way they explained it Arbidopsis is a very genetic plant and can be thought of as the mother plant or something like that. Cornell's department was trying to see links between arabidopsis and the tomato plant.
And as someone else pointed out earlier, sequencing the half the job. The actual research goes in when the sequences are clustered and compared against other sequences. Anyway, good day for plant science.
This is an honest question.. If there are thousands, millions, or billions of genes in an organsim, and I decide to make a subtle change to one of those genes how can we predict the effects of that change on the organism as a whole??
Let's simplify the problem a bit.. Let's say there are 10 genes in the organism of interest. We have no equation, or set of equations, to govern the response of the system to a change.. So the best option we have requires some sort of empirical approach. So I gather up a large subset of the population, examine each, and note their differences. Then I'd compute something like a two-point correlation tensor.. So if I have 10 genes and, coincidently I find 10 differences, that's a 10x10 matrix of relations between the "causes" and "effects". So that's not bad, right?
Wrong.. This is a non-linear system, and a subtle change in one gene may have an enormous impact on a handful of genes. Then another subtle change in a second gene, may negate the effects of the first change.. I'm curious what sort of training the geneticists working on these sequencing problems have with non-linear systems?
This is not an attack: I'm curious.. I've spent a lot of time over the last few years trying to get a handle on the inner workings of simple turbulent flows, and even with a set of governing equations to guide my efforts, the problem of predicting how subtle changes will affect my flow is non-trivial!.
Aside from what's already been mentioned, scientists have traditionally had certain 'model organisms' which are intensivly studied and then used as models for other organisms. White mice, E.coli, arabadopsis etc.
There are a disproportionate number of papers detailing the workings of E.coli.
I'm not sure how long arabadopsis has been a model plant organism- whether before the start of the human genome project or not.
___
It's the end of my comment as I know it and I feel fine.
Didn't you see all those pictures of tobacco plants that had the gene that allowed metabolism of luciferase from fireflies? Water the plants with water containing luciferase and they glow. Now why they couldn't also put a gene in so that you didn't have to water them with luciferase, I'll never know. I would have bought one.
___
It's the end of my comment as I know it and I feel fine.
To state it in a form that may be relevant to a programming mentality; how else would you go about testing a program which was essentially self modifying?
i.e. some genes can regulate the expression of other genes.
You can figure out that some variables may affect a particular portion of the program.
Of course, just because one gene produces one protein, this dosen't mean that one protein only has one use. Just as variables can affect several parts of a program, enzymes can take part in a number of very different metabolic processes.
___
It's the end of my comment as I know it and I feel fine.
A lot of biological systems tend to 'normalize' themselves. Inject yourself with a 1 cc of sugar water and your body will return itself to normal in less than an hour, most likely. I'm assuming, that would make initial conditions in the physics sense a little less important, especially since biological output tends to be much more fuzzy to begin with. Anyone want to contradict me here? I'm not a biologist, though I ended up taking an ungodly number of life science courses in school. They've come in quite useful on Slashdot :)
___
It's the end of my comment as I know it and I feel fine.
Cornell researchers have used the genome sequence of the Arabidopsis to obtain information on its origins as a species. See here.
5 years? Can we expect banana plantations buzzing away towards the local refuse depot? And who'll be eating those bananas anyway?
Or maybe we will have "fruit fly seed pots -- grow your own Drosophilia!" or so...
On one hand, we have the potential for greater understanding.
On another, we have the potential for some crazy shit.
On still another hand (for you freaks that have three hands, heh), neither of the two cases could be the case, in which case it is neither good or bad, but just another tidbit of information to be archived on /. and eventually float off into cyberspace...
Why not? :) (but in this case, it doesn't...)
The Nature article talks about giving away 5000 CDs containing the data, and mentiones somewhere that the dataset is 120 Megabytes.
No, it said 120Mb, which is 120 mega base pairs... geneticists don't talk about DNA in megabytes :) The ABC article seems to be off by 3 orders of magnitude. I think the human genome is around 3 billion base pairs, so it's probably right about that.
If you want to think of one base as being the same as two bits with four possible states, that's fine. In class, they told us to think of it more as an alphabet with four letters, but that's just another way to visualize the unvisualizable.
What makes it tricky is that it's a group of three bases together that actually expresses for anything. Therefore, out of your three-bit word you can express up to 64 different items. Each "word" codes for one amino acid, and there are only twenty-odd of them known. Whatever makes a protein more unique than the steak I'm grilling now is the number of amino acids present and the order they're in.
And if I got anything wrong, it's because I'm an ecologist and not a geneticist, but /. never does anything on centrarchid feeding behavior.
Alrighty then....
First off (and I'm not being deliberately snotty here), we're not talking about physics here. Current biology has nowhere near the decimal point accuracy, etc. that modern physics does.
Let's talk about bacteria (since it's a simpler problem - but most applies with minimal changes to studies of other organisms). Let's, furthermore, say I am interested in something like nutrient uptake. There are proteins on the cell surface which are involved in either passing (or not passing) external molecules to the cell interior. It is possible (let's not get into details) to get a good idea as to which surface proteins are involved with passing different classes of external molecules into the cell.
ok then. I have a protein of interest, I have a 'behavior' of interest, what next? believe it or not, the next step is usually trial and error. I induce mutations in the bacterium (by x-raying it or adding some chemical to a culture, etc.) and look for colonies that do weird things vis-a-vis my system of interest (in this case uptake of some particular nutrient - since this is /., let's say caffeine).
As bacteria reproduce like crazy and I have induced mutations in a population of, literally, millions of individual bacteria - there are bound to be some which do funky things as regarding caffeine uptake. I cannot attribute this necessarily to some change in my protein, but I can check the interesting mutants to see if my protein is different from the 'normal' sequence. If it is not - well then, no change in this protein is directly involved in the funky behavior. If it is changed, there is still much more to be done.... because, of course, it may be that some other mutation elsewhere produced the new, funky behaviour, and not my new, improved protein of interest.
One then zeros in on the effect of the changed protein by inserting or otherwise point mutating 'wild-type' bacteria to attempt to determine what effect the changes in sequence have... etc..
of course, having written all this, I realize that I haven't answered the question you asked. And the answer is that biologists, especially molecular types like me, don't predict!
We create mutants and see what happens. You would be astounded at the number of different mouse lineages out there with specific mutations and disease susceptibilities. If something is eventually to be used in humans, you start by seeing what it does to mice, move on to monkeys, then move on to human cells in vitro (cells in a tube, basically), and finally if animals/cells are not dying, etc. move on to trials in humans.
Prediction would be nifty, but even with whole genomes, its just not in the cards for the near future.
I thought the public could get copys of this stuff. Where is it?
Eat right. Stay fit. Die anyway.
By mapping the genome, are we actually figuring out the underlying structure of what every gene serves to do in a given plant? (more like a decision tree) or are we just figuring out in a vague way what groups of genes do what (more like a bayesian belief net)?
(Obviously, a having the understanding at the "neural net" level implies no mapping at all, so it can't be like that.)
-winter fantom
The plant can't have more genetic information than us
well it can, a lot off species have more base pairs then a human(If I remember my biology class correctly) it's all about redundancy and also a lot of info isn't used at all(well you could say there's a whole lot of cruft in in us)
"Mommy, mommy! The garbage man is here!" "Well, tell him we don't want any!" -- Groucho Marx
"These genome projects are the way to gather intellectual property positions, for example if we identify the function of a useful gene we could patent it. Without participation in this type of pure research, we will be left behind."
This is a shame. All that scientists are worried about these days is patenting the genome of something so they can get rich. Whatever happened to research for the benefit of mankind? Whatever happened to putting politics aside when it came to science? A damn shame.
--
The World is Yours.
Au contraire, I read both the Nature pdf file end to end and the ABC article. I did not come across anything that would answer my questions. I was a physicist, not a biologist, nor do I remember my grade school biology teaching us the difference in the number of bases between species (doesn't mean they didn't tell us, just means I don't remember every sentence 12 years later).
But I will agree, my post didn't deserve a 4. When I wrote it, I distinctly thought to myself "this isn't worth any points, but some of the answers might...", so I of course didn't use my +2.
There are some strange contradictions in the ABC article.
It first claims that "The sequencing of 118.7 billion base pairs of the nuclear genetic complement of a model plant is enormously significant". Then it says something near the bottom regarding "the 3.2 billion base pairs of the human genome". So what's going on here? The plant can't have more genetic information than us.
The Nature article talks about giving away 5000 CDs containing the data, and mentiones somewhere that the dataset is 120 Megabytes. So I presume that is compressed, down from the 3.2(*2) billion bits that ABC quotes. Are these numbers accurate? (And just how much information is there per base pair? Is my translation of four nucleotides to 4 possible states (2 bits) correct?)
Is this interesting? This implies we already can understant human genome? No.
Doing a comparison with computers: If you had the binary executable of a program of an architecture you don't know... how would you suppose what means every bit of this file? And, the most important, how would you discover the instructions this processor can understand?
The "solution" is to search for species with small sequences of DNA and compare to others. Finally you could try to modify some of this to see what changes in the final individual. But we won't get anything in a near future, perhaps we won't see any real use for this in our lifes.
--
To visit or not to visit: findusclub.com
The opinions in this comment are subject to GPL, you can copy, modify and redistribute freely (as in speech).
I just finished with a class that touched on the genome mentioned here. Most of the gene functions are, as has been mentioned earlier, deduced by comparison to other genes of known function in other species. This is usually done using algorithms like BLAST/PSI-BLAST/Gapped BLAST (Basic Local Alignment Search Tool) that compare sequences in question to data stored in a large database like GenBank. More information on these topics can be found at the National Center for Biotechnology Information, run by the National Library of Medicine at the National Institutes of Health.
...but was there a reason why this plant was picked? Some obscure scientific reason? Because it is a 'simple' plant? Because some scientist was on a walk decided to pick the first interesting looking weed he found?
Kierthos
Mr. Hu is not a ninja.
When you say similar what exactly are you comparing? Physical appearance? Certain characteristics? (if so which ones?) or is it sequential data (i.e. DNA or something along those lines) within these genes.
I am not a biologist and the realities of this concept eludes me.
I'm a writer, a poet, a genius, I know it. I don't buy software, I grow it.
We now have the book of life, let's learn to read.
HOW? I can't! Please use tt as well.
Hope you all get first posts and trolls for Christmas
Thanks for modding me as a troll. I'm so happy I'll forget all the markup now.
Hope you all get first posts and trolls for Christmas