Celera Completes Human Genome. Sorta.
kovacsp was the first to write to us about the announcement from Celera that they had completed mapping of the human genome. Note: This is /not/ the be-all, end-all. They have finished *mapping* one person's genes. With Celera's approach, this means that they now need to being assembling the information they've gathered. All in all, Celera plans to do the same process with four other people. The Human Genome Project, using a more traditional approach is still a couple years away, but the race is still pretty close.
Great, now how much longer until a public beta release?
------------
a funny comment: 1 karma
an insightful comment: 1 karma
a good old-fashioned flame: priceless
this sig limit is too small to put anything good h
Is anyone else bothered by the fact that the first group to have a complete sequencing of the human genome is a private company? If anything ought to be in the public domain, all other arguments about software, music, etc... aside, it is the human genome. After all, everybody already has their very own. Celera deserves to reap the benefits of getting there first, but only until somebody else can get there as well. If another group finishes the sequencing, they have just as much right to use it as Celera. It's not like Celera has created an original work-- they've just finished reading through the genome first.
I really hope that the HGP places this information in the public domain as soon as possible, and refrains from signing any exclusionary deals with Celera that would prevent this information from being free.
www.eFax.com are spammers
Oh give me a clone
/||\
Of my own flesh and bone
With her Y chromosome changed to X
And when she is grown
My very own clone
She will be of the opposite sex (hurray!)
Clone, clone of my own
With her Y chromosome changed to X
And when she is grown
Since her mind is my own
She'll be thinking of nothing but sex!
(written by Robert A Heinlein)
__
(oO)
Hand me that airplane glue and I'll tell you another story.
SEQUENCING means creating a complete list of the nucleotides in order. If you had this information, you could actually synthesize the entire genome of the individual. [There are some sophisticated niceties like methylation that distinguish the synthesized version from one extracted from a human, but it's essentially complete.] There are other factors (like which regulatory binding sites are actually bound, by what proteins; exact state of histone supercoiling, etc.) that control gene expression enough to keep this from being a working human genome, but it's awfully close.
MAPPING means determining distances between known genes. Using this information, you can deduce where the various genes are, the approximate location of specific unknown genes, and many other useful facts. A detailed map is a good starting place for hunting down a gene, so you can locate and sequence it; it also can tell you what traits are likely to be inherited together, etc.
A "sequence" is a complete blueprint (though there are details that aren't covered by sequence alone) A map is like a geographical map that shows where all the cities and large towns are. There are still many factories, facilities, and industrial complexes off that map -- not to mention all the roads, rivers, mountains and utility lines. ETC.
A sequence is a lot more information, and a wonderfully compact database - at 2 bits per base pair (4 possible cases), you could fit a complete human genome in under a gigabyte. (That's only one human, however.)
Naturally, even once we had the genome (or preferably a few thousand individuals, to let us get a real handle on variations), we could still spend decades or centuries figuring out what it all meant. 3x10^9 bases is a lot of info. You thought it was hard trying to trace western civilization in the first million digits of pi.
I am not a Molecular Biologist - anymore. But I was, about 10 years ago.
__________
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
Any reasonable person would define "complete" as this: there's three billion bases of human DNA in 24 different linear chromosomes. The sequence is complete when you can give me a DVD with 24 files on it, each of which contains a contiguous sequence of a human chromosome.
That may never happen for any large animal or plant genome. Too many regions of a genome sequence are an ungodly mess, repetitive and difficult to sequence.
The public worm (C. elegans) project, at 98 million bases, defined "essentially complete" as "we've come as close as we can to complete using existing technology". We have 97 million bases sequenced and about ~50-100 remaining gaps.
The fly (Drosophila melanogaster) project, at 180 million bases in size, was recently declared "substantially complete" by Celera. They have 120 million bases of sequence, with several thousand gaps. The fly has more extensive regions of repetitive sequence than the worm.
The human, at 3 billion bases in size, is nowhere near complete, either by the public (us) or by Celera, no matter what Celera press releases say.
You need the following steps to get close:
1. shotgun coverage. Technology limits us to reading ~500 bases of sequence at a time, so we have to blow the genome to bits, sequence millions of fragments, then assemble it all back (computationally) into a contiguous sequence. Because a successful assembly relies on deeply redundant overlap amongst the fragments, we need ~8-10x shotgun coverage (24 to 30 billion bases) to try to assemble the human genome. The fly genome was shotgunned to 12x coverage to achieve the results Celera reports.
2. Assembly. Once you've got shotgun data, you can try to assemble the genome from those fragments.
3. Finishing. The automated assembly (like the fly genome now) will have a great number of gaps. These must now be closed, more manually, by expert molecular biologists; the gaps represent regions that are biologically difficult to sequence.
The actual science behind the Celera press release is that they have partially completed phase 1. They currently have 4-5x shotgun coverage of the human genome, about half of what they need for a proper assembly. They intend to get the other 4-5x coverage from the public "rough draft", which is at about the same stage Celera's project is in.
The two projects (Celera and public) are neck and neck in this "race". The difference is that we acknowledge that our sequence is a rough draft at this stage; whereas Celera claims that their sequence is complete. Celera has every right to spin their project to their investors any way they feel is appropriate, but scientifically, they are being rather disingenuous if not dishonest.
conflicting oblig. disclaimers: I'm a co-PI on the public project, and I (accidentally, through an acquisition) also hold substantial stock in CRA.
Although DNA fingerprinting is mostly accurate, it is based on differences in the introns, which are highly variable. As far as exons are concerned... You've probably heard that chimps and humans are 98% or so genetically similar, and humans and hamsters are 95% genetically similar.
If you compared the genes (exons) of any 2 people, you'd find them to be 99.99999999% or more similar. The differences are very slight. What makes people unique is not the genes so much as which ones are expressed.