Celera Completes Human Genome. Sorta.
kovacsp was the first to write to us about the announcement from Celera that they had completed mapping of the human genome. Note: This is /not/ the be-all, end-all. They have finished *mapping* one person's genes. With Celera's approach, this means that they now need to being assembling the information they've gathered. All in all, Celera plans to do the same process with four other people. The Human Genome Project, using a more traditional approach is still a couple years away, but the race is still pretty close.
The way to succeed in biological sciences is to study what the media is interested in and not necessarily what's practical. Gene sequencing today has the media attention yet the foundation of gene sequencing, bioinformatics and protein modeling is impossible to get grants in. Since the DNA sequence is useless unless you can process the data we see how important media coverage is in the world of biology even though none of the data can be used for anything.
Inbreeding is only an issue if there are bad recessive genes. Heinlein himself addressed this at the end of _Time Enough for Love_, where Lazarus Long was finally jumped by his twin female clones.
:}
/||\
Now THAT is what i call a rich fantasy life.
__
(oO)
Hand me that airplane glue and I'll tell you another story.
According to this week's Time Magazine, the Human Genome Project is now estimating that they will finish in November of this year, unless I grossly misunderstood the article.
> Wake up and smell the coffee.
I think your tin foil cap is slipping. Better adjust it - FEMA is trying to take control of your mind so that they can use your genes to create a race of superhuman zombie soldiers.
It seems unlikely, but it isn't.
I forget what 8 was for.
This is what the outcry is over. It's also why their (and many other 'biotech' firms') stock evaluation is soaring through the roof.
After the HGP had ran for a few years a group splintered off to pursue it purely for profit. That's Celera. They've promised to allow access to the information for researchers, but have never deatiled to what extent. They obviously aren't going to allow outside research to be done with any genes they claim a patent to, and that's been the sticking point in most of the past cooperation talks.
Also remember that the announcement made by Clinton & Blair a few months ago gauranteeing the freedom of the genome only applied to the HGP. They would have had to have completed the first map in order for it to mean anything. Celera's rather amazing accomplishment now will mean a real big headache for everyone who's not one of their investors.
A possible scenario is that you fully develop a way to clone yourself, but can't because certain genes giving you immunity to some diseases are protected by a patent. It's really horrifying. Medicine is about to get as nasty as computer software. Everyone is going to sue everyone over everything, all trying to get any slight advantage they can. And people will be dying so stock prices can raise a few tenths of a point...
The most interesting article in the March 24, 2000 Science issue that described the recent
fly sequencing is comparative protein complexity.
Once you have the genome, you can start deducing better the mix of proteins in organisms. Proteins do most of work of life and are harder to analyze than DNA. Only a few percent in humans are understood.
These three organisms: worm, fly and yeast
were the first three complex organisms to be
fully sequenced. (Mouse, human, dog, corn, rice, and tobacco are in the works.)
It turns out that the worm is slightly more complex
than the fly, and both are about twice as complex
as yeast. It is expected humans will come in
about twice as complex as a worm. We'll know in
a few months.
Protein complexity is not necessarily the same
thing as organism complexity.
All organisms on earth have been evolving for
four billion years, so have the same chance
at complexity.
Genetic mechanisms for managing complexity have
been evolving too.
It may be humbling to find that from a genetic
measure, humans are simpler than some other
plants and animals.
__________
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
For example, for diabetes, it might be enough to get the gene (under a proper control sequence)
into a fraction of you pancreas B-cells.
For other diseases, like phenylketuria (the inability to process certain amino acids like phenylalanine, it might be enough to get the enzyme gene (suitably activated) into a relatively small number of cells, anywhere in the body. Here the goal is just to break down enough of the a.a. by a harmless pathway to keep the toxicity down.
Sometimes, you don't even need to change the cells in the body. For example, a permeable container of genetically engineered cells implanted in the body would work for some diseases
2) It isn't going to be easy. A test subject for a genetic modification died last month of an unexplainable liver failure, being exposed to a usually harmless virus, loaded with a human gene. the other test subjects were fine. No one knows why.
__________
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
You have the intron argument exactly backwards When you read those statistics about 98% similarity, it includes the total genome (introns, exons, non-coding 'junk DNA', telomeric tails and other repeating sequences).
How could we compile a 98% index of similarity in introns? We don't even know a full 98% of the human genes yet, much less their introns? Much less the monkey genes/introns to compare them with?
Even after we have the genome sequenced, it will be many years before we find all the sequences that act as genes, much less the methods of their processing and expression (like introns)
These much-bandied numbers came (years ago) from random sampling techniques, and the sequences of (then) known genes. Predictably, we sequence the important and easily located genes first. These numbers are inaccurate, and should be shot on sight, because, as I will explain, there will never be a single accurate meaningful percentage number for "how much like the chimps are we." Never.
Important proteins are usually more highly conserved (don't change much) because changing them is often life-threatening. Most changes adversely impact the organism. [Histones, for example, are so highly concerved that they are only a few base pair differentin man, cow, and pea. Such conservation is rare, however]
99.9999999 the same? Give me a break. If that were true, human individuals would only vary by four base pairs on average. Watch your numbers, willya?
In fact (for reasons I will cite below), any two random cells in your own body are probably not 99.99999999% identical
So how much is the difference betwee humans?
There are roughly 10,000 genes in the human genome (an estimate widely used in the field). Since I can name, off the top of my head, a few dozen common variable allelles (e.g. AB blood type, minor blood types, eye color, etc) I'd be very surprised if there weren't hundreds of less known common variable allelles (100/10,000 =1%) So I doubt most humans are 98% identical on an ALLELLE level and 95% may be pushing it (ALLELLES are 'different gene forms' like blue vs brown eyes, or Rh+ vs Rh-)
But you're talking on a BASE PAIR level, and that's purely a philosophical question, not a matter of strict numbers as you suggest If you drop a single base pair, all the subsequent amino acids will be TOTALLY different (this is called a frame-shift mutation, and in fact the gene will usually become nonfunctional because an accidental 'stop codon [3 of the 64 codons are stop codons] will likely be created with a short distance of the change)
One could argue that this is a a one base pair change in the gene, but it wipes the gene out entirely.
Another type of mutation is "conversion" where an A becomes a C, etc. You almost certainly carry thousands of base pair conversions compared to your ancestors, but they have little or no effect on your genes, their products, ot the effectiveness of the function of the protein functions
And how do you count transversion? If a big chunk of a monkey liver enzyme gene is now used in a human brain gene? Is that a match or not? Or if the entire monkey enzyme is now never used in the human liver, but only in the human brain, is that a match? Or what if an enzyme splits into two forms that are used in different tissues and are very similar, and perhaps sometimes even combined (e.g. creatine kinase)? What's the frequency, Kenneth?
Therefore, counting random base pair homology (similarity) is an irrelevant exercise in today's science. If we need to count (and why would we do with that info, except supply ignorant science writers with sound bites?) we need to specify the proper comaprative index: functional allele differences, marker mutations for genealogy, population divergences, identifying founder effect gene fixations, etc
In fact, even counting allelles is a matter of philosophy Is 'redhead' really a different gene from blonde or brunette if the base pairs turn out to be 99% the same (they aren't). On the other hand, your immune system may run on HLA27, while your brothers runs omn HLA8 -- entirely different genes serving the same function, and it won't matter unless one of you needs a transplant (may the The Gods of Immunology forgive me for that oversimplification!)
Basically, an 'Allelle' (different gene form) is whatever we say it it, whatever is important for the specific question we are investigating.
The 95% (98% 99.999%) number is useless and will always be useless except to hack science writers -- though the underlying principle of the commonality of genomes is useful. I've come to believe that the *number* is downright harmful to readers of hack science writers
Suffice it to say that the human 'DNA copying mechanism has roughly an error rate of one per billion base pairs, and the human genome is roughly 3 billion base pairs. Every time a human cell divides, the daughter cells probably are a few base pairs different. The cells in your body now are typically dozens of generations away from your embryonic state and are not exactly identical -- but their divergent mutations are probably less than 1 part per Million
Even most genes you'd die without ('important' genes) only have a few critical regions, and can mutate to varying degrees in the rest of the gene. Think of it this way: binding sites may have to be very precisely conserved, but the 'bricks' that hold them the right distance apart, and at the right orientation aren't so important.
In fact there are entire families that are hypervariable: Immunoglobin genes (antibodies) are different, even between identical twins -- so are olfactory receptor genes (though there may be some fixed 'common' olfactory receptors)
A lot of the confusion arises when people learn that (for example) a single tiny change in the B chain for hemoglobin can cause sickle cell. But that change alters the geometry at a 'corner' of the protein that throws the entire protein off.
[it has been suggested that sickle cell hemoglobin is so widespread because it protects aginst malaria, and therefore served a valuable function. Malaria has been one of the biggest killers of humans since pre-history, possibly *the* biggest]
'Variability of expression' is not the primary reason for the large differences. Subtle differences in genes (and the interactions between their products) can produce significant effects.
__________
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
Check out this Globe and Mail story about Sick Childrens Hospital in Toronto sorting the map - with a supercomputer. I've seen pictures - SGI Origins all over the place. Cool hardware - now let's hope they "Do no harm" with any knowlege they gain.
"Depression is merely anger without enthusiasm." - Anonymous
I did hear at one point that they were going to gang together 400 or so 4processor AlphaServer ES40's specifically to handle either the assembly or analysis portion. The 400 servers X 4 600mhz EV6 Alpha chips would give you 1200+ cpu's in the cluster...the final version of this system is what they claim will be the 2nd fastest civillian owned supercomputer on earth.
I don't work for Celera so mistakes made above are my own. I'm just a bioinformatics hardware geek and a big supporter of Alpha-for-life-science-research type projects. From a infrastructure geek's perspective what Celera is doing is just amazing...
10 years ago the state of the art was pretty poor. The HGP estimates were based on that technology.
Celera's relationship with PE allowed them to get their hands on tons of the new 6700 series DNA sequencers. Without them Celera's effort would have been impossible.
So-- Ventor does deserve some credit -- he was smart enough to realize that the revolution in sequencing (plus a cozy relationship with PE) had changed things enough to make a a large-scale private effort possible.
just my $.02
Lets say Celera does finish the mapping before the Genome project can. Celera then sells to researches data from the mapping. What happens when the Genome project completes and gives the information away? Can Celera call it theft of intelectual property?
I found this article on the cataloguing of the Genome Database in the Globe and Mail. It talks about the why and how of the database. Here's a quote: In the one year since Canada took possession of the Genome Data Base -- which shares a small room with a large air conditioner -- the on-line system has logged 20 million hits. Estimates suggest it serves more than 1,000 scientists in 50 countries.
IMHO, as per,
J:)
Oh well, no point in steering now.
... what's the "largest private supercomputer" Celera claims to have used? (quote is from the Wired article). Anybody got any info?
engineers never lie; we just approximate the truth.
Since they're the first to do it, how can anyone be sure of the veracity of their claims?
>
Should I RTFM? IS there a FAQ on this somewhere?
-pbk
Well, now they think they can produce a carbon copy of a human being at will -- or worse yet, a modified copy which has exactly the characteristics they want: Obedience to secular authorities, low IQ, lack of imagination, lack of faith, etc.
This Black Helicopter moment has been brought to you by Genetic Engineering. At GE, we bring good things To Life.
But somewhere out there there's a person who's about to become the benchmark human that we're all going to be measured against .....
The problem is, Celera and its friends are going to be the only companies with the full genome available for several months. Patent law says you can't patent something obvious to someone experienced in the field, but while the genome isn't widely available, Celera may be able to circumvent that clause temporarily.
What I'm sure many people are wondering right now is: once the HGP completes its sequence, will these patents on medical knowledge derived from Celera's work be revoked on the grounds that the method has become obvious through independent public research?
--
The shareholder is always right.
This brings up the whole issue of patenting and the Intellectual property ownership of part/all of the human genome. It is my understanding that parts of the human genome have been patented already, does this mean we no longer own ourselves?
If I were to be marry and have a child would I be violating their patent by using the patented parts of the human genome? Ok so you could argue that a natural process can't be patented and the patents are for secondary uses, but what about medical cloning. If experiments continue into therapeutic cloning for the production of stem cells and potentially organs would this non-natural process not be a patent violation.
If by chance God does turn up for his thousand-year reign could he not claim some kind of prior art? Ok so that's probably not an issue the patent courts are going to have to deal with in the near future but the fact that patents can be granted on existing genetic material does raise some interesting questions, by granting patent in this manner the patent office is dismissing the idea that the genetic makeup of us/animals/plants may have been designed and is thereby encroaching on an area of belief held dear to a lot of people.
Basically they are betting that at least some scientists will pay for their map because of the way in which it rendered, maybe it will be easier to use, look prettier, run sequence search algorhythms faster, or something similar. But two independent copies of the genomic map (done in two scientifically proven methods) can only be better than one, so many scientists will probably end up using both maps for some percentage of their work/research/experimentation.
You will just have to pay for the bells and whistles (and speed of release) of Celera's map.
On a bit of a serious note, if that sex led to actual childbearing, wouldn't there me a MASSIVE inbreeding effect?
I mean, if having children with family whose genes are merely close to yours produces it, then what would happen if one was to have a child with someone that had the EXACT SAME genes as themselves, only a different sex?
-- Dr. Eldarion --
Celera mapped the genetic structure of the fruit fly recently. They claim that they will have the sequenced genes of their human subject assembled in three to four weeks in an article at CNN
I stole this sig from a more creative user.
Last time I checked the human 'rough draft' from the public project had about 80% of the sequence complete in draft form and in the public domain. The Celera project has nothing in the public domain except a few press releases.
I update my databases every night from the HGP. It is doubling in data volume approximately every 7 months and the doubling time is getting shorter.
Moores law eat your heart out.
The HGP is providing us with data faster than we can analyse it, and really opening up a whole new level of understanding of how things work. One of my colleagues complained to me after I had given a seminar on Genome analysis that his labs old laboroius techniques of analysing family pedigrees and careful selection of regions to look for genes was being blown apart by the public sequencing projects.
We are entering a new era of biology, one in which a biologist will need to be as handy with a keyboard as with a pipette. If you want to be a successful molecular biologist you will either need to be very, very good or have good data analysis skills.
Enough of a winge. Any open source programmers out there fancy getting involved in writing code to help with the human genome analysis? plenty of odd tasks to go round.
Dr. David Martin European Molecular Biology Network node manager.
--- Four bases should be enough for any genetic code
Celera is actually doing two things here:
1. Getting the raw sequence of the human genome and marking off all the genes we already know, as well as some "best guess" genes that are similar to other organisms that have been sequenced. This will be available from them for free to everyone.
2. They then plan to go after genes we don't really know. A little explanation:
Genes are how the body stores the information to make proteins (which get made into enzymes, cell signalling molecules, whatever...). They also make other things, but I don't want to complicate this. Largely, it's proteins that scientists are interested in becuase they are the machinery through which the body works. Cancer, for example, is caused largely by proteins that misbehave and refuse to do their jobs.
Just knowing the sequence of the human genome tells you little about the functions of the genes. The proteins made from those genes must be studied and characterized. This is where Celera's business model kicks in. They plan to identify and characterize as many proteins as possible. This is a non-trivial task, given that some molecular biologists spend their entire lives working on one protein. Celera plans to look at the protein-protein interactions as well as their locations within the cell to get an overview of what all the genes in the human body are actually doing. It's real "big picure" stuff", meant to serve as a starting point for future research. It is likely that many of these proteins will have value as targets for drugs, and I think Celera plans to patent these genes to make money. They will at the very least charge a subscription fee to look at all the protein data they have collected. I am fairly certain that other companies have already patented human genes...without the patents, there is not a whole lot to protect a drug from being stolen by competitors.
All of Celeras research will be at an ENORMOUS cost to the company. Should they make all the info free? The bottom line is that realistically, you and I are not going to develop the cure from cancer because we ran a perl script on the human genome. It takes a Pharmaceutical company with deep pockets to pay for all the FDA trials and get the drug ready for "prime time". Celera knows this, and they know these companys will shell out wads of cash to get info about as many proteins as possible. It is possible that university researchers will not have the money to pay for this information. But there is so much research to be done, the big pharmas will likely fund projects at universities to look into some of these genes more closey, so many of them will get what they want anyway.
http://www.pecorporation.com/press/prccorp011000.h tml
[excerpt] "Celera's mission is to become the definitive source of genomic and related agricultural and medical information. Celera's information will be available on a subscription basis to academic and commercial institutions who will have access to tools for viewing, browsing, analyzing, and integrating data in a way that will assist scientists in accelerating their understanding of the human genetic code."
And then there's the courts and governmetns. Both the UK and the US govs. had indicated that they might not be too happy about a company attempting to patent the human genome. It certainly isn't too clear what it means to patent a gene sequence, that simplier issue is yet to be sorted out.
It should be remembered, however, that genes are not an exact blueprint that we will follow. They allow for an expression of a trait. They do not guarantee that it will be expressed. Your dad could be Michael Jordan, but if all you do is sit at your computer and eat junk food you won't make the NBA. To often, it seems that genes are portrayed as concrete instructions on who we will be. Sometimes leading them to be used as excuses for personality/traits. It is the classic nature/nurture debate I suppose. It is just worth remembering though, that although not totally irrelevant, genes are not the sole determinant of one's self. I am in no way doubting the significance of this medically in curing certain genetic diseases, but I am weary of the way I see genes being portrayed by the general public in terms of their effect on who we are as people. ---Lane
What's the point of moderating?!
I'm curious: how well does gene sequence information compress? Is this effectively random data, or are there patterns?
I can see the disclaimers now:
The GeneStor PeopleBackup(tm) device can now store the complete* contents of your genetic makeup!!
(* Storage assumes 2x data compression. Results may vary.)
At least someone now has the technology to do offsite-backups of people... granted, there'll be a certain amount of data loss since the backup fileset was created (birth), but now, at least, there is the beginnings of real disaster recovery technology.
Imagine: Our friends at Legato could license Celera's technology and produce "WetWorker" -- with the ability to put your genetic data on CD-Rom for easy transport to offsite storage. Then, when your friendly, egocentric ocean liner captain decides to go "All Ahead Full" on a foggy night in the North Atlantic AFTER receiving an iceberg warning, you can rest confident that your family can always recover you from archival backup.
I'm aware that there are shortcomings (especially the part about "loss of all data accumulated since birth"), but after all, the centerpiece of any backup software isn't ease of recovery, it's ease of deployment. The data can always be reconstructed from "incrementals" (it pays to take good notes...).
This is my opinion and my opinion only. Incidentally, IANAL.
MOO;IANAL.
There used to be a picture linked here.
I don't think you could just "change" a Y chromosome to an X chromosome. I believe they're entirely different. Think about it, Y chromosomes give you the ability to drive well, while X chromosomes just give you the sense not to wear the same socks two days in a row.
-B
-rpl
Literacy is in short supply amongs most around here.
Q.E.D.
The human gene sequence is in the public domain and will remain there - anything else would be ludicrous (although I agree that when it comes to the law ludicrous seems to be perfectly acceptable. Witness patenting software algorithms). What I believe they will get the patent on is their process for deriving the gene sequences - which is perfectly acceptable. They will also have the rights to their database of human gene information, which they can license the access rights to. The Human Genome Project will be making its results publically available, so it might become a matter of whose database provides the most ancilliary information.
"The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
Unfortunately, the whole gene patent scandal is because of this point exactly... they *are* being awarded exclusive rights because they got there first. Bad, bad, bad, bad, bad.
I have every right to do whatever I want with any gene I've sequenced myself, damnit! I shouldn't have to pay royalties to someone because they sequenced it first!
The analogy is the spanish and portuguese "claims" to the americas, not a translation of Vergil.
-- Still waiting for the Nike endorsement
So, say you could change something cosmetic about yourself genetically for a reasonable price. For example, what if a virus were available that triggered a whole-body genetic mutation, and the end result was a change in your genetic hair color?
So this raises an interesting question that I've wanted an answer to for some time. [side note: I am not a biologist, nor do I play one on the Net, so please excuse me if this is a dumb question].
One of the much touted advantages of genetic engineering is the ability to cure genetic problems in living humans. This is distinct from altering the genetic code in cells that will go on to form a viable human foetus.
So, say I have some genetic disease caused by an unfortunate sequence in my DNA. Assume we know what replacement sequence would cure this problem. On an engineering level, how would I go about making the change in every cell in my body? This is what I would have to do, right? Is this an area where nanotechnology and genetic engineering meet? Or could genetically-modified viri really perform this task?
I assume that now we are closing in on getting detailed genetic information about humans, people are starting to think about how gene therapy might be applied in practice. Does anyone have anything they can share with us on this subject?
Sailing over the event horizon
Great, now how much longer until a public beta release?
------------
a funny comment: 1 karma
an insightful comment: 1 karma
a good old-fashioned flame: priceless
this sig limit is too small to put anything good h
Is anyone else bothered by the fact that the first group to have a complete sequencing of the human genome is a private company? If anything ought to be in the public domain, all other arguments about software, music, etc... aside, it is the human genome. After all, everybody already has their very own. Celera deserves to reap the benefits of getting there first, but only until somebody else can get there as well. If another group finishes the sequencing, they have just as much right to use it as Celera. It's not like Celera has created an original work-- they've just finished reading through the genome first.
I really hope that the HGP places this information in the public domain as soon as possible, and refrains from signing any exclusionary deals with Celera that would prevent this information from being free.
www.eFax.com are spammers
Oh give me a clone
/||\
Of my own flesh and bone
With her Y chromosome changed to X
And when she is grown
My very own clone
She will be of the opposite sex (hurray!)
Clone, clone of my own
With her Y chromosome changed to X
And when she is grown
Since her mind is my own
She'll be thinking of nothing but sex!
(written by Robert A Heinlein)
__
(oO)
Hand me that airplane glue and I'll tell you another story.
SEQUENCING means creating a complete list of the nucleotides in order. If you had this information, you could actually synthesize the entire genome of the individual. [There are some sophisticated niceties like methylation that distinguish the synthesized version from one extracted from a human, but it's essentially complete.] There are other factors (like which regulatory binding sites are actually bound, by what proteins; exact state of histone supercoiling, etc.) that control gene expression enough to keep this from being a working human genome, but it's awfully close.
MAPPING means determining distances between known genes. Using this information, you can deduce where the various genes are, the approximate location of specific unknown genes, and many other useful facts. A detailed map is a good starting place for hunting down a gene, so you can locate and sequence it; it also can tell you what traits are likely to be inherited together, etc.
A "sequence" is a complete blueprint (though there are details that aren't covered by sequence alone) A map is like a geographical map that shows where all the cities and large towns are. There are still many factories, facilities, and industrial complexes off that map -- not to mention all the roads, rivers, mountains and utility lines. ETC.
A sequence is a lot more information, and a wonderfully compact database - at 2 bits per base pair (4 possible cases), you could fit a complete human genome in under a gigabyte. (That's only one human, however.)
Naturally, even once we had the genome (or preferably a few thousand individuals, to let us get a real handle on variations), we could still spend decades or centuries figuring out what it all meant. 3x10^9 bases is a lot of info. You thought it was hard trying to trace western civilization in the first million digits of pi.
I am not a Molecular Biologist - anymore. But I was, about 10 years ago.
__________
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
Any reasonable person would define "complete" as this: there's three billion bases of human DNA in 24 different linear chromosomes. The sequence is complete when you can give me a DVD with 24 files on it, each of which contains a contiguous sequence of a human chromosome.
That may never happen for any large animal or plant genome. Too many regions of a genome sequence are an ungodly mess, repetitive and difficult to sequence.
The public worm (C. elegans) project, at 98 million bases, defined "essentially complete" as "we've come as close as we can to complete using existing technology". We have 97 million bases sequenced and about ~50-100 remaining gaps.
The fly (Drosophila melanogaster) project, at 180 million bases in size, was recently declared "substantially complete" by Celera. They have 120 million bases of sequence, with several thousand gaps. The fly has more extensive regions of repetitive sequence than the worm.
The human, at 3 billion bases in size, is nowhere near complete, either by the public (us) or by Celera, no matter what Celera press releases say.
You need the following steps to get close:
1. shotgun coverage. Technology limits us to reading ~500 bases of sequence at a time, so we have to blow the genome to bits, sequence millions of fragments, then assemble it all back (computationally) into a contiguous sequence. Because a successful assembly relies on deeply redundant overlap amongst the fragments, we need ~8-10x shotgun coverage (24 to 30 billion bases) to try to assemble the human genome. The fly genome was shotgunned to 12x coverage to achieve the results Celera reports.
2. Assembly. Once you've got shotgun data, you can try to assemble the genome from those fragments.
3. Finishing. The automated assembly (like the fly genome now) will have a great number of gaps. These must now be closed, more manually, by expert molecular biologists; the gaps represent regions that are biologically difficult to sequence.
The actual science behind the Celera press release is that they have partially completed phase 1. They currently have 4-5x shotgun coverage of the human genome, about half of what they need for a proper assembly. They intend to get the other 4-5x coverage from the public "rough draft", which is at about the same stage Celera's project is in.
The two projects (Celera and public) are neck and neck in this "race". The difference is that we acknowledge that our sequence is a rough draft at this stage; whereas Celera claims that their sequence is complete. Celera has every right to spin their project to their investors any way they feel is appropriate, but scientifically, they are being rather disingenuous if not dishonest.
conflicting oblig. disclaimers: I'm a co-PI on the public project, and I (accidentally, through an acquisition) also hold substantial stock in CRA.
Although DNA fingerprinting is mostly accurate, it is based on differences in the introns, which are highly variable. As far as exons are concerned... You've probably heard that chimps and humans are 98% or so genetically similar, and humans and hamsters are 95% genetically similar.
If you compared the genes (exons) of any 2 people, you'd find them to be 99.99999999% or more similar. The differences are very slight. What makes people unique is not the genes so much as which ones are expressed.