Celera Opens Up DNA Database
greenplato writes "Thirty billion base pairs from the sequences of humans, mice, and rats that were available only by subscription to Celera's DNA database are being put into the public domain. Celera will donate this information to a 'federally run database,' presumably GenBank. Francis Collins, head of the National Human Genome Research Institute, notes that 'data just wants to be public.' Stories in BusinessWeek and The New York Times."
Celera is pretty evil as a employer. At one time the company had an insane stock evaluation. They realised that the genome database profits will end soon and the "synergies" with its own drug research will not happen. So they fired the genome people and used the stock proceeds to buy up biologic instrument companies and some small biotech companies. Making instruments and biology tools is what produces any income for them.
I worked for a small biotech company that became a part of Celera. They are doing a good researchbut the high management is rotten. I was not there before Celera took over but my understanding is that the new management made all the changes for worse. Now the bulshit there is deeper than ice in Antarctica.
I doubt that we will ever figure out - and I suspect that even if we did figure out we couldn't do much about it
In a word, no.
You can't generally patent "found" sequences. You have to create or assemble something novel. The raw sequence of the human genome is not patentable. Inserting novel or transgenic genes into the human genome might be, but that's still science fiction.
Secondly, there is the free/open culture within universities that almost punishes commercial ventures
I would not have stated it that way. The real reason is that academics hate to leave anything unpublished. If they're constrained by copyright law or some NDA, they can't tell everyone about the fabulous new work they've been doing - or at the very least, it becomes much more difficult.
I worked in bioinformatics at a university for several years, and much of what we did was take existing databases and analyze them, then publish the results online as our own database of annotations. As part of this, we reproduced much of the original database in modified form - and all we had to do was cite the original authors and describe our methods/sources. If the databases we used had not been public, none of these projects would have happened. In some cases, we had to ignore private databases that we had limited access to because we were not allowed to reproduce any of their data.
This is only cultural to the extent that academia thrives on publications. We're not out to punish anyone from trying to make an honest buck (lots of people here collaborate with or consult for companies), but we literally can't afford, professionally, to limit ourselves in accordance with restrictions on databases. So why pay money for something we can't legally use in the manner to which we're accustomed?
Genomes are available at http://www.ensembl.org/ . I know I've said this before, but I feel it can't be overemphasized. Ensembl is so incredibly cool. I imagine Celera is releasing their data because no one wants to pay for it when Ensembl has it for free. Additionally, Ensembl has tools that provide so much more than just genome sequence-scanning. And they use open source projects like BioPerl and use Wiki for documentation! I think this is just a PR stunt for Celera.
"fist in the air in the land of hypocrisy"
They did swear under
oath
that they would release they data without restrictions.
They also told congress (under oath) that their strategy
would end speculative patenting of the human
genome, whereas infact they've applied
for thousands and thousands of speculative
patents.
Shame on them.
Good guess, but only part of that sequence is actually in the human genome, in chromosome 20 (with one error):
Query: 103 catcagctactatgtagctacgatc 127
Sbjct: 84163 catcagctactttgtagctacgatc 84187
The quality of match is rated at E=0.65, which means that you would expect to find a match this good by chance 65% of the time. (E value will change slightly if you search different databases.)
Try searching for the sequence yourself here under Nucleotide-nucleotide BLAST (blastn)
If you want to see the real thing, you can browse one version of the "real" human genome here. If you click on the blue chromosome 1, and then "Download/View Sequence/Evidence", then "display", you can see the repeating "telomere" sequence at the beginning of chromosome 1.
Both sides had a difficult time assembling the sequence. Celera's data was of higher quality because their method provided for better coverage AND they could use the public data to clear up any ambiguities.