Slashdot Mirror


Celera Opens Up DNA Database

greenplato writes "Thirty billion base pairs from the sequences of humans, mice, and rats that were available only by subscription to Celera's DNA database are being put into the public domain. Celera will donate this information to a 'federally run database,' presumably GenBank. Francis Collins, head of the National Human Genome Research Institute, notes that 'data just wants to be public.' Stories in BusinessWeek and The New York Times."

3 of 181 comments (clear)

  1. Re:Free data - or unable to sell it? by the+gnat · · Score: 4, Informative

    Secondly, there is the free/open culture within universities that almost punishes commercial ventures

    I would not have stated it that way. The real reason is that academics hate to leave anything unpublished. If they're constrained by copyright law or some NDA, they can't tell everyone about the fabulous new work they've been doing - or at the very least, it becomes much more difficult.

    I worked in bioinformatics at a university for several years, and much of what we did was take existing databases and analyze them, then publish the results online as our own database of annotations. As part of this, we reproduced much of the original database in modified form - and all we had to do was cite the original authors and describe our methods/sources. If the databases we used had not been public, none of these projects would have happened. In some cases, we had to ignore private databases that we had limited access to because we were not allowed to reproduce any of their data.

    This is only cultural to the extent that academia thrives on publications. We're not out to punish anyone from trying to make an honest buck (lots of people here collaborate with or consult for companies), but we literally can't afford, professionally, to limit ourselves in accordance with restrictions on databases. So why pay money for something we can't legally use in the manner to which we're accustomed?

  2. It's already free by jezmund · · Score: 4, Informative

    Genomes are available at http://www.ensembl.org/ . I know I've said this before, but I feel it can't be overemphasized. Ensembl is so incredibly cool. I imagine Celera is releasing their data because no one wants to pay for it when Ensembl has it for free. Additionally, Ensembl has tools that provide so much more than just genome sequence-scanning. And they use open source projects like BioPerl and use Wiki for documentation! I think this is just a PR stunt for Celera.

    --

    "fist in the air in the land of hypocrisy"
  3. Re:In case it gets slashdotted.... by jcomand · · Score: 4, Informative

    Good guess, but only part of that sequence is actually in the human genome, in chromosome 20 (with one error):
    Query: 103 catcagctactatgtagctacgatc 127
    Sbjct: 84163 catcagctactttgtagctacgatc 84187
    The quality of match is rated at E=0.65, which means that you would expect to find a match this good by chance 65% of the time. (E value will change slightly if you search different databases.)
    Try searching for the sequence yourself here under Nucleotide-nucleotide BLAST (blastn)

    If you want to see the real thing, you can browse one version of the "real" human genome here. If you click on the blue chromosome 1, and then "Download/View Sequence/Evidence", then "display", you can see the repeating "telomere" sequence at the beginning of chromosome 1.