Slashdot Mirror


Genetic Database Hits One Billion Entries

ChocSnorfler writes to tell us that the Sanger Institute is reporting that their Genetic Record Database has hit one billion entries, making it the world's largest. From the announcement: "The Trace Archive is a store of all the sequence data produced and published by the world scientific community, including the Sanger Institute's own prodigious output as a world-leading genomics institution. To grasp how much data is in the Archive, if it were printed out as a single line of text, it would stretch around the world more than 250 times. Printing it out on pages of A4 would produce a stack of paper two-and-a-half times as high as Mount Everest. The Archive is 22 Terabytes in size and doubling every ten months."

11 of 189 comments (clear)

  1. Dubious claims by Dr.+Photo · · Score: 3, Interesting

    if it were printed out as a single line of text, it would stretch around the world more than 250 times. Printing it out on pages of A4 would produce a stack of paper two-and-a-half times as high as Mount Everest.

    Such claims should be taken with a grain of salt until they reveal what fonts and point sizes they use.

  2. How do they map their function? by bubulubugoth · · Score: 1, Interesting

    This is a real question...

    How the scientist do that?

    They wiggle this gen, and see what happens?
    How do they go for the "scientific method" of experimentation?

    --
    Â_Â
  3. Re:For God's sake, don't print it! by kahanamoku · · Score: 3, Interesting

    Printing would be an issue in itself,

    By the time you successfully print the 22TB of data, you would no doubt pass the 10 month threshold for the double sized growth. Once you start printing, you'd never stop!

    then again, a new challenge for Epson/HP etc... develop a printer that is robust enough to print a paper mount everest!

    --
    ----- Concentrate on promoting more than demoting.
  4. Re:i love meaningless data by borisborf · · Score: 3, Interesting

    Well, according to Wikipedia, It is estimated that the print holdings of the Library of Congress would, if digitized and stored as plain text, constitute 17 to 20 terabytes of information. Remember, this is without images or diagrams. Just plain text.
    So this is roughly the size of the TEXT in the library of congress.

  5. Do the math by Kickboy12 · · Score: 2, Interesting

    1 billion entries = ~22 Terabytes
    1 billion x 1,000 Bytes = ~0.9 Terabytes

    Which means, on average, your genetic code can be stored in 22KB.

    Just an interesting thought.

    1. Re:Do the math by Wabin · · Score: 3, Interesting
      except that each entry is not an individual. It is a trace from a sequencing rig, usually. Which means that it is usually 500-1000 bases of sequence (with a bunch of other info there as well... it is not just the As Ts Gs and Cs, but also sequence quality and such). The human genome is roughly 3 billion bases. So they have the equivalent of say 200x the genome of an individual. Of course, the data they have is probably much more concentrated on some areas, where they have thousands of traces, and other areas where they have very few.

      Anyway, the point is you are not about to be able to fit a genome on a floppy disk. Not even close.

      --
      Most exciting phrase in science: not "Eureka!" but "Hmm... That's funny..." -Asimov (abridged for \. limits)
  6. Re:22TB is nothing. by Endymion · · Score: 2, Interesting

    seriously... I've personally added at least that much to NCBI's archive...

    I guess it depends on what they mean by "genetic data", exactly. if they are including the traces, that's not much.

    --
    Ce n'est pas une signature automatique.
  7. Re:For God's sake, don't print it! by queenb**ch · · Score: 3, Interesting

    If it doubles every 10 months, in about 8 years we should no longer have enough hard drive space to store it.

    2 cents,

    Queen B

    --
    HDGary secures my bank :/
  8. On the other hand... by Chris+Snook · · Score: 3, Interesting

    ...the entire database would fit on just one sheet of A(-24) paper. (Yes, I actually did the math.)

    --
    There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
  9. Re:The amazing thing is how SMALL it is. by The+Step+Child · · Score: 4, Interesting

    Just as amazing is that there are only about 25,000 protein coding genes in the entire human genome (though obviously there are more proteins possible through splicing and post-translational modification, but I digress). Also amazing is the precision in which the chromosomes wind up all that DNA. Imagine taking a piece of yarn miles and miles long and compacting it into something that could fit into a paper bag - now imagine someone asking you to take out a VERY specific piece of that yarn and exposing it from your roll, disturbing the rest of the yarn as little as possible, then putting it back exactly as it was before when they're finished with it...that's basically what each chromosome has to do when genes are expressed. And it's all mediated by proteins coded in that very DNA.

  10. 2 columns by Narc · · Score: 2, Interesting

    I can't confirm this, maybe someone can tho. I had an oracle training course last year and the instructor told us she had someone from sanger working on the human genome stuff, and their database was something daft like 2 columns wide. It was used in an example to explain the intricacies of hot backups and such..

    Interesting if its true!