Genetic Database Hits One Billion Entries
ChocSnorfler writes to tell us that the Sanger Institute is reporting that their Genetic Record Database has hit one billion entries, making it the world's largest. From the announcement: "The Trace Archive is a store of all the sequence data produced and published by the world scientific community, including the Sanger Institute's own prodigious output as a world-leading genomics institution. To grasp how much data is in the Archive, if it were printed out as a single line of text, it would stretch around the world more than 250 times. Printing it out on pages of A4 would produce a stack of paper two-and-a-half times as high as Mount Everest. The Archive is 22 Terabytes in size and doubling every ten months."
if it were printed out as a single line of text, it would stretch around the world more than 250 times. Printing it out on pages of A4 would produce a stack of paper two-and-a-half times as high as Mount Everest.
Such claims should be taken with a grain of salt until they reveal what fonts and point sizes they use.
This is a real question...
How the scientist do that?
They wiggle this gen, and see what happens?
How do they go for the "scientific method" of experimentation?
Â_Â
Printing would be an issue in itself,
By the time you successfully print the 22TB of data, you would no doubt pass the 10 month threshold for the double sized growth. Once you start printing, you'd never stop!
then again, a new challenge for Epson/HP etc... develop a printer that is robust enough to print a paper mount everest!
----- Concentrate on promoting more than demoting.
Well, according to Wikipedia, It is estimated that the print holdings of the Library of Congress would, if digitized and stored as plain text, constitute 17 to 20 terabytes of information. Remember, this is without images or diagrams. Just plain text.
So this is roughly the size of the TEXT in the library of congress.
1 billion entries = ~22 Terabytes
1 billion x 1,000 Bytes = ~0.9 Terabytes
Which means, on average, your genetic code can be stored in 22KB.
Just an interesting thought.
google.slashdot
seriously... I've personally added at least that much to NCBI's archive...
I guess it depends on what they mean by "genetic data", exactly. if they are including the traces, that's not much.
Ce n'est pas une signature automatique.
If it doubles every 10 months, in about 8 years we should no longer have enough hard drive space to store it.
2 cents,
Queen B
HDGary secures my bank
...the entire database would fit on just one sheet of A(-24) paper. (Yes, I actually did the math.)
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Just as amazing is that there are only about 25,000 protein coding genes in the entire human genome (though obviously there are more proteins possible through splicing and post-translational modification, but I digress). Also amazing is the precision in which the chromosomes wind up all that DNA. Imagine taking a piece of yarn miles and miles long and compacting it into something that could fit into a paper bag - now imagine someone asking you to take out a VERY specific piece of that yarn and exposing it from your roll, disturbing the rest of the yarn as little as possible, then putting it back exactly as it was before when they're finished with it...that's basically what each chromosome has to do when genes are expressed. And it's all mediated by proteins coded in that very DNA.
I can't confirm this, maybe someone can tho. I had an oracle training course last year and the instructor told us she had someone from sanger working on the human genome stuff, and their database was something daft like 2 columns wide. It was used in an example to explain the intricacies of hot backups and such..
Interesting if its true!