Slashdot Mirror


Encrypted DNA Storage Investigated by DOE Researchers (darkreading.com)

Biological engineers at a Department of Energy lab "are experimenting with encrypted DNA storage for archival applications." Slashdot reader ancientribe shares an article from Dark Reading: Using this method, the researchers could theoretically store 2.2 petabytes of information in one gram of DNA. That's 200 times the printed material at the Library of Congress... Instead of needing a 15,000 square-foot building to store 35,000 boxes of inactive records and archival documents, Sandia National Laboratories can potentially store information on much less paper, in powder form, in test tubes or petri dishes, or even as a bacterial cell... "Hard drives fail and very often the data can't be recovered," explains Bachand. "With DNA, it's possible to recover strands that are 10,000 to 20,000 years old... even if someone sneezes and the powder is lost, it's possible to recover all the information by just recovering one DNA molecule."

42 comments

  1. Mutation by Anonymous Coward · · Score: 1, Insightful

    You'd need robust error detection and correction because of mutation and damage.

    But copying seems trivial.

    1. Re:Mutation by Anne+Thwacks · · Score: 1

      LTO tape seems a better choice if all you want is long term storage.

      --
      Sent from my ASR33 using ASCII
    2. Re:Mutation by ShanghaiBill · · Score: 4, Funny

      You'd need robust error detection and correction because of mutation and damage.

      We already have that. There are a few billion years of prior art.

      But copying seems trivial.

      The hard part is writing the device driver to interface the ribosome to /dev/dna.

    3. Re:Mutation by TFAFalcon · · Score: 3, Insightful

      I think mutation isn't really that much of an issue if the DNA isn't actually doing anything (being duplicated or transcribed to RNA).

      It's supposed to be one of the more stable ways of storing data, much better than tape in fact. What I'd be more worried about is reading it again - current ways of reading DNA can misread it and have problems with long sequences of the same base pair, so some kind of an encoding to avoid those would be needed.

    4. Re:Mutation by Theaetetus · · Score: 1

      But copying seems trivial.

      The hard part is writing the device driver to interface the ribosome to /dev/dna.

      Will we rephrase Darwin Awards as storing to /dev/null?

    5. Re:Mutation by Anonymous Coward · · Score: 0

      The linked article does not mention tape storage but a very vague comparison "can stay intact ... approximately 10 times longer than storage media used in the past like floppy discs or CD".

      I claim that I'm much better at reading than you in fact.

    6. Re:Mutation by Anonymous Coward · · Score: 0

      1) Forward error correction helps. Even a CD-ROM/DVD/BluRay wouldn't work vaguely as well as it does without it.

      2) The sequences with long stretches of the same base pair are quite rare, so you don't lose that much by avoiding them: using each base pair to code for 2 bits, you'd need a 4-mer to encode 8 bits; if you avoid the AAAA, AAA*, *AAA, TTTT, TTT*, *TTT, etc. combinations, you'll basically lose less than 5% of your encoding space and eliminate any stretch of base pairs larger than 2. If you only need to eliminate stretches larger than 3, you'll only lose 1,5% of your encoding space. No biggie.

      3) You're not going to store a *single copy* of your DNA; you're always going to store millions, billions, trillions of copies, and stored in different aliquots, in different physical locations. Having redundant backups works the same with magnetic or chemical storage. So, the same way RAM, SSDs and plain-old hard drives have write/read error rates and failure rates that have to be accounted for and "engineered around", DNA storage would have to work the same. DNA read errors can be estimated, so then you just have to maintain enough redundancy to "ensure" data recovery.

      The question is not whether it can be made as good and as reliable (or even better) than magnetic storage: the question is if the benefits outweigh the costs of developing and applying such technology.

    7. Re:Mutation by TFAFalcon · · Score: 1

      It claims DNA can remain stable for more than 500 years. And the life expectancy of tape only appears to be 30 years or so (found from other sources).

      What I find surprising is that (printed)CDs don't have a much longer lifespan, but it seems they are prone to corrode.

    8. Re:Mutation by Anonymous Coward · · Score: 0

      it isn't mutation, it is spontaneous chemical degradation - ozone, hydrogen peroxide, oxygen radical, stuff like that causes DNA to slowly degrade
      In your body, in every cell, every minute, there are dozens of DNA repair enzymes repairing damage that occurs every minute; look up 8 oxo Guanine if you don't believe me
      Dry storage helps, but drying make long DNA very hard to resuspend

      here
      http://www.ncbi.nlm.nih.gov/pubmed/20399712
      The estimated number of chemical events causing just 8oxo Guanine, out of hundreds of possible chemical alterations, is about 1,000 per cell per day
      we think DNA is stable; this is partially an illusion because in all living organisms there are very robust error correction mechanisms

    9. Re:Mutation by healyp · · Score: 1

      I read a similar article in ACM Communications. In order to combat the misreads I believe they were suggesting using base-3 with parity encoding.

  2. And nobody around to honor the warranty by Anonymous Coward · · Score: 0

    OTOH they are the DOE so maybe they can sustain a 20000 year project?

  3. Oh, great... by Anonymous Coward · · Score: 0

    DMCA takedown notice for getting infected with a Kanye West "Music" Bacteria...

  4. The sheer scale of it by dhaen · · Score: 2

    I deal in archiving film and video by the petabyte. At a storage symposium a couple of years ago I met my equivalent in the DNA research sphere, his data requirements blew me away. And all encoded in my cells.

    1. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      Fortunately, the design document isn't available.

      Too much tech industry disruption, having something that advanced.

      Probably we'd have to start with... a simplified allegorical presentation, or something...

    2. Re:The sheer scale of it by ShanghaiBill · · Score: 1

      At a storage symposium a couple of years ago I met my equivalent in the DNA research sphere, his data requirements blew me away. And all encoded in my cells.

      If he is storing human DNA data, he is doing something wrong. A human has about 4 billion base pairs, which are roughly 2 bits each, so that is 500 MB. You could fit that on a CDROM with room to spare. But humans share 99% of their DNA, so you would really only need to store the diffs. 1% is 5 MB. But even that is overkill, since humans don't differ from each other randomly, but in common sequences where you have either one sequence or another across wide segments of the population. So a human's DNA can probably be stored in less than a megabyte per person, and with good compression maybe far less.

    3. Re:The sheer scale of it by dhaen · · Score: 1

      l Take your point though I'd ague that 700MB is closer. Also that both he and I agreed that bit-rot made compression moot, What I don't remember _ it was 5 years ago - was how many records he mentioned.

    4. Re:The sheer scale of it by dhaen · · Score: 1

      OK it was between 2 and 5 years ago...

    5. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      Nobody said anything about human. This DNA would not make an animal or plant. It is simply data encoded in a DNA molecule. Make the string as long or short as required. Up to some theoretical maximum per strand, whatever that may be, but it has nothing to do with humans.

    6. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      Perhaps you should start your post from a simplified allegorical foundation, or something. Because although I think I know what you might be trying to say, your rambling is nigh incomprehensible.

    7. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      How many cells do you have in your body? Do they all have identical DNA? Until you can answer these questions, try to keep dumb comments to yourself.

    8. Re:The sheer scale of it by SNRatio · · Score: 2
      Sequencing DNA these days means creating a library of millions short segments (100-300 bp) of DNA from your sample, and then assembling the data into longer fragments by finding the segments that overlap and stringing them together. To sequence 4 billion base pairs they actually read about 120 billion base pairs (multiple reads are needed to eliminate errors, generate overlaps, etc). And that raw data is not 2 bits per base: it's an intensity level from the machine and a probability score that the algorithm has called the correct base for that position in the image, plus all of the associated indexing. About 40 bits per "base" for Illumina sequencing. Illumina X-10 sequencers can generate ~10 petabytes of data per year - each.

      The final archived data, what you might use for clinical purposes, could indeed be a diff file more or less. But in the meantime the world was generating 1 zettabyte of DNA sequencing data per year in 2015, the rate doubles every ~7 months.

      http://www.ncbi.nlm.nih.gov/pm...

    9. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      That's okay. The maximum of you possibly retained being your DNA pattern, leaves your comment at the appropriate level of irrelevancy.

    10. Re:The sheer scale of it by Anonymous Coward · · Score: 0

      Multiply it by infinity, and take it to the depth of forever, and you will still have barely a glimpse of what I'm talking about

    11. Re: The sheer scale of it by Anonymous Coward · · Score: 0

      Hey. All of this has happened before. Just ask the Cylons.

    12. Re:The sheer scale of it by RDW · · Score: 1

      A human has about 4 billion base pairs, which are roughly 2 bits each, so that is 500 MB. You could fit that on a CDROM with room to spare. But humans share 99% of their DNA, so you would really only need to store the diffs. 1% is 5 MB.

      A copy of the (haploid) reference genome encoded as 2 bits per base pair comes in at about 800MB:

      http://hgdownload.soe.ucsc.edu...

      Run that through something like Z-zip and you can store it in less that 640MB, so it will indeed fit on a CD. Each of us has a diploid genome, though (a copy from each parent), so you really need to store double that if you take no account of the high level of similarity between both copies. If we assume a known reference genome, however, the 'diffs' are as you suggest very small - one paper reports compression down to 4MB, small enough to email:

      http://www.ncbi.nlm.nih.gov/pu...

      Lots of analyses are done with lists of variants with respect to a reference genome, but the raw data generally comes from 'next generation sequencing' platforms, where every base needs to be sequenced many times over before bases can be called confidently, and quality scores of base calls need to be stored. The raw data usually needs to be kept since alignment and variant calling algorithms are improving all the time. Storage requirements are something like 80GB+ compressed.

  5. Harry Harrison had it decades ago by msk · · Score: 1

    see subject

  6. "Recover all data by recovering a single molecule" by Anonymous Coward · · Score: 0

    Sounds suspiciously like xor compression.

  7. But we'll ban encryption by Anonymous Coward · · Score: 0

    So then what?

  8. DNA storage capacity seems to be wildly overstated by ffkom · · Score: 0

    Whenever the press covers the "data storage in DNA"-topic, they boast about huge storage capacities based he assumption that you can basically store 2 bits per base pair. But DNA has not quite evolved to be a long-term mass-storage device. DNA is rather an energy-efficient way to store relatively small amounts of data (~0.8 GB of very redundant data in a human) that exists in so many copies (billions in a human) that it doesn't matter too much if millions of those billions of copies suffer some "bit rot" over time, and also the DNA storage needs a living organism around it to sustain constantly ongoing activities to repair or sort out damaged data. Also, DNA is meant to be variable over time, as mutation is important for ongoing success of a species.

    I don't think that DNA based storage will ever beat simple, anorganic storage in terms of providing reliable long-term mass storage. It's just not optimized for that purpose.

  9. Re:DNA storage capacity seems to be wildly oversta by Anonymous Coward · · Score: 0

    It's kind of like SOLAR ROADWAYS. SOLAR FREAKING ROADWAYS. (Maybe also flying cars.) All Bullshit ideas if you're an engineer.

  10. Re:DNA storage capacity seems to be wildly oversta by jcochran · · Score: 1

    True enough. Although looking at the figures given in the summary, there's one hell of a lot of redundancy in their 2.2 petabyte/gram estimate. Looking up the molecular masses of the base pairs plus the sugar chain to make up a DNA molecule and assuming 2 bits per base pair, I get approximately 160,000 petabytes per gram of material (no redundancy), so the estimate given in the summary has a redundancy factor of about 73,000.

  11. Too long to handle? by Razed+By+TV · · Score: 1

    The internet tells me that human genome weighs 3.59 x 10^-12 grams.
    1 gram of dna * 1 complete strand of dna / (3.59 x 10^-12 grams) = 278 x 10^9 strands = 278,000,000,000 strands of dna.

    Length of human dna stretched out: about 2 meters
    (278 x 10^9 strands) * (2 meters / strand) = 554 x 10^9 meters

    I can't conceive of how you can organize that in order to read it.

    Then again, I don't know the length of a blu-ray, if you could unravel it and stretch it out straight. Or that of a record.

    1. Re:Too long to handle? by Anonymous Coward · · Score: 0

      Well, since a blu-ray has an areal density of about 12.5 Gb per square inch and contains about 25 GB of data, it's a simple matter of doing the math. Going to assume the lineal density is the square root of the areal density. Then do a bit of simple multiplication and division.The end result is about 28.25 miles ... or to put into metric 45.5 km.

  12. Store it. Do nothing about it. Die. Go to Hell. by Anonymous Coward · · Score: 0

    Not useful data to people.

  13. Already done by TeknoHog · · Score: 1

    It's called "junk DNA".

    --
    Escher was the first MC and Giger invented the HR department.
  14. Great by Anonymous Coward · · Score: 0

    Now everyone is potentially possessing child porn or terrorist documents.

  15. Quartz Glass by simpz · · Score: 3, Informative

    Is the potential of Quartz Glass Storage for archive not better http://themindunleashed.org/2014/02/data-storage-crystal-quartz-will-change-everything.html Stable for longer won't get eaten by bacteria

  16. DNA degrades after just a few years by Tony+Isaac · · Score: 2

    I work for a DNA lab. After about 10 years, DNA samples that have been sent to us are basically unusable because they degrade over time. Sure, it might be possible to still read some strands of the remaining DNA, but significant percentages are lost. DNA archaeologists don't mind, because they are looking for whatever fragments they can still read. But if they required most of the DNA to be readable after long periods of time, they would be out of luck.

    1. Re:DNA degrades after just a few years by Anonymous Coward · · Score: 1

      Reading around a bit I think you must be receiving DNA from live samples? Every article on this subject refers to various materials in organisms that will be mixed with samples that will cause DNA to degrade.

      You do however read about ideal conditions. Those would be the conditions these DNA data storage schemes are talking about. The DNA is synthesized and the end product is just the DNA.

      Also, the lengths involved aren't going to be huge in the schemes I've read about. You're going to have lots of short bits of maybe 200bp. Part of each bit is going to be the "address" and all these schemes involve error correction methods.

  17. DNA reading techniques require massive redundancy by Tony+Isaac · · Score: 1

    Today's DNA reading techniques begin with PCR, a process that multiplies small amounts of DNA so that millions of copies are made. These copies are needed to be accurately read by the equipment, in order to distinguish between "good" copies and noise. Getting the results amounts to statistical analysis of the number of A, T, C, or G results read at a certain location; a "call" can be made only if a high enough percentage of the results agree.

    The bit density claims are massively overstated, and reading the data would not be trivial!

  18. doesn't anyone know the error rate in sequencing ? by Anonymous Coward · · Score: 0

    to recover the information, you have to sequence the DNA
    That is, determine the physical order of A,T,C,G
    The error rates are high with current technology
    and they are a LOT Higher if you start with one molecule of DNA

    When people talk about recovering 10,000 year old DNA, they are talking 50% -80% recovery
    that really what you want for storage of financial records ?

    I assert that anyone using DNA for industrial purposes is stupid or a grant whore

    and yes, I know about illumina, PACBIO, Oxford nanopore, etc
    I know about WGA error rates, and deamination and all that good stuff

  19. 10,000 times slower/costly to write than read DNA by peter303 · · Score: 1

    Mainly because scientists have focused on reading and invented clever technologies to do so. The guy who made the reading breakthrough, Craig Venter, is also a writing pioneer in his synthetic biology work. Earlier this year there was a secret meeting ran by a Harvard prof to launch the DNA-WRITE project to improve write technology. The meeting was secret because it was feared that anti-GMO groups and general Frankenstein fear might quash the efforts prematurely. P.S. Some computer write memory technologies are also much slower than reading.