Slashdot Mirror


Researchers Achieve Storage Density of 2.2 Petabytes Per Gram of DNA

SternisheFan sends news of researchers who encoded an MP3, a PDF, a JPG, and a TXT file into DNA, along with another file that explains the encoding. The researchers estimate the storage density of this technique at 2.2 petabytes per gram (abstract). "We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let's break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn't allow repeats. That way, you would have to have the same error on four different fragments for it to fail – and that would be very rare," said one of the study's authors. "We've created a code that's error tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer," said another.

24 of 136 comments (clear)

  1. Memory upgrade of the future by DougOtto · · Score: 2, Funny

    Memory upgrade kits of the future could just be a razor blade and a plastic bag. Bleed your own upgrade!

    --
    Solving Unix problems since 1989...
    1. Re:Memory upgrade of the future by Anonymous Coward · · Score: 5, Funny

      So my "thumb drive" will really be my thumb?

  2. Where's the important information? by Bobfrankly1 · · Score: 3, Funny

    How fast does it spin? Whats the iops on something like that? How fast will Windows 7 boot on it?

  3. New error correction scheme? by JMZero · · Score: 5, Insightful

    That way, you would have to have the same error on four different fragments for it to fail

    I understand they wanted the overall system to be fault tolerant, but it might be better to leave that part to established computer science. I understand DNA might be uniquely prone to certain types of errors or reading problems - but there's a lot of computer science theory (and practice) established here that would likely make the overall system more robust than what looks like a fairly simple redundancy scheme.

    --
    Let's not stir that bag of worms...
  4. Call me when they can encode video... by ddxexex · · Score: 5, Funny

    I can't wait to see what happens when a video stored on DNA goes viral...

    *ducks*

    1. Re:Call me when they can encode video... by tragedy · · Score: 3, Informative

      Well, this smbc comic addresses that, except that it's stored in bacterial DNA.

  5. 0.0005% of potential storage by ShanghaiBill · · Score: 4, Informative

    Each DNA nucleotide has a molecular weight of about 150. So a gram of DNA should contain about about 6e23/150 = 4e21 bases. At two bits per base, that is 1e21 bytes. These guys are getting 2e15. So, in theory, they are getting about a half millionth of the potential storage, or 0.0005%.

    1. Re:0.0005% of potential storage by Forty+Two+Tenfold · · Score: 2
      --
      Upward mobility is a slippery slope - the higher you climb the more you show your ass.
    2. Re:0.0005% of potential storage by reverseengineer · · Score: 3, Informative

      These are artificial DNA oligos, so there shouldn't be any of those sorts of modifications. However, a figure of MW 150 per base leaves out the sugar-phosphate backbone, and doesn't account for this being double-stranded DNA. Molecular weight per base pair should be around 700 g/mol..

      Of course, that's really nitpicking, What really accounts for the low ratio of achieved versus theoretical is that they made "~1.2x10^7 copies of each DNA string."

      They go on to explain in the supplementary materials that "With the latest platform, up to 244,000 unique sequences are synthesized in parallel and delivered as ~1-10 pmol pools of oligos... In our experiment, three runs were used to synthesize 153,335 designs, leading to the higher figure of ~12-120x10^6 (= 3-30 x 10^-12 x6.02x10^23/153,335)." A more accurate assessment of their coding scheme is that they used 153335 strings of 117 nucleotides ( 17940195 total) to encode 5165800 bits of Shannon information, or about 0.29 bits per nucleotide.

      The fact they made ten million copies of each string is more of a current technical limitation of DNA oligo synthesis and automated DNA sequencing than an limit on the efficiency of the encoding itself. With the appropriate technology, you could make a few thousand copies (for appropriate error correction) instead of ten million, and your mass of DNA would be in the femtograms instead of hundreds of picograms.

      --
      "FDA staff reviewers expressed concern about the number of patients who were left out of the study because they died."
  6. Redundancy by Hatta · · Score: 5, Insightful

    It's 2.2 petabytes per gram, but only if you don't mind that it contains a billion copies of the same 2.2 megabytes. Making lots of copies of a short DNA sequence is easy. Making a whole gram of unique DNA sequences is much, much harder. What's the non-redundant storage density of this process?

    --
    Give me Classic Slashdot or give me death!
  7. Re:Latency and bandwidth? by Anonymous Coward · · Score: 5, Informative

    Huge latency and low bandwidth. From the abstract:

    DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving

  8. Re:Latency and bandwidth? by Alomex · · Score: 4, Interesting

    Not if it is for archival purposes, like Amazon storage.

  9. "very rare"? by Anonymous Coward · · Score: 2, Insightful

    How rare is "very rare"? If they have that 2.2 petabyte gram of sotrage, and "rare" means 0.0001% of the time, that's still 9 billion failures in your archived data.

  10. Re:Latency and bandwidth? by Anonymous Coward · · Score: 4, Interesting

    It's not useless. One interesting part is how long it holds up in storage. There isn't any effective storage medium available today that lasts for 10k+ years. Another is how high the information density is.

  11. Re:Please use a real unit of measure by Krazy+Kanuck · · Score: 4, Interesting

    225.28 based on the highly inaccurate assumption that the quantitative size of the library of congress is 10 terabytes.

  12. Re:Latency and bandwidth? by plover · · Score: 4, Insightful

    No, it's only useless for the specific application you're imagining, not "useless" in general. A jet airliner may be really, really fast in comparison to my car, but is useless if my task is to get to the grocery store for milk and eggs. That doesn't invalidate the usefulness of jet airliners.

    --
    John
  13. Re:Where does it all end? by cervesaebraciator · · Score: 5, Insightful

    Hard to say whether we should or shouldn't. But it's worth noting that there are at least two possible important differences between IBM's experiments and Monsanto's:

    1) Monsanto's experiments are often self replicating.

    2) IBM isn't trying to sell us MP3 files as food.

  14. Major challenge: Retrieval and storage by robbyjo · · Score: 3, Interesting

    Okay, storing is "solved". How about retrieval? Especially random access retrieval that are simple and fast (relatively speaking) that allow such storage medium to be practical? Certainly not DNA sequencing that can take weeks to complete?

    The second problem: DNA denature and fragment at room temperature. Certainly a -80C lab freezer for storage wouldn't be practical.

    Third problem: DNA secondary and tertiary structure. The coding scheme must also solves the problem of DNA tendency to make secondary structure (like hairpin) or tertiary structure (like super-coil) that can hamper reading / access to the information. I think this is the reason why the storage uses short sequences. But short DNA sequences like the one proposed (~100 bp, from the figure) could still make such structures.

    --

    --
    Error 500: Internal sig error
  15. Re:Please use a real unit of measure by Anonymous Coward · · Score: 3, Insightful

    Please wait until you sober up before posting again.

  16. Uh oh. by viperidaenz · · Score: 2

    You copied an MP3? Expect to be sued by the RIAA and their European buddies.

  17. Re:Please use a real unit of measure by shaitand · · Score: 4, Funny

    We should redefine the gram to match the amount of DNA it takes to store a LOC. Then people would have an easier time switching to metric.

  18. Re:Latency and bandwidth? by webmistressrachel · · Score: 2

    Yet surely the hard drive is less likely to be mistake for a tasty snack?

    --
    This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
  19. Re:Latency and bandwidth? by lannocc · · Score: 2

    It may very well not be cost-effective, but that's outside the scope of what I was addressing. "Too slow for any use" was just such an absolutist statement that I had to provide a counter-example.

  20. Re:Does it run exFat 1.0? by inamorty · · Score: 2

    Sorry to hear that. You would have killed it.