Slashdot Mirror


Researchers Achieve Storage Density of 2.2 Petabytes Per Gram of DNA

SternisheFan sends news of researchers who encoded an MP3, a PDF, a JPG, and a TXT file into DNA, along with another file that explains the encoding. The researchers estimate the storage density of this technique at 2.2 petabytes per gram (abstract). "We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let's break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn't allow repeats. That way, you would have to have the same error on four different fragments for it to fail – and that would be very rare," said one of the study's authors. "We've created a code that's error tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer," said another.

85 of 136 comments (clear)

  1. Please use a real unit of measure by Anonymous Coward · · Score: 1

    How many Libraries of Congress is that?

    1. Re:Please use a real unit of measure by Krazy+Kanuck · · Score: 4, Interesting

      225.28 based on the highly inaccurate assumption that the quantitative size of the library of congress is 10 terabytes.

    2. Re:Please use a real unit of measure by Anonymous Coward · · Score: 3, Insightful

      Please wait until you sober up before posting again.

    3. Re:Please use a real unit of measure by Luckyo · · Score: 1

      1/225.28 grams.

    4. Re:Please use a real unit of measure by shaitand · · Score: 4, Funny

      We should redefine the gram to match the amount of DNA it takes to store a LOC. Then people would have an easier time switching to metric.

    5. Re:Please use a real unit of measure by WhackAttack · · Score: 1

      Yea...Seriously.

    6. Re:Please use a real unit of measure by miserere+nobis · · Score: 1

      Likely enough LoC's to fill several Manhattans.

    7. Re:Please use a real unit of measure by grumpy_old_grandpa · · Score: 1

      This is why I will always come back to Slashdot. There will always be some guy, like yourself, delivering the most absurd, but at the same time spot on comment, managing to combine wit, insight, technical details, sarcasm, cynicism, and reality into a beautifully constructed one-liner (which, as you can see, I am not capable of), and make me laugh of loud! -- Congratulations! Keep 'em coming!

  2. Latency and bandwidth? by loufoque · · Score: 1, Insightful

    It's useless unless it's reasonably fast.

    1. Re:Latency and bandwidth? by Anonymous Coward · · Score: 5, Informative

      Huge latency and low bandwidth. From the abstract:

      DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving

    2. Re:Latency and bandwidth? by Alomex · · Score: 4, Interesting

      Not if it is for archival purposes, like Amazon storage.

    3. Re:Latency and bandwidth? by Phrogman · · Score: 1

      Until someone mistakes it for a snack and pops it into the microwave :P

      --
      "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
    4. Re:Latency and bandwidth? by Anonymous Coward · · Score: 4, Interesting

      It's not useless. One interesting part is how long it holds up in storage. There isn't any effective storage medium available today that lasts for 10k+ years. Another is how high the information density is.

    5. Re:Latency and bandwidth? by plover · · Score: 4, Insightful

      No, it's only useless for the specific application you're imagining, not "useless" in general. A jet airliner may be really, really fast in comparison to my car, but is useless if my task is to get to the grocery store for milk and eggs. That doesn't invalidate the usefulness of jet airliners.

      --
      John
    6. Re:Latency and bandwidth? by Forty+Two+Tenfold · · Score: 1

      I have an infinite backup storage. I save all my compies in /dev/null

      --
      Upward mobility is a slippery slope - the higher you climb the more you show your ass.
    7. Re:Latency and bandwidth? by suutar · · Score: 1

      true, but actual numbers describing "reasonably fast" are domain-dependent. I doubt it will ever be fast enough to use as RAM, much less cache, but I could see it being an alternative to tape at some point.

    8. Re:Latency and bandwidth? by LordLimecat · · Score: 1

      I think a hard drive would fare at least as bad in that scenario.

    9. Re:Latency and bandwidth? by loufoque · · Score: 1

      If it takes 1 day per byte, then sorry, it's too slow for any use.

    10. Re:Latency and bandwidth? by Dekker3D · · Score: 1

      The latency must be horrible though!

    11. Re:Latency and bandwidth? by modmans2ndcoming · · Score: 1

      Really? a very large storage medium that does not degrade (theoretically) for 10K years....I see no use here....you are right.

    12. Re:Latency and bandwidth? by loufoque · · Score: 1

      If it takes more than 10k years to actually write stuff to it, surely you can see the problem?

    13. Re:Latency and bandwidth? by lannocc · · Score: 1

      If it takes 1 day per byte, then sorry, it's too slow for any use.

      Not quite. You could, for example, store a daily temperature reading in one byte per day.

    14. Re:Latency and bandwidth? by loufoque · · Score: 1

      And how is that cost-effective compared to other more flexible solutions?

    15. Re:Latency and bandwidth? by imikem · · Score: 1

      Write performance is awesome. I suppose I'd better check read performance again. ... ... ...
      Hmm, that disk array doesn't seem to help much here. I want a refund!!

      --
      Perscriptio in manibus tabellariorum est.
    16. Re:Latency and bandwidth? by webmistressrachel · · Score: 2

      Yet surely the hard drive is less likely to be mistake for a tasty snack?

      --
      This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
    17. Re:Latency and bandwidth? by lannocc · · Score: 2

      It may very well not be cost-effective, but that's outside the scope of what I was addressing. "Too slow for any use" was just such an absolutist statement that I had to provide a counter-example.

    18. Re:Latency and bandwidth? by ZombieThoughts · · Score: 1

      If I used raid and built a giant striped array weighing in at only 1 million grams (1000 kg or 2200 lb, less than your average car)...

      2.2 million petabytes = 2.2 zettabytes

      With a write speed of 1 MiB per day. (I hate it when they started doing the MiB, no more powers of 2)

      Think of what we used to put on 5.25in floppies at 360k per disk.

    19. Re:Latency and bandwidth? by schneidafunk · · Score: 1

      This article claims a storage device made from sapphire & platinum that will last 10 million years, although I do not see where they come up with that estimate.

      --
      Some people die at 25 and aren't buried until 75. -Benjamin Franklin
    20. Re:Latency and bandwidth? by modmans2ndcoming · · Score: 1

      Don't confuse the tech used to read and write the data with the tech to store the data....the benefits of the storage medium are promising enough mean we should invest in the research needed to read and write to them efficiently.

  3. Memory upgrade of the future by DougOtto · · Score: 2, Funny

    Memory upgrade kits of the future could just be a razor blade and a plastic bag. Bleed your own upgrade!

    --
    Solving Unix problems since 1989...
    1. Re:Memory upgrade of the future by Anonymous Coward · · Score: 1

      "Hold on, mum, the internet hasn't quite finished downloading into my hair yet"

      Oh yeah, I can't wait :)

    2. Re:Memory upgrade of the future by Anonymous Coward · · Score: 5, Funny

      So my "thumb drive" will really be my thumb?

    3. Re:Memory upgrade of the future by ZombieThoughts · · Score: 1

      5 blades used in an array.

    4. Re:Memory upgrade of the future by RivenAleem · · Score: 1

      Check out the character QiRia in "The Hydrogen Sonata" by Iain Banks. The character is 10,000 years old and has converted much of his body into additional storage for memories.

  4. Where's the important information? by Bobfrankly1 · · Score: 3, Funny

    How fast does it spin? Whats the iops on something like that? How fast will Windows 7 boot on it?

    1. Re:Where's the important information? by SternisheFan · · Score: 1

      "Soylent Green" hard drives...

    2. Re:Where's the important information? by idontgno · · Score: 1

      "It's people. WD Green is made out of people."

      If anyone from Western Digital or MGM/UA is listening, it's PARODY. Thank you.

      --
      Welcome to the Panopticon. Used to be a prison, now it's your home.
  5. New error correction scheme? by JMZero · · Score: 5, Insightful

    That way, you would have to have the same error on four different fragments for it to fail

    I understand they wanted the overall system to be fault tolerant, but it might be better to leave that part to established computer science. I understand DNA might be uniquely prone to certain types of errors or reading problems - but there's a lot of computer science theory (and practice) established here that would likely make the overall system more robust than what looks like a fairly simple redundancy scheme.

    --
    Let's not stir that bag of worms...
    1. Re:New error correction scheme? by vlm · · Score: 1

      same error on four different fragments for it to fail

      swap usenet article for dna fragment and right there, they've done a crappy job of reinventing the PAR2 file.
      There's probably some analogies from the tape sort/merge era although that's slightly before my time.

      SSSS shamirs (aka the S in RSA) secret sharing system just tell it how many slices you want, and how many slices you need present and error free to decrypt, and you're done. Using it for redundancy in this case rather than security.

      ECC is a pretty well worn path in CS.

      A real hack would be writing DNA that expresses a protein which folds itself into a really little QR code. Those (can) have quite a bit of ECC Now that would be completely useless, yet impressive.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  6. Call me when they can encode video... by ddxexex · · Score: 5, Funny

    I can't wait to see what happens when a video stored on DNA goes viral...

    *ducks*

    1. Re:Call me when they can encode video... by tragedy · · Score: 3, Informative

      Well, this smbc comic addresses that, except that it's stored in bacterial DNA.

    2. Re:Call me when they can encode video... by Anonymous Coward · · Score: 1

      *ducks*

      *geese*

  7. 0.0005% of potential storage by ShanghaiBill · · Score: 4, Informative

    Each DNA nucleotide has a molecular weight of about 150. So a gram of DNA should contain about about 6e23/150 = 4e21 bases. At two bits per base, that is 1e21 bytes. These guys are getting 2e15. So, in theory, they are getting about a half millionth of the potential storage, or 0.0005%.

    1. Re:0.0005% of potential storage by Forty+Two+Tenfold · · Score: 2
      --
      Upward mobility is a slippery slope - the higher you climb the more you show your ass.
    2. Re:0.0005% of potential storage by Forty+Two+Tenfold · · Score: 1
      --
      Upward mobility is a slippery slope - the higher you climb the more you show your ass.
    3. Re:0.0005% of potential storage by reverseengineer · · Score: 3, Informative

      These are artificial DNA oligos, so there shouldn't be any of those sorts of modifications. However, a figure of MW 150 per base leaves out the sugar-phosphate backbone, and doesn't account for this being double-stranded DNA. Molecular weight per base pair should be around 700 g/mol..

      Of course, that's really nitpicking, What really accounts for the low ratio of achieved versus theoretical is that they made "~1.2x10^7 copies of each DNA string."

      They go on to explain in the supplementary materials that "With the latest platform, up to 244,000 unique sequences are synthesized in parallel and delivered as ~1-10 pmol pools of oligos... In our experiment, three runs were used to synthesize 153,335 designs, leading to the higher figure of ~12-120x10^6 (= 3-30 x 10^-12 x6.02x10^23/153,335)." A more accurate assessment of their coding scheme is that they used 153335 strings of 117 nucleotides ( 17940195 total) to encode 5165800 bits of Shannon information, or about 0.29 bits per nucleotide.

      The fact they made ten million copies of each string is more of a current technical limitation of DNA oligo synthesis and automated DNA sequencing than an limit on the efficiency of the encoding itself. With the appropriate technology, you could make a few thousand copies (for appropriate error correction) instead of ten million, and your mass of DNA would be in the femtograms instead of hundreds of picograms.

      --
      "FDA staff reviewers expressed concern about the number of patients who were left out of the study because they died."
  8. Where does it all end? by Dan+Hayes · · Score: 1

    This seems like an amazing development, but just today we've had a story about Monsanto and how well their error correction is going despite haivng the best in Western thinking availalble to them. Why should we trust that IBM's procedures are any better?

    1. Re:Where does it all end? by cervesaebraciator · · Score: 5, Insightful

      Hard to say whether we should or shouldn't. But it's worth noting that there are at least two possible important differences between IBM's experiments and Monsanto's:

      1) Monsanto's experiments are often self replicating.

      2) IBM isn't trying to sell us MP3 files as food.

    2. Re:Where does it all end? by SkimTony · · Score: 1

      Okay, but imagine if they did encode MP3 files as food. And then people started sharing that (self replicating?) data as food.

      Just think: Coming soon to a courthouse in East Texas: Monsanto vs. the RIAA...

    3. Re:Where does it all end? by MLBs · · Score: 1

      We've had that back in the seventies

  9. Redundancy by Hatta · · Score: 5, Insightful

    It's 2.2 petabytes per gram, but only if you don't mind that it contains a billion copies of the same 2.2 megabytes. Making lots of copies of a short DNA sequence is easy. Making a whole gram of unique DNA sequences is much, much harder. What's the non-redundant storage density of this process?

    --
    Give me Classic Slashdot or give me death!
    1. Re:Redundancy by butalearner · · Score: 1

      Phew! Until you pointed this out, I was worried that we were all walking around with over a hundred petabytes of random data in our bodies (assuming approximately 50 grams of DNA per person). If that were the case there would be a pretty solid chance that, with the right decoder, we're all infringing on somebody's copyright. Thank evolution we're running RAID 1000000000 instead.

    2. Re:Redundancy by Hatta · · Score: 1

      It's perfectly acceptable to store multiple copies of the same data. You just have to divide your quoted storage density by the number of copies. You don't say a RAID1 array made of 2 3TB drives is a 6TB array, and you shouldn't say this is 2.2PB/gram either.

      --
      Give me Classic Slashdot or give me death!
    3. Re:Redundancy by LordLimecat · · Score: 1

      Something tells me you dont understand how RAID levels are designated.

      Hint: Noone in their right mind would run something named "RAID 1000000000", unless they didnt care in the least whether their data was retrievable.

      Hint 2: It has an array failure rate of ~14% over 3 years, assuming standard drive failure rate of 5% over 3 years. ( 1 - ( 0.95 ) ^ 9 ) ^ 2

    4. Re:Redundancy by LordLimecat · · Score: 1

      Correction: Was assuming a 9-disk RAID 0. I think the actual failure rate for 9-levels of nested RAID0, RAID1'd, would be
      99.99999999921383102043346279157%.

    5. Re:Redundancy by Pawnn · · Score: 1

      Whoa, I just got a cease and desist order from Adam and Eve saying they're being represented by a Mr. Serpent...

    6. Re:Redundancy by butalearner · · Score: 1

      Meh, RAID 1 with a billion mirrors didn't seem as funny to me...I didn't think using more than two numbers made sense anyway. I should have taken my audience into consideration, though.

  10. Or maybe not by JMZero · · Score: 1

    It could be they are already using a fancier scheme - it's hard to tell what's real details of their method, and what's pop-sci "summary". So I apologize if I'm not giving them deserved credit here.

    --
    Let's not stir that bag of worms...
  11. "very rare"? by Anonymous Coward · · Score: 2, Insightful

    How rare is "very rare"? If they have that 2.2 petabyte gram of sotrage, and "rare" means 0.0001% of the time, that's still 9 billion failures in your archived data.

    1. Re:"very rare"? by wed128 · · Score: 1

      So uhhh....parity?

  12. Human data carriers? by futhermocker · · Score: 1

    Think about it, saving your stuff in a dna tattoo... very cool and very creepy at the same time.

    --
    KERNEL PANIC -SIGFAULT AT ADDRESS #51A54D07
  13. Re:Kneau Reeves movie "Johnny Mnemonic" by Anonymous Coward · · Score: 1

    Review I read of that movie: "Keanu Reeves is miscast as someone with too much information in his head."

  14. Re:How do you... by Anonymous Coward · · Score: 1

    DNA isn't "alive," it's a really big molecule.

  15. Ok.. digital data on DNA.... by wierd_w · · Score: 1

    So, while I realize that the intet here is not to put it inside a living organism.... some part of me wants to know what would happen if the data for various windows malware packages was encoded, and injected into bacterial hosts.

    Think of all the new diseases that could come about from pure happenstance, coinicidence, and murphy's law!

    Kind of a "throw stuff at the wall and see what sticks" silliness side effect of using DNA for data storage.

    1. Re:Ok.. digital data on DNA.... by plover · · Score: 1

      You mean like Snow Crash?

      --
      John
  16. Highly unlikely, but I can't help but to wonder: by kheldan · · Score: 1

    DNA is the machine language of biological life. What happens if it starts perpetuating itself?

    --
    Are YOU using the TOOL, or is the TOOL using YOU? Think about it!
  17. Re:How do you... by tragedy · · Score: 1

    Synergy?

  18. Major challenge: Retrieval and storage by robbyjo · · Score: 3, Interesting

    Okay, storing is "solved". How about retrieval? Especially random access retrieval that are simple and fast (relatively speaking) that allow such storage medium to be practical? Certainly not DNA sequencing that can take weeks to complete?

    The second problem: DNA denature and fragment at room temperature. Certainly a -80C lab freezer for storage wouldn't be practical.

    Third problem: DNA secondary and tertiary structure. The coding scheme must also solves the problem of DNA tendency to make secondary structure (like hairpin) or tertiary structure (like super-coil) that can hamper reading / access to the information. I think this is the reason why the storage uses short sequences. But short DNA sequences like the one proposed (~100 bp, from the figure) could still make such structures.

    --

    --
    Error 500: Internal sig error
  19. Transfers by RNLockwood · · Score: 1

    "That was the best sex ever and BTW, I just gave you copies all my videos".

    --
    Nate
    1. Re:Transfers by Inigo+Montoya · · Score: 1

      That's a 9 month transfer. Those videos are old already...

  20. Uh oh. by viperidaenz · · Score: 2

    You copied an MP3? Expect to be sued by the RIAA and their European buddies.

  21. prior art by boldi · · Score: 1

    The question is: What if other already used similar method to send messages to us? How would you find that out? Anybody tried to find it out? Considering the possibility we are not alone...

  22. Re:Kneau Reeves movie "Johnny Mnemonic" by oodaloop · · Score: 1

    I think the file he uploaded in the beginning was something like 200 megabytes. Or maybe 20. It's certainly chuckle inducing to watch these days.

    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
  23. another flap by fyngyrz · · Score: 1

    ok, those were both really fowl.

    --
    I've fallen off your lawn, and I can't get up.
  24. Re:First practical use... by viperidaenz · · Score: 1

    Porn is already encoded in DNA. Sometimes a bit of silicone is added.

  25. what's in our DNA then? by dogganos · · Score: 1

    Should we start checking our own DNA for encoded files from our deep ancestors?

  26. Does it run exFat 1.0? by FatLittleMonkey · · Score: 1

    I was actually trying to come up with a ReiserFS gag.

    --
    Science is all about firing a drunk pig out of a cannon just to see what happens.
    1. Re:Does it run exFat 1.0? by inamorty · · Score: 2

      Sorry to hear that. You would have killed it.

  27. But DNA has a Half-Life of 521 years by sidevans · · Score: 1

    Slashdot told me so

    10,000 years my ass....

    --
    I'm not signing anything
  28. Great. by Legion303 · · Score: 1

    So now I can store my entire porn collection in one spurt.

    Science!

  29. Prior art by Demat · · Score: 1

    I filed a patent along similar lines back in 2006, IIRC. Although long since lapsed, it did include more sophisticated error correction and compression. The text of the patent can be found here: https://docs.google.com/file/d/0BwCRbg2GjBaddHU5UnRYTWJUS3c/edit

  30. Re:Kneau Reeves movie "Johnny Mnemonic" by damien_kane · · Score: 1

    Or someone encodes a copyright-protected song into that DNA, and it starts replicating, thus committing millions upon millions of acts of infringement, which wipes out the debt the Refined League owes to Earth (if we ourselves ever become refined)

  31. DNA reading/righting rather slow by peter303 · · Score: 1

    Its like a millisecond per base pair or a kilobyte or two per second. However a cell may have tens of thousands of ribosomes to parallelise this function.

  32. At Last! by museumpeace · · Score: 1

    A media that is DRM free because the rip-and-burn tools cost about a billion dollars. I for one would not want to carry around a box of test tubes with gelatinous MP3s of every note and recording humanity has ever emitted...gimme my iPod.

    --
    SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
  33. This is Fucking Awesome by omfglearntoplay · · Score: 1

    Sorry, this is just the best news I've read in ages. Fucking AWESOME!!!