Slashdot Mirror


Four New DNA Letters Double Life's Alphabet (nature.com)

Joe_NoOne (Slashdot reader #48,818) shares this update from Nature: The DNA of life on Earth naturally stores its information in just four key chemicals -- guanine, cytosine, adenine and thymine, commonly referred to as G, C, A and T, respectively. Now scientists have doubled this number of life's building blocks, creating for the first time a synthetic, eight-letter genetic language that seems to store and transcribe information just like natural DNA.

In a study published on 22 February in Science, a consortium of researchers led by Steven Benner, founder of the Foundation for Applied Molecular Evolution in Alachua, Florida, suggests that an expanded genetic alphabet could, in theory, also support life. "It's a real landmark," says Floyd Romesberg, a chemical biologist at the Scripps Research Institute in La Jolla, California. The study implies that there is nothing particularly "magic" or special about those four chemicals that evolved on Earth, says Romesberg. "That's a conceptual breakthrough," he adds... Benner says that the work shows that life could potentially be supported by DNA bases with different structures from the four that we know, which could be relevant in the search for signatures of life elsewhere in the Universe...

The researchers call the resulting eight-letter language 'hachimoji' after the Japanese words for 'eight' and 'letter'. The additional bases are each similar in shape to one of the natural four, but have variations in their bonding patterns. The researchers then conducted a series of experiments that showed that their synthetic sequences shares properties with natural DNA that are essential for supporting life... Benner's group previously showed that strands of DNA that included Z and P were better at binding to cancer cells than sequences with just the standard four bases. And Benner has set up a company which commercialises synthetic DNA for use in medical diagnostics.

67 comments

  1. hachimoji??? by jasnw · · Score: 0

    Good god man, don't tell Apple about this!

    1. Re:hachimoji??? by ShanghaiBill · · Score: 0

      This is like going from ASCII to Unicode. But it isn't really "doubling". With four nucleotides, each base pair encodes 2 bits. With eight pairs, it is 3 bits each. So this is only a 50% improvement in information density.

      But I am not sure how much this helps. With four nucleotides we already have 64 triples, but there are only 20 amino acids. Add stop and start codons, and that is only 22. So there is already plenty of redundancy.

      At least for now, I am sticking with four nucleotides.

    2. Re:hachimoji??? by Anonymous Coward · · Score: 0

      iOS 13 will include hachimoji or will switch to Android. I demand this as a nerd consumer.

    3. Re:hachimoji??? by Livius · · Score: 2

      With four nucleotides, each base pair encodes 2 bits. With eight pairs, it is 3 bits each. So this is only a 50% improvement in information density.

      3 bits is twice as much information as 2 bits.

    4. Re:hachimoji??? by Anonymous Coward · · Score: 1

      3 bits is twice as much information as 2 bits.

      By that measure, 100,000,001 bits per second is twice as much information as 100,000,000 bps. So you must be really happy when your ISP gives you an upgrade.

      Well there's an incredible lack of understanding if I ever saw one.

      With 2 bits, you can count four possibilities.

      00, 01, 10, 11.

      With three bits, you can count eight possibilities, twice as many. (All four of the above with a zero added, and all four above with a one added.)

      000, 001, 010, 011, 100, 101, 110, 111.

    5. Re:hachimoji??? by Livius · · Score: 1

      I think you're mixing two notions of 'bit', and squaring versus doubling (which are the same for 2 but no other number). Eight possibilities per genetic base-pair is twice as many as four. Equivalent representation in binary is another matter, but it would be a whole other coding step.

    6. Re:hachimoji??? by K.+S.+Kyosuke · · Score: 1

      That's twice as many options, but amount of information is proportional to the logarithm of the number of options.

      --
      Ezekiel 23:20
  2. 6 to 8 by Artem+S.+Tashkinov · · Score: 3, Informative

    Not meaning to downplay the significance of this breakthrough but early last year synthetic biologist Floyd E. Romesberg already announced the creation of two additional synthetic letters.

    1. Re:6 to 8 by Anonymous Coward · · Score: 0

      Not to play up the importance of reading, but this is a follow-up to that story. Now there are four additional "letters" total.

    2. Re:6 to 8 by Red_Forman · · Score: 1

      So is it the new total number of letters 4+2+4, or 4+2+2?

    3. Re:6 to 8 by Anonymous Coward · · Score: 0

      You're demonstrating the cognitive consequences of a synthetic genome for us, eh? 4 additional synthetic keys total. Stop eating plastic, it's bad for you despite what Luckyo says.

    4. Re:6 to 8 by Anonymous Coward · · Score: 0

      No, it's 1 + 2 + 2 + 1!

    5. Re:6 to 8 by Anonymous Coward · · Score: 0

      I just created 9! I am guessing there is a good reason life settled on ACGT.

    6. Re:6 to 8 by Goldsmith · · Score: 1

      Don't mistake an entertainment focused presentation like a TED talk for an original research presentation.

      FAME and Steve Benner have been the leaders in this field for a long time. Benner made the first expansion from 4 to 6 bases about 30 years ago. SRI and Floyd Romesberg are very good, but they are following Benner here (and Romesberg is a better public speaker).

      You're right, though, that this is from 6 to 8.

    7. Re: 6 to 8 by Anonymous Coward · · Score: 0

      Thanks for the Clue.

  3. What will my alphabet soup say today? by Anonymous Coward · · Score: 0

    Kill yourself. Wait, what? FU Campbells.

    1. Re:What will my alphabet soup say today? by Anonymous Coward · · Score: 0

      mine says "OOOOOOOOOOOOOO". oh wait, it's spaghetti-O's.

  4. Interesting! by oldgraybeard · · Score: 2

    Life is 8 bit
    How soon before we move to 16 -> 32 -> 64?

    Just my 2 cents ;)

    1. Re:Interesting! by Artem+S.+Tashkinov · · Score: 1

      Life on Earth (DNA code) is still four-bit. Synthetic organisms can't yet produce synthetic base pairs.

    2. Re:Interesting! by Red_Forman · · Score: 5, Funny

      How soon before we move to 16 -> 32 -> 64?

      It depends if you rely on the marketing departments of NEC, SEGA or Intel.

    3. Re: Interesting! by Anonymous Coward · · Score: 0

      Kill yourself, stupid "2 cents" imbecile.

    4. Re:Interesting! by Tough+Love · · Score: 1

      Life is 8 bit

      It's not. A codon consists of 3 nucleotides, each of which has 4 different values (i.e., distinct chemical composition) equivalent to a two bit encoding. Total: six bits. The artificially extended version adds one bit to the nucleotide range, times 3 nucleotides, total 9 bits. So there is no way to interpret your joke as correct, sorry. It's not about lacking a sense of humour, it just that the numbers need to add up. For example, "life is braille". Which is actually close to the truth when you look at the way a ribosome works.

      Anyway, like almost anything in nature, it's way more complicated than that because the natural "design" is pretty sloppy, with a lot of redundancy that serves no apparent purpose, so that the 6 bit natural code actually encodes only 20 different protein peptides, plus start and stop, so there are 42 redundant codes out of 64. You could attempt an analogy with ECC, also redundant, but it's not like that, it's more like an ECC designed by somebody who read the first paragraph of the wikipedia page then set out bravely to design their own error correction scheme without bothering to learn the underlying math, giving up in frustration half way through and leaving it for a bunch of code monkeys to make the demented scheme work anyway. Kind of like most software projects, actually. The end result is something that looks like it ought to have some underlying mathematical pattern, but the closer you look at it, the more you want to go back for a do-over.

      Well. That would be a moot point if there was nothing we could do about it, but as these researchers demonstrate, there clearly is. Just to be clear about what they actually did: what defines the genetic code? It's not just four distinct nucleotides, you also need a decoder, which in nature is a ribosome. This research did not go that far. I'm sure they hope to one day, but engineering a 3-bit-per-nucleotide ribosome is a whole lot harder than coming up with 4 new nucleotides. One thing they did get for free is the ability to transcribe 3-bit DNA to 3-bit RNA, because this mechanism doesn't involve any coding, it just requires that the nucleotides pair up uniquely, as they say, like lego bricks.

      Some other researchers previously repurposed some of those redundant natural codes to code for non-natural peptides, another way to extend the genetic code. Another, much harder way, would be to increase the nucleotides per codon. Then, getting back to your joke, we really could create 8 bit life. Left as an exercise for the interested reader to determine whether we want that.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    5. Re:Interesting! by Tough+Love · · Score: 1

      Hmm, correction, the ribosome actually doesn't need to change much if at all for their 3 bit code, mostly they just need some new tRNAs that incorporate their new nucleotides. The natural machinery should be able to handle it with minimal modification. Yikes.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    6. Re:Interesting! by Tough+Love · · Score: 1

      OMG, two-bit. Or six-bit per codon.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    7. Re:Interesting! by meglon · · Score: 1

      If it was AT&T we'd already be at 8 bit with only the original 4.

      --
      Fascism: An authoritarian and nationalistic right-wing system of government and social organization. See also: NAZI's
    8. Re: Interesting! by Anonymous Coward · · Score: 0

      If I wanted to kill myself I'd climb up your ego and jump down to your IQ. Toodles.

      My 2 cents. ;)

    9. Re:Interesting! by Anonymous Coward · · Score: 0

      Total: six bits

      That's not entirely true, young man. You're forgetting that
      1 bits is used as the parity peptide, so in truth it's 5 bits.
      But there are combinations that chemically don't work
      together, effectively leaving it at 3.5 bits. So there.

      CAP === 'subnet'

    10. Re:Interesting! by Tough+Love · · Score: 1

      Please do provide an example of a chemical combination that doesn't "work together", thus reducing the number of bits per codon. (troll detected)

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    11. Re:Interesting! by rgmoore · · Score: 1

      It's not just four distinct nucleotides, you also need a decoder

      You need more than just a decoder. You also need a supply of the nucleotides. If you're doing this as a proof of concept experiment in vitro, you can get away with synthesizing them chemically, but if you want it to work in vivo, you need would need a whole set of enzymes to synthesize, metabolize, and transport those nucleotides. That would probably require dozens of new enzymes, which is way beyond the current state of the art.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    12. Re:Interesting! by Anonymous Coward · · Score: 0

      Please do provide an example of a chemical combination that doesn't "work together", thus reducing the number of bits per codon.

      Sure, third paragraph.

      (troll detected)

      Are you admitting to trolling?

    13. Re:Interesting! by Tough+Love · · Score: 0

      What an ass.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  5. Re-reporting news by WoodstockJeff · · Score: 1

    https://science.slashdot.org/s...

    At least it's not on the front page anymore...

    1. Re:Re-reporting news by Tough+Love · · Score: 1

      Are you implying it's a dup? It's not. Different group, different research. Seems to take a similar idea much further, to a specific application.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    2. Re:Re-reporting news by WoodstockJeff · · Score: 1

      If it's a different group, they share the same lead researcher and the name for their new 8-code DNA strands.

  6. 8 POSITION, not 8 bit. by Anonymous Coward · · Score: 2, Informative

    You can only have one of any of the 8 occupying a given strand pair position, meaning that it is 2^3 or 3 bit resolution, up from 2 bit with the previous base pairs. The rest of what you say is correct however.

    1. Re:8 POSITION, not 8 bit. by Tough+Love · · Score: 1

      The rest of what you say is correct however.

      Right, when you double numbers they do get bigger, and the 2 cents is correct.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  7. the evolution revolution by Anonymous Coward · · Score: 0

    if we get it right we could all grow wings & each have our own planet? beware falling gargoyles, & other stuff..

    1. Re:the evolution revolution by Red_Forman · · Score: 2

      Isn't it easier to just drink Red Bull?

    2. Re:the evolution revolution by Tablizer · · Score: 1

      Mormonism?

  8. meanwhile by Anonymous Coward · · Score: 0

    we could discover our ability to acquire/restore abandoned property & recycle it ourselves?

  9. One step closer.. by Rick+Schumann · · Score: 1

    ..to the zombie apocalypse.

    1. Re:One step closer.. by PPH · · Score: 1

      ..to Leeloo.

      --
      Have gnu, will travel.
    2. Re:One step closer.. by jth1234567 · · Score: 1

      I, for one, welcome our new S, B, P and Z(ombie) overlords.

    3. Re:One step closer.. by Megane · · Score: 1

      For that they would need to make bakemoji instead of hachimoji or mojibake.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    4. Re:One step closer.. by Anonymous Coward · · Score: 0

      Roy Batty says: I want more life, fucker.

  10. When Slashdot doesn't dupe by Crashmarik · · Score: 1

    That's noteworthy.

  11. Other letters were probably discarded by Solandri · · Score: 5, Insightful

    Humans have created a variety of languages, from Chinese with thousands of different characters (one for each word), to English with 26 characters which are combined to make different words, to binary with just 2 characters which are combined to make words. When researchers started playing around with compression algorithms, they got to wondering - what's the optimal number of characters in an alphabet for maximizing compression? That is, minimizing the size of the words, while also minimizing the space taken up by each character. With binary, you minimize the space needed to encode each character, but it comes at the cost of lengthening the size of each word. With Chinese you minimize the size of each word, but it comes at the cost of increasing the space needed to encode each character. How many letters in an alphabet results in the most compact language?

    The answer turns out to be e. 2.718. An alphabet with e characters allows you to represent data the most efficiently and compactly. Obviously you can't have a non-integer number of characters, so the optimal number of characters for a compact language is 3.

    Which is probably why DNA only codes 4 different molecules. Since a double helix with conjugate pairs can't be coded with 3 letters, 4 end up being the next step. Likely, DNA/RNA with more base pairs have developed naturally before (probably several times), but were eventually selected out after having to compete with 4-base pair DNA. So as interesting as this is, it probably isn't the first time it's happened like TFA states.

    1. Re:Other letters were probably discarded by dfghjk · · Score: 1

      "Likely, DNA/RNA with more base pairs have developed naturally before (probably several times), but were eventually selected out after having to compete with 4-base pair DNA."

      And why would it be selected out? Are you claiming that it could not compete with 4-base pair due encoding efficiency? Please.

    2. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      The biological equivalent has also been analysed; where "efficiently and compactly" have very physical characteristics - like molecule sizes and energy requirements. Not really my field I but remember finding this interesting:

      https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2003.2355

      IIRC, the use of 4 letters was established back in the RNA world before DNA. The ideal number of codes for use in RNA of lands in somewhere between 4 and 5, and since they operate in pairs that makes 4 indeed turn out to be the fittest choice (over 2, 6 or 8, according to whatever simulation was done). The paper then suggests that in "modern" DNA cells with high-fidelity copying, that actually 6 codes could have been a better fit. (Presumably: had the machinery based around 4 codes not already evolved as it had.)

      This "fidelity" of copying gets into the fascinating area of error correction: I forget the exact numbers (from one intro systems-biology text I read) but the initial replication of DNA makes a surprisingly high number of mistakes, like 1 in 10,000 or something. It requires secondary proof-reading machinery to come along and correct the copies to bring the error rate back down to a reasonable level. (Like one single letter mistake per cell copy for a typical cell.)

      So for cells with more than 4 codes to actually replicate/divide, it's not just a matter of throwing in some extra nucleotides and the enzymes that synthesize them, transport them etc, but then you'd also need those hellishly complex molecules that perform the proof-reading and error-correcting to be updated accordingly.

      So I agree creating these 4 new well-behaved nucleotides sounds like an exciting and significant advance over prior attempts (e.g. that they also work in RNA), but you also can't understate this quote from the article: "but there is still a substantial distance to go before reaching a true eight-letter synthetic genetic system" :-)

    3. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      How many letters in an alphabet results in the most compact language? The answer turns out to be e. 2.718. An alphabet with e characters allows you to represent data the most efficiently and compactly. Obviously you can't have a non-integer number of characters, so the optimal number of characters for a compact language is 3. Which is probably why DNA only codes 4 different molecules. Since a double helix with conjugate pairs can't be coded with 3 letters, 4 end up being the next step. Likely, DNA/RNA with more base pairs have developed naturally before (probably several times), but were eventually selected out after having to compete with 4-base pair DNA. So as interesting as this is, it probably isn't the first time it's happened like TFA states.

      As you eluded to yourself, there are ultimately only 2 actual characters in DNA due to encoding in strict base pairs, making it binary. I guess the number "42" (4 molecules, 2 letters) really was the meaning of life and everything in the universe!

    4. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      RNA and DNA share 3 bases, so it kinda does make sense. in an RNA world 3 bases might work, but once DNA started going with 4, RNA would have followed suit. a billion years ago it woulda been kinda hard for RNA-based life like viruses to infect DNA based life when you're short a base.

    5. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      The difference between complementary pairs matters, dude.

      C
      G
      T
      A

      is different from

      G
      C
      A
      T

    6. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      Where did 2.718 come from? Your imagination?

    7. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      I add that we need pairs, so 2, 4, 6... is OK, but then there is something else, in nature - where "natural selection" governs - we have up to 5 "identical items" as 5 fingers (extremely rarely 6), so yes, 4 5, seems that 4 is the "gold number".
            Next should be to check if the new "letters" are really letters and not sci-fi, so they need to create "life" with the new "characters", life with descendants.
            I profundly doubt this, since technology manipulate the actual code, but not able to create something NEW even with the ONLY 4 letters there are...

    8. Re:Other letters were probably discarded by Anonymous Coward · · Score: 0

      The answer turns out to be e. 2.718. An alphabet with e characters allows you to represent data the most efficiently and compactly.

      [citation needed]

  12. who's CAPTCHA: foregone by Anonymous Coward · · Score: 0

    So the A, G, C, and T, molecules are produced by cells from genes made out of A, G, C, and T. Which came first?

  13. Nothing "magic"... yeah, sure by Anonymous Coward · · Score: 0

    "The study implies that there is nothing particularly "magic" or special about those four chemicals that evolved on Earth"

    Nothing we know about, that is. While humankind is capable of doing some amazing things, let's not let our heads get so swollen that we think we really know much of anything.

  14. Not Sure That Makes Sense by Anonymous Coward · · Score: 0

    I'm not sure that logic results in the outcome we see. You assume the double helix is the preferred molecular configuration. Why?

    Seems to me the more likely outcome would be (just spitballin' here) a system of 3 bases, a triad. They can be combined ABC, BCA, and CAB. I'm assuming that base reversed triads (e.g. CBA for ABC) either get excluded or are treated as equivalents. The goal is, of course, to round e to the nearest integer which is 3.

    This may be chemically impossible. It may challenge the origins of life too greatly. It might be excluded on the basis of Occam or Newton's Second Law or Entropy. I Am Not A Chemist (IANAC).

  15. search for signatures of life elsewhere... by Ken+McE · · Score: 1

    Benner says that the work shows that life could potentially be supported by DNA bases with different structures from the four that we know, which could be relevant in the search for signatures of life elsewhere in the Universe...

    Not sure why anyone would expect to find terrestrial DNA anywhere but Terra. Totally unrelated creatures would use a totally different system. If they did find something like ours, that would mean we were relatives.

    1. Re: search for signatures of life elsewhere... by phantomfive · · Score: 1

      Because if DNA is the only way to do life, then anywhere life exists, it will have DNA. They've shown that there's at least one potential alternative.

      --
      "First they came for the slanderers and i said nothing."
    2. Re:search for signatures of life elsewhere... by Antique+Geekmeister · · Score: 1

      Might I suggest you look up "panspermia" ? In it's fullest theories, it's the idea that the most elementary building blocks of life were seeded on Earth from interstellar dust or asteroids, and that they will be seeded on other worlds the same way.

      Even if panspermia did not happen. It does not mean that significant amounts of biochemistry, perhaps including some form of DNA, might not exist elsewhere. The nature and extent of parallel evolution surprised Charles Darwin when he originally wrote about evolution. If DNA is efficient and effective at coding complex biological structures, it might be quite likely compared to other useful structures. It would only need to have some genuine advantage to out-evolve other structures. And this raises a very interesting question. Is there some advantage, or disadvantage, to the newly described amino acids? Is there a compelling reason that DNA uses the 4 amino acids normally used?

  16. Um, sequel, please? by Anonymous Coward · · Score: 0

    Finally! A sequel to GATTACA! Been waiting for fucking ever, mate. Oy.

  17. ERRATA by Anonymous Coward · · Score: 0

    the " 4 5 " - please read " 4 less than 5 "
    TY!