Slashdot Mirror


Computational Genomics

blamanj writes "Scientists at UC Santa Cruz have been using computational techniques to 'reverse engineer' the DNA of extinct species. David Haussler and colleagues created a hypothetical portion of ancestral mammalian DNA and let a computer model simulate the process of evolution. Then they made their algorithm work backward from these descendants, to see if it could recreate the original ancestor."

34 comments

  1. This will have many applications by orthogonal · · Score: 0, Flamebait

    "Scientists at UC Santa Cruz have been using computational techniques to 'reverse engineer' the DNA of extinct species."

    The scientists added that the Bush Administration's environmental policies will helpfully provide many more extinct species for the new techniques to be tested on.

  2. Reverse enginering by Ender_Stonebender · · Score: 2, Interesting

    Does this seem like "we'll get the original order of a list based on the sorted order and knowing how the sort algorithm took to run" (in otherwords, bound to be so wrong as to be useless)?

    Or is it just me?

    --Ender

    --
    Loose things are easy to lose. You're getting your hair cut. They're going there to see their aunt.
    1. Re:Reverse enginering by Smidge204 · · Score: 1

      Almost, but not quite.

      Which would be closer to taking several outputs of a semi-random "black box" function which is reasonably well understood, and trying to determine the common input that generated the various outputs.

      Even that's not a terrific analogy, really, but it's a little closer.
      =Smidge=

    2. Re:Reverse enginering by Lenale · · Score: 3, Insightful

      It does sound a bit fishy... I just attended a lecture on DNA-focused biophysics the other day, and they were all about "we won't be able to compute it for years, but..." And by the way, as the article said, we're quite a bit behind the rodents in losing bases... let's make babies :)

    3. Re:Reverse enginering by Daniel+Dvorkin · · Score: 1

      Biophysics isn't quite the right comparison, since biophysicists tend to deal with problems that are computationally hard. (And I mean "hard" in the strict CS-y sense, not in the "I'm having trouble coding this particular function" sense.) Genomics, while there is plenty of core algorithm work to be done, is much computationally much easier -- there are well-known polynomial-time sequence comparison and reconstruction algorithms, for instance.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    4. Re:Reverse enginering by Anders+Andersson · · Score: 2, Interesting
      And by the way, as the article said, we're quite a bit behind the rodents in losing bases...

      The article explained the difference in mutation rates by referring to the shorter reproduction rates of rodents. However, as I understand the process of transferring DNA from one generation to the next, mutations may occur whenever a cell splits in two, not only when the animal reproduces. I seem to recall from Sykes' book The Seven Daughters of Eve that the average number of successive cell divisions in the reproductive organs from one human generation to the next is around 20 (or perhaps less). Do the corresponding cells in rodents really divide more often than in humans, just because they reproduce faster?

    5. Re:Reverse enginering by Anders+Andersson · · Score: 3, Insightful

      I would compare it to analyzing languages spoken today to determine how the language they descend from (such as proto-indoeuropean) may once have sounded. While many indoeuropean languages are mutually unintelligible today, they share certain fundamental elements that are best explained by them having been present from the start. It's not an exact science, of course.

    6. Re:Reverse enginering by fbjon · · Score: 3, Funny

      Fishy, you say? I immediately thought of hollywood science.

      "Detective, we have a new computer program that can predict the path of any bullet...."
      "Yeah, so?"
      "So we tried running it backwards, and we just found out where the suspect bought the ammo!"

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    7. Re:Reverse enginering by daymitch · · Score: 2, Interesting

      Think about this for a minute. Rat cells don't have to divide any faster than humans for them to have had more cell divisions since our divergence. They reproduce more often so they have more generations per unit time than we. The rats somatic cells are also probably just 20 divisions or so away from the original zygote. Just multiply it out and you have more cell divisions per unit time in the rats.

    8. Re:Reverse enginering by Anonymous Coward · · Score: 0

      Hey, I just saw a CSI episode where they did the exact same thing.

      And then Grissom said something witty.

    9. Re:Reverse enginering by Anonymous Coward · · Score: 0
      Does this seem like "we'll get the original order of a list based on the sorted order and knowing how the sort algorithm took to run" (in otherwords, bound to be so wrong as to be useless)?

      Details are lacking, but my guess as a biologist is that things are better than you may think.

      A more accurate comparison is "We'll be able to get a noiseless photo based on a large number of noisy images and a knowledge of how the noise was added."

      The trick is that you start with a number of DNA sequences which are known to be derived from the same source. Over time, each gets mutated, but (here's the key part) each in different locations. With enough genomes, it's likely that two or more will be identical in some positions. If it's statistically unlikely that those two parts became identical by random chance, you can conclude they probably derive from the common origin. By teasing out the portions that are presumed to be from the original sequence, you get a good idea of what the original looks like.

      I'm sure that their method is more complex and involves things like analyzing the expected frequecy of mutations, along with other analysis, but the basic premise is sound

    10. Re:Reverse enginering by Mattcelt · · Score: 1

      mutations may occur whenever a cell splits in two, not only when the animal reproduces

      This is true, but don't forget that most of these mutations are completely lost right away; only those that are passed on to future generations by reproduction are able to persist.

      A mutation which starts in the brain but doesn't make its way to the reproductive organs prior to procreation doesn't get passed on to any future generation.

  3. 98 percent? by kureido · · Score: 3, Interesting

    From the article: "Then they made their algorithm work backward from these descendants, to see if it could recreate the original ancestor. The ancestor the algorithm came up with had a sequence that was 98% accurate..."

    Human and chimpanzee DNA are about 98% similar, too. In that context, 98% similarity doesn't seem that impressive. Maybe someone needs to invent a new benchmark for sequence comparison for species that are already similar?

    1. Re:98 percent? by Anders+Andersson · · Score: 2, Insightful

      Since the accuracy with which the artificial genome was recreated in the simulation isn't compared with that of other methods for doing the same thing, the 98% figure doesn't tell us much. For all I know, that could be the accuracy you would get using any method (but I suppose the scientists actually have more simulation data than was presented in the article).

      Likewise, comparing that number to the degree of genome similarity between humans and chimps isn't very meaningful either. Since the article doesn't mention chimpanzees but rather rats and pigs, I suppose the research is focused on longer periods of evolution than the few million years that have passed since the split between humans and chimps.

      By the way, is the 98% difference in relation to all human DNA, or merely to the part of the genome that is identical among all humans? I don't know how much of a difference that makes, but I believe there is a difference.

    2. Re:98 percent? by Jormundgandr · · Score: 1

      Forget the 98% figure. It honestly doesn't mean anything at all. The tests are very accurate, but their results are meaningless. They amount to comparing two strings of binary written with a language and compiler that were lost 3 billion years ago, for a computer we don't understand. Measuring how every base pair matches up to every other base pair is useless. There are pieces of DNA that are far more important than others, like the ones that turn other genes on and off, or the ones that code for important enzymes.

      The figure for how many identical base pairs we share doesn't mean much because we don't understand most of what the DNA is doing. A switch controlling production of an important chimp hormone could be switched off in humans, turning off the thousands of base pairs involved in creating that hormone. Read by this type of analysis, however, that system would read as only 1% different, rather than 100% different, because the base pairs stay on the chromosome even though they don't do anything.

      Also, chimps don't even have the same number of chromosomes we do, so comparing the DNA strands in their 48 chromosomes to those in our 46 is to ignore significant changes in the overall organization of genes.

      So please, stop asking for clarification on the percentage, because it just doesn't matter. All it tells us is that we are closely related, but not how close. or in what way.

      --
      -sig removed for tax purposes-
  4. Hmm.. by LGEKoji · · Score: 1

    Apply this to birds and we'll get the gene sequence for raptors and other dinos..

    Who's up for Jurassic Park? Anyone?

  5. Wow...cool name.... by cephyn · · Score: 1

    Wasn't "Computational Genomics" in the Sid Meier's Alpha Centauri Tech Tree?

    --
    Moo.
  6. How does this compare to Bayesian analysis? by poincaraux · · Score: 1

    I read the Nature summary, but no real articles .. anyone know how they do the "working backwards" thing? I would've guessed some sort of Bayesian analysis, just like most people use to come up with phylogenetic trees, but it sounds like there's something more interesting going on here.

  7. Jurassic Park by Anders+Andersson · · Score: 2, Interesting

    I don't know how well understood the lineage from dinosaurs to modern birds are, but I suspect you would need the genomes from a few species that are not descended from dinosaurs (say, mammals) as well, for interpolation rather than extrapolation of the dinosaur genome.

    Even if we could recreate dinosaur DNA in this way, I doubt we have the technology to turn that DNA into a live animal, or even do a computer simulation of that process. Is anybody working on an open-source biochemical simulator?

  8. Algorithm testing. by Fortran+IV · · Score: 3, Interesting
    The process is interesting, but their description of how they tested their algorithm is less than confidence-inspiring.
      1. 1) Manually create a set of hypothetical data.
      1. 2) Run a mathematical algorithm to generate new data.
      1. 3) Run the converse of the algorithm on the generated data.
    If an algorithm is truly reversable then, without the necessary randomization, such a process is likely to generate the original data with 100% accuracy. I'd have felt much better if they'd run two independent algorithms against each other: create descendants with ForwardA() and extract ancestors with BackwardB(), then do the same thing with ForwardB() and BackwardA().
    --
    I figure by 2030 or so my 6-digit UID will be something to brag about.
    1. Re:Algorithm testing. by Anders+Andersson · · Score: 2

      I agree that the way this is expressed in the article leads to your interpretation:

      To assess their method, they created a hypothetical portion of ancestral mammalian DNA and let a computer model simulate the process of evolution, to generate sequences for its descendants.

      Then they made their algorithm work backward from these descendants, to see if it could recreate the original ancestor.

      However, I seriously doubt they actually reversed the simulation algoritm. Reading the entire article, it sounds more as if the algorithms for reverse engineering DNA have been under development for a long time, and that they wrote a separate simulation program to produce test data for evaluation of the newest version of the algorithm.

      A geneticist isn't necessarily a good computer scientist, and mistakes do happen in science, but somehow I doubt a mistake like that would slip by reviewers unnoticed. Maybe the algorithms are described in more detail in the associated papers (linked from the article).

  9. Planet seeding by Associate · · Score: 3, Interesting

    Once we get things like this under control along with teraforming, we can seed barren planets. We can walk the universe like gods. Probably have to kick the old one's out first.

    --
    Someone hates these cans.
  10. And next thing you know... by nystagman · · Score: 1

    ...Barclay's turned into a spider.

    --
    Theory and practice are the same in theory, but different in practice.
  11. Of mice and men by Anders+Andersson · · Score: 1

    I believe the answer is in line with your explanation, but I still can't really visualize the process well enough to understand it. If the female rat is one year old and the human woman is 30, how come their respective egg cells are both 20 "cell generations" younger than those of their mothers?

    While the entire rat population will experience a higher number of cell divisions (and thus a proportionally higher number of mutations) per unit time due to its size, those mutations will normally end up in different lineages rather than accumulate in the same lineage, and thus not contribute to a higher number of mutations in the same individual...

    Ah! I think I get it now. Sykes was discussing mtDNA evolution, where lineages split, but never merge (as mitochondrial DNA is inherited along the maternal line only). This is of course different from nucleic DNA, which is combined from the DNA of two parents, thereby allowing mutations from both to merge in the same individual. The mutation rate is the same, but mtDNA mutations are more easily lost due to some mothers having sons only, no daughters. While any single nucleic mutation from either parent runs a 50% risk of being eliminated in a child, this is made up for by one couple having more than two children, whether male or female. This effect is way more noticeable with rats and their explosive reproduction.

    It's late here, but I hope I didn't mess that reasoning up completely and there is still a grain of truth to it... :-)

    1. Re:Of mice and men by Anonymous Coward · · Score: 0
      If the female rat is one year old and the human woman is 30, how come their respective egg cells are both 20 "cell generations" younger than those of their mothers?

      You're probably forgetting differing cell division rates. It's "well known" that female mammals are born with all the eggs they'll ever have in their lifetime. So while still in the woumb, both rat and human have 20 divisions from egg to egg. Then their born, and the cells don't split agian - be it for one year or thirty.

    2. Re:Of mice and men by Anders+Andersson · · Score: 1

      Thanks, I wasn't aware of that fact (possibly forgotten, more likely never learned).

    3. Re:Of mice and men by Mattcelt · · Score: 1

      There has been some recent research to contradict this. Our cells seems a little more willing to replace themselves than we thought. First neuroplasticity, now this... hmmm.

    4. Re:Of mice and men by Anders+Andersson · · Score: 1

      That's an interesting find, but it doesn't invalidate the argument that most egg cells have already been created when the female mammal is born, and thus the average cell division rate would still be close to 20 divisions per generation (don't take that exact number for granted; I just seem to remember it off the top of my head). Even if a 45-year old woman produces an egg cell that is twice as many cell divisions removed from her own conception (say, 40) than the egg cells produced during her teenage years, she is far behind a rat which is reproductive already before one year of age.

      Interestingly, this constant "20" appears to put an upper limit to the theoretical number of offspring a single mammal could produce, namely around a million if no cells are wasted on non-reproductive organs... :-)

  12. Whoops, should've posted here by Jormundgandr · · Score: 1

    Forget the 98% figure. It honestly doesn't mean anything at all. The tests are very accurate, but their results are meaningless. They amount to comparing two strings of binary written with a language and compiler that were lost 3 billion years ago, for a computer we don't understand. Measuring how every base pair matches up to every other base pair is useless. There are pieces of DNA that are far more important than others, like the ones that turn other genes on and off, or the ones that code for important enzymes.

    The figure for how many identical base pairs we share doesn't mean much because we don't understand most of what the DNA is doing. A switch controlling production of an important chimp hormone could be switched off in humans, turning off the thousands of base pairs involved in creating that hormone. Read by this type of analysis, however, that system would read as only 1% different, rather than 100% different, because the base pairs stay on the chromosome even though they don't do anything.

    Also, chimps don't even have the same number of chromosomes we do, so comparing the DNA strands in their 48 chromosomes to those in our 46 is to ignore significant changes in the overall organization of genes.

    So please, stop asking for clarification on the percentage, because it just doesn't matter. All it tells us is that we are closely related, but not how close, or in what way.

    --
    -sig removed for tax purposes-
    1. Re:Whoops, should've posted here by kureido · · Score: 1

      So please, stop asking for clarification on the percentage, because it just doesn't matter. All it tells us is that we are closely related, but not how close, or in what way.

      That was exactly my point. Why bother to run expensive computational genetics experiments if the result comes out as "Well, we know it's close, but not how close, or in what way." The scientists might as well give a picture of an animal to a sketch artist and say, "Draw this, except more primordial."

  13. Accuracy of reverse engineering by Anders+Andersson · · Score: 1

    The percentage that doesn't matter is the similarity between human and chimp DNA, not the accuracy with which the artificial sequence was reconstructed after simulated mutations. While it's true that there is little if any correlation between DNA sequence similarity and similarity of the resultant physiologies, the simulation was only concerned with the DNA sequences themselves, not their manifestations as living creatures.

    For all we know, the initial DNA sequence used for the simulation may have been entirely random, not related to the DNA of any organism alive today or in the past. The purpose of the experiment was to determine whether the reverse-engineering algorithm would be able to "undo" the simulated mutations, which it managed to do with 98% accuracy. This is a strictly quantitative measure, unrelated to the biochemical results of placing those DNA sequences in live cells.

    One thing I doubt the simulated evolution can have taken into account is natural selection due to lethal mutations. Well, the simulation may have considered some percentage of mutations lethal and dropped them, but it may hardly have been able to predict which parts of the artificial DNA sequence would kill the embryo if they were mutated. Thus the resulting sequences were obviously even more artificial than the sequence they started with, but I doubt this mattered to the reverse-engineering algorithm. The scientists are merely comparing blueprints with each other, not the houses built from said blueprints. If you are interested in blueprint evolution, it doesn't matter what the houses look like or how similar they are to each other (assuming those blueprints are based on other blueprints only, not on actual houses previously built).

  14. Mutation rates by Anders+Andersson · · Score: 1
    A mutation which starts in the brain but doesn't make its way to the reproductive organs prior to procreation doesn't get passed on to any future generation.

    Of course, but this doesn't affect the mutation rate, and thus won't explain the differing mutation rates between rats and humans. A mutation in a rat's brain cell is no more likely to make it to the reproductive organs than a mutation in a human brain cell, in spite of the rat being a lot smaller (it supposedly has fewer of all kinds of cells, not smaller cells).

    Instead, the number of mutations passed on to future generations relates to the number of times the DNA helix is split and copied between two successive conceptions, ignoring what happens in non-reproductive organs of the body. As others have explained, that number appears to be constant with mammals, resulting in faster mutation rates for species with shorter generation spans, although this theory is probably not cast in stone yet.