Slashdot Mirror


Machine Learning Reveals Genetic Controls

An anonymous reader writes with this quote from Quanta Magazine: Most genetic research to date has focused on just 1 percent of the genome — the areas that code for proteins. But new research, published today in Science, provides an initial map for the sections of the genome that orchestrate this protein-building process. "It's one thing to have the book — the big question is how you read the book," said Brendan Frey, a computational biologist at the University of Toronto who led the new research (abstract).

For example, researchers can use the model to predict what will happen to a protein when there’s a mistake in part of the regulatory code. Mutations in splicing instructions have already been linked to diseases such as spinal muscular atrophy, a leading cause of infant death, and some forms of colorectal cancer. In the new study, researchers used the trained model to analyze genetic data from people afflicted with some of those diseases. The scientists identified some known mutations linked to these maladies, verifying that the model works. They picked out some new candidate mutations as well, most notably for autism.

One of the benefits of the model, Frey said, is that it wasn’t trained using disease data, so it should work on any disease or trait of interest. The researchers plan to make the system publicly available, which means that scientists will be able to apply it to many more diseases.

3 of 14 comments (clear)

  1. cis and mi regulation is not "bad" code by WillAffleckUW · · Score: 3, Interesting

    See, the problem is many of you don't get that what you think of as "noise" in the DNA is actually code. Shifted code. The internal mechanisms use cis regulation and miRNA, mRNA, cRNA to adapt to things going on in the environment.

    It's not noise code, or broken code.

    It's designed to do that.

    If anyone had taken assembler and machine coding back in the old days of computing, they'd get it. You only have so much to code with, so you make it do multiple things.

    --
    -- Tigger warning: This post may contain tiggers! --
    1. Re:cis and mi regulation is not "bad" code by rockmuelle · · Score: 2

      For small genomes, yes, but for large genomes, there is a lot of "unused" material.

      Only about 6-10% of the human genome is transcribed into RNA, either protein the coding kind or non-coding types used in regulation. (small genomes are almost always entirely coding and even include overlapping coding regions, large genomes are the ones that have "junk" DNA in them)

      Transcription is most closely related to a processor reading machine code and doing something with it. In a computer program, we know that we can safely remove dead code paths and the code will still function. This is not true for DNA. Remove a portion of someone's genome and they usually die.

      It's much more likely that the "junk"/"noise" regions of the genome are structural and help the DNA coform so the chromosomes can specialize for different functions. DNA folds differently depending on the cell type in multicellular organisms. Because the nucleus of a cell is a fairly crowded place, the way the DNA folds determines which sites on it are even accessible for transcription. Muscle cells expose one set of gene coding regions, fat cells expose another.

      Taken from this perspective, large genomes are more akin to an origami fortune teller than machine code. Depending on the series of folding/unfolding events, a specific fortune is revealed. The fortunes are encoded directly onto the paper, but the paper also forms the structure used to access the fortunes. Another actor reads the instructions and acts on them (a person in the origami case or polymerase for DNA).

  2. Junk DNA by SupraTT+GOP · · Score: 3, Insightful

    Junk seems to be amazingly capable. I seem to be learning of its doing more and more with each passing day. Impressive stuff.