Slashdot Mirror


Genome Methods Applied to Reverse-Engineering

L1TH10N writes "Wired news has an article on a truely innovative way of analysing network protocol reverse-engineering. Marshall Beddoe, a security analyst, is using algorithms used in bioinformatics to analyse closed-source and secret network protocols which he calls "Protocol Informatics".According to Beddoe, network conversations are full of "junk" -- usually the actual data being sent -- which interferes with the analysis of the occasional command sequence that controls what to do with that junk. This has parrallels with Bioinformatics that has to deal with a similar problem of finding known DNA sequences separated by long gaps of unknown data. Biologists have devised complex algorithms to discover whether DNA sequences are descended from the same ancestors by comparing the genetic differences with the known mutation rates of certain DNA components. Beddoe applied the same principles to mutating network conversations of evolving network protocols."

8 of 94 comments (clear)

  1. Re:Illegal in the US.. by ZuperDee · · Score: 3, Informative

    Not quite true--it is still allowed for the purpose of ensuring compatibility, IIRC.

  2. true+ly = ? by kamagurka · · Score: 2, Informative

    it's "truly", damn it! TRULY!

  3. Bioinformatics links by mattr · · Score: 4, Informative
    Yesterday wrapped up over a week of intense Bioinformatics seminars, poster sessions, exhibitions, and brainbusting studying at Bio Japan in Tokyo and related links. I just saw a presentation on the H-Invitational database which though in Japan also combines the content of foreign databases. It is extremely impressive, and they combine lots of online calculators and results visualizers that are really impressive.

    Also figuring out biology seems to be a lot harder than figuring out networking, at least there are all kinds of nefarious things but also serendipitous things found. Like one presentation I just heard had a U.S. scientist who announced that they had discovered an entire signalling network in human cells that was like the one found in yeast cells. And apparently more proteins can be encoded than the number of genes, because of alternate orderings (counting from different displacements in the gene, I think, ask a real bioinformatics expert). One talk I heard a year ago that stuck with me was a scientist who had devised a way to find signalling pathways in cells quickly; by forcing the cell to die if certain requirements were not met, he created a parallel computer that allowed him to discover a whole swath at once. There is also a lot of math and statistics, as well as a lot of biological knowledge behind it, it is not strange to see various statistical tests, references to different computer programs they used for analysis, or a mention of simulated annealing (well maybe that one not so often, came up yesterday though).

    One interesting thing is that they (the H-Invitational people / Japan Bioinformatics Consortium) have I believe twice held what they call annotation jamborees, much like a hackfest! In 2002 they had 120 scientists gather (mostly Japan but from all over the world) in a big room with a computer per person. They locked them in for 10 days, and annotated IIRC over 20,000 genes, basically doing a figure some man years of work in a week, inputting data so it can be searched, analyzed, and crossreferenced.

    They do have a comparison between mouse and human genome there, I wonder if something similar could be done in open source in terms of annotating and indexing a libary of open source code in different languages, really all in one pseudo language would be more useful perhaps. Anyway biologists are learning from computer scientists learning from mathematicians, and someone famous has said that in the future, all science will be computer science.

    Bioinformatics people are doing text mining and data mining, but also there are many flavors and types of analysis programs designed to penetrate and match up information as encoded by tiny molecules, folded proteins, genes, and so on. Here are some links to get started. Also note the perl for bioinformatics books, and there was a big oreilly bioinformatics conference archived from 2003 and other links too (see bio.oreilly.org link below).

    I cannot speak for everyone, but I can convey what I have heard, that there have long been communication gaps that have held back some of this, actually cultural differences. For example physicists like pure math and biologists deal in dirty, wet things.. when people successfully combine different perspectives in this area [more] discoveries start getting made. In Japan at least they are trying to figure out how to grow more bioinformaticists, since students tend to go only towards either biology or towards computer science (why study twice as hard). But there seems to be a lot of interesting stuff in there for both sides.

    PLoS Bio article
    some clusty
    faq

    1. Re:Bioinformatics links by Anonymous Coward · · Score: 5, Informative
      And apparently more proteins can be encoded than the number of genes, because of alternate orderings (counting from different displacements in the gene, I think, ask a real bioinformatics expert).
      Actually, the increase in number of genes compared to actual encoded genes as you move up the "eukaryotic evolutionary chain" is due to the organisms finding new and novel ways to combine the same protiens.. not in different displacements of the same gene. See Nature paper on draft human genome analysis: Nature. 2001 Feb 15;409(6822):860-921 Also the draft Mouse genome analysis: Nature. 2002 Dec 5;420(6915):520-62
  4. Re:Contrasts: Datastreams to DNA by haluness · · Score: 2, Informative

    > Junk" in DNA (e.g., "latent" DNA) is probably not
    > junk

    Actually theres an article in this months SciAm that talks exactly about this. Very interesting

    http://sciam.com/article.cfm?chanID=sa006&colID=1& articleID=00045BB6-5D49-1150-902F83414B7F4945

  5. Re:Universal principles of information communicati by pjt33 · · Score: 2, Informative

    "Information theory". If you get too many random pages with that, throw "Shannon" in as well.

  6. Re:Universal principles of information communicati by cougartoo · · Score: 2, Informative

    Shannon's seminal paper created the field of information theory, it's a surprisingly easy read for such an influential paper.

  7. Re:Contrasts: Datastreams to DNA by pfafrich · · Score: 2, Informative
    "Junk" in DNA (e.g., "latent" DNA) is probably not junk, we just don't know the function (yet). No scientist worth their salt would admit that (at least not in earshot of a grant proposal review committee!)

    From what I've read there is a case that there is real Junk in the DNA. Various sequences which at some point in the past served a purpose but now (like the human apendix) the original function is no longer relavant. I've also read somewhere that some of the DNA is actually a sort of virus which eons ago colanised the DNA sequence.

    From Junk DNA

    There are many theories about the factors that shaped junk DNA and why it persists in the genome. Speculations are that:
    • These chromosomal regions are trash heaps of defunct genes, sometimes known as pseudogenes, which have been cast aside and fragmented during evolution. Evidence for a related hypothesis suggests that the junk represents the accumulated DNA of failed viruses.
    • Junk DNA acts as a protective buffer against genetic damage and harmful mutations. An overwhelming percentage of DNA is irrelevant to the metabolic and developmental processes, so it is unlikely any single, random insult to the nucleotide sequence will affect the organism.
    • Junk DNA provides a reservoir of sequences from which potentially advantageous new genes can emerge.
    • Junk DNA serves the role as "meta-DNA", being involved in the development of an organism from embryo to adult. Recent results indicate that so-called ultraconserved elements of junk DNA are common to all vertebrates, and this could mean that this part of the genome is essential to our survival.
    It may be that a combination of these are true, or partly true.

    The first of these seem to indicate a posibility of real junk.

    --
    There are four sorts of people in the world: fools, lunatics, idiots and morons. - Umberto Eco, Foucaut's pendulum.