Genome Methods Applied to Reverse-Engineering
L1TH10N writes "Wired news has an article on a truely innovative way of analysing network protocol reverse-engineering. Marshall Beddoe, a security analyst, is using algorithms used in bioinformatics to analyse closed-source and secret network protocols which he calls "Protocol Informatics".According to Beddoe, network conversations are full of "junk" -- usually the actual data being sent -- which interferes with the analysis of the occasional command sequence that controls what to do with that junk. This has parrallels with Bioinformatics that has to deal with a similar problem of finding known DNA sequences separated by long gaps of unknown data. Biologists have devised complex algorithms to discover whether DNA sequences are descended from the same ancestors by comparing the genetic differences with the known mutation rates of certain DNA components. Beddoe applied the same principles to mutating network conversations of evolving network protocols."
I guess we are on our way to finding global laws for everything :)
Trolling using another account since 2005.
If we could find a way to apply said algorithms to spam at the gateway level.
If that could be implemented somehow (an attached appliance or something), it could drastically cut the amount of spam that goes through.
Striking fear in the authors of godawful fanfiction, I am here, appearing in darkness, Tuxedo Jack!
reverse-engineering methods applied to genome
Perhaps these techniques can be applied to the never-ending task of creating an accurate converter for MS Word .doc-uments?
Yes, simple document conversion is possible but until 100% accuracy is possible the race is not won.
This post encoded with ROT26. If you can read it, you've violated the DMCA. Handcuffs please, sergeant.
The Human Brain... the most complex and amazing computer ever built. The more we learn about it and how it works the more we can apply to computers. Imagine the computational power of the mind put to something specific.
I dont know what im talking about... but its cool anyway.
"All I can tell the "lesser of two evils" folks is that if they keep voting for evil, they'll keep getting evil."-Lp.org
Microsoft will finally be able to figure out what is happening in their own network protocols!
Of course, this is illegal in the US. No reverse engineering allowed
My email addy? should be easy enough.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
"Junk" in DNA (e.g., "latent" DNA) is probably not junk, we just don't know the function (yet). No scientist worth their salt would admit that (at least not in earshot of a grant proposal review committee!)
Curb CO2 emissions: Kill yourself today!
That'll come as a relief to Beddoe, who until now assumed that biologists wouldn't pay much heed to his project.
"They're working on uncovering the mysteries of life itself; we're just hacking network protocols," he said. "Which sounds more important to you?"
I don't think Beddoe should cheapen the reverse engineering aspects of networking compared to biology. We may still be years away from finding a cure to cancer, AIDs, etc. and there's a good chance that biology work in this area might not be as fruitful. After all, (without getting into a religious debate, here) man was not created by man, whereas network protocols are. Because of this, it is relatively easier for us to reverse-engineer something that was created by another human, because we know how they think. Evolution or creation, we don't know much about our own building blocks, because we don't know how either God thinks, or the universe fully works.
While his software is great for "hacking network protocols", the biologists paying attention to his work might not find what they are looking for. The inputs very well may be just too vast for his ideas to provide any help.
On the other hand, the Samba team and the Spam Assasin author will most likely enjoy this.
I think that network protocols are not similar to unmapped genome sequences in that network traffic is metadata and data.
Genome sequences are much more consistent. It's all data, processed by RNA computers.
I'd just grep the stream and be done with it.
it's "truly", damn it! TRULY!
Gary Larson has previously documented this phenomenon: http://home.earthlink.net/~grleone/funny/farside/g inger.gif
I always enjoy such articles.... Technology tranfer has been the cornerstone of innovation for how long? Companies study other industries in order to bring innovation to tired processes and technologies. It is responsible for many of today's disruptive technological achievement. Was it South West Airlines who did formal research on pit crews at Daytona (or something like that)? Regardless, keep up the good work... who knows the next great step in reverse engineering might come from examining how Vegas tears down their casino's, or is that just what I'm thinking for Windows. "It is a miracle that curiosity survives formal education." --Albert Einstein --j
"If you're flammable and have legs, you are never blocking a fire exit." - Mitch Hedberg
Both at the gateway and the SMTP server, it seems like sifting through junk to find what matters, and determining common ancestry would be useful anti-spam measures.
At least until the spammers figured out how to make spam look so much like certain types of legit email that we started losing good email...
Prolog configured to huge stacks does the job with a very little code actually writen. If you are sufficiently patient.
There you are, staring at me again.
Didn't realize the human Genome could be used as a hammer...
Comment removed based on user account deletion
Sweet...time to make some money! I'm gonna sue Beddoe for violating the DMCA because he is re-engineering my genes without my authorization!
Seriously, how much would a Big Red Button have cost?
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
You can't reverse engineer the genome: some of the genes are patented! Nevermind the prior art in your mom's nuclei, they literally own your ass - you've just got a limited license to use it. When they release the retrovirus with the broadcast flag flipped on, finally every Slashdotter's dream of "baby licenses" will be possible.
--
make install -not war
Also figuring out biology seems to be a lot harder than figuring out networking, at least there are all kinds of nefarious things but also serendipitous things found. Like one presentation I just heard had a U.S. scientist who announced that they had discovered an entire signalling network in human cells that was like the one found in yeast cells. And apparently more proteins can be encoded than the number of genes, because of alternate orderings (counting from different displacements in the gene, I think, ask a real bioinformatics expert). One talk I heard a year ago that stuck with me was a scientist who had devised a way to find signalling pathways in cells quickly; by forcing the cell to die if certain requirements were not met, he created a parallel computer that allowed him to discover a whole swath at once. There is also a lot of math and statistics, as well as a lot of biological knowledge behind it, it is not strange to see various statistical tests, references to different computer programs they used for analysis, or a mention of simulated annealing (well maybe that one not so often, came up yesterday though).
One interesting thing is that they (the H-Invitational people / Japan Bioinformatics Consortium) have I believe twice held what they call annotation jamborees, much like a hackfest! In 2002 they had 120 scientists gather (mostly Japan but from all over the world) in a big room with a computer per person. They locked them in for 10 days, and annotated IIRC over 20,000 genes, basically doing a figure some man years of work in a week, inputting data so it can be searched, analyzed, and crossreferenced.
They do have a comparison between mouse and human genome there, I wonder if something similar could be done in open source in terms of annotating and indexing a libary of open source code in different languages, really all in one pseudo language would be more useful perhaps. Anyway biologists are learning from computer scientists learning from mathematicians, and someone famous has said that in the future, all science will be computer science.
Bioinformatics people are doing text mining and data mining, but also there are many flavors and types of analysis programs designed to penetrate and match up information as encoded by tiny molecules, folded proteins, genes, and so on. Here are some links to get started. Also note the perl for bioinformatics books, and there was a big oreilly bioinformatics conference archived from 2003 and other links too (see bio.oreilly.org link below).
I cannot speak for everyone, but I can convey what I have heard, that there have long been communication gaps that have held back some of this, actually cultural differences. For example physicists like pure math and biologists deal in dirty, wet things.. when people successfully combine different perspectives in this area [more] discoveries start getting made. In Japan at least they are trying to figure out how to grow more bioinformaticists, since students tend to go only towards either biology or towards computer science (why study twice as hard). But there seems to be a lot of interesting stuff in there for both sides.
PLoS Bio article
some clusty
faq
"Information theory". If you get too many random pages with that, throw "Shannon" in as well.
Cheers - I appreciate the suggestion
"If you're flammable and have legs, you are never blocking a fire exit." - Mitch Hedberg
I work right in the middle of all that is biology at MIT(Center for Cancer Research, Biology, BioInformatics, Chemistry, Biological Engineering, Brain and Cog, Mathematics, Physics, Computer Science, etc..) and the geeks in each department are aware of the advancements made in other departments and how they can help themselves. In fact, MIT created something called CSBi, the Computational and Systems Biology Initiative(csbi.mit.edu), which has professors and students from all the departments listed above, and more. They collaborate, share students and projects, organize retreats and conferences. There's even a degree program in systems biology.
The majority of study is computer research applied towards biological methods and models, but I'm sure some of the cs geeks will be reading this article and grab the work done by the bio geeks.
And in the end, we will all have the best mouse trap ever.
Do you see the sig? Do you have it in your sights? Why yes, Miss Moneypenny...
Shannon's seminal paper created the field of information theory, it's a surprisingly easy read for such an influential paper.
For those "evolving" protocols...
http://www.ietf.org/rfc.html
There's a pdf here on the subject or you could read the google html version here.
Si la vida me da palo, yo la voy a soportar Si la vida me da palo, yo la voy a espabilar
So... I did this with intrusion detection (masquerade detective actually) about a year and a half ago. Just FYI ...
http://www.acsac.org/2003/beststud.html
I guess he should write a script to create a huge amount of very similar programs, and compile them all to create binary trees. Are there standard methods for analyzing such a data set? Is it just simple multivariate statistics?
Information theory and statistics/probability theory.
"Perhaps these techniques can be applied to the never-ending task of creating an accurate converter for MS Word .doc-uments?"
Or reverse-engineer the Nvidia driver.
Junk DNA acts as a protective buffer against genetic damage and harmful mutations. An overwhelming percentage of DNA is irrelevant to the metabolic and developmental processes, so it is unlikely any single, random insult to the nucleotide sequence will affect the organism.
I read something about this in NewScientist a while ago. Blocks of a certain base (guanine?) either side of important regions of DNA, which are more susceptible to damage (by free radicals?), serve to protect the important code, by being damaged first. Anyway, I thought it was really cool because it's basically analogous to bolting blocks of more easily oxidizable metal onto the hull of a ship, to prevent the hull from corroding. (What is this process called, anyone?)
You should read Godel Escher Bach by Douglas Hofstadter the abilities of DNA to hold both data and meta data about itself is the single most amazing thing about it. It is a code that contains the instructions to build itself and the source of that which makes us (to some extent at least) understand ourselves - our brain.
DNA demonstrates a programming system that defies Godel's second theorem: there is no such thing as a complete mathematical system, one that can prove itself.
Definitely worth looking at from a programming perspective, I'd say.
At which point we apply the spammers' techniques to genome research! :D
Wouldn't that be ironic, that spam actually DID provide a cure for cancer or some other disease? And you wouldn't even have to read it or buy anything!
Keith D.
Sounds exciting - applying one science onto another - I think this is the basic foundation on which Science builds itself up -isn't it!
is our genome protected under the DMCA or is that around the corner? Hope I didn't give them any ideas....
http://tinyurl.com/3t236