Biohackathon
wjv writes: "Open source Bioinformatics hackers from around the world are meeting in the
first ever Biohackathon to hack, eat, hack, sleep, hack... The South African
Business Day has the scoop, or see our weblog. The
event is co-sponsored by my employer
and O'Reilly. I'm typing this from the
hackathon, and you wouldn't believe the buzz... or the scenic venue!"
Interesting venue to hold a Biologicaly minded event. Many Capetonians will not go to the Oudekraal hotel, when the hotel was developed about 3 years ago there were large protests against developing on that part of the mountain due to ecological sensetivity, the fact that it is one of the last stretches of the coastline that isn't developed and its proximity to a kramat (burial place of a muslim Holy man). They also demolished a historic homestead to build the thing...
I am the director of a core molecular biology laboratory with a focus on agricultural genotyping at a major midwestern university. I am happy to see that there is an interest in providing better downstream tools for data analysis.
My main area of concern however is the lack of good tools to take the raw data from sequencing machines and produce genotypes. Most software available is vendor specific, closed source, not very robust and extremely expensive. The closed source vendor specific solutions which are available lock up the data in proprietary databases, making it difficult to migrate to equipment from other vendors in the future and to get the data organized for many projects. All the software (including the few open source projects that exist) that I have evaluated has the same basic flaw, it starts with the premise that the lab will use them to screen against an existing database of genotypes (for disease or pedigree). These are fine medical applications (for which they were developed) but does not address the needs of the basic research laboratory which is working to discover the genotypes to begin with.
I would like to build an open source application that gives the user the freedom to choose the data collection platform, the freedom to move the data from one application to another and the freedom to improve and expand the application itself. I face two challenges: 1) Administration that says "open source, why would we want to use shareware". This one I'm addressing by building the information infrastructure using Linux. 2) Finding qualified programers that would like to work on the project. (I'm a pretty good admin, but am not a programmer).
The need for this work is great. In talking with other people in my field, I've found that the key thing they want to know is what software are you using to do the raw analysis. No one is satisfied with the current situation, but most of these are old school and don't know anything about opensource software. I've showed them that we can use existing open source software to run the lab. I'd like to show them that we can develop our own software to do some of the basic work. Any volunteers?
http://www.extremeprogramming.org
You wouldn't believe the lack of anarchy among these people. They sound young, but there is a lot of personal discipline in that room.
The best product is the one that is tested and evolves with that experience - and this is working code, used in anger by the human genome project.
Hey, check out http://www.ensembl.org and see what you think.
As much as bioinformatics tries to combine biology, computer science, and mathematics (which no has metioned yet but which has as much importance as the other 2 disciplines), they do stay quite seperate with regards to actual the actual programs written. Imagine a biologist running a bioinformatics lab. He may come up with a problem for which computers would work well in solving. So, does this biologist write the program himself? No. He tells the computer scientist who either works for him or is in collaboration with him what he wants, and the programmer programs it. Perhaps he has a mathematition there somewhere too to help out with the algorithms, but in the end he does no 'real' work himself except to come up with the idea.
Computer sciencists, as you say, don't really care about the data and, per their training, are not able to think about biological processes with the same expertise a biologist is. Vice versa with the biologist. So, at some point you still need experts in each individual field, as opposed to trying to merge 3 disciplines into one.
I say this as a Ph.D. student in bioinformatics with a BS in biology and a very good computer science background. To be honest, my cs background is of much more use to me than my biology degree, since the biology we work in is specific (and thus easy to learn), as with most bioinformatics laboratories. Many people can write scripts to get the data they need, but where a good cs background comes in is the difference between a program running 3 weeks or 3 hours.