Biohackathon
wjv writes: "Open source Bioinformatics hackers from around the world are meeting in the
first ever Biohackathon to hack, eat, hack, sleep, hack... The South African
Business Day has the scoop, or see our weblog. The
event is co-sponsored by my employer
and O'Reilly. I'm typing this from the
hackathon, and you wouldn't believe the buzz... or the scenic venue!"
Interesting venue to hold a Biologicaly minded event. Many Capetonians will not go to the Oudekraal hotel, when the hotel was developed about 3 years ago there were large protests against developing on that part of the mountain due to ecological sensetivity, the fact that it is one of the last stretches of the coastline that isn't developed and its proximity to a kramat (burial place of a muslim Holy man). They also demolished a historic homestead to build the thing...
> Ultimately, the question is whether it is more efficient to teach a computer science student biology or teach programming to a biology student.
... something many genomics folks are only really starting to grok. There are only 30k or so genes, how do you get the human diversity with only 30k inputs? Hint: it's not going to be some simple linear model.
Bzzz. To narrow. The real question is, how do you find smart, tech savy people and turn them on to the questions at hand. The class of people you want are the ones that never let school get in the way of their education. I'll argue that you even want to pull people who have a background neither bio nor CS. Perhaps, finance, physics or fluid mechanics as example. Why? Finance folks do some wicked nasty statistics and modelling -- they eat and sleep stochastic calc. Physicists make their livings, fundamentally, by measuring and interpreting complex systems. Fluid mechanics deals with lots of non-linear systems
I am the director of a core molecular biology laboratory with a focus on agricultural genotyping at a major midwestern university. I am happy to see that there is an interest in providing better downstream tools for data analysis.
My main area of concern however is the lack of good tools to take the raw data from sequencing machines and produce genotypes. Most software available is vendor specific, closed source, not very robust and extremely expensive. The closed source vendor specific solutions which are available lock up the data in proprietary databases, making it difficult to migrate to equipment from other vendors in the future and to get the data organized for many projects. All the software (including the few open source projects that exist) that I have evaluated has the same basic flaw, it starts with the premise that the lab will use them to screen against an existing database of genotypes (for disease or pedigree). These are fine medical applications (for which they were developed) but does not address the needs of the basic research laboratory which is working to discover the genotypes to begin with.
I would like to build an open source application that gives the user the freedom to choose the data collection platform, the freedom to move the data from one application to another and the freedom to improve and expand the application itself. I face two challenges: 1) Administration that says "open source, why would we want to use shareware". This one I'm addressing by building the information infrastructure using Linux. 2) Finding qualified programers that would like to work on the project. (I'm a pretty good admin, but am not a programmer).
The need for this work is great. In talking with other people in my field, I've found that the key thing they want to know is what software are you using to do the raw analysis. No one is satisfied with the current situation, but most of these are old school and don't know anything about opensource software. I've showed them that we can use existing open source software to run the lab. I'd like to show them that we can develop our own software to do some of the basic work. Any volunteers?
Hmmm. i fail to see the problem with hypothetically low wages. 50-80k is plenty for me to live on, and more than someone would have to pay me to do work which i actually wanted to do. And aren't people who share a genuine interest in their work better workers than those after a raise?
My plan is to get a masters in bioinformatics, then perhaps look for a 50-80k job. In the meantime, i'll be happy paying for school, assuming i can find a good program, because it's something i want to study.
So, OT question for those who know: where are the best bioinformatics graduate programs? (my particular interest is in proteomics) And what should i consider while considering schools?
http://www.extremeprogramming.org
You wouldn't believe the lack of anarchy among these people. They sound young, but there is a lot of personal discipline in that room.
The best product is the one that is tested and evolves with that experience - and this is working code, used in anger by the human genome project.
Hey, check out http://www.ensembl.org and see what you think.
As much as bioinformatics tries to combine biology, computer science, and mathematics (which no has metioned yet but which has as much importance as the other 2 disciplines), they do stay quite seperate with regards to actual the actual programs written. Imagine a biologist running a bioinformatics lab. He may come up with a problem for which computers would work well in solving. So, does this biologist write the program himself? No. He tells the computer scientist who either works for him or is in collaboration with him what he wants, and the programmer programs it. Perhaps he has a mathematition there somewhere too to help out with the algorithms, but in the end he does no 'real' work himself except to come up with the idea.
Computer sciencists, as you say, don't really care about the data and, per their training, are not able to think about biological processes with the same expertise a biologist is. Vice versa with the biologist. So, at some point you still need experts in each individual field, as opposed to trying to merge 3 disciplines into one.
I say this as a Ph.D. student in bioinformatics with a BS in biology and a very good computer science background. To be honest, my cs background is of much more use to me than my biology degree, since the biology we work in is specific (and thus easy to learn), as with most bioinformatics laboratories. Many people can write scripts to get the data they need, but where a good cs background comes in is the difference between a program running 3 weeks or 3 hours.
i guess it depends on what it is you really want to do in the field..._very_ basically, there are two areas in bioinformatics: 1. the programmer who creates (possibly enterprise-level) tools as directed by the needs of the scientists and, 2. the bioinfomatics researcher/scientist who also develops tools at need, but also analyzes the data and makes conclusions and uses those conclusions/interpretations to guide wet lab work. and then, the results from the wet lab work come back to the bioinfo scientist who then incorporates the data to refine their ideas or to develop new ones which then go back into the lab. it's a very nice positive feedback loop when it works.
i fall in the latter category, which i like to call "genome hacking." the programming focus is to get the data and process it rather than making a tool that looks pretty, is user friendly, etc.
what i have found most useful in this regard is an extensive background in molecular/cellular biology (i have ~10 years of wet lab experience interspersed with my bioinformatics work (i've been full time bioinfo since '95/'96)). since molecular/cellular biology data is inherently noisy, i find that experience actually working with it and interpreting it has a profound impact on how i do my computational research as not only do i know what the wet lab is capable of doing, but i am also able to analyze wet lab data and make informed decisions based upon it...many times, this noise i spoke of has a story to tell...and sometimes it does not. it is experience that allows one to make the differentiation.
as to the type 1 bioinfo type, i always think that it is a good idea to have a working knowledge of the type of data you're processing--not just the form the data take (ie this is a text file, this is an image, etc.), but rather "this is a DNA sequence that may have errors in it and i need to be aware of that and know the types of errors that can occur so that i can include provisions for that." of course, it's more complicated than that, but i think you get the idea. of course, the best way to learn this is by doing...reading some basic molecular biology texts wouldn't hurt either. ;)
james
If the problem is just a common data issue, why don't you use a common database on the backend? Now I have no idea of the type of application you are dealing with but many programs today allow you to connect to some relational database via ODBC or a native driver. That's where you want your data anyway...sitting happily organized in a relational database, waiting for your next query.
The front-end can be locked and proprietary and you can point it at any database you need. I would be skeptical that the software you use doesn't allow even this (although I know bastard companies like this exist =). It seems trivial to program a frontend that does the number crunching based on queries from a relational database...I would suspect that organizing the data would be the hard part. Maybe I should finish my Perl for Bioinformatics book before I oversimplify. =)
Jayson
Never go to sea with two chronometers; take one or three.