Bioinformatics Books for the Technically Inclined?
bookEnders asks: "I hold a BS and MS in Biology. For the past 6 years, I have worked as a computer programmer not in field of Biology. I have an upcoming interview (several weeks from now) for a Bioinformatics programmer position. It appears to be a great job for me - a marriage of University training and professional experience. As LISP is a requirement, I have been burrowing through David Lamkins's Successful LISP tutorial. However, I am having trouble finding Bioinformatics books that are geared toward my skills: most are written for Biologists who don't know Linux or PERL. Others are written for Computer Scientists who don't know squat about Biology. I know enough about both that neither set of these books is too valuable. Can someone (hopefully those in the field) suggest reference or tutorial materials to help me prepare for this interview?"
If there arn't any books on the subject, I seriously doupt very many people know as much as you. I seriously doupt the people interviewing you will know more. So why worry, your probably exactly what they are looking for, and anything that you do need to know, I'm sure they will teach you on the job. Its the general knowledge and ability to learn that companies look for, they prefer it if they can teach you something.
I pose the question: how well can you really know Perl if you call it PERL?
If you type perldoc perlfaq1, you'll see, among other things:
What's the difference between "perl" and "Perl"? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom's quip that "Nothing but perl can parse Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and "Python and Perl" look OK, while "awk and Perl" and "Python and perl" do not. But never write "PERL", because perl isn't really an acronym, apocryphal folklore and post-facto expansions notwithstanding.
One bit. Oh, you weren't talking ASCII?
Get busy living or get busy dying. Carpe diem.
What area of biology does the job involve? With that, people could give you more specific pointers. Failing that, I'd suggest going to some web sites -- NCBI, ensembl.org, genome.ucsc.edu -- and looking at what's around. (Of course, my list is biased towards sequence-based genomics. If the job you're eyeing is in proteomics or arrays or some other functional genomics, it won't help as much which is why it would be useful to have a more specific pointer.)
What I'm listening to now on Pandora...
http://www.biolisp.org has a lot of information about Lisp and bioinformatics on their site...resources and code that you can play with. Franz Inc. also has a free "Basic Lisp Techniques" book that can be downloaded, and BioDB-Loader, a toolkit created by Peter Karp of SRI for loading and querying databases.
http://www.cs.ucr.edu/~stelo/pattern.html#Resource s
under "Books". I agree that the there is no book that cover 100% of Bioinformatics, but a a subset of these will definitely do.
I particularly like the book by Gusfield for the algorithms.
Regarding Perl, you are probably aware of a new book by O'Reilly about "Perl for Bioinformatics"
http://www.oreilly.com/catalog/begperlbio/
Regards,
Stefano
there's a new book by o'reilly about bioninformatics - "Developing Bioinformatics Computer Skills" - see http://www.oreilly.de/catalog/bioskills/
If you are reading Lamkin's "Successful Lisp", I want to point out a few other resources that you might find valuable.
... visit www.lisp.org's bibliography). This book culminates in the C code for a basic Lisp interpreter which is thoughtfully discussed thoroughly examined in the preceding chapters. Studying a well-documented implementation has been very helpful.
First, there is "Structure and Interpretation of Computer Programs" (which is in the opinion of many) the best introductory computer science book ever written. A pleasant "side-effect" of this book on the reader is a really good understanding of how Scheme in particular (close Lisp dialect) really works after reading it, in the context of general programming language concepts. The full text is available online at MIT Press website.
Paul Graham's "On Lisp: Advanced Techniques for Common Lisp" also deserves special mention. It is out of print (check out your local university library or try on-line used book shops), but definitely worth the read, if you are really going to delve into some serious Lisp programmming.
I also recently stumbled across a little book by Gary Knott called "Interpreting Lisp" (which can be downloaded off the web
There are probably many other good Lisp references out there; I just wanted to share the ones that have been particularly useful for me (two of which are freely available).
Sorry I can't help you more on the biology side of things. I am also very interested in bioinformatics. Recently saw a research level text on the mathematical problems arising out of genome sequencing that looked very interesting, called Algorithms on Strings Trees and Sequences by Gusfield.
My understanding is that computational research in genome sequencing techniques has crested, and that now the more important (and more difficult ) problems lie in predicting structure. For the latter, check out the lecture notes of Bonnie Berger at MIT, which are freely available on her webpage. She also has links to other papers/conference proceedings in this area that people interested in folding might find useful.
Good luck with your interview!
by Durbin et al. (Cambridge) is a good bet. It's mostly about the central algorithms (Smith-Waterman, Baum-Welch, etc.) -- as a LISP wonk, you'll be able to implement them efficiently.
O'Reilly seems to be getting into bioinformatics pretty heavily and while I don't think any of their more advanced books will be out in time for you, a search for 'bioinformatics' on their website turns up a rather large list of hits.
:)
I agree with the other poster who said you're probably perfect for the job as it is. Sounds like you've got good qualifications and in my opinion anybody who get any degree in biology can learn anything they put their mind to.
I sat in on a Bioinformatics course last spring. We used texts by Pevzner and Gusfield. I would recommend looking at Gusfield. It's definitely from the CS side, but that's probably more approach anyway, since you said the job required you to know Lisp, not how to run gels. Pevzner tries to straddle both sides, but doesn't always succeed. I would also second the Lisp in Biology site. P.
Ontologies for Ethology.... PEM
There's hardly any trained professionals in the field. These guys probably have something similar to your background. Just promise to be flexible and to learn whatever language/skills the job requires.
I got into the biz three years ago (as a biologist) and I'm still learning new languages.
Mostly C and XML transformation stuff, cause I honestly believe Perl & Lisp are not up to the massive amount of data challenge.
IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
I highly recommend "Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins" by Baxevanis and Ouellette-published by Wiley-Interscience; "Bioinformatics: Methods and Protocols" by Misener and Krawetz-published by Hamana Press; and "Bioinformatics: The Machine Learning Approach" by Baldi and Bruank-published by MIT Press.
There are three ebooks on the subject at www.mightywords.com
Data Analysis for Bioinformatics : Part 1: Probability, Statistics, Information Theory, Clusters
by Arun Jagota $10.00 Publisher: Arun Jagota Pub Date: 05/09/00
Data Classification for Bioinformatics : Supervised Methods
by Arun Jagota $10.00 Publisher: Arun Jagota Pub Date: 07/14/00
Perl for Bioinformatics
by Arun Jagota $9.95 Publisher: MightyWords Inc. Pub Date: 03/15/01
For my entire undergrad career while I was a double major in Math and Bio people kept asking me wtf I was doing. Now that these stupid idiots know what bioinformatics is too I have to compete with them for jobs that they are not qualified for. Here are my demands:
1.)An O'Reilly book deal. I know a *lot* of programming and UNIX system adminstration. A lot more than the crack users who they got to write these dreadful bioinformatics books.
2.)A lot of money. Especially in a position where I get to boss around straight-up bio geeks. My first order of business will be banning the reading of slashdot at work.