Bioinformatics in the Post-Genomic Era
Bioinformatics is the science of biological information, namely sequences and metadata about organisms and sequences. What's interesting about this field to many people, both in the sciences and outside of it, is the large volume of data that gets analyzed and the results that emerge on a daily basis. Obviously interesting for the medical advances and the rapidly growing business in the life sciences, there's a complex field that has developed in the past ten years or so. And following the sequencing of the human genome, new challenges have arisen for everyone involved. Augen's Bioinformatics provides a good introduction to this new field of research for students in the sciences, and anyone with a decent undergraduate education in modern biology. I think that this accessibility of the material is one of the book's biggest winning points.
After an introduction to the book and the subject area of bioinformatics (chapters 1 and 2), Augen begins at the level of the structure of a gene (chapter 3). Here, anyone with an undergraduate level understanding of genetics or molecular biology can begin using the book and bridging the gap to the new areas of modern bioinformatics. Augen then describes how basic sequence analysis is performed at the DNA sequence level (in chapter 4). The material in Bioinformatics covers some of the higher-level methods for sequence analysis, including hidden Markov models, neural networks, and pattern discovery, and introduces some of the common algorithms found to do this analysis.
Chapter 5 then covers transcription, the process of going from DNA to mRNA. Beginning with the biology behind this activity (the ribosome and the larger "transcriptome"), Bioinformatics then describes how you would perform transcriptional analysis. Here, Augen shows how you go from a wet lab to a computational lab and describes what classes of experiments you perform to gather data and then what kinds of analysis you perform on it. This chapter introduces some of the more common clustering techniques for data aggregation and understanding.
The next step in the DNA -> RNA -> protein chain is found in chapter 6, which covers the translation process. Coupled to chapter 7, which describes protein structure prediction and searching, these two chapters bridge the next gap between laboratory data and computational analysis. Protein folding and structure analysis was one of my pet areas of study as a graduate student, and Augen's text does a decent summarization of the field to date. The resources listed and techniques described are definitely on par with the common practices in the field.
Finally, Bioinformatics gets into the next major area of bioinformatics, medical databases. Augen's bridge from genetics to medical science is complete, and he discusses how medical professionals utilize databases and can begin to predict disease, for example, based on data mining. The final chapter, "New Themes in Bioinformatics," covers exactly that, but also what Augen refers to as "workflow computing," or basically going about being a bioinformatics scientist. One of my favorite emerging areas in bioinformatics, metabolic pathway elucidation, is also covered briefly.
I've shared this book with a few friends who are all studying computer science or practicing computer scientists. I did so because Augen's material does a good job of explaining my background and introducing them to some of the analysis forms I introduce into my own work. It does a good job of that, and gets them quite excited. Bioinformatics really bridges a number of fascinating areas of computer sciences, including data mining and high performance algorithms. Augen's Bioinformatics is a good introduction to the field for them, and really anyone who has studied a couple of biology courses in college.
Where the book falls short, however, can be grouped into two main areas. The first is the failure of Augen's presentation of the algorithms. While the methods used to describe computational algorithms in Bioinformatics is common for non-computer scientists, it's completely unusable for computer scientists who are used to a specific algorithm presentation style that looks more like pseudocode than rambling text. The ambiguities this presents for a technical reader are unfortunate, especially if anyone studying bioinformatics is supposed to be computer science literate. The book itself assumes a life science literacy, so this isn't an unreasonable expectation of the reader.
The second area that consistently falls short in the book is in the utility of the information given. While I am significantly happier with the quality and depth of material presented in Augen's book than in the O'Reilly bioinformatics series, where the book fails to deliver is in showing the reader how to actually use the data they gather. After all, the book shows various sequence analysis algorithms and discusses tools available to do this work, but it only devotes a few pages (out of over 370 in total) to a workflow that can be used. Also, the book fails to point the reader at very worthwhile web resources sometimes, including meta sites like the SDSC Biology Workbench site, and just says "some Perl scripts" for local data analysis. As such, you'll have to go a few extra miles on your own to make use of the data sources.
I guess a third complaint of the book for me is that Augen has ignored or omitted significant bodies of research that fit squarely into the scope of the book. For example, Ken Dill's research into protein folding models, as well as Martin Karplus' work on the subject, receives no mention, nor does the topic of Bayesian network analysis when Augen discusses time series data analysis. These aren't new, they've been around for many years and influenced most of the field, and their absence is noted. The book's spotty coverage in some places, like these, is noticeable.
Bioinformatics does a few things well, but overall reads too much like a biology textbook to be useful to the average computer scientist. More emphasis on the practice of bioinformatics and data analysis would have made this book stronger and complemented the substantive background material well. Finally, using an approach more similar to the computer science approach would have been a tremendous benefit, since the material really is computer science in part. That said, I think this is probably the best introduction to this exciting area of science that I have yet seen.
You can purchase Bioinformatics in the Post-Genomic Era from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Uh, genomics isn't going anywhere.
It's my feeling from working in EE that the dying fields are EE and software; the future is in the hands of the bio guys. So why did you leave? I'd give everything to get rid of my floaters, but don't give two hoots about the latest hardware. I don't think I'm alone in waiting for the sci-fiesque promises of advanced biotech.
Mostly random stuff.
Here is the TOC: Table of Contents: Preface. 1. Introduction. Overview. Computationally Intense Problems: A Central Theme in Modern Biology. Building the Public Infrastructure. The Human Genome's Several Layers of Complexity. Toward Personalized Medicine. Illnesses are Polygenic. New Science, New Infrastructure. The Proactive Future of Information-Based Medicine. 2. Introduction to Bioinformatics. Introduction. The Emergence of Bioinformatics. The Public Database Infrastructure. Building Database Infrastructure for Bioinformatics. Traditional Bioinformatic Tools and Algorithms. Summary. 3. Gene Structure. Introduction. The Central Dogma of Molecular Biology. The Genetic Code. Structure and Content of the Genome. Computational Techniques for the Identification of Genomic Features. High-Throughput Gene Sequencing. Summary. 4. Computational Techniques for Sequence Analysis. Introduction. Hidden Markov Models. Perceptrons and Neural Networks. Pattern Discovery, Single Nucleotide Polymorphisms, and Haplotype Identification. Summary. 5. Transcription. Introduction. The Transcriptome. Technologies for Transcriptional Profiling. Hierarchical Clustering. Summary. 6. Overview of the Proteome and the Protein Translation Process. Introduction. Ribosomal Structure and the Protein Translation Process. Special Features of the Eukaryotic-Translation Process. Summary. 7. Protein Structure Prediction. Introduction. Overview of Ab Initio and Database-Driven Approaches. Overview of Protein Structure. Protein Structure Databases. Ab Initio Structure Prediction. Predicting Lead-Target Interactions. Summary. 8. Medical Informatics and Information-Based Medicine. Introduction. The Continuous Evolution in Understanding that Leads to Genomic Medicine. Electronic Medical Records. Grid Computing and Medical Informatics. Modeling and Predicting Disease. Summary. 9. New Themes in Bioinformatics. Introduction. Overview of Parallel Computing and Workflow Distribution in Bioinformatics. Workflow Computing. High-Performance Computing and Systems Biology. The Delineation of Metabolic Pathways. Systems Biology. Summary. Further Reading. Index.
fuvoo: watch something
The ambiguities this presents for a technical reader are unfortunate, especially if anyone studying bioinformatics is supposed to be computer science literate. The book itself assumes a life science literacy, so this isn't an unreasonable expectation of the reader.
In bioinformatics, science literacy is so much more important than computer literacy. Computer scientists rarely become good bioinfromaticians. This is the primary reason almost every single peice of commercial bioinformatics software is a complete peice of shit. And why the free stuff is hacky but gets the job done. The free stuff was written by life scientists, the commercial stuff was written by computer scientists with no domain knowledge of the question they were trying to answer.
Bioinformatics is not something you 'just get into.' And it is not a natural path to go from CS to bioinformatics.
bioinformatics is more bio than informatics...
---- Where is my mind?
"Post-Genomic Era"? What, is Jon Katz back, and ghostwriting this time?
Note: IAAB (I am a bioinformaticist)
:)
Having been in the field for 5 years or so, and matriculating for my PhD next year, I know something about the subject. Unfortunately, the subject "bioinformatics" is way too broad to ever make for a good book.
For example, applying for PhD programs, I found myself looking at program names such as: Biophysics, Bioinformatics and Integrative Genomics, Biomedical Informatics, Computational and Systems Biology, and of course Bioinformatics. And the terms meant something different to each professor I spoke to, and are changing over time yet. Biomedical informatics definitely implies medical databases and EMRs (electronic medical records), while Biophysics implies more of a, well, physical approach (x-ray crystallography, cell movement and membrane forces).
But Bioinformatics and computational biology encompass them all--including other topics such as protein folding, genomics, proteomics, sequence alignment, paper-mining, evolution. Each of these touches on a vastly different aspect of biology and/or computer science and to different degrees. A good book (and plenty long enough for a textbook, I assure you) could be written on any single sub-subject. A book titled bioinformatics isn't going to be worth your while.
My 2 cents and rant. Thanks for bearing with me
I'm a Math major, Comp Sci/Physics minor out of university, been working with computer programming and database administration in the past 9 years, but have strongly been looking at changing careers and moving into bioinformatics.
Perhaps it's the DB admin that getting to me, but I've enjoyed being able to work with enormous data sets and putting puzzle pieces together.
It's a big leap. I'm 30. I only have first year chemistry under my belt (no university level biology) and having kids, a mortgage and my own health and sanity to take into account, it seems an enormous career change.
I've started to look into the field by checking out about a couple dozen books on the subject from my university library. (I've since whittled the pile down to just a few books!) I'm plodding along and what I've read to date is really intriguing, even if I'm taking a bizzare Math approach to understanding genetics.
I'm concerned that I have a niave approach to the field: looking at genomics, proteomics and bioinformatics as the biggest and coolest LEGO puzzle ever devised. Yet most books (especially the "Programming for Bioinformatics" types) seem to focus solely on data storage and not actually *using* the data.
Has anyone else here moved from Computing or Mathematics into Bioinformatics? Was the experience what you expected?
Ah yes, here we go:x t
http://www.chaosmatrix.org/library/humor/pshift.t
Looks like this guy has a newer version, I don't see a "bioinformatics" option.
Not that this wasn't entirely predictable.
"Bioinformatics : a practical guide to the analysis of genes and proteins"
Had much better sections in the third edition, which I got fresh out of the UW Library when it came in, on PSI-BLAST and BioPerl and suchlike.
The only downside to a textbook in our field is that half the database practical sections become out of date within a year or two.
-- Tigger warning: This post may contain tiggers! --
A kid interested in biology will get taken to the psychologist if he takes the neighborhood squirrel apart. Sadly, it's when you're young that it's the best time to learn by tinkering, so if you do like bio, you'll only get to tinker in your 20s when you hit university and put mice in the blender.
Like the other poster mentionned, the local priest also doesn't care if you take a radio apart and engineer it into a television, let's say. The freaky 12th-century religious fuckwits that dominate the US (it seems) are the biggest problem. (Yet they are the first to demand extraordinary measures for keeping corpses alive.)
Mostly random stuff.
Anything that happened after the agricultural revolution?
It seems a lot of cool stuff never appears in the dead sequence, but in the runtime environment. Functional genomics, with all that gooey chromatin and methylized dna is where all the action is, literally.
And structural genomics more than functional genomics.
I spend my day covered in protein sequences and worried about docking configurations and charges, quite frankly, working on drug design targets to help cure malaria and other nasty beasties.
-- Tigger warning: This post may contain tiggers! --
Mike
"Not an actor, but he plays one on TV."
Here is a link to the UM online MSC program...
o urse
http://www.bioinf.man.ac.uk/education/MSc.shtml#c
love is just extroverted narcissism
If you're looking for a book about bioinformatics then consider Bioinformatics by Baldi and Brunak.
Keep in mind that many if not most such jobs require a master's degree in the field.
I'm currently finishing such a degree. I'm an engineer with a strong interest in molecular biology and I've taken enough math credits to have a Bachelor's degree in mathematics. I think it's good to have a strong background in mathematics, software development and molecular biology to succeed in this field.
Seems like any other book on bioinformatics - its either too heavy on biology and not enough on quantitative factors, or too heavy on statistics and barely touches the significance of biology. I'd give my kingdom for a book that is bio enough to explain the importance of genetic markers, binding sites, etc, and how to spot them but CS enough to give pseudo-code for pairwise alignments and HMMs.
For those interested in the field, btw, bioinformatics is definitely more toward computational biology/genomics, medical informatics is more EMRs, medical laboratory systems, etc. and (at least IMHO) biomedical informatics encompasses all of the above and others. The days of commercial bioinformatics are over, though. If you're interested in a career in bioinformatics, you're best bet is going to be in academia, working with biologists and whatnot. The cash-money nowadays seems more oriented toward medical informatics.
Can we please let the term "Bioinformatics" die already?!
...sigh...
I never understood why people think it's special. We used to call these run-time studys, search algorithms, etc "Computer Science", or maybe just "Informatics".
It seems that biologists decided to learn Perl, and discovered (on their own, maybe!) that you could use it to search these sequence files they generate. Suddenly, they decided they needed to create this entire new field, totally ignoring all of the CS research before them.
It shows in the software they use, too. A huge ammount of software that is considered "production", fails in ways you'd expect a fresman CS student to fail.
"Blast" doesn't have consistent return codes!
"cross_match"/"PXM" has no concept of memmap(), and will happly malloc() multi-GB spaces so it can slurp in entire files!
Ok, I'm bitter... Working here, and see this all the time. It's CS people! Grrr...
#include [std_disclamer.h]
Ce n'est pas une signature automatique.
Nobody will want to hire us normal humans.
We generally have lots of flaws.
Once made-to-order humans become common, all
of us existing people become obsolete. We'll
be, at best, like chimps or gorillas in the
new world.
Life wouldn't be grand for the new people either,
because then human version 2.1 comes out, etc.
Instead of going for this book, which sounds rather weak, try this syllabus: http://bio5495.wustl.edu/ by Sean Eddy, one of the world's most effective bioinformaticians. You'll learn more. If the biology is incomprehensible, the classic introduction is probably still Watson's Molecular Biology of the Gene, now in its 5th edition: http://www.amazon.com/exec/obidos/tg/detail/-/0805 34635X
When I started grad school (in biology, but I did computational work on evolution and gene finding) in 1992 we called it "computational biology" I never heard the term "bioinformatics" until the CS people discovered the field after the dot-com bust.
"It's my feeling from working in EE that the dying fields are EE and software; the future is in the hands of the bio guys. "
Type Onocology into a search engine. That will be the next growth field. Making human information more accessable to machines.
"I'm concerned that I have a niave approach to the field: looking at genomics, proteomics and bioinformatics as the biggest and coolest LEGO puzzle ever devised. Yet most books (especially the "Programming for Bioinformatics" types) seem to focus solely on data storage and not actually *using* the data."
5 926641/qid%3D1112750321/sr%3D11-1/ref%3Dsr_11_1/10 2-2973245-9165750?v=glance&s=books
http://www.amazon.com/exec/obidos/tg/detail/-/156
the markets are very small, and there are a lot of companies in each market, so they cant actually afford any real programmers to write real algorythms; instead, they take some piece of code written by a grad student , add a fancy but not very usable gui, and sell the result..
Lots of molecular biologists would say the same thing (perhaps not in the way you meant it). Francis Crick apparently thought genomics was way overhyped.
Seriously though, I sometimes wonder why anyone bothers writing another bioinformatics howto book when Durbin et al (apologies for amazon link) is still unrivalled. Maybe also Felsenstein for phylogeny, MacKay for general probabilistic modeling... anyone recommend anything for the coalescent? Microarrays? Image analysis? I could post book refs for these, but I'm not as fluent in those areas.
[Why are you on /.?] Hmmm. Not sure. Because the converse of my statement is surely: Bioinformatics people have no interest in programming, linux, etc. Actually, it's a little known fact that all bioinformatics is still done with pencil and paper (we can't even use calculators, the Patriot Act forbids it).
...
Shhh. Don't mention that. Next thing you know Congress will outlaw our Sliderules and Pencils.
It's hard enough using Polaroids to take pictures of the gels when we PCR
-- Tigger warning: This post may contain tiggers! --