Slashdot Mirror


Bioinformatics in the Post-Genomic Era

nazarijo (Jose Nazario) writes "As a biochemist by training, Jeff Augen's Bioinformatics in the Post-Genomic Era was very interesting to me. Though I left the field some years ago, I was using the bioinformatics tools that are covered in the book daily and still look in from time to time. Naturally I was curious to see a larger perspective, as well as any progressions, that have occurred in the past few years. Augen's book gave me part of the larger picture, but it could have done more." Read on for the rest of Nazario's review. Bioinformatics in the Post-Genomic Era author Jeff Augen pages 388 publisher Addison-Wesley Longman rating 7 reviewer Jose Nazario ISBN 0321173864 summary Genome, Transcriptome, Proteome, and Information-Based Medicine

Bioinformatics is the science of biological information, namely sequences and metadata about organisms and sequences. What's interesting about this field to many people, both in the sciences and outside of it, is the large volume of data that gets analyzed and the results that emerge on a daily basis. Obviously interesting for the medical advances and the rapidly growing business in the life sciences, there's a complex field that has developed in the past ten years or so. And following the sequencing of the human genome, new challenges have arisen for everyone involved. Augen's Bioinformatics provides a good introduction to this new field of research for students in the sciences, and anyone with a decent undergraduate education in modern biology. I think that this accessibility of the material is one of the book's biggest winning points.

After an introduction to the book and the subject area of bioinformatics (chapters 1 and 2), Augen begins at the level of the structure of a gene (chapter 3). Here, anyone with an undergraduate level understanding of genetics or molecular biology can begin using the book and bridging the gap to the new areas of modern bioinformatics. Augen then describes how basic sequence analysis is performed at the DNA sequence level (in chapter 4). The material in Bioinformatics covers some of the higher-level methods for sequence analysis, including hidden Markov models, neural networks, and pattern discovery, and introduces some of the common algorithms found to do this analysis.

Chapter 5 then covers transcription, the process of going from DNA to mRNA. Beginning with the biology behind this activity (the ribosome and the larger "transcriptome"), Bioinformatics then describes how you would perform transcriptional analysis. Here, Augen shows how you go from a wet lab to a computational lab and describes what classes of experiments you perform to gather data and then what kinds of analysis you perform on it. This chapter introduces some of the more common clustering techniques for data aggregation and understanding.

The next step in the DNA -> RNA -> protein chain is found in chapter 6, which covers the translation process. Coupled to chapter 7, which describes protein structure prediction and searching, these two chapters bridge the next gap between laboratory data and computational analysis. Protein folding and structure analysis was one of my pet areas of study as a graduate student, and Augen's text does a decent summarization of the field to date. The resources listed and techniques described are definitely on par with the common practices in the field.

Finally, Bioinformatics gets into the next major area of bioinformatics, medical databases. Augen's bridge from genetics to medical science is complete, and he discusses how medical professionals utilize databases and can begin to predict disease, for example, based on data mining. The final chapter, "New Themes in Bioinformatics," covers exactly that, but also what Augen refers to as "workflow computing," or basically going about being a bioinformatics scientist. One of my favorite emerging areas in bioinformatics, metabolic pathway elucidation, is also covered briefly.

I've shared this book with a few friends who are all studying computer science or practicing computer scientists. I did so because Augen's material does a good job of explaining my background and introducing them to some of the analysis forms I introduce into my own work. It does a good job of that, and gets them quite excited. Bioinformatics really bridges a number of fascinating areas of computer sciences, including data mining and high performance algorithms. Augen's Bioinformatics is a good introduction to the field for them, and really anyone who has studied a couple of biology courses in college.

Where the book falls short, however, can be grouped into two main areas. The first is the failure of Augen's presentation of the algorithms. While the methods used to describe computational algorithms in Bioinformatics is common for non-computer scientists, it's completely unusable for computer scientists who are used to a specific algorithm presentation style that looks more like pseudocode than rambling text. The ambiguities this presents for a technical reader are unfortunate, especially if anyone studying bioinformatics is supposed to be computer science literate. The book itself assumes a life science literacy, so this isn't an unreasonable expectation of the reader.

The second area that consistently falls short in the book is in the utility of the information given. While I am significantly happier with the quality and depth of material presented in Augen's book than in the O'Reilly bioinformatics series, where the book fails to deliver is in showing the reader how to actually use the data they gather. After all, the book shows various sequence analysis algorithms and discusses tools available to do this work, but it only devotes a few pages (out of over 370 in total) to a workflow that can be used. Also, the book fails to point the reader at very worthwhile web resources sometimes, including meta sites like the SDSC Biology Workbench site, and just says "some Perl scripts" for local data analysis. As such, you'll have to go a few extra miles on your own to make use of the data sources.

I guess a third complaint of the book for me is that Augen has ignored or omitted significant bodies of research that fit squarely into the scope of the book. For example, Ken Dill's research into protein folding models, as well as Martin Karplus' work on the subject, receives no mention, nor does the topic of Bayesian network analysis when Augen discusses time series data analysis. These aren't new, they've been around for many years and influenced most of the field, and their absence is noted. The book's spotty coverage in some places, like these, is noticeable.

Bioinformatics does a few things well, but overall reads too much like a biology textbook to be useful to the average computer scientist. More emphasis on the practice of bioinformatics and data analysis would have made this book stronger and complemented the substantive background material well. Finally, using an approach more similar to the computer science approach would have been a tremendous benefit, since the material really is computer science in part. That said, I think this is probably the best introduction to this exciting area of science that I have yet seen.

You can purchase Bioinformatics in the Post-Genomic Era from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

13 of 105 comments (clear)

  1. Re:Post Genomics Era? by killtherat · · Score: 5, Informative

    I think it's referring to the fact that mapping genomes is no longer the future (much like we live in a post-modern world).
    Genomics is now part of the game. It used to be that if you sequenced a gene, you could work a PhD off of it. Now that's simply the first step. So now that genomes are a part of every day life science, if you don't know how to run blast, you had better get back to school.

  2. Re:Post Genomics Era? by TheWhaleShark · · Score: 2, Informative

    I thought the same thing when I saw this title. As far as I know, and I'm a biologist by training, we are very much still IN the era of genomics. In fact, it would be rather big news if we ever LEFT said era.

    Yup, still got my genes.

    --
    "It never got weird enough for me." - HST (RIP)
  3. Re:TOC by Anonymous Coward · · Score: 1, Informative

    Here is the TOC (Table of Contents) posted by an AC that knows how to use tabs:

    Preface.
    1. Introduction.
    Overview.
    Computationally Intense Problems: A Central Theme in Modern Biology.
    Building the Public Infrastructure.
    The Human Genome's Several Layers of Complexity.
    Toward Personalized Medicine.
    Illnesses are Polygenic.
    New Science, New Infrastructure.
    The Proactive Future of Information-Based Medicine.
    2. Introduction to Bioinformatics.
    Introduction.
    The Emergence of Bioinformatics.
    The Public Database Infrastructure.
    Building Database Infrastructure for Bioinformatics.
    Traditional Bioinformatic Tools and Algorithms.
    Summary.
    3. Gene Structure.
    Introduction.
    The Central Dogma of Molecular Biology.
    The Genetic Code.
    Structure and Content of the Genome.
    Computational Techniques for the Identification of Genomic Features.
    High-Throughput Gene Sequencing. Summary.
    4. Computational Techniques for Sequence Analysis.
    Introduction.
    Hidden Markov Models.
    Perceptrons and Neural Networks.
    Pattern Discovery, Single Nucleotide Polymorphisms, and Haplotype Identification.
    Summary.
    5. Transcription. Introduction.
    The Transcriptome.
    Technologies for Transcriptional Profiling.
    Hierarchical Clustering.
    Summary.
    6. Overview of the Proteome and the Protein Translation Process.
    Introduction.
    Ribosomal Structure and the Protein Translation Process.
    Special Features of the Eukaryotic-Translation Process.
    Summary.
    7. Protein Structure Prediction.
    Introduction.
    Overview of Ab Initio and Database-Driven Approaches.
    Overview of Protein Structure.
    Protein Structure Databases.
    Ab Initio Structure Prediction.
    Predicting Lead-Target Interactions.
    Summary.
    8. Medical Informatics and Information-Based Medicine.
    Introduction.
    The Continuous Evolution in Understanding that Leads to Genomic Medicine.
    Electronic Medical Records.
    Grid Computing and Medical Informatics.
    Modeling and Predicting Disease.
    Summary.
    9. New Themes in Bioinformatics.
    Introduction.
    Overview of Parallel Computing and Workflow Distribution in Bioinformatics.
    Workflow Computing.
    High-Performance Computing and Systems Biology.
    The Delineation of Metabolic Pathways.
    Systems Biology.
    Summary.
    Further Reading.
    Index.

  4. Re:Post Genomics Era? by habuji · · Score: 4, Informative

    I think when biologists refer to the "pre-genomic" era, they're talking about before the Human genome and other genomes are sequenced. Now that many genomes have been sequenced, they call it the "post-genomic era." I think they're referring to the fact that there's not as much sequencing going on. Since there's so much genomic information available, the next step is to weed through it all, searching for gene function, silencing, and other stuff like that.

  5. Too broad in scope by tOaOMiB · · Score: 5, Informative

    Note: IAAB (I am a bioinformaticist)

    Having been in the field for 5 years or so, and matriculating for my PhD next year, I know something about the subject. Unfortunately, the subject "bioinformatics" is way too broad to ever make for a good book.

    For example, applying for PhD programs, I found myself looking at program names such as: Biophysics, Bioinformatics and Integrative Genomics, Biomedical Informatics, Computational and Systems Biology, and of course Bioinformatics. And the terms meant something different to each professor I spoke to, and are changing over time yet. Biomedical informatics definitely implies medical databases and EMRs (electronic medical records), while Biophysics implies more of a, well, physical approach (x-ray crystallography, cell movement and membrane forces).

    But Bioinformatics and computational biology encompass them all--including other topics such as protein folding, genomics, proteomics, sequence alignment, paper-mining, evolution. Each of these touches on a vastly different aspect of biology and/or computer science and to different degrees. A good book (and plenty long enough for a textbook, I assure you) could be written on any single sub-subject. A book titled bioinformatics isn't going to be worth your while.

    My 2 cents and rant. Thanks for bearing with me :)

    1. Re:Too broad in scope by Rei · · Score: 2, Informative

      I'll second this. The Biomedical Informatics Research Network, for example, covers everything from studies of how MRI images match up between different scanners at different sites to UMLS mappings of different mouse brain components to developing a distributed filesystem, custom computing racks, and various databases and query tools.

      Quite a diverse collection, really.

      --
      What a crazy random happenstance!
  6. Re:Important point: by Neil+Blender · · Score: 2, Informative

    BTW, the term is "bioinformaticists", not "bioinformaticians"

    Actually, it is you who is wrong. In the world of bioinformitics, "bioinformatician" is more widely used than "bioinformaticist". By the way, I work for a bioinformatics company.

  7. Re:Why would you have left the field? by Anonymous Coward · · Score: 1, Informative

    I don't think I'm alone in waiting for the sci-fiesque promises of advanced biotech.

    I hate to say it, but my opinion is that very few today are going to live to see the promise realized. The last polls of westerners I saw showed an almost universal dislike over the idea of genetic engineering. Have you ever heard the head of the US bioethics council speak? He's a nutjob who thinks humans are some sort of divine creation which stands apart from the animals. Any tinkering with our genetics is, to those who share his line of thinking, an affront to their God and their own sense of self. Just the knowledge that engineering of humans is possible is a violation of everything they hold dear. I think about the only real hope for westerners is that sneaking treatment in another country becomes possible, or that home hacking becomes commonplace and safe enough to be useful.

  8. disagree by mkcmkc · · Score: 2, Informative
    I beg to disagree. Computer scientists (i.e., skilled computer programmers, etc.) and biologists both have substantial domain knowledge that they're bringing to the table. A practitioner from either camp that fails to make use of the skills of a partner from the other is likely to leave a trail of serious messes in their wake. I see this a lot, and I think it really slows science down.

    Mike

    --
    "Not an actor, but he plays one on TV."
  9. Re:Should I go into Bioinformatics? by spin2cool · · Score: 2, Informative

    What you need to pick up really depends on what kind of work you want to do in the field. There are absolutely people with little understanding of biology all over. They typically do things like optimize and translate code or tweak algorithms for biologists. To move up to more interesting problems, though, you'll have to teach yourself quite a bit of biology and chemistry.

    My advice is to start with the basics. Pick up a college-level Intro to Biology textbook and learn the relevant stuff: Biological molecules, Natural selection and evolution, basic Genetics - the whole pathway from gene -> protein -> enzyme. These kinds of concepts are the foundations of biology that you need to understand before you can get into the hardcore stuff.

    If you enjoy chem, keep going through it too - finish general chemistry and work your way up through some organic chemistry and biochemistry. Structural and computational Biochemistry is HUGE right now, and you can definitely choose to go more of a chem path, if that's what floats your boat.

    MIT OpenCourseWare has whole sections devoted to Biology, Chemistry, and Biological Engineering. It's probably worth checking out, if nothing else, to guide you to some topics to look more closely at.

    Lastly, I'm going to encourage you to do your homework and make the jump. Both Universities and corporations are salivating over anyone with knowledge in both the life and computer sciences that can help bridge the gap between the two. (I should know - I'll be working on my PhD in Computational Biology starting this fall)

  10. Re:Post Genomics Era? by glwtta · · Score: 2, Informative
    Now that many genomes have been sequenced, they call it the "post-genomic era." I think they're referring to the fact that there's not as much sequencing going on.

    I'd say that there is far more sequencing going on right now than ever before, in terms of total output. GenBank provides a nice growth summary (note that the human genome was officially "completed" in 2003). It's just that we now have one nearly complete genome (human) and several largely complete, or getting there.

    To me, "post-genomic" sounds like a complete misnomer (probably coined to make it all sound exciting); I mean, finally having a workable genome kinda makes it seem like we just entered the "genomic" era, doesn't it?

    --
    sic transit gloria mundi
  11. Re:Post Genomics Era? by Torst · · Score: 2, Informative
    It's just that we now have one nearly complete genome (human) and several largely complete, or getting there.

    We have far more than one completed genome! The human genome project gets the most publicity of course, but there are hundreds of bacteria, viruses and plants which have been sequenced, see http://www.ncbi.nlm.nih.gov/Genomes/index.html. Many of these genomes have also been annotated by human curators - the so called "meta information".

  12. Bioinformatics book recommendations by Dioscorea · · Score: 2, Informative
    Uh, genomics isn't going anywhere

    Lots of molecular biologists would say the same thing (perhaps not in the way you meant it). Francis Crick apparently thought genomics was way overhyped.

    Seriously though, I sometimes wonder why anyone bothers writing another bioinformatics howto book when Durbin et al (apologies for amazon link) is still unrivalled. Maybe also Felsenstein for phylogeny, MacKay for general probabilistic modeling... anyone recommend anything for the coalescent? Microarrays? Image analysis? I could post book refs for these, but I'm not as fluent in those areas.