Beginning Perl for Bioinformatics

← Back to Stories (view on slashdot.org)

Beginning Perl for Bioinformatics

Posted by timothy on Tuesday January 29, 2002 @03:00AM from the listen-up-class dept.

babbage writes:"As the banner above the title of James Tisdall's Beginning Perl for Bioinformatics indicates, this book is 'an introduction to Perl for biologists.' What the banner doesn't mention is that it's also an introduction to biology and bioinformatics for Perl programmers, and it's also an introduction to both Perl *and* biology for people that have never really been exposed to either field. The author has clearly thought a lot about making one book to please these different audiences, and he has pulled it off nicely, in a way that manages to explain basic topics to people learning about each field for the first time while not coming off as condescending or slow-paced to those that might already have some exposure to it." Read on for the rest of his review. Beginning Perl for Bioinformatics author James Tisdall pages 400 publisher O'Reilly & Associates rating 8 reviewer babbage ISBN 0-596-00080-4 summary Well-balanced approach to applying Perl's sorting and analytical abilities to the field of bioinformatics.

Superficially, this book isn't all that different from a lot of introductory Perl books: the Perl material starts out with an overview of the language, followed by a crash course on installing Perl, writing programs, and running them. From there, it goes on to introduce all the various language constructs, from variables to statements to subroutines, that any programmer is going to have to get comfortable with. Pretty run of the mill so far. Tisdall starts with two interesting assumptions, though: [1] that the reader may have never written a computer program before, and so needs to learn how to engineer a robust application that will do its job efficiently and well, and [2] that the reader wants to know how to write programs that can solve a series of biological problems, specifically in genetics and proteomics.

As such, there is at least as much material about the problems that a biologist faces and the places she can go to get the data she needs as there is about the issues that a Perl programmer needs to be aware of. The author introduces the reader to the basics of DNA chemistry, the cellular processes that convert DNA to RNA and then proteins, and a little bit about how and why this is important to the biologist and what sorts of information would help a biologist's research. The main sources of public genetic data are noted, and the often confusing -- and huge -- datafiles that can be obtained from these sources are examined in detail.

With the code he presents for solving these problems, Tisdall makes a point of not falling into the indecipherable-Perl trap: this is a useful language, well-suited to the essentially text-analysis problems that bioinformatics means, and he doesn't want to encourage the kind of dense, obscure, idiomatic coding style that has given Perl an undeservedly bad reputation. Some of Perl's more esoteric constructs are useful, and they show up when they're needed, but they're left out when they would only serve to confuse the reader. This is a good decision.

Rather, the focus is on teaching readers how to solve biological problems with a carefully developed library of code that happens to leverage some of Perl's most useful properties. The result is pretty much a biologist's edition of Christiansen & Torkington's Perl Cookbook or Dave Cross' Data Munging With Perl. The author presents a series of issues that a working bioinformaticist might have to deal with daily -- parsing over BLAST, GenBank, and PDB files, finding relevant motifs in that parsed data, and preparing reports about all of it. If a bioinformaticist's job is to be able to report on interesting patterns from these various sources, then following the programming techniques that Tisdall explains in clear, easy-to-follow prose would be an excellent way to go about doing it.

And when I say "programming techniques," note that I'm not specifically mentioning Perl. The code in this book is clear and organized, and all programs are carefully decomposed into logical subroutines that are then packaged up into a library file that each later sample program gets to draw from. Each new program typically contains a main section of a dozen lines of code or less, followed by no more than two or three new subroutines, along with calls to routines written earlier and called from the BeginPerlBioinfo.pm that is built up as the book progresses. Each sample is typically preceded by a description of what it's trying to accomplish and followed by a detaild description of how it was done, as well as suggestions of other ways that might have worked or not worked.

This modular approach is fantastic -- too many Perl books seem to focus so heavily on the mechanics of getting short scripts to work that they lose sight of how to build up a suite of useful methods and, from those methods, to develop ever-more-sophisticated applications. It isn't quite object-oriented programming, but that's clearly where Tisdall is headed with these samples, and given a few more chapters he probably would have started formally wrapping some of this code into OO packages.

If I have a complaint with the book, in fact, it's that Tisdall doesn't go any further: everything is good, but it ends too soon. Seemingly important topics such as OO programming, XML, graphics (charts & GUIs), CGI, and DBI are mentioned only in passing, under "further topics" in the last chapter. I also have a feeling that some of the biology was shorted, and the book barely touches upon the statistical analysis that probably is a critical aspect of the advanced bioinformaticist's toolbox. I can understand wanting to keep the length of a beginner's book relatively short, and this was probably the right decision, but it would have been nice to see some of the earlier sample problems revisited in these new contexts by, for example, formally making an OO library, showing a sample program that provided a web interface to some of the methods already written, or presenting code that presented results as XML or exchanged them with a database.

But these are minor quibbles, and if the reader is comfortable with the material up to this point, she shouldn't have a hard time figuring out how to go a step further and do these things alone. It's a solid book, and one that should be able to get people learning Perl, genetics, or both up to speed and working on real world problems quickly.

You can purchase Beginning Perl for Bioinformatics at Fatbrain. Want to see your own review here? Read the review guidelines first, then use Slashdot's webform.

127 comments

Min score:

Reason:

Sort:

if only it was in italian... by joss · 2002-01-29 03:05 · Score: 4, Funny

then I could learn perl, biology, and Italian all at the same time.

--
http://rareformnewmedia.com/
Heh by British · 2002-01-29 03:09 · Score: 3, Funny

"You got your Perl in my biology!"

"You got your biology in my perl!"

Two great interests that interest great together!
Awesome. by sawilson · 2002-01-29 03:10 · Score: 1

I like it when I see a "tie in" to another industry or scientific discipline. I could read this book, learn all about DNA, crack it with a perl script, then get served papers by $DEITY so I can be prosecuted under the DMCA.

--

The most important thing any republican needs to know.
1. Re:Awesome. by TooTallFourThinking · 2002-01-29 05:00 · Score: 1
  
  Maybe this book could create a trend in scientific fields, where the focus of books become merging Perl and the discipline. Of course, it isn't as broad as many of the other Perl books out there, but I enjoy this sort of deviation.
  
  And holy crap! A Pixie's reference, and an obscure one at that. You get snaps from me.
Biology and Perl... by TheCow · 2002-01-29 03:11 · Score: 1

Now I can convert the code for my Terminator robot from Fortran 77 to Perl! Good bye columns!
statistical approaches by ciole · 2002-01-29 03:12 · Score: 5, Insightful

I felt the same about the lack of statistical approaches. While this book is probably great for biologists just learning to write code, for coders entering the field (bioinformatics) it contains too little biology or math to be really educational. My opinion.

What I'd love would be a dissection of the construction of various motif analysis tools, critiquing various impl's of HMMs, really going into detail. This seems like a perfect complementary work to OSS, so I might even find one, someday...
1. Re:statistical approaches by sql*kitten · 2002-01-29 03:25 · Score: 2
  
  . While this book is probably great for biologists just learning to write code, for coders entering the field (bioinformatics) it contains too little biology or math to be really educational. My opinion
  
  A question for practitioners: why would you want to use Perl over a flat file data set, rather than loading the data into Oracle and using professional data mining (OLAP/DSS/DW) tools? Surely the latter are more mature and comprehensive.
2. Re:statistical approaches by ciole · 2002-01-29 03:44 · Score: 1
  
  IMHO, you wouldn't, although i question the selection of Oracle here. There are a variety of tools suited for the actual kinds of analysis one might do - but i'd be surprised if any "enterprise" DB or general data mining software could be most effectively used. Check out meta-meme as an example of bioinformatics-specific statistical software.
3. Re:statistical approaches by babbage · 2002-01-29 03:50 · Score: 3, Informative
  
  Well of course loading data into an DBMS is the ideal here, it's the loading of the data into one that's the tricky part :)
  Generally, a lot of biological data is publically available from sources such as NCBI (US national computational biology lab) and EMBL (European molecular biology lab), but it could be coming in as SQL statements ready for loading into your database, CSV or TSV files, any of several annoyingly flexible standard biological data exchange formats, or worst of all something like an Excel spreadsheet or just scraped from a web page somewhere. There is way too much of this stuff to pump it all into your local storage system by hand, so you need something like Perl that can munge it into an intermediate format that can be loaded properly. Once it's actually in there then yeah, you only revert to some sort of flat file system if you want to redistribute data.
  A related but more central problem is in looking for interesting patterns in these huge datasets once you have them locally, whether in flat files or a database or what have you. This is a huge area of research right now, because modern bioloogical lab technques can slurp up data extremely fast, we have the whole genome decoded but uninterpreted, etc, and now we need computational techniques that can chew through this fire hose of information efficiently.
  A lot of this seems to be unsolveable at the moment, because the algorithmic complexity is up there with the Travelling Salesman problem (e.g. protein folding), so every little bit that can chip away at the difficulty of it helps. Perl is good at this, and a lot of places are using it heavily right now. Being able to work with flat files is only one aspect of it; it just happens to be a useful one to teach with, which is why it was used so heavily in the book, but in actual use the applications of Perl go way beyond simple file maniipulation.
  
  --
  DO NOT LEAVE IT IS NOT REAL
4. Re:statistical approaches by Marcus+Brody · 2002-01-29 03:59 · Score: 4, Informative
  
  why would you want to use Perl over a flat file data set
  
  Good Question. Answer is yes and no.
  Flat Files are really quite useful in biology (btw, when a biologist mentions a "database", he almost certainly mean a "flatfile"). DNA/RNA/Proteins are just a long sequence of letters, and therefore these are perfectly represented by good 'ol ASCII. This is particularly useful for means of distribution etc. When annotations are added to the data, they are traditionally added to the flatfile by way of an "annotation table", to keep the simple ease of ASCII.
  
  However, more advanced ways are used to store annotations of biological data, although traditional databases arent allways that good at expressing the rather messy, randomness of biology ;-) Therefore, specialised databases such as acedb are quite useful and intuitive to the biological mind. Furthermore, projects such as ensembl (which ambitiously attempts annotations on the whole genome) store their data in an SQL database. However, they still make extensive use of perl to interact wiht the database.
5. Re:statistical approaches by dAzED1 · 2002-01-29 05:45 · Score: 1
  
  I felt the same about the lack of statistical approaches. While this book is probably great for biologists just learning to write code, for coders entering the field (bioinformatics) it contains too little biology or math to be really educational. My opinion.
  A coder should already have a very firm grasp on mathmatics. If not, they're not gona do well...
6. Re:statistical approaches by T.+Will+S.+Idea · 2002-01-29 06:44 · Score: 1
  
  What I'd love would be a dissection of the construction of various motif analysis tools, critiquing various impl's of HMMs, really going into detail.
  
  Try Pavel Pevzner's Computational Molecular Biology for an overview of many different algorithms; Durbin et al Biological Sequence Analysis for probabilistic approaches (especially Hidden Markov Models); and Baldi and Brunak's Bioinformatics for a machine learning approach.
  
  Obviously there is overlap in these (they all cover HMMs for example) but they each approach real computer science problems in detail and from different points of view.
  
  --
  If electricity is produced by electrons is morality produced by morons?
7. Re:statistical approaches by Anonymous Coward · 2002-01-29 07:29 · Score: 0
  
  well there's a difference between a firm grasp and what you would need to know to make any headway in real biology, or chemistry, or physics...most coders are happy that they can count, let's leave them at that.
8. Re:statistical approaches by Anonymous Coward · 2002-01-29 07:32 · Score: 0
  
  As a reflection on my research experiance (as a biologist doing computational stuff), the following are good books, note that bioinformatics (in all senses) is a recent topic, so there are few books because everyone has been publishing papers.
  
  For HMM's take a look at "Biological Sequence Analysis" (Durban, Eddy, Krogh and Michhison), for stats and a wider range of other methods try "Bioinformatics" (Baldi & Brunak). For a wider range of practical bioinformatics information see "Protein Structure Prediction" (ed. Sternberg, good luck finding a copy). Background in protein structure is well covered by "Introduction to Protein Structure" (Branden & Tooze).
  
  Now go forth and solve the protein folding problem, see you all at CASP... :)
9. Re:statistical approaches by ciole · 2002-01-29 09:14 · Score: 1
  
  Yeah, see, "math" is actually, a pretty damn big subject. The kinds of statistics in use for biology is not much like problems of NP completeness, or of hash functions, or ...
10. Re:statistical approaches by JimMcCusker · 2002-01-29 09:24 · Score: 1
  
  I refer you to this excellent paper talking about that very problem: Practical Lessons in Supporting Large-Scale Computational Science (in pdf). The gist of it is the tradeoffs between RDBMS's and custom flat files. It seems that (and I've dealt with this myself, competing in KDD Cup 2001) while a naive set of code does far worse than a database+olap, a indexed and paged data format (memory mapped) does far better, with less overhead. Of course, it's harder to apply your favorite Machine Learning or AI algorithm to stuff that's in a database. I've found that, even when I put it into a database, I pull it back out to perform real computation on it.
11. Re:statistical approaches by Untimely+Ripp'd · 2002-01-30 09:32 · Score: 1
  
  A question for practitioners: why would you want to use Perl over a flat file data set, rather than loading the data into Oracle and using professional data mining
  
  The basic answer to this question is that the problems involved in sequence analysis are large, numerous and lucrative, which means it is economically feasible to develop specialized code rather than applying general data-mining. The most common algorithm used in sequence analysis, BLAST, requires preprocessing of the database into a particularly efficient data structure prior to running queries against it.
  
  At least one company, Accelrys (formerly Genetics Computer Group) has developed a large scale Oracle solution that applies many standard bioinformatics algorithms to the entire Genbank database.
  
  --
  And let the angel whom thou still hast serv'd tell thee ...
I haven't read it myself but by Theodore+Logan · 2002-01-29 03:15 · Score: 5, Insightful

I have a number of friends in the business who have read that book. In summary:
1) It is good for biologists who wants to learn how to program
2) It is not good for programmers who want to learn biology
Obviously, my friends disagree with reviewer Babbage on this point. However, a quick look on Amazon reveals that most reviewers who found the book interesting are biologists with no programming experience instead of the other way round.

--
"If you think education is expensive, try ignorance" - Derek Bok
Flashbacks by keiferb · 2002-01-29 03:17 · Score: 4, Interesting

Seeing a title like this, aiming a particular language at a particular discipline makes me flash back to the college days (last year) where the engineering classes all used fortran. God forbid, if perl gets outdated in another few years, are all the Biologists in the world going to lock themselves into a dead language like those stuffy engineers?
1. Re:Flashbacks by glwtta · 2002-01-29 04:26 · Score: 4, Informative
  
  I've worked in bioinformatics for the last few years, and I can say that there's a bit of a difference between bioinf and perl, and engeneering and fortran - perl is suited for bioinformatics far, FAR better than any other language. And so far the benefits of modern languages just can't seem to outweigh this innate suitability.
  Traditionally almost all bioinformatics tools have been done in perl, and they continue to be so, for one very simple reason - bioinformatics, when it comes down to it, is just plain text processing.
  Anyway, about the book itself - it's nice for biologists who want to learn something about programming, but I neither learned much about biology from it, nor am I afraid I will lose my job because all the bio people are gonna start doing their own programming :)
  
  --
  sic transit gloria mundi
2. Re:Flashbacks by Anonymous Coward · 2002-01-29 04:34 · Score: 0
  
  Let's see, what are the options? People can learn something and risk it will become an old, dead language or they can learn nothing.
  
  You may wish to grump about Fortran being outdated. I got news for you, you can re-implement most of the stuff in huge old fortran libraries in what ever Super-Uber-OO language you want. You then get to go off and debug and debug and debug then probably optimize and optimize and optimze -- and when it doesn't work just right you get to defend yourself. Roughly speaking the bits being shoved through the cpu to do a specific job will be the same regardless of language. Save yourself lots of time and headaches, write some wrappers to bind to whatever OO language you want.
3. Re:Flashbacks by lisam · 2002-01-29 06:08 · Score: 1
  
  He's (she's?) right; *everything* in technology has the approximate shelf life of yoghurt. Except maybe a passion for knowledge or a spirit of innovation. If you want to wait for the technology that will never change, you'll be a very old and unproductive programmer.
4. Re:Flashbacks by T.+Will+S.+Idea · 2002-01-29 06:08 · Score: 1
  
  Traditionally almost all bioinformatics tools have been done in perl
  
  Not really. Perl is used a lot for parsing output and tying things together, for the reasons that you have stated above. Also, bioinformatics is very web-centric and a lot of people build web-apps in Perl. Java is used a lot for building GUIs. Most of your heavy duty algorithms are written in C or C++. BLAST is probably the number 1 bioinformatics tool in use today and that is written in C. In short, the programming language chosen is often the one best suited to the job.
  
  Granted, you run into people (primarily in academia) who only program in Perl or Java or Fortran. But the most useful tools are often ported.
  
  --
  If electricity is produced by electrons is morality produced by morons?
5. Re:Flashbacks by glwtta · 2002-01-29 09:34 · Score: 2
  
  I think I was a bit too general in what I said - yes many widespread "standard" bioinf tools are not Perl (I can' remember what GCG is written in, at the moment, but it's probably a mix of C and Perl) - but "on the job" day-to-day work is still mostly done in Perl. You usually don't see too much GUI stuff (and therefor java/swing) because bioinf is still a very command line oriented world (probably because of things like BLAST, mview and so forth). When we did things in java to fit in with everything else at the company, the bulk of work was being done by org.apache.oro.text.perl :)
  
  --
  sic transit gloria mundi
6. Re:Flashbacks by squidfood · 2002-01-29 11:13 · Score: 1
  
  Whaddaya talking about? I'm a biologist, I'm working for the govm't, and we all use fortran. Locked, indeed. My only wasted time in College was learning C.
Strings by Kones · 2002-01-29 03:18 · Score: 0, Flamebait

Unfortunately, after flipping through this book extensively at Barnes & Noble, I found it to be a glorified Perl string manipulation book, applied to strings of DNA info instead of "Hello World!" type data. There was only one decent chapter on specific file format conversion. Not very worthwhile in my opinion,

--
Wouldn't you like to be a pepper, too?
The challenge of Bioinformatics by nesneros · 2002-01-29 03:19 · Score: 5, Informative

Bioinformatics is probably the biggest challenge facing the biological sciences in the next few years. Its becomming more and more apparent that even slight changes in very small elements of a system (i.e., a small sequence of a protein, the behavior of a single neuron within a group of 10,000) can have a drastic effect on the behavior of the entire system. As a result, to really study the problem, you have to aquire massive amounts of data. For example, in our lab we routinely collect data from 64 channels of 16-bit data (monitoring neuron firing in culture) at 1KHz, in addition, we're simultaneously taking calcium imaging video at 100fps at 256x256 (at 256 colors). This results in about 200 MB of data gathered every second. Considering we run tests for over 10 minutes, just aquiring and storing this data is a challenge, but finding useful methods to analyze it is even more difficult. Its refreshing to see texts being written on how to bridge the gap between comp. sci. and biology. I've been working in the area for about 4 years now, and its really great to see the field growing and getting more mainstream attention.

--
Some men spend their entire lives trying to kill themselves for having been born. --Ross MacDonald
1. Re:The challenge of Bioinformatics by Anonymous Coward · 2002-01-29 03:49 · Score: 0
  
  It's called COMPRESSION, dude.
2. Re:The challenge of Bioinformatics by bdoliver · 2002-01-29 04:16 · Score: 0, Flamebait
  
  Ok I have to...Lets work this out.
  
  channel data = 16 bits * 64 channels * 1KHz = 1Mbit/s
  frame data = 256 * 256 * 8 bits = 524,288 bits
  video data = (524,288 bits) * 100 frames/sec = 52,428,800 Mbit/s
  
  This then is about 53Mbits or 7MB. I am not saying this is a small amount of data, but makes me laugh...bio people.
3. Re:The challenge of Bioinformatics by babbage · 2002-01-29 04:39 · Score: 3, Informative
  
  Ok, so you can work out about how much data is coming out of that machine. Now assume that the lab in question has several such machines, and that labs all over the world are churning out this degree of output, and maybe your lab needs to keep a local copy of all the relevant data. Go ahead & make up your own numbers if you'd liike, but keep in mind that this is a huge field these days, so there are probably hundreds or thousands of such groups working on it all, and they're churning out, by your math, 7mb per second per machine per lab.
  Now take a step in a different direction, and realize that we don't know what *any* of this stuff means (much less than 1% of it, at a rough estimate). We've got a completed genome project that has produced another mountain of mostly undecoded data.
  Or to go to the central issue, we understand that DNA translates trivially to RNA, then to chains of amino acids that fold up into balls of protein, with secondary, tertiary, (etc) levels of structure. Largely this is determined by how the chemical bonds between each amino acid twist together, and how disparate segments of the chain come close together or far apart. And the effect of this protein chain biologically is determined by which segments of the chain end up at which parts of the knot: the same sequence of amino acids can be neutral or active depending on whether it's near the surface, for example. Finally, go a step further and realize that all these proteins in the body are in a constant state of flux, constantly changing each other, catalyzing each other, restricting each other, and so on. The number of active variables very quickly hits a point that becomes incalculable, and we're down to a new version of the travelling salesman problem, which no contemporary computer system can even dent, nevermind solve.
  Laugh at the stream of data if you want to, but keep in mind that it's not like we're just talking about a piece of network hardware that needs to be able to shuffle this much data around more or less blindly. Rather, any & all of it could be biologically relevant in any given context, and so each bit of that data stream has to be scrutinized, often more than once in different contexts. It is, simply, a *huge* amount of computational work.
  
  --
  DO NOT LEAVE IT IS NOT REAL
4. Re:The challenge of Bioinformatics by LL · 2002-01-29 06:40 · Score: 1
  
  I think this is the begining of the trend of the hardening of the biological sciences (as in becoming infused with mathematics ... and compsci algorithms in this field are essentially discrete maths). Unfortunately that means many people seeking easy course credits are going to be hit with a shock to the programming/maths skills as biology realigns to more complex techniques in shifting from wet to dry labs.
  
  For people wishing to advance professionally, O'Reilly is now introducing their Bioinformatics Technology Conference (http://conferences.oreilly.com/biocon/) in fact at this moment in Arizona. Also there is the Int. Soc. for Computational Biology (http://www.iscb.org) which organise regular bioinformatics meetings. Open-source tools are a regular part of this.
  
  LL
5. Re:The challenge of Bioinformatics by Anonymous Coward · 2002-01-29 09:58 · Score: 0
  
  in our lab we routinely collect data from 64 channels of 16-bit data (monitoring neuron firing in culture) at 1KHz
  
  I'm curious.....what sort of data is this? Are you constantly recording EPSPs or what? And as for the rest.....100fps I can understand, but what are you using 256 colors for?
6. Re:The challenge of Bioinformatics by oingoboingo · 2002-01-30 00:00 · Score: 1
  
  Unfortunately that means many people seeking easy course credits are going to be hit with a shock to the programming/maths skills as biology realigns to more complex techniques in shifting from wet to dry labs.
  
  This sort of stuff makes me laugh. If only a computer was 1/10th as complicated as a cell. I find it is more the people trained in computer science who experience the shock when they come into a lab environment and see how complex, chaotic and large the biological data sets are that are routinely generated.
  
  In my experience it is a rare computer scientist who can make the transition into bioinformatics and be truly effective, ie: not dependent on having a biologist sit with them 8 hours a day telling them what to do and why they're doing it, and can actually understand the breadth of the problem they're trying to solve, not just "develop a system to correctly pick these monoisotopic mass spectrometer peaks" or "build an algorithm to count the number of protein spots on a 2D gel"
  
  It reminds me of a quote I saw published in a newspaper a while ago by an academic:
  
  Biologists tend to write terrible computer code, but the computer scientists trivialise things and spend vast amounts of time solving problems that don't exist
try it by oo7tushar · 2002-01-29 03:19 · Score: 2

As a CS person about to switch into Biology I found the reviewed book interesting. Even if you have a good handle on Perl and Biology you will find certain elements in the book intruguing.
On a personal experience side note, Perl does seem to handle genetics problems with quite a bit of ease. The ease seems to stem from Perl's obfuscation. (it also seems to confuse my Biology profs quite a bit since my answers are legitimate answers on the exams)

--
internet like monkeys'
And by NMerriam · 2002-01-29 03:20 · Score: 2

They also don't mention it's a great introduction to books for those familiar with perl, biology, and bioinformatics, but not the written word!...

--
Recursive: Adj. See Recursive.
As a biologist... by Ubi_NL · 2002-01-29 03:22 · Score: 3, Interesting

We were just discussing programming languages recently.
We use so-called micro-arrays frequently, which yield so much information it is not possible to go through all that manually (on average you get about 10.000 "genes" that show changes in expression, after which you have to check the intertesting ones for functionality).
At the moment we can either mess around with MS excel or buy some serious software which is so incredibly expensive only companies can afford it.
Still I doubt whether Perl should be the language of choice due to it tending to be "write-only code". Maybe this book will change my mind though.

--

If an experiment works, something has gone wrong.
1. Re:As a biologist... by mfarah · 2002-01-29 04:02 · Score: 3, Insightful
  
  Still I doubt whether Perl should be the language of choice due to it tending to be "write-only code". Maybe this book will change my mind though.
  
  FWIW, in my personal experience, I find Perl to lend itself to some very obscure code, worthy of the IOCCC [*] just as easily to extremely clear code - the latter, though, requires a disciplined programmer and some effort (not much, though) directed to that goal.
  
  [*]: so, when will the first International Obfuscated Perl Code Contest will come? Perl poetry is getting kinda old.
  
  --
  "Trust me - I know what I'm doing."
  - Sledge Hammer
2. Re:As a biologist... by Anonymous Coward · 2002-01-29 04:18 · Score: 0
  
  Yo Ubi,
  I switched language when I left uni for the real world (IDL is nice but bloody expensive) and settled for python.
  Actually, more like python+numeric+c
  I usually experiment in pure python+numeric first and then optimise the parts that need in c. I really love how easy it is to interface c and python (especially compared to IDL). No problems passing numeric arrays to and from python, etc, etc...
  Don't forget to check biopython which is always a good place to start. Dunno if there's anything for processing microarrays, though.
3. Re:As a biologist... by Jeremy+Erwin · 2002-01-29 11:45 · Score: 2
  
  biopython is fairly primitive (but their goals are laudable.) bioperl is more advanced, but microarray modules are still on the todo list. I've found bioperl modules to be fairly easy to write code around.
Coming Soon, "ML for Philosophers" by Anonymous Coward · 2002-01-29 03:22 · Score: 4, Funny

This could spawn a great trend in cross-area programming books. Ada for Historians? Smalltalk for Hairdressers?
1. Re:Coming Soon, "ML for Philosophers" by twocents · 2002-01-29 04:19 · Score: 1
  
  Intercal for aspiring politicians?
2. Re:Coming Soon, "ML for Philosophers" by SlideGuitar · 2002-01-29 07:06 · Score: 1
  
  Assembler for Assembly Line Workers?
  C for the blind? Pascal for lamb lovers? Java for the sleepy? Ada for the adled? Perl for oyster divers? Lisp for speech therapists? Fortran for whom?
  
  Never mind.
Alternative book by Theodore+Logan · 2002-01-29 03:25 · Score: 5, Interesting

Instead of just whining, I should really recommend an alternative book for people who (like myself) have their background in CS.
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield is usually very liked for people with a computer science background. And it's not only of use if you want to go into bioinformatics: most algorithms on strings are usable in everyday coding too.

--
"If you think education is expensive, try ignorance" - Derek Bok
1. Re:Alternative book by cenobyte · 2002-01-29 06:10 · Score: 1
  
  I'd agree. The book is excellent, covers string processing in depth, and has the best explanation of dynamic programming (a *very* useful technique) that I have come across. It is rather heavy, but it is worth the effort.
More for your library by chundercanada · 2002-01-29 03:31 · Score: 5, Informative

I just spend a couple of days trying to choose a few books in this area. My interest was as a computer guy needing to get filled in on the bio side of things. Here are the books I ended up ordering:
Human Molecular Genetics 2: Looks to be a great primer on all the biology background.
Bioinformatics: A Practical Guide...: This book is a detailed tour of the online databases and existing tools for analysis of genes and proteins.
Algorithms on Strings, Trees and Sequences: This is a book for real computer science types who want to do high-performance implementations of new tools.
Universities going this way by Marx_Mrvelous · 2002-01-29 03:40 · Score: 5, Interesting

At Purdue University, there is a class specifically meant for CS majors and Biology majors, to address this same issue. I wonder if they use this book in the class.

--

Moderation: Put your hand inside the puppet head!
1. Re:Universities going this way by bodyborg · 2002-01-29 17:48 · Score: 1
  
  UC santa cruz actually uses this text for a course titled Bioinformatics 101. you may find links to this and other bioinformatics resources at http://medanth.members.easyspace.com/index.html#bi oinformatics
For those interested in Biology and Perl by SloppyElvis · 2002-01-29 03:42 · Score: 5, Interesting

The BioPerl project (http://bio.perl.org/) has been going on for some time.

In their own words they are, "The Bioperl Project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research."

There bioinformatitians can find a wealth of useful Perl scripts and modules to use in their efforts.

Yet another example of an open source initiative serving the needs of science!
1. Re:For those interested in Biology and Perl by glwtta · 2002-01-29 04:18 · Score: 2
  
  yep, BioPerl is good, I've used it extensively in the past.
  Also check out BioJava - same concept (probably mostly the same people), for those times when you can't use the bioinformatics language of choice (e.g. if you want someone to maintain your code at some point ;) )
  
  --
  sic transit gloria mundi
Not all biologists are doing genomics! by RevAaron · 2002-01-29 03:45 · Score: 4, Insightful

This book seems to equate biology with genomics/bioinformatics, when that is simply not the case. There are a fair amount of scientists in the general school of biology who *are not* bioinformaticians. As a person who does computational ecology, this book really wouldn't help me- and I am a biologist. Sure, DNA is swell, but it won't tell us about the complex interactions between a number of populations of organisms and the environment in which they live; it doesn't provide strategies and formulas (or references to perl modules?) that *other* kinds of biologists use. ...sigh.

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
1. Re:Not all biologists are doing genomics! by SloppyElvis · 2002-01-29 03:55 · Score: 2, Informative
  
  From Gray's Lab Dictionary on medical sciences:
  
  Bioinformatics: The use of computers in solving information problems in the life sciences.
  
  This says nothing about bioinformatics being used solely for genomics, though I hear your gripe, as many think of the two as the same. No doubt, this author has made the same assumption. I speculate it has something to do with money, since genomics are a "hot topic". The point is, you may be a bioinformatician and not even know it.
2. Re:Not all biologists are doing genomics! by jfrumkin · 2002-01-29 04:58 · Score: 2, Insightful
  
  Agreed - I happen to work on a phylogenetic project, which heavily uses PERL and other Open Source technologies. I believe O'Reilly's other book, "Developing Bioinformatics Skills" makes some mention of phylogeny, but it is rather limited, to be sure.
  
  On the other hand, my guess is most of the big money is in genomics at this point, so I can understand the heavy emphasis in that area at this time. Perhaps the increased attention given to this area will allow for increased interest in other biology-related arenas....
  
  --
  
  "What we have here, is a failure to communicate." - Cool Hand Luke
3. Re:Not all biologists are doing genomics! by RevAaron · 2002-01-29 05:50 · Score: 2
  
  Oh yes, I am a bioinformatician, and I know it. However, with this big trend with many dollars behind it, any bioinformatics worth mentions outside of real biology has to do with computational genomics or molecular biology. It's like telling people you're an anarchist; they think you mean you're one of those 14 year old kids who smoke their dad's ciagrettes when he's not home and have the circle-A patch on their pack-packs.
  
  --
  
  Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
4. Re:Not all biologists are doing genomics! by RevAaron · 2002-01-29 05:56 · Score: 2
  
  You'd think they'd have a little more on phylogeny, since so much of that is comparisons of different proteins among species. But yeah, not much money in "real" science. ;)
  
  --
  
  Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
For fun or for work? by thelen · 2002-01-29 03:49 · Score: 1

What's the aim of this book, really? Is it meant to give the layperson in either field a hobby in the other? Are you supposed to read this and then go get a job in bioinformatics? As a Perl programmer with an interest in Biology but no formal training in it, I can say with certainty that it's not the latter. To land a job in that field you basically must have a graduate degree one of the two fields, preferably with significant formal education in the other as well.
I might pick up this book because it sounds genuinely worthwhile, but I fully expect that at the end of it I'd feel more than anything that I needed to go back to school.
1. Re:For fun or for work? by babbage · 2002-01-29 04:12 · Score: 3, Insightful
  
  I would say that it's a crash course in two linked fields, targeted at an audience of people lookiing for bioinformatics work who might be familiar with one or the other of these fields, but need to get up to speed on the other one quicky.
  And I *do* think it does a good job at this -- I'm a Perl hacker that hasn't taken a biology class since my freshman year of high school (ten years ago, oy vey), but the genomics & proteomics covered in this book did bring me up to speed to the point where I understand the terminology and have a decent grasp of the computational issues involved in doing work in this field, as well as some techniques that can be appled to these issues. After reading this book, I read The Cartoon Guide to Genetics by Larry Gonick -- it's a better introduction to the field than you might expect from a title like that -- and felt satisfied that I had already been exposed to 95% of the material in there, with a significant portion of that coming from this book (and O'Reilly's other bioinformatics book, and skimming over web sites).
  No, it isn't a masters degree by a long shot, but it's a solid start at learning the field, and if I choose to follow it that far. And it is enough of a crash course to land you a job, if you feel comfortable with the Perl stuff. You might not be expected to understand all the subtleties of DNA and proteins on your first day on the job, but you will at least come in knowing what your colleagues are talking about, and you'll be able to begin workiing with it immediately.
  Give it a chance, it's a good book for starting out with. Yes, there's more to learn -- I understand that James Tisdall is doing a followup that'll be more like a "Perl-Bioinformatics Cookbook" for more advanced users, and there are of course other books out there besides the O'Reilly stuff -- but it's a worthwhile & solid start.
  
  --
  DO NOT LEAVE IT IS NOT REAL
2. Re:For fun or for work? by Anonymous Coward · 2002-01-29 04:13 · Score: 0
  
  Have to disagree with you on the requirement of degrees for working in the bioinformatics field. I'm currently without a degree in either and am in my second bioinformatics programming position. Overall, I find my lack of genetic knowledge to be the deficiency, not my software skills. Bioinformatics is a very new field where formal training and study is just beginning to take shape. The most important requirement for a CS person's success in the field (IMHO) is the ability to communicate with biologists to define and design the tools they need. No one expects you, the computo guy/girl to have all the domain knowledge no matter what field you write software for -- but you need to be able to communicate.
Other Recommendations????? by tlh1005 · 2002-01-29 03:53 · Score: 1

Odd for me that this story was on slashdot today. I've spent the last 24 hrs lurking around the net trying to find books that'll give me a little info on bioinformatics. Anyways, I have a CS degree and I am kicking around the idea of taking Biology classes. I know a tiny bit about Biology but not any significant amount at all. I was wondering if you guys could recommend some books for a programmer in terms of bioinformatics?? I've seen the recommendations on bioinformatics.org but I want some feedback from some of you knowledgeable slashdotters. Feel free to send email.....
1. Re:Other Recommendations????? by dekraved · 2002-01-29 08:23 · Score: 1
  
  I searched and there was another discussion about this on /. that had some good-looking recommendations here...
2. Re:Other Recommendations????? by pao93 · 2002-01-29 14:43 · Score: 1
  
  Try this book: Bioinformatics: Sequence and Genome Analysis by David W. Mount It's a good introduction to the field. A little more in depth, but one that everyone should have: Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin (Editor), S. Eddy, A. Krogh, G. Mitchison Hope this helps! carl
Human DNA simulator in perl by Anonymous Coward · 2002-01-29 03:54 · Score: 1, Funny

perl -e 'for (1..1000000) { print ${[G,T,C,A]}[int(rand() * 4)] }'

-- This is my penis. There are many like it, but this one is mine.
Perl Bioinformatics for AI Neuroscience by Mentifex · 2002-01-29 03:54 · Score: 2, Offtopic

Anyone who wanders into the use of Perl for bioinformatics ought to consider the ultimate plunge into the use of Perl for neuroscientific Artificial Intelligence. Since v.t.y. Mentifex here has been coding the AI Brain-Mind in JavaScript for tutorial purposes and also in Forth for Intelligent Mind Roboinformatics, the switch-over to Perl is advancing so slowly that I must first promulgate some candidate AI module proposals for inclusion among the object-oriented Perl 5 Module List.

The Comprehensive Perl Archive Network (CPAN) contains some not-yet-implemented, suggested AI module namespaces for those who read the Beginning Perl book reviewed here on SlashDot and who may then wish to do some really exciting, wave-of-the-future Perl neuroscience theory and practice work.
1. Re:Perl Bioinformatics for AI Neuroscience by Anonymous Coward · 2002-01-29 05:48 · Score: 0
  
  > object-oriented Perl 5 Module List. [cpan.org]
  
  Coke out the nose, thanks. Note: the filename
  00modlist.long.html, uh, those are zeros, not
  capital Oohs.
God, this book is hard by Anonymous Coward · 2002-01-29 04:03 · Score: 0

I read it and found that it includes about 400 theorems. It is a very good book but it is *hard* for people without strong math background. Don't expect the light reading.
1. Re:God, this book is hard by Anonymous Coward · 2002-01-29 04:17 · Score: 0
  
  Do you know a book with implementations (preferrably in C++) of the most important
  algorithms?
2. Re:God, this book is hard by Anonymous Coward · 2002-01-29 04:31 · Score: 0
  
  numerical recipes in c++ should be available soon maybe?
  http://www.amazon.com/exec/obidos/search-handle- fo rm/104-2385427-3564732
  
  numerical recipes used to have a dope web site with lots of free code if i remember correctly.
  
  If you can't wait, you want it NOW (sorry inside joke) go to Amazon and type algorithms in C++ and you'll see like 50 books to select from.
  
  you're welcome
3. Re:God, this book is hard by KidSock · 2002-01-29 12:48 · Score: 2
  
  It is a very good book but it is *hard* for people without strong math background
  
  Skip the proofs (I believe you Gusfield :~). You don't need to understand the proofs. I think he does a rather good job at propviding a high level description for each algorithm. And you can complement this other material from Algo books. I do NOT have a strong math backgroud but I managed to understand what was being discussed enough to implement the algorithms and understand why you would use one or a permutation of one to solve a particular problem. It's hard read but not too hard.
Maybe there's a reason... by pongo000 · 2002-01-29 04:03 · Score: 2

If I have a complaint with the book, in fact, it's that Tisdall doesn't go any further: everything is good, but it ends too soon. Seemingly important topics such as OO programming...are mentioned only in passing, under "further topics" in the last chapter.

Mabye that's because Perl's OO support is an extremely kludged-together ugly beast that's undergoing a much-needed facelift in Perl6.

The author actually does the world a favor by not mentioning Perl and OO in the same sentence.
1. Re:Maybe there's a reason... by jslag · 2002-01-29 08:49 · Score: 1
  
  Mabye that's because Perl's OO support is an extremely kludged-together ugly beast that's undergoing a much-needed facelift in Perl6.
  
  The author actually does the world a favor by not mentioning Perl and OO in the same sentence.
  
  Too bad that your aesthetics are so easily offended. Plenty of us in the real world (including pretty much every author of a module on CPAN) find that OO perl is perfectly usable.
2. Re:Maybe there's a reason... by pongo000 · 2002-01-29 09:06 · Score: 1
  
  I use it (and teach about it) quite extensively. That doesn't mean it's any less ugly or kludged-together.
Re: IOPCC by sab39 · 2002-01-29 04:15 · Score: 2

[[ so, when will the first International Obfuscated Perl Code Contest will come? Perl poetry is getting kinda old. ]]

<tongue-in-cheek>Wouldn't that be rather like having a International Wet Water Contest?</tongue-in-cheek>

Stuart.
Re:out of touch by pclminion · 2002-01-29 04:26 · Score: 2

The problem with doing OO is it requires you to understand software engineering. Biologists are probably more interested in crunching their numbers than in good OO design. Or am I wrong?
If biologists are learning how to do OO, maybe I should get out my old chemistry set and try some gene-splicing :)
What's with that last couple of sentences? Did you fall out of bed this morning?
Why a scripting language? by pclminion · 2002-01-29 04:31 · Score: 2
Why do scientists gravitate to these scripting languages? My guess is that scripting languages avoid several common things that non-programmers usually have a hard time with:
- Variable declarations
- Memory allocation
- Type conversion
Unless you're using Python in which case you have to do type conversion sometimes...
Really, why scripting languages? It seems like some of these scientists are getting really good at it, using OO and everything. Why not switch over to a native language like C++ (which isn't actually that hideous if you avoid all the stupid features) and do the calculations 50 times faster?
Anyone have input?
1. Re:Why a scripting language? by babbage · 2002-01-29 04:58 · Score: 3, Informative
  
  Why do scientists gravitate to these scripting languages?
  
  For the same reasons that people gravitated to them for internet programming: there is so much ad hoc work do be done that it isn't worth the effort to work "that close to the metal". Perl's text analysis capabilities are so sophistocated that it would be hard to match them with custom written C code -- and if you did manage to pull it off without getting ensnared in infuriating memory leaks and so on, a well designed system will end up approaching Perl anyway. Yeah, Python is well suited towards modularizing systems and reworking bottleneck components in something like C, but Python just isn't as slick at text analysis as Perl is, and this kind of genetic/proteomic work is essentially a text analysis problem.
  I mean, look at it the other way around -- Perl isn't actually that hiideous if you avoid all the stupid features, and you can do the development 50 times faster. If it really runs that slowly -- and usually the execution time won't be a problem -- then sure, redo parts in C (or XS), but 99% of the time that really doesn't help very much.
  
  --
  DO NOT LEAVE IT IS NOT REAL
2. Re:Why a scripting language? by Phillip2 · 2002-01-29 05:06 · Score: 2
  
  Because bioinformatics grew out of a bunch of people writing small tools, to do small things. After a while we had a whole load of small tools doing small things, and we wanted to stick them together. So we write small perl scripts to tie them together. Perl is very good at this. Unfortunately it tends to also hide the fact that if we had written some decent libraries in the first place, we wouldn't have need to stick bits together with perl.
  
  Bioinformatics is in a mess, and its slowly crawling out of it. To be honest, I think that the last thing that we need is more biologists with a working knowledge of perl.
  
  "It seems like some of these scientists are getting really good at it, using OO and everything. "
  
  Really, what actually using OO?
  
  The reality is that if you can work out how a cell works, its easy to write computer programs. The problem is that too many people feel that because its easy to write programs, its also easy to write programs well. Which is why we are in such a mess now.
  
  Phil
3. Re:Why a scripting language? by scottcain · 2002-01-29 05:10 · Score: 2, Informative
  
  It's easy to explain really: text manipulation. Bioinformatics is really about moving text around. What are DNA and protein sequences? Text. What are the reports generated by the plethora of analysis programs? Text. And Perl has outstanding and easy to use text manipulation tools. Add to that CPAN and BioPerl, and you have the makings of excellent Bioinformatics tools.
4. Re:Why a scripting language? by jslag · 2002-01-29 08:56 · Score: 2, Informative
  
  scripting languages avoid several common things that non-programmers usually have a hard time with:
  
  * Variable declarations
  
  Actually, most perl programs more than a few lines long (hopefully) use strict; thus requiring variable declarations.
  
  * Memory allocation
  
  Seems like plenty of programmers have trouble with this as well, based on the number of memory leaks out in the wild.
  
  Really, why scripting languages?
  Why not? Hardware is fast and cheap compared to programmer time, so slightly slower (but written!) programs are often better than super-optimized programs that are only half done.
  
  Scripting languages aren't necessarily slower, anyhow. Perl programs, for example, tend to do all their heavy lifting in libraries, with performance-critical parts coded in C. If you're into benchmarks, you can dig some up showing perl outpacing java and c++ at various text-processing tasks.
5. Re:Why a scripting language? by Mike+Buddha · 2002-01-29 11:30 · Score: 3, Interesting
  
  Hmm. I agree with you that Perl is an excellent choice for this task, but I'm wondering if a lexical analyzer generator (like flex or lex) might make a better choice even than Perl? I suppose it would all matter on what exactly was being recognized.
  
  --
  by Mike Buddha -- Someday the mountain might get him, but the law never will.
Bioinformatics is very hot by tony+clifton · 2002-01-29 04:38 · Score: 1

In the San Francisco area, the Biotech companies are on a hiring swing. It's a notoriously hard area for even the strongest programmers to get a job in, unless they've worked in biotech before.

Any indications if this book (or any of the others noted here) would be enough to get someone in the door?
1. Re:Bioinformatics is very hot by ellem · 2002-01-29 05:44 · Score: 1
  
  Oh yeah. Don't even bring your resume, just a well worn copy of a book. You'll totally get the job
  
  --
  This .sig is fake but accurate.
Is there a Beginning Perl for Pornography? by DataSquid · 2002-01-29 04:46 · Score: 1

There's gotta be some legit way to link the two. I aim to be more than just a consumer of both ;) It's time to give a little something back to both communities I feel, it's only polite...

--

DataSquid.net, a little about me.
Fortran by wiredog · 2002-01-29 04:58 · Score: 2

The reason engineers, and physicists, use Fortran is that, until recently, it was the best number crunching language around. C and C++ didn't get math libraries that could compete with Fortran until a couple of years ago, and no one with any sense is going to use an interpereted language for serious number crunching.

--

Best Slashdot Co
Re:out of touch by Phillip2 · 2002-01-29 05:11 · Score: 2

"The problem with doing OO is it requires you to understand software engineering. Biologists are
probably more interested in crunching their numbers than in good OO design. Or am I wrong?"

It depends what you want to do. If you are writing a few hundred lines of code, then good software enginnering is not that important. Nowadays biologists and bioinformaticts are writing very large code bases, to do very complex tasks. Under these circumstances software engineering becomes necessary.

Programming is not my area of research. Nor is enginnering, or architecture. It is however my plumbers spanner. Of course us biologists are learning how to do, and in many cases how to do it very well.

Phil
useless for protein scientists by ubiquitin · 2002-01-29 05:15 · Score: 1

If you work on or with proteins (structural biology, biophysics, etc.) you will find this book to be largely a waste of time. An earlier slashdotter said it: there is more to biology than genomics. O'Reilly should stick to unix, leave the science for the peer-reviewed journals. Amen.

P.S. If you want an intro to some field in biology, read up on TIBS (Trends in Biological Science for the uninitiated.)

--
http://tinyurl.com/4ny52
1. Re:useless for protein scientists by atomicgirl · 2002-01-29 08:45 · Score: 1
  
  O'Reilly should stick to unix, leave the science for the peer-reviewed journals.
  
  Yeah, where they'll publish a bunch of papers about Excel add-ins (for Windows only). I'm really happy O'Reilly is doing bioinformatics these days. It's exactly the topic that I need to know about as a manager of a lab in need of computing solutions for our data. I'm installing unix bioinformatics programs on our G4 running OS X, and so now it runs EMBOSS and clustalW and phred, and uses its X windowing power to run GCG from a remote Sun server. I convinced my boss to let us buy this book, and now I'm getting to learn about what goes on when I click "assemble."
Additional open source bioinformatics projects by Anonymous Coward · 2002-01-29 05:17 · Score: 0

You can also find a large number of open source bioinformatics projects hosted at
Bioinformatics.org

with links to BioPerl, BioPython, BioXML, BioJava, BioCORBA, and BioRuby projects on the
lower right hand side of their page.
Perl and Bioinformatics by fasta · 2002-01-29 05:21 · Score: 5, Informative

I would like to answer several questions that were raised in this discussion.

(1) How does a CS person learn biology? I recommend "Recombinant DNA, A short Course", as an accessible (Scientific American style) introduction to the cloning breakthroughs and discoveries that lead to genome science.

(2) How does a CS person learn "Bioinformatcs"? I strongly recommend "Bioinformatics - Sequence and Genome Analysis" by David Mount as an accessible and extremely comprehensive survey of current approaches in Biological Sequence Analysis.

(3) Why do Biologists use Perl? Much of the information Biologists want is on the WWW, and Perl's LWP makes it extremely easy to get it. We don't use Perl for sophisticated text analysis (similarity searching, motif searching, etc) because the algorithms that are appropriate are typically not exact (or even regular expression) matches. But it's difficult to beat Perl for getting stuff off the WWW.

(4) Why do Biologists use Flat files? Several reasons - (a) the most useful information is sequence information, and it can be read much more quickly out of a flatfile (esp. one that is memory mapped) than a DB; (b) flat files solve some versioning problems that DB's make very complex and slow. (c) Most data providers only provide flatfiles. This will change, however, over the next 2 - 3 years, mySQL and postgresQL are moving into biology labs.

It is very exciting that Bioinformatics has high visibility now, and many people with CS background are considering bioinformatics problems. Unfortunately, many of the introductory books on bioinformatics (particularly the O'Reilly books) do not adequately present the substantial foundations of bioinformatics that have been build over the past 15 - 20 years, and some newcomers are mislead into believing there are simple problems looking for a few good programmers. Most of the simple problems have been solved; many of the complicated problems are challenging not because we do not know enough CS, but because we do not know enough biology.
Re:out of touch by pclminion · 2002-01-29 05:22 · Score: 2

Have the bioinformaticists (now there's a word) ever thought of outsourcing it? Putting together all the experience gained over the years of scripting, pasting, gluing, etc and writing up a nice requirements document, and handing it off to some software people to write a system?
Or are bioinformaticists so paranoid about others using their techniques that they don't want such a tool to be available commercially? I bet there are a ton of "trade secrets" in some of that Perl code...
Re:out of touch by Anonymous Coward · 2002-01-29 05:47 · Score: 0

At least part of the problem there is that there is no one single "requirement" for bioinformatics as a whole (except for trivial tasks like reformatting files, doing elementary transformations, accessing and retrieving stuff over the net and so on). When it comes to the big problems, every research project is different and it is a genuine handicap to not have a biological background when putting the system together. I'm not saying that a CS person couldn't do it, of course they could (almost certainly better, in fact). However, the lack of biological background would complicate and delay things down to unworkable levels when it comes to knowing where the limits of "meaning" are in the system, if you get my drift.
TIBS? by Cheshire+Cat · 2002-01-29 05:56 · Score: 1

Do you have a link for this TIBS which you speak of? Apparently, I'm far too lazy to use google! ;)

--

Last night I shot an elephant in my pajamas. How he got in my pajamas I'll never know.
Another language for bioinformatics by Jon+Howard · 2002-01-29 05:58 · Score: 2, Informative

Since I'm a Lisp fiend: while we're on the subject of programming for bioinformatics, I'd like to point out that Allegro Common Lisp has been used by a few folks in the field. Here are two links:

Pangea Systems Inc. (now DoubleTwist) for EcoCyc.

MDL Information Systems to design new drugs.
1. Re:Another language for bioinformatics by Jonathan · 2002-01-29 13:36 · Score: 2
  
  Well, at least in the case of Pangea, that's because Larry Hunter was in charge, and he came of age in the Symbolics LISP workstation era. Most bioinformaticians tend to be a bit younger and therefore missed out on the whole LISP-as-mainstream-tool era. Whether this was to their benefit or loss is of course subjective.
PubMed Books online by NullSpaceKid · 2002-01-29 06:19 · Score: 2, Informative

A selection of possibly relevant books (_Introduction to Genetic Analysis_, Molecular Cell Biology_, etc) can be found at: www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books NSK
Mod this up!! by ChaosMt · 2002-01-29 06:33 · Score: 1

So far, this is the most useful comment today.

--
Democrats and Republicans only disagree about how to enslave you
Perl has been the bane of bioinformatics for years by Anonymous Coward · 2002-01-29 06:37 · Score: 0

As a professional in the feild, I have seen over and over again how using Perl in bioinformatics has crippled efforts towards a real bioinformatics infrastructure. It leads to data islands, lack of interoperability, lack of maintainability, poor code reuse, and slow development. Lack of multithreading makes it difficult to spread jobs out over multiple processsors. I think it is popular because it is easy for non-programmers to start spewing out simple text transformations. However that only gets you so far, and creating real enterprise back end needs a real language. Sometimes they try to patch things together by using the more OO-like features of Perl, but it is a loosing battle. Save yourself the grief and use Java or C++.

A little knowledge can be a dangerous thing.
Re:TIBS link by ubiquitin · 2002-01-29 07:25 · Score: 1

http://www.elsevier.nl/locate/tibs or find it at your library.

--
http://tinyurl.com/4ny52
Neuroscience is bioinformatics too! by MrBlic · 2002-01-29 07:44 · Score: 1

I'm a devolper working on www.neuroinformatica.com. (online microscope, with analysis and discussion of biological material)
Our customers are looking to teach, research and diagnose all sorts of stuff. We will link with some genomics information, but at the moment there is plenty of anatomy and structure to provide a context for the rest of the information.
In my mind, the goal is to simulate, and therefore understand the processes at an electrochemical level, and by putting everything into the context of a model based on real (digitized) tissue create a serious base of knowledge.
I use java more than perl. I want to be able to maintain the code over the years! I know just enough perl to know that two programmers will seldom agree on a strategy for implementing something. I want my java neuroinformatics project to be timeless.
This is a facinating time to be alive!

--
Celebrate Excellence!
1. Re:Neuroscience is bioinformatics too! by RevAaron · 2002-01-29 11:21 · Score: 2
  
  I use Squeak Smalltalk (see link below) for the same reason you use Java. But luckily, in my research, the software doesn't need to be really maintained, I just need to use it personally to visualize and generate graphs and HTML.
  
  --
  
  Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
bioinformatics does not equal string manipulation by Anonymous Coward · 2002-01-29 08:36 · Score: 1, Informative

It seems that perl is still being used purely because many bioinformatics departments are full of people who know how to program in perl. And this is because bioinformatics *used* to be pretty much only about string manipulation.
This is just not true any more - proteomics require in silico trypsin digest and algorithms for protein identification for MALDI mass spec (prediction of protein sequence via analysis of digested protein fragments); microarray experiments require cluster analysis of expression data in order to identify functinoal relationships. Added to this there are lots of issues relating to integrating the many many databases there are out there.
The systems are becoming bigger and have to deal with lots of other systems around the world. Is Perl the best language for all this? I don't know but languages shouldn't be pushed into unsuitable roles purely for historical reasons and lots of bioinformaticians are trying to do this by trying to cling onto perl.

martin
Bioperl.org powered by Python! by Anonymous Coward · 2002-01-29 10:26 · Score: 0

It's funny that the bioperl project is powered by Python!
50 times faster? by Anonymous Coward · 2002-01-29 10:42 · Score: 0

Surely you jest. In my humble experience Perl
code is about .9 to .95 the performance of C code
when it come to predominately file i/o and string
manipulation tasks. Now when it come to programmer
efficiency there is no contest. A single reg exp
will replace dozens of lines of C, dynamic arrays and
and hashes are cumbersome in C and Memory worries are , etc.
Perl, Linux, & microarrays, Was:As a biologist by jcmatese · 2002-01-29 10:47 · Score: 1

Hi,
For a free microarray database and software package utilizing Perl and Linux, you might look into the following links.

Stanford Microarray Database [SMD] Package

SMD on Linux

Cheers, jcmatese
Re:bioinformatics does not equal string manipulati by bishbosh · 2002-01-29 12:19 · Score: 1

I agree, but perl is not the only language used in bioinformatics and some people do know when to use the correct language for the job in hand. There is use of SQL and C / C++ in many bioinformatics projects not forgetting good old Fortran which is used to a large extent in the Structural Biology field.
Re:Perl, Linux, & microarrays, Was:As a biolog by Anonymous Coward · 2002-01-29 13:25 · Score: 0

If you need to do heavy duty microarray analysis Excel and so on will definitely not do the job (as you hinted at). If you can't afford something like S-Plus to your analysis you should check out R and then get over to Terry Speed's website. Lots of good R routines there for analyzing microarray data. As for the topic at hand, i find perl very good for what we do in microarray analysis. I use it all the time for combining results, adding or updating columns in largish (around 80 Mb) data sets and so on. It's great that there's a book out there now for the biologists and i'm definitely going to buy it for at least a few people around here who constantly bug me with simple PERL questions!
Other kinds of Bioinformatics by cookie_cutter · 2002-01-29 14:26 · Score: 1

The type of bioinformatics described in this book deals with processing long strings of symbols, which much biological sequence data is represented as(eg DNA, RNA and protein sequence data).
There is another area of bioinformatics which uses physics based simulations of biological systems. These types of tasks have little to do with ascii file processing, and are more sheer number crunching, and involve classic simulation modelling techniques.
Some examples of these types of bioinformatics problems are:
-simulation of protein folding
-simulation of chemical reaction circuits/control mechanisms in a cell or organ system
-cellular automata simulation of a group of cells in a tissue
Because of the number crunching requirements involved, these types of tasks are usually coded in languages which are good at math and have fast compilers, such as fortran and C.
I'm just trying to mention what else is out there, so that people don't get the idea that pattern parsing is the only thing bioinformaticists do
Other option for DNA sequencing by Anonymous Coward · 2002-01-29 15:43 · Score: 0

Hi,
Why dont use Euphoria?
Since the basic structure of Euphoria is the sequence, then it would be the best option for
DNA sequencing :-)
And it beats Perl in speed!
check:
www.rapideuphoria.com
Re:out of touch by Phillip2 · 2002-01-30 01:41 · Score: 2

Outsourcing is no magic wand. Some one has still
got to write the software, and some one has still got to do the research. In the case of all research requirements analysis is extremely hard. If you knew what you required from the software in advance, then it probably wouldn't be research!

But yes there are lots of computer scientists in bioinformatics these days. Its a hybrid discipline. Even though my origins are as a bench biologists I consider myself to be a computer scientist at least in part these days, as well as a programmer.

Research code is often a little flaky. We are not writing finished products to sell to people. We are writing code to do research!

Phil
Re:Perl sucks! by Anonymous Coward · 2002-01-30 14:19 · Score: 0

PHP and Python own Perl in every way. Better structure, cleaner syntax and more robust (without having hit up CPAN every 5 fuckin minutes for yet another module that has completely different usage syntax than every other module you have).

Perl is ugly and its "TMTOWTDI" design sucks ass.