Slashdot Mirror


Open-Source Bioinformatics Programs?

An anonymous reader asks: "This summer I have the opportunity to work in a bio research lab creating a web site for data about proteins. Part of my job is to do bioinformatic analysis of the proteins to determine what types of support their are for the preliminary gene predictions. I have been using DNA Stryder (a Mac program) for sequence alignments plus translations from DNA sequences to protein sequences, and I was wondering if any of the Slashdot crowd knew of similar programs for Linux? I have looked into Bioperl , Biopython, EMBOSS, and BioConductor, but they seem to be more oriented towards servers and less towards stand-alone applications. What programs would you suggest, especially those that might be geared more towards biologists rather than computer scientists?"

7 of 28 comments (clear)

  1. Freshmeat by abradsn · · Score: 2, Informative

    http://freshmeat.net/browse/252/

    If you go here and have a look you will see some interesting programs that meet your needs. I was looking for some biochem programs the other day in this web site.

  2. Read the O'Reilly book by wayne606 · · Score: 3, Informative

    "Developing Bioinformatics Computer Skills"

    This has lots of useful information and references and is a great starting point. It might be a bit dated, though.

    1. Re:Read the O'Reilly book by dmaduram · · Score: 2, Informative

      Regarding books by O'Reilly, I'd also recommend Beginning Perl for Bioinformatics, and, to a lesser extend, Mastering Perl for Bioinformatics -- quite personally, our lab has been using several custom-built sequencing tools, but I've found that Perl always gets the job done faster.

      PS: Personally haven't checked this out, but you might want to take a gander at O'Reilly's Sequence Analysis in a Nutshell: A Guide to Tools

      Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book contains details and examples of the common database formats (GenBank, EMBL, SWISS-PROT) and the GenBank/EMBL/DDBJ Feature Table Definitions. It also provides the command line syntax for popular analysis applications such as Readseq and MEME/MAST, BLAST, ClustalW, and the EMBOSS suite, as well as tables of nucleotide, genetic, and amino acid codes. Written in O'Reilly's enormously popular, straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students. If sequence analysis is part of your daily life, you'll want this easy-to-use book on your desk.

  3. OS Bioinformatics software by Neil+Blender · · Score: 3, Informative

    Most useful open source bioinformatics software is going to be geared toward biologists with at least some programming and unix skills. A lot of it was written by bioinformaticians which tend to lean more toward the informatics than the bio. They get more caught up in the technical aspects of the feild rather than the biology of the problem looking to be addressed. Unfortunately, the same can be said about most commercial bioinformatics software as well.

    On the flip side, when people more interested in the biology than the technology write software, they tend to write just enough to get the job done and then stop. Software from this camp is often buggy and has a bad UI or no UI at all. It gets the job done, but only if you know exactly how to use it.

    Anyway, you might want to take a look at R - http://cran.stat.ucla.edu/. It's more geared towards statistics but it does have some protein modules.

  4. Try Chimera and BioKnoppix by frenchs · · Score: 2, Informative

    Well, I'm %99.9999 sure that you can get BioPerl running on a Linux box. Also, for a fun project, grab a copy of your sequence databases of choice and try to install BLAST on the Linux box.

    That said, take a look at Chimera, which is an app written at UC San Francisco. It is mostly useful for visualizing, but I know there is a sequence viewer, and some other tools in there too.

    Now, for all the aspiring bio geeks I give you BioKnoppix. Go download and burn the ISO. Then use that CD to boot any x86 box into a full Linux install with many of the popular bioinformatics tools already installed.

    Enjoy!

    -Steve

  5. Oriented Towards Servers? by jmt9581 · · Score: 3, Informative
    I don't quite know what you mean when you claim that Bioperl, Biopython, EMBOSS and BioConductor are more oriented towards servers than stand-alone applications. First of all, servers and stand-alone applications don't divide up the application world into mutually exclusive parts. Applications can be stand-alone and run on a server for example. I've built applications using Bioperl that have a GU interface (take that grammar nazis!), and people are extremely happy with them. So, if you have a Perl guy nearby, I highly recommend talking to them about your problems.

    Secondly, translations? Database searches? Sounds like you're doing some very basic Bioinformatics work. Not to say that your research isn't meaningful, just that the problems you're approaching are easily solved by a computational biologist. For example, here's a snippet of Bioperl code that will read in a set of GenBank sequences, translate them and print the results to a new file:

    my $seqin = Bio::SeqIO->new( -file => 'myseq.gbk', -format => 'genbank' );
    my $seqout = Bio::SeqIO->new( -file => '>translated.gbk', -format => 'genbank' );
    while ( my $seq = $seqin->next_seq ) {
    $translated_seq = $seq->translate;
    $seqout->write_seq( $translated_seq );
    }


    Seems pretty simple, right? There are similar, simple wrappers around BLAST, FASTA and some other common algorithms in computational biology. Check out the Beginners HOWTO on the Bioperl website, it explains Bioperl without requiring previous CS experience. I think it's a good intro, but I also wrote it so I'm slightly biased.

    If programming is not your style, check out JEMBOSS. It's a Java-based GUI wrapper for EMBOSS.

    Cheers and good luck.

    --

    My blog

  6. Useful bioinformatics programs by axolotl_farmer · · Score: 2, Informative

    I have used Clustal for multiple sequence alignments. There is a gui (ClustalX) and a scriptalbel command line version (ClustalW). Available for all platforms and source included with the download.

    Also keep an eye on POY that does direct optimization on sequences. Also available for all platforms with BSD style licence.

    For just viewing and manual editing of alignments there is BioEdit. Free, but not open source. Windows only.

    For a general sequence assembly/analysis/kitchen sink approach try the Staden Project. Open source and available for Windows, Linux and OSX.

    Hope this is useful. I have never worked with protein sequences, but I have done a lot of DNA sequenceing and alignment!