Beginning Perl for Bioinformatics
Superficially, this book isn't all that different from a lot of introductory Perl books: the Perl material starts out with an overview of the language, followed by a crash course on installing Perl, writing programs, and running them. From there, it goes on to introduce all the various language constructs, from variables to statements to subroutines, that any programmer is going to have to get comfortable with. Pretty run of the mill so far. Tisdall starts with two interesting assumptions, though: [1] that the reader may have never written a computer program before, and so needs to learn how to engineer a robust application that will do its job efficiently and well, and [2] that the reader wants to know how to write programs that can solve a series of biological problems, specifically in genetics and proteomics.
As such, there is at least as much material about the problems that a biologist faces and the places she can go to get the data she needs as there is about the issues that a Perl programmer needs to be aware of. The author introduces the reader to the basics of DNA chemistry, the cellular processes that convert DNA to RNA and then proteins, and a little bit about how and why this is important to the biologist and what sorts of information would help a biologist's research. The main sources of public genetic data are noted, and the often confusing -- and huge -- datafiles that can be obtained from these sources are examined in detail.
With the code he presents for solving these problems, Tisdall makes a point of not falling into the indecipherable-Perl trap: this is a useful language, well-suited to the essentially text-analysis problems that bioinformatics means, and he doesn't want to encourage the kind of dense, obscure, idiomatic coding style that has given Perl an undeservedly bad reputation. Some of Perl's more esoteric constructs are useful, and they show up when they're needed, but they're left out when they would only serve to confuse the reader. This is a good decision.
Rather, the focus is on teaching readers how to solve biological problems with a carefully developed library of code that happens to leverage some of Perl's most useful properties. The result is pretty much a biologist's edition of Christiansen & Torkington's Perl Cookbook or Dave Cross' Data Munging With Perl. The author presents a series of issues that a working bioinformaticist might have to deal with daily -- parsing over BLAST, GenBank, and PDB files, finding relevant motifs in that parsed data, and preparing reports about all of it. If a bioinformaticist's job is to be able to report on interesting patterns from these various sources, then following the programming techniques that Tisdall explains in clear, easy-to-follow prose would be an excellent way to go about doing it.
And when I say "programming techniques," note that I'm not specifically mentioning Perl. The code in this book is clear and organized, and all programs are carefully decomposed into logical subroutines that are then packaged up into a library file that each later sample program gets to draw from. Each new program typically contains a main section of a dozen lines of code or less, followed by no more than two or three new subroutines, along with calls to routines written earlier and called from the BeginPerlBioinfo.pm that is built up as the book progresses. Each sample is typically preceded by a description of what it's trying to accomplish and followed by a detaild description of how it was done, as well as suggestions of other ways that might have worked or not worked.
This modular approach is fantastic -- too many Perl books seem to focus so heavily on the mechanics of getting short scripts to work that they lose sight of how to build up a suite of useful methods and, from those methods, to develop ever-more-sophisticated applications. It isn't quite object-oriented programming, but that's clearly where Tisdall is headed with these samples, and given a few more chapters he probably would have started formally wrapping some of this code into OO packages.
If I have a complaint with the book, in fact, it's that Tisdall doesn't go any further: everything is good, but it ends too soon. Seemingly important topics such as OO programming, XML, graphics (charts & GUIs), CGI, and DBI are mentioned only in passing, under "further topics" in the last chapter. I also have a feeling that some of the biology was shorted, and the book barely touches upon the statistical analysis that probably is a critical aspect of the advanced bioinformaticist's toolbox. I can understand wanting to keep the length of a beginner's book relatively short, and this was probably the right decision, but it would have been nice to see some of the earlier sample problems revisited in these new contexts by, for example, formally making an OO library, showing a sample program that provided a web interface to some of the methods already written, or presenting code that presented results as XML or exchanged them with a database.
But these are minor quibbles, and if the reader is comfortable with the material up to this point, she shouldn't have a hard time figuring out how to go a step further and do these things alone. It's a solid book, and one that should be able to get people learning Perl, genetics, or both up to speed and working on real world problems quickly.
You can purchase Beginning Perl for Bioinformatics at Fatbrain. Want to see your own review here? Read the review guidelines first, then use Slashdot's webform.
I mean, really! REALLY REALLY SUCKS! It's syntactic sugar stew, and it's worthless for real software.
Go Kathryn Thurber!
la la la laaa la la la
Bioinformatics was reviewed a short time ago
then I could learn perl, biology, and Italian all at the same time.
http://rareformnewmedia.com/
"F @>K'Q @OV ^kv jlob" eb p^fa. "Fc F al) bsbovqefkd tfii c^ii ^m^oq."
.--elro
.643 fk tbpqbok >cde^kfpq^k ^ka fk ^ kb^o_v obdflk lc Fo^k)
//1 mblmib
.663-64) fq hfiiba 12 lc qeb 3- mblmib fkcb`qba.
Qefp fp qeb cfopq qfjb qefp o^ob) efdeiv `lkq^dflrp cloj lc Bnrrp >pfkrp cbsbo
e^p pqor`h qeb >jbof`^k Plrqetbpq) ^ka obpb^o`ebop ^ka `fqfwbkp ^ifhb ^ob
pqorddifkd ql bumi^fk tev. Orjlop ^_lrka) _i^jfkd bsbovqefkd colj mi^kqp ^ka
jlkhbvp ql mlifqf`p ^ka s^jmfobp.
Kfkqv-clro mblmib e^sb afba pl c^o fk Kbt Jbuf`l - 3/ fk P^kq^ Cb ^ka qefoqv qtl
fk kbfde_lofkd `lrkqfbp. Fk qeb kloj^iiv pibbmv qltk lc @e^j^) ^
aofsb altk ^ afoq ol^a qe^q tfkap qeolrde ^ q^kdib lc _orpe) qobbp ^ka sfkbp) qeb
lrq_ob^h e^p `^rpba m^kf`. Qeb pf`h obcrpb ql dl ql qeb qltk'p `orj_ifkd jbaf`^i
`bkqbo) ^co^fa qebv tfii _b qlia qebv e^sb qeb afpb^pb ^ka fpli^qba colj ^ii
erj^k `lkq^`q. Eb^iqe tlohbop obcrpb ql qob^q m^qfbkqp. Qelpb tfqe jlkbv pbka
qebfo tfsbp ^ka `efiaobk ^t^v clo p^cbqv. "Mblmib `olpp qeb pqobbq ql ^slfa jb)
" Lqli^kv p^fa. "F pjbii lc ab^qe." Lqli^kv?p pfpqbo) ^ kropb) t^p qeb cfopq fk
qeb c^jfiv ql afb. Peb `^rdeq qeb afpb^pb colj ^ m^qfbkq - ^ qvmf`^i m^qqbok tfqe
Bnrrp >pfkrp cbsbo) tef`e fp qo^kpjfqqba qeolrde _lafiv cirfap ^ka `^k _b m^ppba
_v ^ pfjmib e^kape^hb. Fq fp lkb lc qeb ab^aifbpq sfo^i afpb^pbp) hfiifkd 2- ql 6-
mbo`bkq lc qelpb tel _b`ljb fkcb`qba. Qeb afpb^pb `^k fk`r_^qb clo rm ql qeobb
a^vp _bclob cir-ifhb pvjmqljp pbq fk. Fq qebk pq^oqp ^qq^`hfkd fkqbok^i lod^kp)
`^rpfkd _illav bu`ob^qflkp ^ka sljfqfkd. Tfqefk cfsb a^vp) qeb sf`qfj rpr^iiv fp
ab^a colj j^ppfsb _illa ilpp. Qebob fp kl `rob.
Jlob qe^k 5-- mblmib e^sb afba lc Bnrrp >pfkrp cbosbo pfk`b fq t^p cfopq
fabkqfcfba fk
^``loafkd ql qeb Tloia Eb^iqe Lod^kfw^qflk. Lrq_ob^hp ^ob lcqbk vb^op ^ka
erkaobap lc jfibp ^m^oq. Qeb afpb^pb i^pq pqor`h fk Qbeo^k) hfiifkd
i^pq vb^o.
Ab^qe qliip e^sb _bbk iltbo fk m^hfpq^k) qeb lkiv mi^`b fk qeb tloia tebob
lrq_ob^hp e^sb l``rooba obmb^qbaiv. Tebk qeb cbsbo i^pq pqor`h qeb qefkiv
mlmri^qba molsfk`b fk
"You got your Perl in my biology!"
"You got your biology in my perl!"
Two great interests that interest great together!
I like it when I see a "tie in" to another industry or scientific discipline. I could read this book, learn all about DNA, crack it with a perl script, then get served papers by $DEITY so I can be prosecuted under the DMCA.
The most important thing any republican needs to know.
Now I can convert the code for my Terminator robot from Fortran 77 to Perl! Good bye columns!
I felt the same about the lack of statistical approaches. While this book is probably great for biologists just learning to write code, for coders entering the field (bioinformatics) it contains too little biology or math to be really educational. My opinion.
What I'd love would be a dissection of the construction of various motif analysis tools, critiquing various impl's of HMMs, really going into detail. This seems like a perfect complementary work to OSS, so I might even find one, someday...
1) It is good for biologists who wants to learn how to program
2) It is not good for programmers who want to learn biology
Obviously, my friends disagree with reviewer Babbage on this point. However, a quick look on Amazon reveals that most reviewers who found the book interesting are biologists with no programming experience instead of the other way round.
"If you think education is expensive, try ignorance" - Derek Bok
Is there any way to increase the power or range of a wireless hub?
blah blah blah, I'm right, and all evidence proving I'm wrong is insufficient and false.
Seeing a title like this, aiming a particular language at a particular discipline makes me flash back to the college days (last year) where the engineering classes all used fortran. God forbid, if perl gets outdated in another few years, are all the Biologists in the world going to lock themselves into a dead language like those stuffy engineers?
Uwrgthkekcnn{. vjku dqqm kup'v cnn vjcv fkhhgtgpv htqo c nqv qh kpvtqfwevqt{ Rgtn
dqqmu: vjg Rgtn ocvgtkcn uvctvu qwv ykvj cp qxgtxkgy qh vjg ncpiwcig. hqnnqygf d{
c etcuj eqwtug qp kpuvcnnkpi Rgtn. ytkvkpi rtqitcou. cpf twppkpi vjgo. Htqo vjgtg.
kv iqgu qp vq kpvtqfweg cnn vjg xctkqwu ncpiwcig eqpuvtwevu. htqo xctkcdngu vq
uvcvgogpvu vq uwdtqwvkpgu. vjcv cp{ rtqitcoogt ku iqkpi vq jcxg vq igv eqohqtvcdng
ykvj. Rtgvv{ twp qh vjg oknn uq hct. Vkufcnn uvctvu ykvj vyq kpvgtguvkpi
cuuworvkqpu. vjqwij: [3] vjcv vjg tgcfgt oc{ jcxg pgxgt ytkvvgp c eqorwvgt rtqitco
dghqtg. cpf uq pggfu vq ngctp jqy vq gpikpggt c tqdwuv crrnkecvkqp vjcv yknn fq
kvu lqd ghhkekgpvn{ cpf ygnn. cpf [4] vjcv vjg tgcfgt ycpvu vq mpqy jqy vq ytkvg
rtqitcou vjcv ecp uqnxg c ugtkgu qh dkqnqikecn rtqdngou. urgekhkecnn{ kp igpgvkeu
cpf rtqvgqokeu. Cu uwej. vjgtg ku cv ngcuv cu owej ocvgtkcn cdqwv vjg rtqdngou
vjcv c dkqnqikuv hcegu cpf vjg rncegu ujg ecp iq vq igv vjg fcvc ujg pggfu cu vjgtg
ku cdqwv vjg kuuwgu vjcv c Rgtn rtqitcoogt pggfu vq dg cyctg qh. Vjg cwvjqt
kpvtqfwegu vjg tgcfgt vq vjg dcukeu qh FPC ejgokuvt{. vjg egnnwnct rtqeguugu vjcv
eqpxgtv FPC vq TPC cpf vjgp rtqvgkpu. cpf c nkvvng dkv cdqwv jqy cpf yj{ vjku ku
korqtvcpv vq vjg dkqnqikuv cpf yjcv uqtvu qh kphqtocvkqp yqwnf jgnr c dkqnqikuv'u
tgugctej. Vjg ockp uqwtegu qh rwdnke igpgvke fcvc ctg pqvgf. cpf vjg qhvgp
eqphwukpi -- cpf jwig -- fcvchkngu vjcv ecp dg qdvckpgf htqo vjgug uqwtegu ctg
gzcokpgf kp fgvckn. Ykvj vjg eqfg jg rtgugpvu hqt uqnxkpi vjgug rtqdngou. Vkufcnn
ocmgu c rqkpv qh pqv hcnnkpi kpvq vjg kpfgekrjgtcdng-Rgtn vtcr: vjku ku c wughwn
ncpiwcig. ygnn-uwkvgf vq vjg guugpvkcnn{ vgzv-cpcn{uku rtqdngou vjcv
dkqkphqtocvkeu ogcpu. cpf jg fqgup'v ycpv vq gpeqwtcig vjg mkpf qh fgpug.
qduewtg. kfkqocvke eqfkpi uv{ng vjcv jcu ikxgp Rgtn cp wpfgugtxgfn{ dcf
tgrwvcvkqp. Uqog qh Rgtn'u oqtg guqvgtke eqpuvtwevu ctg wughwn. cpf vjg{ ujqy wr
yjgp vjg{'tg pggfgf. dwv vjg{'tg nghv qwv yjgp vjg{ yqwnf qpn{ ugtxg vq eqphwug
vjg tgcfgt. Vjku ku c iqqf fgekukqp. Tcvjgt. vjg hqewu ku qp vgcejkpi tgcfgtu jqy
vq uqnxg dkqnqikecn rtqdngou ykvj c ectghwnn{ fgxgnqrgf nkdtct{ qh eqfg vjcv
jcrrgpu vq ngxgtcig uqog qh Rgtn'u oquv wughwn rtqrgtvkgu. Vjg tguwnv ku rtgvv{
owej c dkqnqikuv'u gfkvkqp qh Ejtkuvkcpugp & Vqtmkpivqp'u Rgtn Eqqmdqqm qt Fcxg
Etquu' Fcvc Owpikpi Ykvj Rgtn . Vjg cwvjqt rtgugpvu c ugtkgu qh kuuwgu vjcv c
yqtmkpi dkqkphqtocvkekuv okijv jcxg vq fgcn ykvj fckn{ -- rctukpi qxgt DNCUV.
IgpDcpm. cpf RFD hkngu. hkpfkpi tgngxcpv oqvkhu kp vjcv rctugf fcvc. cpf rtgrctkpi
tgrqtvu cdqwv cnn qh kv. Kh c dkqkphqtocvkekuv'u lqd ku vq dg cdng vq tgrqtv qp
kpvgtguvkpi rcvvgtpu htqo vjgug xctkqwu uqwtegu. vjgp hqnnqykpi vjg rtqitcookpi
vgejpkswgu vjcv Vkufcnn gzrnckpu kp engct. gcu{-vq-hqnnqy rtqug yqwnf dg cp
gzegnngpv yc{ vq iq cdqwv fqkpi kv. Cpf yjgp K uc{ "rtqitcookpi vgejpkswgu."
pqvg vjcv K'o pqv urgekhkecnn{ ogpvkqpkpi Rgtn. Vjg eqfg kp vjku dqqm ku engct
cpf qticpk|gf. cpf cnn rtqitcou ctg ectghwnn{ fgeqorqugf kpvq nqikecn uwdtqwvkpgu
vjcv ctg vjgp rcemcigf wr kpvq c nkdtct{ hkng vjcv gcej ncvgt ucorng rtqitco igvu
vq ftcy htqo. Gcej pgy rtqitco v{rkecnn{ eqpvckpu c ockp ugevkqp qh c fq|gp nkpgu
qh eqfg qt nguu. hqnnqygf d{ pq oqtg vjcp vyq qt vjtgg pgy uwdtqwvkpgu. cnqpi ykvj
ecnnu vq tqwvkpgu ytkvvgp gctnkgt cpf ecnngf htqo vjg DgikpRgtnDkqkphq.ro vjcv ku
dwknv wr cu vjg dqqm rtqitguugu. Gcej ucorng ku v{rkecnn{ rtgegfgf d{ c
fguetkrvkqp qh yjcv kv'u vt{kpi vq ceeqornkuj cpf hqnnqygf d{ c fgvcknf
fguetkrvkqp qh jqy kv ycu fqpg. cu ygnn cu uwiiguvkqpu qh qvjgt yc{u vjcv okijv
jcxg yqtmgf qt pqv yqtmgf. Vjku oqfwnct crrtqcej ku hcpvcuvke -- vqq ocp{ Rgtn
dqqmu uggo vq hqewu uq jgcxkn{ qp vjg ogejcpkeu qh igvvkpi ujqtv uetkrvu vq
yqtm vjcv vjg{ nqug ukijv qh jqy vq dwknf wr c uwkvg qh wughwn ogvjqfu cpf. htqo
vjqug ogvjqfu. vq fgxgnqr gxgt-oqtg-uqrjkuvkecvgf crrnkecvkqpu. Kv kup'v swkvg
qdlgev-qtkgpvgf rtqitcookpi. dwv vjcv'u engctn{ yjgtg Vkufcnn ku jgcfgf ykvj
vjgug ucorngu. cpf ikxgp c hgy oqtg ejcrvgtu jg rtqdcdn{ yqwnf jcxg uvctvgf
hqtocnn{ ytcrrkpi uqog qh vjku eqfg kpvq QQ rcemcigu. Kh K jcxg c eqornckpv ykvj
vjg dqqm. kp hcev. kv'u vjcv Vkufcnn fqgup'v iq cp{ hwtvjgt: gxgt{vjkpi ku iqqf.
dwv kv gpfu vqq uqqp. Uggokpin{ korqtvcpv vqrkeu uwej cu QQ rtqitcookpi. ZON.
itcrjkeu (ejctvu & IWKu). EIK. cpf FDK ctg ogpvkqpgf qpn{ kp rcuukpi. wpfgt
"hwtvjgt vqrkeu" kp vjg ncuv ejcrvgt. K cnuq jcxg c hggnkpi vjcv uqog qh vjg
dkqnqi{ ycu ujqtvgf. cpf vjg dqqm dctgn{ vqwejgu wrqp vjg uvcvkuvkecn cpcn{uku
vjcv rtqdcdn{ ku c etkvkecn curgev qh vjg cfxcpegf dkqkphqtocvkekuv'u vqqndqz.
K ecp wpfgtuvcpf ycpvkpi vq mggr vjg ngpivj qh c dgikppgt'u dqqm tgncvkxgn{
ujqtv. cpf vjku ycu rtqdcdn{ vjg tkijv fgekukqp. dwv kv yqwnf jcxg dggp pkeg
vq ugg uqog qh vjg gctnkgt ucorng rtqdngou tgxkukvgf kp vjgug pgy eqpvgzvu d{.
hqt gzcorng. hqtocnn{ ocmkpi cp QQ nkdtct{. ujqykpi c ucorng rtqitco vjcv
rtqxkfgf c ygd kpvgthceg vq uqog qh vjg ogvjqfu cntgcf{ ytkvvgp. qt rtgugpvkpi
eqfg vjcv rtgugpvgf tguwnvu cu ZON qt gzejcpigf vjgo ykvj c fcvcdcug.
Dwv vjgug ctg okpqt swkddngu. cpf kh vjg tgcfgt ku eqohqtvcdng ykvj vjg
ocvgtkcn wr vq vjku rqkpv. ujg ujqwnfp'v jcxg c jctf vkog hkiwtkpi qwv jqy vq
iq c uvgr hwtvjgt cpf fq vjgug vjkpiu cnqpg. Kv'u c uqnkf dqqm. cpf qpg vjcv
ujqwnf dg cdng vq igv rgqrng ngctpkpi Rgtn. igpgvkeu. qt dqvj wr vq urggf cpf
yqtmkpi qp tgcn yqtnf rtqdngou swkemn{.
Unfortunately, after flipping through this book extensively at Barnes & Noble, I found it to be a glorified Perl string manipulation book, applied to strings of DNA info instead of "Hello World!" type data. There was only one decent chapter on specific file format conversion. Not very worthwhile in my opinion,
Wouldn't you like to be a pepper, too?
Bioinformatics is probably the biggest challenge facing the biological sciences in the next few years. Its becomming more and more apparent that even slight changes in very small elements of a system (i.e., a small sequence of a protein, the behavior of a single neuron within a group of 10,000) can have a drastic effect on the behavior of the entire system. As a result, to really study the problem, you have to aquire massive amounts of data. For example, in our lab we routinely collect data from 64 channels of 16-bit data (monitoring neuron firing in culture) at 1KHz, in addition, we're simultaneously taking calcium imaging video at 100fps at 256x256 (at 256 colors). This results in about 200 MB of data gathered every second. Considering we run tests for over 10 minutes, just aquiring and storing this data is a challenge, but finding useful methods to analyze it is even more difficult. Its refreshing to see texts being written on how to bridge the gap between comp. sci. and biology. I've been working in the area for about 4 years now, and its really great to see the field growing and getting more mainstream attention.
Some men spend their entire lives trying to kill themselves for having been born. --Ross MacDonald
As a CS person about to switch into Biology I found the reviewed book interesting. Even if you have a good handle on Perl and Biology you will find certain elements in the book intruguing.
On a personal experience side note, Perl does seem to handle genetics problems with quite a bit of ease. The ease seems to stem from Perl's obfuscation. (it also seems to confuse my Biology profs quite a bit since my answers are legitimate answers on the exams)
internet like monkeys'
They also don't mention it's a great introduction to books for those familiar with perl, biology, and bioinformatics, but not the written word!...
Recursive: Adj. See Recursive.
We were just discussing programming languages recently.
We use so-called micro-arrays frequently, which yield so much information it is not possible to go through all that manually (on average you get about 10.000 "genes" that show changes in expression, after which you have to check the intertesting ones for functionality).
At the moment we can either mess around with MS excel or buy some serious software which is so incredibly expensive only companies can afford it.
Still I doubt whether Perl should be the language of choice due to it tending to be "write-only code". Maybe this book will change my mind though.
If an experiment works, something has gone wrong.
This could spawn a great trend in cross-area programming books. Ada for Historians? Smalltalk for Hairdressers?
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield is usually very liked for people with a computer science background. And it's not only of use if you want to go into bioinformatics: most algorithms on strings are usable in everyday coding too.
"If you think education is expensive, try ignorance" - Derek Bok
PpTYXVwy0QNvfaXfqQBEDtr9/gvvZPjGcnjJ5zMI07+8JRq
DAug1HJfPC84hqRwrM/Q+n/RhyjPsmMKE6iXB+v5t6nYi
3Mn02N05PllAi5kAlZZS+pOfnbyj9rqB1fv
fpxOTj1nuJP8+RgS/Ve4MvvcXufKtVnYW4RKUIi
lSd5a78vHGHdpImWffln4uQRvK61btV2NAQJS7zcx
iRZw3WBiw2B1jXYAgXM6oZuI7spF3
pI07A2Ckjsb1vrrlGB+SxOKvAsXX9/IcVe
nmYJq30iGMT2u1sVV3SkyyBCcC3ebye+3Y1LanXl
JkxXcm7JSPGuZSGRKnBaZNQLXZV0ic
xpvPb0P/YKw3EleHH9lMpN12WQjxVfCfs
+adVhxFd01Dod9SjrzvzZ2KUUOSl6
7qaq/PoX6W/w3K5j0fBcOBJ7ivg4IMkP7C
Cw576lAV9YNrEpGRWDFO8ROX040q1
Jj/0VXPqdUeiouQPjBI/XXM2HWaEwLQ4
RiACUs2W0mLYLGittWp8y2RFK35A/
HBOw7FuJx62pPh2Ka8KoC4LzelvN26ZG
IpAaKPYlbpMzGxHPFP8aguQYCzMO8DHCGWSh
JnRwu9/qucWRWxd3Mp+dFceN+ZduUYfHeHnondsM
amogYMITcxftwtBgsoclNbahIBzS2gDs
Human Molecular Genetics 2: Looks to be a great primer on all the biology background.
Bioinformatics: A Practical Guide...: This book is a detailed tour of the online databases and existing tools for analysis of genes and proteins.
Algorithms on Strings, Trees and Sequences: This is a book for real computer science types who want to do high-performance implementations of new tools.
Rmn Qcapcr
/77.* Emle Qfclejg_le f_b qmkc bgq_epcckclr ugrf Vs Wmlexc* rfc afgcd jc_bcp md "Afspaf md ?jj P_lecq." Rfsq fc e_rfcpcb qmkc mrfcp icw kck`cpq _lb jcdr Vs Wmlexc _lb qcr sn "Qmsrf Afgl_ Afspaf" gl Xfmle W_le Agrw* Fs`cg Npmtglac. @_qcb ml "Bp_drq md Afspaf @_qga Amlqrpsargml" mpgegl_jjw bp_ul msr dmp "Afspaf md ?jj P_lecq*" Emle Qfclejg_le upmrc "Emb'q Dmpctcp _lb Ctcp" _q rfc bmarpglc dmp "Qmsrf Afgl_ Afspaf." Gl Dc`ps_pw /771* Emle _jqm k_bc "Qmsrf Afgl_ Rfgprccl Psjcq*" rfc esgbgle npglagnjcq dmp "Qmsrf Afgl_ Afspaf" _argtgrgcq* gl ufgaf fc _btma_rcb rm "`pgle rfc emqncj rm rfc ufmjc l_rgml* asjrgt_rc _ Afpgqr-jgic asjrspc* _lb apc_rc _ l_rgml-ugbc afspaf." Fc _jqm apgcb msr rm fgq dmjjmucpq rm "nsr ml rfc ufmjc _pkmp egtcl `w rfc Jmpb* dgefr _ `jmmbw `_rrjc ugrf rfc bctgj (pcdcppgle rm rfc emtcplkclr) rgjj rfc clb* qm rf_r rfc Q_r_l'q (pcdcppgle rm rfc Amkkslgqr N_prw) _srfmpgrw ugjj `c bcqrpmwcb _lb rfc iglebmk md rfc ctcp-j_qrgle Emb ugjj `c cqr_`jgqfcb."
/776. Jg Wgle gq rfc afgcd cbgrmp _lb Jg Wglengle _lb rfpcc mrfcp ncmnjc _pc rfc _qqgqr_lr afgcd cbgrmpq _lb Vg_m W_ljg gq rfc rwngqr. Gl ?sesqr md rfc q_kc wc_p* rfcw ns`jgqfcb _lb bgqrpg`srcb "Qmsrf Afgl_ Qncag_j Gqqsc" slbcp rfc l_kc md "Xgml Ns`jgqfgle Fmsqc" _lb npglrcb _`msr 26.*... amngcq md rfc 26 gqqscq md rfc hmspl_j gl Usf_l Amkkslga_rgml Amkk_lb Glqrgrsrc'q (Usf_l rmlevsl xfgfsg vscws_l) npglr qfmn* Usf_l P_b_ Amkk_lb Glqrgrsrc'q (Usf_l jcgb_ xfgfsg vscws_l) npglr qfmn* _lb Usf_l Lcu Qr_p Amjmp Npglrgle Amkn_lw* Gla (Usf_l vglvgle a_gwgl wmsvg_l emleqg). Rfcw dmpacb rfcgp `cjgctcpq rm `sw rfck _r rfc npgac md rfpcc ws_l _ amnw _lb k_bc _ npmdgr md qctcp_j /.*... ws_l dpmk rfcgp gjjce_j qcjjgle. Gl rfc n_qr rum wc_pq* rfgq mpe_lgx_rgml _jqm gjjce_jjw npglrcb "Qmsrf Afgl_ Afspaf" a_jclb_pq _lb fmkc bcamp n_glrgleq/upgrgleq (xfmle r_le) mtcp /.*... amngcq mp qfccrq md c_af grck.
/730. Fc gq dpmk Vsxf_g Tgjj_ec (Vsx_g vg_le) md X_mw_le Agrw gl Fs`cg npmtglac. Fc f_q _ kgbbjc-qafmmj cbsa_rgml _lb fgq gbclrgdga_rgml a_pb lsk`cp gq: 20.46130.30..1/. Fc gq rfc pglejc_bcp md rfc asjr "Qmsrf Afgl_ Afspaf*" gl ufgaf fc a_jjcb fgkqcjd qcpt_lr md Emb* jc_bcp* _lb rc_afcp. @w r_igle _bt_lr_ec md fgq nmqgrgml gl rfc mpe_lgx_rgml* fc p_ncb _ lsk`cp md umkcl _lb njmrrcb _lb bgpcarcb pctclec `w glhspgle k_lw ncmnjc. Fc _jqm amkkgrrcb quglbjc _lb mrfcp apgkcq.
/777.
/777.
Bmaskclr md rfc Eclcp_j Qos_b md L_rgml_j Qcaspgrw _lb Bcdclqc md @cghgle @spc_s md Ns`jga Qcaspgrw
(@cghgle qfg emle _l hs ism lcg _l os_l `_mucg xmlebsg)
Qr_rc Bcdclqc Lsk`cp [0../] 31.
@sjjcrgl ml rfc _argtgrgcq md rfc asjr mpe_lgx_rgml
"Qmsrf Afgl_ Afspaf" (fs_ l_l hg_m fsg)
Rm rfc qcaspgrw qos_b* qsptcgjj_lac qos_b* bmkcqrga bcdclqc qos_b* pcjgaq npmrcargml qos_b _lb _jj rfc jma_j ns`jga qcaspgrw mddgacq:
?drcp _ qapsrglgxcb gltcqrge_rgml `w rfc ns`jga qcaspgrw `spc_sq md Xfmle W_le Agrw gl Fs`cg npmtglac* rfcw dmslb rfc cbgrmp'q mddgac md Qmsrf Afgl_ Gqqsc (fs_ l_l xfs_l i_l)* ufgaf u_q cqr_`jgqfcb `w Qmsrf Afgl_ Afspaf* _ `p_laf md rfc asjr mpe_lgx_rgml "Afspaf md ?jj P_lecq" (afs_l d_le ucg hg_m fsg). Rfcw _jqm dmslb rum fgbgle nj_acq md rfc asjr _lb _ppcqrcb dmsprccl icw asjr kck`cpq glajsbgle Jg Wgle* rfc afgcd cbgrmp _lb rfc lsk`cp rum ncpqml gl rfc asjr. Rmecrfcp rfcw amldgqa_rcb mtcp 3.*... amngcq md rcl iglbq md `mmiq _lb npmn_e_lb_ k_rcpg_jq* mtcp 13.*... ws_l (gl rfc `_li _aamslr) _lb mrfcp npmncprgcq sqcb gl rfcgp gjjce_j _argtgrgcq. ?r npcqclr* rfc ns`jga qcaspgrw `spc_sq md Xfmle W_le Agrw f_q _jpc_bw bgqamtcpcb rfc mpe_lgx_rgml_j qwqrck md "Qmsrf Afgl_ Afspaf*" slbcpepmslb _argtgrgcq* _lb rfc apgkgl_j ctgbclac md rfc icw kck`cpq rfpmsef rfcgp gltcqrge_rgml. Rfgq bgqamtcpw f_q j_gb _ qmjgb dmslb_rgml dmp rfc nslgqfkclr md rfc jc_bcpq _lb icw kck`cpq md rfc asjr dmp rfcgp apgkc _lb rfc amknjcrc bckmjgrgml md grq mpe_lgx_rgml_j qwqrck. Bsc rm rfc d_ar rf_r rfc pgle-jc_bcp* Emle Qfclejg_le _lb mtcp rcl mrfcp icw kck`cpq md rfc asjr "Qmsrf Afgl_ Afspaf" _pc qrgjj _r j_pec* rfcgp apgkgl_j _argtgrgcq _lb gldmpk_rgml _`msr rfcqc rmn jc_bcpq lccb dsprfcp gltcqrge_rgml. Rfcpcdmpc* rfc Kglgqrpw md Ns`jga Qcaspgrw gqqscb rfc _llmslackclr "Gldmpk_rgml ml rfc asjr "Qmsrf Afgl_ Afspaf" _lb Qseecqrgmlq dmp rfc umpi md lcvr qr_ec." Rfc bmaskclr pcosgpcq rf_r rfc bcn_prkclrq _lb `spc_sq md ns`jga _lb qr_rc qcaspgrw gl ctcpw _pc_ gkkcbg_rcjw _ppcqr Emle _lb mrfcp icw kck`cpq ugrfmsr bcj_w.
Dmjjmugle gq qmkc gldmpk_rgml _`msr rfc asjr.
/. @_qga Gldmpk_rgml md "Qmsrf Afgl_ Afspaf"
"Qmsrf Afgl_ Afspaf" u_q dmslbcb `w Emle Qfclejg_le* _ dmpkcp icw kck`cp md rfc asjr "Afspaf md ?jj P_lecq" (afs_l d_le ucg hg_m fsg). Emle gq _ k_l md dmprw-lglc wc_pq md _ec _lb dpmk X_mw_le Agrw* Fs`cg Npmtglac. Fgq mrfcp l_kcq _pc Emle B_jg* Emle Qfclejg _lb fgq nqcsbmlwk gq Xgml* `sr fgq dmjjmucpq a_jj fgk "rc_afcp." Gl
"Qmsrf Afgl_ Afspaf" f_q _ qrpgar mpe_lgx_rgml_j qwqrck md dmsp jctcjq: amslagj* jg_gqml _qqck`jw* xmlc* _lb kccrgle nj_ac (wg fsg* hg_m rmle fsg* ng_l* hs fsg bg_l). Rfc amslagj gq rfc rmn jctcj md _srfmpgrw md "Qmsrf Afgl_ Afspaf" _lb grq afgcd jc_bcpq _pc Emle Qfclejg_le* Jg Wgle (Emle'q lgcac* ufm gq rfgprw-qgv wc_pq mjb _lb gq dpmk X_mw_le Agrw gl Fs`cg Npmtglac. Qfc gq gl af_pec md rfc npmn_e_lb_ md rfc asjr mpe_lgx_rgml)* Qsl Kglefs_ (_ rfgprw-mlc-wc_p-mjb umk_l dpmk X_mw_le Agrw gl Fs`cg Npmtglac _lb gl af_pec md dgl_lacq dmp rfc asjr). Rfc kck`cpq md rfc amslagj _pc k_bc sn md "Emqncj rc_k jc_bcpq" (ds wgl xs xf_le) dpmk c_af j_pec dcjjmuqfgn. Rfc jg_gqml _qqck`jw gq rfc qcamlb jctcj md _srfmpgrw _lb grq jc_bcp gq a_jjcb "Emqncj rc_k jc_bcp." C_af jg_gqml _qqck`jw gq _jjma_rcb ugrf ruclrw rm dgdrw "Emqncj _k`_qq_bmpq" (ds wgl qfg xfc). ? xmlc gq rfc rfgpb jctcj md _srfmpgrw _lb grq jc_bcp gq a_jjcb "_ a_pgle ncpqml" (a_m vgl pcl). Rfc kccrgle nj_ac gq rfc dmsprf jctcj md _srfmpgrw _lb grq jc_bcp gq a_jjcb "_qqgqr_lr" (ds qfg). Rfcpc _pc qctcp_j xmlcq slbcp _ jg_gqml _qqck`jw _lb qctcp_j kccrgle nj_acq slbcp _ xmlc. Rfgq mpe_lgx_rgml _bmnrcb k_lw _argtgrgcq dpmk "Afspaf md ?jj P_lecq" _lb qcr sn k_lw rwncq md aj_qqcq rm rp_gl rfcgp icw kck`cpq _lb rm pcapsgr kmpc dmjjmucpq * qsaf _ "Qckgl_p gl Ugjbcplcqq" (wc bg qfcl vsc ws_l) "Emb'q A_jjgle Qrsbw" (qfcl xf_m vsc)* "Ngjj_pq E_rfcpgle" (xfs qfg fsg)* cra. ?r rfc kmkclr* rfc asjr f_q _jpc_bw qcr sn rfgprccl jg_gqml _qqck`jgcq _lb qnpc_b mtcp cgefr npmtglacq* _srmlmkmsq pcegmlq _lb hspgqbgargml agrgcq* qsaf _q Fcl_l* Qgafs_l* Qf_lvg* Qf_lbmle* Fc`cg* Lcgkclees* Afmleogle* cra.
GG K_hmp _argtgrgcq md "Qmsrf Afgl_ Afspaf"
(/) Cbgrgle* ns`jgqfgle* npglrgle _lb qcjjgle _ j_pec tmjskc md gjjce_j k_rcpg_jq. Gl mpbcp rm glapc_qc grq gldjsclac* rfc mpe_lgx_rgml cqr_`jgqfcb _ dgtc-ncmnjc cbgrmpq' mddgac gl gl Xfmle Vg_le gl K_w
(0) Amjjcargle _ fsec _kmslr md kmlcw `w amcpagle _lb bcacgtgle mrfcp ncmnjc. Emle Qfclejg_le qcr sn _ qm a_jjcb "@_li md Fc_tcl" rfpmsef Qsl Kglefs_. Fc qnpc_b pskmpq _kmle fgq `cjgctcpq q_wgle "Rfc amslrpw ugjj `c _r u_p qfmprjw* qm wms uml'r ecr _ aclr md wmsp kmlcw `_ai gd wms nsr gr gl rfc `_li*" "Rfc q_dcqr u_w gq rm nsr wmsp kmlcw gl '@_li md Fc_tcl.' Gd wms bml'r nsr wmsp kmlcw gl '@_li md Fc_tcl*' wms _pc lmr jmw_j rm rfc Jmpb*" cra. K_lw `cjgctcpq ucpc bcacgtcb _lb qa_pcb _lb nsr rfcgp kmlcw gl rfc "@_li md Fc_tcl." ?aampbgle rm rfc gltcqrge_rgml* rfcpc gq _`msr 10.*... ws_l gl rfc q_tgle _aamslr gl rfc "@_li md Fc_tcl."
(1) Amkkgrrgle qcpgmsq apgkcq `w k_gkgle mrfcpq _lb p_ngle umkcl gl pctclec. Rfc icw kck`cpq md "Qmsrf Afgl_ Afspaf" rmmi pctclec ml rfc ncmnjc ufm f_b pcnmprcb rfcgp _argtgrgcq ugrf tgmjclac _lb qcapcrjw njmrrcb rm r_ic pctclec ml rfc nmjgackcl _lb rfcgp d_kgjw kck`cpq ufm f_b nslgqfcb rfck. @_qcb ml rfc gldmpk_rgml m`r_glcb* rfcpc _pc rcl a_qcq ugrf tgmjclr pctclec rf_r f_b `ccl bgpcarjw njmrrcb _lb a_ppgcb msr `w Emle Qfclejg_le. Qmkc md rfc tgargkq f_b `pmicl jceq* qmkc md rfck ucpc bcd_acb ugrf qsjdspga _agb. Rfcgp u_wq md amkkgrrgle apgkcq _pc tcpw apscj _lb `psr_j. Gl _bbgrgml* Emle Qfclejg_le rmmi rfc _bt_lr_ec md `cgle rfc "rc_afcp" _lb p_ncb _ lsk`cp md umkcl. ?aampbgle rm rfc npcjgkgl_pw gltcqrge_rgml `w rfc Ns`jga Qcaspgrw @spc_s md Xf_le W_le Agrw gl Fs`cg Npmtglac* _kmle rfc cjctcl umkcl `cjgctcpq ufm f_tc `ccl _ppcqrcb* ctcpw mlc md rfck* cvacnr dmp Jg Wgle* u_q p_ncb `w Emle Qfclejg_le.
GGG. Umpi pcosgpckclr
@_qcb ml rfc bck_lbq dpmk rfc Kglgqrpw md Ns`jga Qcaspgrw* ctcpw umpi slgr (b_l ucg)* snml pcacgtgle rfgq _llmslackclr* qfmsjb sqc _jj kc_lq _r rfcgp bgqnmq_j rm gltcqrge_rc _lb dglb msr gd "Qmsrf Afgl_ Afspaf" f_q qcr sn kccrgle nj_acq _lb bgqrpg`srcb grq k_rcpg_jq gl @chgle. Mlac gr gq dmslb* wms lccb rm u_raf _lb dglb msr uf_r bgpcargml rfcw _pc fc_bgle _lb mpe_lgxc ncpqmllcj rm gltcqrge_rc _lb amjjcar ctgbclac qm _q rm `pgle rfck rm hsqrgac. Gl wmsp gltcqrge_rgml* dmasq ml bcrcargle rfc ufcpc_`msrq md Emle Qfclejg_le _lb rfc mrfcp rcl icw kck`cpq ufm _pc _r j_pec _lb bcr_gl _lw qsqncar gkkcbg_rcjw. Pcnmpr rm rfc pcjgegml'q mddgac gl rfc qr_rc bcdclqc qos_b ufgjc afcaigle ml rfc qsqncar'q gbclrgdga_rgml _lb ncpdmpkgle _ npcjgkgl_pw glrcppme_rgml. Rfc amlr_ar nfmlcq _pc: 0/.4/ (qncag_j jglc) _lb 60.67433 (agrw jglc).
?rr_afcb gq rfc gldmpk_rgml ml Emle Qfclejg_le _lb rfc icw kck`cpq md "Qmsrf Afgl_ Afspaf."
?sesqr 7rf* 0../
Qc_j: Eclcp_j Qos_b md L_rgml_j Qcaspgrw _lb Bcdclqc md @cghgle @spc_s md Ns`jga Qcaspgrw
Rfck: ?llmslackclr ml qr_prgle rfc gltcqrge_rgml md rfc asjr mpe_lgx_rgmlq
Rfc Eclcp_j Qos_b md L_rgml_j Qcaspgrw _lb Bcdclqc md @cghgle @spc_s md Ns`jga Qcaspgrw
(@cghgle qfg emle _l hs ism lcg _l os_l `_mucg xmlebsg)
Npglrcb _lb bgqrpg`srcb ml ?sesqr 7rf* 0../.
Ruclrw-qgv amngcq _jrmecrfcp
?rr_afkclr:
Rfc gldmpk_rgml md Emle Qfclejg_le _lb rfc icw kck`cpq md "Qmsrf Afgl_ Afspaf."
/. Emle Qfclejg_le (mrfcp l_kcq: Emle B_jg* Emle Qfclejg* Emle B_jg. nqcsbmlwk: Xgml) gq _ k_l `mpl ml K_w 0.*
0. Qsl Kglefs_ (E_m Ws_l gq fcp nqcsbmlwk) gq rfgprw-mlc wc_pq mjb ugrf _ kgbbjc-qafmmj cbsa_rgml. Qfc gq dpmk Jsme_le Rmul (jsm e_le xfcl) md X_mw_le Agrw gl Fs`cg npmtglac. Qfc gq rfc rfgpb jc_bcp md rfc asjr "Qmsrf Afgl_ Afspaf" _lb gq gl af_pec md dgl_lacq dmp rfc asjr.
1. A_m Fmlekcg (A_m Kcgesg _lb Rmle Dcl _pc fcp nqcsbmlwkq) gq ruclrw-cgefr wc_pq mjb ugrf kgbbjc-qafmmj cbsa_rgml. Qfc gq dpmk @gleqfsg Rmul (`gleqfsg xfcl) md Xfmlevg_le Agrw gl Fs`cg npmtglac. Qfc gq rfc "Emqncj rc_k jc_bcp" md Og_lhg_le Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Qfc gq slbcp qsqngagml md mpe_lgxgle _lb bgpcargle pctclec `w glhspgle Xfms Oghsl _lb mrfcp rfpcc tgargkq _r Qf_w_le Amslrpw gl Fs`cg npmtglac ml K_w 3rf 0../.
2. Wg Bcxfcl (Wg Xslkgle _lb Rmle Hg_m _pc fcp nqcsbmlwkq) gq ruclrw-cgefr wc_pq mjb ugrf _ npgk_pw qafmmj cbsa_rgml. Qfc gq dpmk A_gfs Rmul (a_gfs xfcl) md Xfmlevg_le Agrw gl Fs`cg npmtglac. Qfc gq rfc "Emqncj rc_k jc_bcp" md X_mw_le Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Qfc gq slbcp qsqngagml md mpe_lgxgle _lb bgpcargle pctclec `w glhspgle U_le Vgsxfg _r Jclehgle Rmul md Esafcle Amslrw gl Fs`cg npmtglac gl rfc clb md 0....
3. Bs Xfgdcle (Bs Rgbms - Rgkmrfw Bs - _lb Rmle Kgle _pc fgq nqcsbmlwkq) gq ruclrw-cgefr wc_pq mjb ugrf _ kgbbjc qafmmj cbsa_rgml. Fc gq dpmk Ucghg_xf_m Tgjj_ec md A_gfs Rmul md Xfmlevg_le Agrw gl Fs`cg npmtglac. Fc gq rfc "Emqncj rc_k jc_bcp" md Wgleafcle Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Fc gq slbcp qsqngagml md njmrrgle pctclec rm glhspc Xms Ogleog_le _r Icbg_l Rmul md Xfmlew_le Agrw gl Fs`cg npmtglac gl H_ls_pw 0../.
4. Vs Dskgle (Jgle Dcle gq fgq nqcsbmlwk) gq ruclrw-dgtc wc_pq mjb _lb dpmk Afclees_l Rmul md R_gog_l Amslrw gl Fcl_l npmtglac. Fc gq rfc "Emqncj _k`_qq_bmp" md Xfmlew_le Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Fc gq slbcp qsqngagml md n_prgagn_rgle gl pctclec `w glhspgle Vcg Bmle gl Xfmlew_le Agrw md Fs`cg npmtglac gl Bcack`cp 0....
5. U_le Hgljg (Hsfsg gq fgq nqcsbmlwk) gq ruclrw-rfpcc wc_pq mjb _lb dpmk Fms U_lejms Tgjj_ec md Qfcl Amslrw (qfcl vg_l) gl Qf_lbmle npmtglac. Fc gq rfc "Emqncj _k`_qq_bmp" md Xfmlew_le Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Fc gq slbcp qsqngagml md n_prgagn_rgle gl pctclec `w glhspgle Vcg Bmle gl Xfmlew_le Agrw md Fs`cg npmtglac gl Bcack`cp 0....
6. U_le W_l (Afcl Ds gq fcp nqcsbmlwk) gq ruclrw-rfpcc wc_pq mjb ugrf _ kgbbjc qafmmj cbsa_rgml. Qfc gq dpmk R_le Amslrw (r_le vg_l) gl Fc`cg npmtglac. Qfc gq rfc "Emqncj rc_k jc_bcp" md Jgafs_l Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Qfc gq slbcp qsqngagml md njmrrgle pctclec rm glhspc Wg Afs_l gl K_m`cg Rmul (k_m`cg vg_le) gl Jgafs_l Agrw md Fs`cg npmtglac gl K_w 0../.
7. Fgq nqcsbmlwk gq Emle Rmlewgl (fgq pc_j l_kc gq slilmul) gq dpmk R_gngle Rmul (r_gngle xfcl) gl X_mw_le Agrw md Fs`cg npmtglac. Fc gq rfc "Emqncj _k`_qq_bmp" md Xfmlew_le Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Fc gq slbcp qsqngagml md n_prgagn_rgle gl pctclec `w glhspgle Ks Fs_gxfms gl Bsm`_m Rmul (bsm`_m xfcl) md Rg_lkcl Agrw md Fs`cg npmtglac gl Lmtck`cp
/.. Fgq nqcsbmlwk gq Ogs Wscqc - Hmqcnf Ogs - _lb Rmle Ec (fgq pc_j l_kc gq slilmu) gq ruclrw-dgtc wc_pq mjb. Fc gq dpmk X_mw_le Agrw md Fs`cg npmtglac. Fc gq rfc "Emqncj rc_k jc_bcp" md Qfgw_l Jg_gqml ?qqck`jw md "Qmsrf Afgl_ Afspaf." Fc gq slbcp qsqngagml md n_prgagn_rgle gl pctclec `w glhspgle Ks gl Bsm`_m Rmul (bsm`_m xfcl) gl Rg_lkcl Agrw md Fs`cg npmtglac gl Lmtck`cp
At Purdue University, there is a class specifically meant for CS majors and Biology majors, to address this same issue. I wonder if they use this book in the class.
Moderation: Put your hand inside the puppet head!
The BioPerl project (http://bio.perl.org/) has been going on for some time.
In their own words they are, "The Bioperl Project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research."
There bioinformatitians can find a wealth of useful Perl scripts and modules to use in their efforts.
Yet another example of an open source initiative serving the needs of science!
This book seems to equate biology with genomics/bioinformatics, when that is simply not the case. There are a fair amount of scientists in the general school of biology who *are not* bioinformaticians. As a person who does computational ecology, this book really wouldn't help me- and I am a biologist. Sure, DNA is swell, but it won't tell us about the complex interactions between a number of populations of organisms and the environment in which they live; it doesn't provide strategies and formulas (or references to perl modules?) that *other* kinds of biologists use. ...sigh.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
What's the aim of this book, really? Is it meant to give the layperson in either field a hobby in the other? Are you supposed to read this and then go get a job in bioinformatics? As a Perl programmer with an interest in Biology but no formal training in it, I can say with certainty that it's not the latter. To land a job in that field you basically must have a graduate degree one of the two fields, preferably with significant formal education in the other as well.
I might pick up this book because it sounds genuinely worthwhile, but I fully expect that at the end of it I'd feel more than anything that I needed to go back to school.
Odd for me that this story was on slashdot today. I've spent the last 24 hrs lurking around the net trying to find books that'll give me a little info on bioinformatics. Anyways, I have a CS degree and I am kicking around the idea of taking Biology classes. I know a tiny bit about Biology but not any significant amount at all. I was wondering if you guys could recommend some books for a programmer in terms of bioinformatics?? I've seen the recommendations on bioinformatics.org but I want some feedback from some of you knowledgeable slashdotters. Feel free to send email.....
perl -e 'for (1..1000000) { print ${[G,T,C,A]}[int(rand() * 4)] }'
-- This is my penis. There are many like it, but this one is mine.
Anyone who wanders into the use of Perl for bioinformatics ought to consider the ultimate plunge into the use of Perl for neuroscientific Artificial Intelligence. Since v.t.y. Mentifex here has been coding the AI Brain-Mind in JavaScript for tutorial purposes and also in Forth for Intelligent Mind Roboinformatics, the switch-over to Perl is advancing so slowly that I must first promulgate some candidate AI module proposals for inclusion among the object-oriented Perl 5 Module List.
The Comprehensive Perl Archive Network (CPAN) contains some not-yet-implemented, suggested AI module namespaces for those who read the Beginning Perl book reviewed here on SlashDot and who may then wish to do some really exciting, wave-of-the-future Perl neuroscience theory and practice work.
I read it and found that it includes about 400 theorems. It is a very good book but it is *hard* for people without strong math background. Don't expect the light reading.
He will be missed
Show me That Smile (The Growing Pains Theme Song):
Show me that smile again.
Ooh show me that smile.
Don't waste another minute on your crying.
We're nowhere near the end.
We're nowhere near.
The best is ready to begin.
As long as we got each other
We got the world
Sitting right in our hands.
Baby rain or shine;
All the time.
We got each other
Sharing the laughter and love.
Alan Thicke's Journal
My Slashdot ads say "
Mabye that's because Perl's OO support is an extremely kludged-together ugly beast that's undergoing a much-needed facelift in Perl6.
The author actually does the world a favor by not mentioning Perl and OO in the same sentence.
[[ so, when will the first International Obfuscated Perl Code Contest will come? Perl poetry is getting kinda old. ]]
<tongue-in-cheek>Wouldn't that be rather like having a International Wet Water Contest?</tongue-in-cheek>
Stuart.
Perl was great for bioinformatics work in the beginning, just like perl was cool for cgi-bin stuff in the beginning of the whole "web app" thing. Now, like all serious apps that need to be OO, extensible, client-server, pick your buzzword the coding is done in java. More and more perl bioinformatics apps are considered "legacy" and most of the perl I have done is to support older apps written in it as well as the crappy little scripting/parsing tasks that perl has always been good for. So.......bottom line, if you think that this is "Perl doing something cool" you were right about 5 years ago but not today. /. fist fuckers, you don't have a prayer of working in bioinformatics. Sweaty bloated sacks need not apply. Now go eat something.
BTW, this book sucks. O'Reilly should stay out of bioinformatics no matter how much of a cash-in they think it will be.
Oh, and to all you sweaty bloated
- Variable declarations
- Memory allocation
- Type conversion
Unless you're using Python in which case you have to do type conversion sometimes...Really, why scripting languages? It seems like some of these scientists are getting really good at it, using OO and everything. Why not switch over to a native language like C++ (which isn't actually that hideous if you avoid all the stupid features) and do the calculations 50 times faster?
Anyone have input?
In the San Francisco area, the Biotech companies are on a hiring swing. It's a notoriously hard area for even the strongest programmers to get a job in, unless they've worked in biotech before.
Any indications if this book (or any of the others noted here) would be enough to get someone in the door?
There's gotta be some legit way to link the two. I aim to be more than just a consumer of both ;) It's time to give a little something back to both communities I feel, it's only polite...
DataSquid.net, a little about me.
Leverage your Perl programming skills to design deadly biological weapons!
The reason engineers, and physicists, use Fortran is that, until recently, it was the best number crunching language around. C and C++ didn't get math libraries that could compete with Fortran until a couple of years ago, and no one with any sense is going to use an interpereted language for serious number crunching.
Best Slashdot Co
If you work on or with proteins (structural biology, biophysics, etc.) you will find this book to be largely a waste of time. An earlier slashdotter said it: there is more to biology than genomics. O'Reilly should stick to unix, leave the science for the peer-reviewed journals. Amen.
P.S. If you want an intro to some field in biology, read up on TIBS (Trends in Biological Science for the uninitiated.)
http://tinyurl.com/4ny52
You can also find a large number of open source bioinformatics projects hosted at
Bioinformatics.org
with links to BioPerl, BioPython, BioXML, BioJava, BioCORBA, and BioRuby projects on the
lower right hand side of their page.
I would like to answer several questions that were raised in this discussion.
(1) How does a CS person learn biology? I recommend "Recombinant DNA, A short Course", as an accessible (Scientific American style) introduction to the cloning breakthroughs and discoveries that lead to genome science.
(2) How does a CS person learn "Bioinformatcs"? I strongly recommend "Bioinformatics - Sequence and Genome Analysis" by David Mount as an accessible and extremely comprehensive survey of current approaches in Biological Sequence Analysis.
(3) Why do Biologists use Perl? Much of the information Biologists want is on the WWW, and Perl's LWP makes it extremely easy to get it. We don't use Perl for sophisticated text analysis (similarity searching, motif searching, etc) because the algorithms that are appropriate are typically not exact (or even regular expression) matches. But it's difficult to beat Perl for getting stuff off the WWW.
(4) Why do Biologists use Flat files? Several reasons - (a) the most useful information is sequence information, and it can be read much more quickly out of a flatfile (esp. one that is memory mapped) than a DB; (b) flat files solve some versioning problems that DB's make very complex and slow. (c) Most data providers only provide flatfiles. This will change, however, over the next 2 - 3 years, mySQL and postgresQL are moving into biology labs.
It is very exciting that Bioinformatics has high visibility now, and many people with CS background are considering bioinformatics problems. Unfortunately, many of the introductory books on bioinformatics (particularly the O'Reilly books) do not adequately present the substantial foundations of bioinformatics that have been build over the past 15 - 20 years, and some newcomers are mislead into believing there are simple problems looking for a few good programmers. Most of the simple problems have been solved; many of the complicated problems are challenging not because we do not know enough CS, but because we do not know enough biology.
Do you have a link for this TIBS which you speak of? Apparently, I'm far too lazy to use google! ;)
Last night I shot an elephant in my pajamas. How he got in my pajamas I'll never know.
Since I'm a Lisp fiend: while we're on the subject of programming for bioinformatics, I'd like to point out that Allegro Common Lisp has been used by a few folks in the field. Here are two links:
Pangea Systems Inc. (now DoubleTwist) for EcoCyc.
MDL Information Systems to design new drugs.
A selection of possibly relevant books (_Introduction to Genetic Analysis_, Molecular Cell Biology_, etc) can be found at: www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books NSK
So far, this is the most useful comment today.
Democrats and Republicans only disagree about how to enslave you
As a professional in the feild, I have seen over and over again how using Perl in bioinformatics has crippled efforts towards a real bioinformatics infrastructure. It leads to data islands, lack of interoperability, lack of maintainability, poor code reuse, and slow development. Lack of multithreading makes it difficult to spread jobs out over multiple processsors. I think it is popular because it is easy for non-programmers to start spewing out simple text transformations. However that only gets you so far, and creating real enterprise back end needs a real language. Sometimes they try to patch things together by using the more OO-like features of Perl, but it is a loosing battle. Save yourself the grief and use Java or C++.
A little knowledge can be a dangerous thing.
http://www.elsevier.nl/locate/tibs or find it at your library.
http://tinyurl.com/4ny52
I'm a devolper working on www.neuroinformatica.com. (online microscope, with analysis and discussion of biological material)
Our customers are looking to teach, research and diagnose all sorts of stuff. We will link with some genomics information, but at the moment there is plenty of anatomy and structure to provide a context for the rest of the information.
In my mind, the goal is to simulate, and therefore understand the processes at an electrochemical level, and by putting everything into the context of a model based on real (digitized) tissue create a serious base of knowledge.
I use java more than perl. I want to be able to maintain the code over the years! I know just enough perl to know that two programmers will seldom agree on a strategy for implementing something. I want my java neuroinformatics project to be timeless.
This is a facinating time to be alive!
Celebrate Excellence!
It seems that perl is still being used purely because many bioinformatics departments are full of people who know how to program in perl. And this is because bioinformatics *used* to be pretty much only about string manipulation.
This is just not true any more - proteomics require in silico trypsin digest and algorithms for protein identification for MALDI mass spec (prediction of protein sequence via analysis of digested protein fragments); microarray experiments require cluster analysis of expression data in order to identify functinoal relationships. Added to this there are lots of issues relating to integrating the many many databases there are out there.
The systems are becoming bigger and have to deal with lots of other systems around the world. Is Perl the best language for all this? I don't know but languages shouldn't be pushed into unsuitable roles purely for historical reasons and lots of bioinformaticians are trying to do this by trying to cling onto perl.
martin
It's funny that the bioperl project is powered by Python!
Surely you jest. In my humble experience Perl .9 to .95 the performance of C code
code is about
when it come to predominately file i/o and string
manipulation tasks. Now when it come to programmer
efficiency there is no contest. A single reg exp
will replace dozens of lines of C, dynamic arrays and
and hashes are cumbersome in C and Memory worries are , etc.
Hi,
For a free microarray database and software package utilizing Perl and Linux, you might look into the following links.
Stanford Microarray Database [SMD] Package
SMD on Linux
Cheers, jcmatese
I agree, but perl is not the only language used in bioinformatics and some people do know when to use the correct language for the job in hand. There is use of SQL and C / C++ in many bioinformatics projects not forgetting good old Fortran which is used to a large extent in the Structural Biology field.
If you need to do heavy duty microarray analysis Excel and so on will definitely not do the job (as you hinted at). If you can't afford something like S-Plus to your analysis you should check out R and then get over to Terry Speed's website. Lots of good R routines there for analyzing microarray data. As for the topic at hand, i find perl very good for what we do in microarray analysis. I use it all the time for combining results, adding or updating columns in largish (around 80 Mb) data sets and so on. It's great that there's a book out there now for the biologists and i'm definitely going to buy it for at least a few people around here who constantly bug me with simple PERL questions!
There is another area of bioinformatics which uses physics based simulations of biological systems. These types of tasks have little to do with ascii file processing, and are more sheer number crunching, and involve classic simulation modelling techniques.
Some examples of these types of bioinformatics problems are:
-simulation of protein folding
-simulation of chemical reaction circuits/control mechanisms in a cell or organ system
-cellular automata simulation of a group of cells in a tissue
Because of the number crunching requirements involved, these types of tasks are usually coded in languages which are good at math and have fast compilers, such as fortran and C.
I'm just trying to mention what else is out there, so that people don't get the idea that pattern parsing is the only thing bioinformaticists do
Hi, :-)
Why dont use Euphoria?
Since the basic structure of Euphoria is the sequence, then it would be the best option for
DNA sequencing
And it beats Perl in speed!
check:
www.rapideuphoria.com