ewanb · Slashdot Mirror

Perl in bioinformatics on Why Corporates Hate Perl · 2008-08-20 00:04 · Score: 2, Interesting

As someone who has both written and read _alot_ of perl, in particular in Bioperl and Ensembl, in bioinformatics I have a rather love/hate relationship with Perl.

I love: the low learning curve for people coming from biology, with alot of forgiving behaviour (in particular I think the auto-creation of datastructures as you use notation to fill in complex anonymous - think pointer based - structures). This is probably the critical one which means we can hire a much broader group of people with a much better understanding of biology and for them to be productive far earlier

I love: the large and robust libraries accessing nearly every sort of database, web-app and other things you need

I love: the consistency of behaviour between systems (don't get me started on Java or porting C++ code between compilers/library systems. Ugh! unbelievable pain as one starts using those languages and move between high end systems. Its C for the fast stuff and Perl for anything else for portability in my book).

I love/hate: The (huge) amount of robust existing Perl code that we have in Ensembl and that works day in, day out on multiple outings

I hate: The lack of clean objects. Why, oh why, oh why?

I hate: The inability to switch on strong typing and bigger checking optionally in libraries - I know you can do more these days, but it is still clunky.

I hate: switching the word "continue" (in C) to "next" (it gets me every time)

I hate: having to always brace if statements

I hate: operators designed for one-liners that gets in the way of good readable code - grep and map in complex lines are pet hate of mine.

I hate: the tortorous cross-language capabilities - compare python's jython and other C-level compilers. Soooo much better.

Interestingly I coded in python for about 6 months in the late 90s - very early on python - and lots python appeals to me. But then Perl came along, and lots of bioinformaticians were using it, and systems people were installing it by default on systems...

Roll on Parrot. I want Parrot to be able to run
Perl5 syntax code, Perl6 and Python/Java syntax
all together, with easy ways to load in C level or compiled down libraries. That's what Perl needs to save it.

Re:1 in 2000 people on The 1000 Genomes Project · 2008-01-22 20:38 · Score: 1

Also, don't forget that each person has two haplotypes, one from each parent, so
when one sequences a person, one captures the variation on two human genomes at once.

Of course, this all relies on the coverage you sequence at, and one option for
the 1,000 genomes project is doing this at low (2x?) coverage, using pretty sophisticated
methods to combine statistical power between sample datasets.

The "1,000" though is more a round number that is in the right range. it might well be
1346 people or something like that (often some multiple of 96, as 96, or 4*96, 384
is the standard size of a molecular biology "tray" put into a robotic system).

We're going to have alot of fun at http://www.ensembl.org/ with this...

Re:As a UK local government councillor ... on UK Gov't Considers Expanding Open Source Use · 2003-10-11 22:37 · Score: 1

This is probably why you need a consultancy
firm (dare I say it... IBM or someone) to show
you what is going on. If I had time... I'd be
happy to show you what is going on.

Raw OpenSource generally only appeals to people
who are confident about what they want and understand the IT problem correctly. Then you can
get this stuff for free, off the net and set up
things for just the cost of the time of the guys
who installs it. And generally it is far stabler
than any "commercial" solutions.

But, in the absence of someone like that in your
department, ring up IBM or RedHat (or hopefully
they will see your post here, and some salesman
will give you a call). You'll have to spend money
at some point, but your total cost will
be waaaay lower than a heavily marketted, (presumably M$oft) "solution"

Don't dismiss open source straight out because
the raw software doesn't come with a fancy brochure.... that's a sign of strength...

(if you would like some more pointers, I can
help you out. But... looking at your web page,
you seem to have a high comfort level with MS
stuff, so I think it would be slightly pointless
unless you really want to learn stuff.

At some point you will be using open source
directly - you already do indirectly via web
sites and email - so, you might as well build
you skill set up sooner rather than later)

Made me smile on Phone Plus Sensory Deprivation Equals... · 2003-09-12 01:58 · Score: 3, Funny

The idea that people would actively get into
a swimming pool and put on a helmet to answer
a work phone call. The mental image... is
quite worrying in some cases.

Though I find the best thing about working from
home is that people dont have my phone number
here, so ... noone calls me. And I go to no
meetings. Magical.

ESTs are hard work on Researchers Revamp Human Gene Count Estimates · 2001-07-12 22:57 · Score: 1

There is a comment somewhere down here which is
really that noone knows how to convert a whole bunch of ESTs hitting the genome into genes. The EST data is *very* messy. We've looked at this recently inside Ensembl and don't see a big win from confidently placed ESTs. Our opinion is that the Ohio State thang is just somewhat enthusiastic
researchers getting good PR for their work.

Check out http://www.ensembl.org/ for the more sober-headed view of this.

Slashdot PR again! on Fastest Commercial Supercomputer To Be Built · 2000-12-18 01:08 · Score: 1

This annoys me. Slashdot are really happy to pander to the PR that these sorts of companies have but consistently turn down interesting stories about how we are trying make the human genome open and accessible for all, in projects like Ensembl. What are these guys really going do with this? Probably nothing. They don't look like they know what they are doing. And yet they get posted to slashdot.

I wish Slashdot was more interested in the real science of the genome and less PR orientated. Slashdot aint what it used to be...

Open source for genome data on Medicine And Open Source? · 2000-10-24 01:05 · Score: 2

I like the article. The more of these sorts of
articles that are around the easier it is for
people like me to make an impact.

BTW - on topic here somewhat - if you want to see
an open source genome management system, take
a trip over to

http://www.ensembl.org/

for your open source project ...

Wow - CmdrTaco pissed off on Tech Stocks Tumble · 2000-04-15 21:34 · Score: 5

That is an impressive show of being pissed off
by CmdrTaco. I guess it got to him.

Stock Market is a non-story to me as well.
I don't think it should be commented on by slashdot either!

Re:open source genome analysis & annotation tools on Celera Completes Human Genome. Sorta. · 2000-04-06 16:31 · Score: 1

Yo Chris - thanks for the tag. I always feel that
the signal to noise discussions on slashdot
are pretty skewed. Who knows how this all going to
pan out.

I have to admit I think we have done pretty
well with the latest bioperl. Kudos for you
as well chris...

Some web sites...open source as well... on Learning About Genetic Engineering On The Net · 2000-03-12 01:14 · Score: 5

It always amuses me how clueless slashdot generally as group is about these things....
Despite best efforts otherwise. It comes up as
an "Ask Slashdot" related question regularly;
slashdot posts pseudo-science stories or op-ed
about cloning etc, and yet... slashdot hasn't
attempted to *contact the actual scientists*
involved to get their opinion.

Yes - I have suggested this as an interview topic
a number of times. Slashdot editorials are more
interested in "wow-science" stories than real
science. It annoys me. (but I still read slashdot).

Here are some pointers:

The largest public sequencing center in the world

http://www.sanger.ac.uk/

The US biological information portal

http://www.ncbi.nln.nih.gov/

The European biological information portal

http://www.ebi.ac.uk/

Some open source projects in this area:

(The bio* group.)

http://bio.perl.org/

http://www.biojava.org/

http://www.biopython.org/

http://www.bioxml.org/

Open source genome annotation project

http://www.ensembl.org/

The answer is .... kinkos on Net Access on an American Road Trip? · 2000-02-10 21:01 · Score: 1

I have been a long UK -> US road traveller,
and bizarrely the best thing to do is track
down a kinko's - kinko's offer reasonable
(still pretty steep) cybercafe type access
but they are everywhere (even in knoxville
tenesse for example)

I never tried to get dhcp into an ethernet
port. I don't think they offered it then (this
summer). But you never know - if enough of us
ask ;)

ewanb

Work with geek girls - and it all works out. on Want More Geek Chicks? · 2000-02-06 07:29 · Score: 1

I have worked (closely) with two female
programmers. One was an ex-physicist with strong
java/perl/c skills and the other was a c programmer who used to code asynchronous signalling stuff.

Both were/are great. And we get on well. And the
work is good.

It confuses me why there are not so many girls
in the industry but I guess the best way to
solve it (like most things) is just to live to
your ideals. So - I don't worry about the
sex/age/culture/race of the people I work with
and that seems good enough for me...

I think talking about it helps air some issues
but doesn't really change much.

Re:Molecular Biology and BioChem for hackers on Distributed Computing and the Human Genome Project · 1999-11-28 19:36 · Score: 1

join in with ensembl and help us out. You
would learn *alot* of biology v.quickly ;)

Re:This was my idea. on Distributed Computing and the Human Genome Project · 1999-11-28 19:26 · Score: 1

Thanks troc - just got around to reading this
commnet.

I have sort have appealed at the top to people
to come along. People seem more interested
in writing about patents than getting down to
nuts and bolts of course....;)

If there is anyone out there who would like to
do this coding, as sure as hell I don't know how
to it ;). But I know what to run...

d.net coders wanted for DNA analysis on Distributed Computing and the Human Genome Project · 1999-11-28 19:17 · Score: 3

It is clear from these postings that people would
like the client to run. If there are people with
experience in writing these sorts of d.net systems
then please drop me a note. We have the problem
for you to work on - it is just a question of
figuring out how to do it.

Drop me a mail (birney@sanger.ac.uk).

Re:I think it's technically unfeasible on Distributed Computing and the Human Genome Project · 1999-11-28 19:15 · Score: 1

There are aspects of the work which have
a good data/cycles ratio. (surprisingly).

I would read about the subject before you pronounce... ;)

Re:warm and fuzzy on Distributed Computing and the Human Genome Project · 1999-11-28 19:01 · Score: 1

Absolutely - see my reply to the post above yours.

Re:warm and fuzzy on Distributed Computing and the Human Genome Project · 1999-11-28 19:01 · Score: 2

Hardware at the moment generally are clusters of alpha boxes or intel boxes (running tru64 or linux respectively).

The two big drainers on CPU for analysis are gene prediction (genscan) and database searching (blast). database searching can't be distributed easily as you have to worry about the database ;)

However, there are programs like sim4, genewise and est2genome that could greatly help us and could be distributed.

Genewise - you can download (I wrote it) at Wise2 est2genome is somewhere around as well.

For the more general overview of the problem - check out ensembl for an idea of the project.

Re:Difficult to distribute on Distributed Computing and the Human Genome Project · 1999-11-28 18:50 · Score: 2

I assumme that the original poster did not understand what was going on ;). Like alot of slashdot in this case - concerned but not knowledgeable.

Celera always talk about the assembly problem as they have gene myers solving it (he has) and think it is pretty cool. It is not trivial, but from my view (an annotation centric view) not the most important thing.

Re:Difficult to distribute on Distributed Computing and the Human Genome Project · 1999-11-28 18:15 · Score: 3

Lars

This is only for the assembly and not for the analysis. With analysis you have a better data/cycles ratio. Assembly is done at the genome centres anyway...

Re:warm and fuzzy on Distributed Computing and the Human Genome Project · 1999-11-28 18:13 · Score: 4

Consell -

Great that you were following the talk. I thought I put everyone to sleep

The rate limiting step at the moment is effectively the mapping in fact, then sequencing. The interesting thing about the analysis is that the amount of CPU is unbounded. If we have more CPU we just use more accurate algorithms. We can do something within the CPU bounds on the hinxton campus, but if anyone wants to give me a super computer, then we could get more accurate analysis.

I can always use more juice!

Re: cycles/data on Distributed Computing and the Human Genome Project · 1999-11-28 18:07 · Score: 1

Bioinformatics generally has a very good cycles to data ratio - ie - we have algorithms that take alot of cycles for very little data. So it is feasible...

Does anyone want to write it? If so - I have alot of CPU hungry algorithms to run.

Open Source Genome Projects on Distributed Computing and the Human Genome Project · 1999-11-28 17:34 · Score: 5

There are some good open source genome projects for doing this efficiently - and we do welcome help of any kind. Here are some open source projects which I know about/work on/

ensembl is an open source genome project designed to get as much data and software into the public domain as possible
EMBOSS
bioperl

All these are well backed, strong open source projects with different strengths. Everytime genome stuff comes up on slashdot I try to point these things out to people, but everything gets lost in the noise about people $%!"'ing on about patents (generally without alot of knowledge!).

Anyway - check out these projects for more information about real open source efforts in biology.

Not so impressed on SourceForge Goes Public Beta · 1999-11-17 02:11 · Score: 1

I could not submit a bug in the source forge bug report area (doh. Can't even submit a bug that the bug submission does not work!)

Had a dodgy certificate that explorer didn't like...

And the projects that are there seem to be focused mirroring other projects

Finally - could you/would you trust someone else to keep a server up 24-7 for your source code? My experience of projects is that they need more than cvs/mailinglist. They need coordinated web site and people close by to make it all work

So. I am not moving from my work machine yet. But I guess this is the way things are going to go

ewan

Re:Perk/TK front end to readseq on New Genetic Information Web Portal · 1999-11-07 04:12 · Score: 1

Check out bioperl. In particular the new 0.6 series (just available via anonymous cvs). Bioperl is more up to date than readseq, and it is in your favourite language.

Bioperl at bio.perl.org

Slashdot Mirror

User: ewanb

Comments · 32