Slashdot Mirror


Genetic Stone Soup

It's the scientific achievment of our generation; what can you say about the mapping of the human genome? But here's a story behind the story. parvati turned us on to this NYT article about James Kent, who wrote the gene assembly program GigAssembler last June. It turns out that, thanks to his code, the public Human Genome Project had actually finished its work three days before the private effort by Celera Genomics -- a feather in their cap and a boon to public science. The head of Celera was "astonished" to learn of this grad student's genius -- ten thousand lines of C in a month, and why? -- "because of his concern that the genome would be locked up by commercial patents if an assembled sequence was not made publicly available for all scientists to work on." (The debate over public vs. private science continues to rage; see this Seattle P-I article, which discusses among other things the ethics of NDA'ing scientific data produced for profit.)

Update: 02/13 02:26 PM by J : Thanks to tlunde for finding the link to GigAssembler and thus clarifying which language it was written in.

1 of 175 comments (clear)

  1. Things are not as easy by jw3 · · Score: 5
    As many of you probably know, the actual work hasn't started yet. The schedule of a genome project looks like that:

    a) sequencing, that is -- getting the actual sequence. This is almost purely technical work, and definitely not very interesting for a scientist, although you can get a lot of credits for it.

    b) annotating the sequence: finding out where are the genes, what are the similarities between them and between the genes known from another organisms, and what can be suggested about their function based on those similarities. This is pure bioinformatics stuff: first finding the "open reading frames" (ORFs), that is -- anything that can be a gene at all: it has to start with an "ATG" (codon for metionine) and stop with a so-called stop codon. This is only the most basic criterium.

    Whatever comes later is called "postgenomics", and it is probably the most exciting stuff in this whole area of reasearch.

    1) in most of the genome projects which were done until now, as much as half of the proposed genes had not even a rough function assigned to them. (the group I'm working in sequenced a bacterial genome back in 1996, and during that time the situation hasn't changed much). Experimental work and more biocomputing is needed to find out what those genes do. The problem with biocomputing isn't the lack of CPU, but the lack of good strategies / models / theory (or, not lack of "good", but lack of "better" strategies etc.).

    2) knowing what a gene does is, contrary to the common belief, only very little information. You need to know how it is regulated, and this means a lot of tedious and complicated experimental work: two hole areas of postgenomic science deal with that -- transcriptomics (regulation on RNA level) and proteomics (on protein level). You have to understand that each gene is regulated on many levels -- transcription of the gene from DNA to RNA, turnover (that is, the speed of degradation) of the mRNA, speed of translation, amino acid composition of the protein, protein turnover. Moreover, the genes are interconnected into networks rather then pathways. Creating a functioning model of an eukaryotic cell will be probable impossible during the next twenty or so years. That is -- among other things -- my group works with a little bacterium, which has only +- 700 genes. And even though it is a couple of orders of magnitude more simple then the simplest eukaryotic cell, it is very, very, very complicated.

    Take-home lesson: don't be too enthusiastic. This is not the flight to the moon. This is only the first Sputnik.

    Best regards,

    January Weiner