Domain: sanger.ac.uk
Stories and comments across the archive that link to sanger.ac.uk.
Comments · 60
-
More info on chromosome 22The Sanger centre has more info on chromosome 22.
Congratulations to all who participated in its sequencing. We look forward to the first draft of the human genome by spring 2000.
-
hmm
I worked for a bit as a CO-OP student in this area last summer, which is not to say I know anything about this, but..
:]While distributed computing would probably benifti the HGP, there are a couple of points to take into consideration.
1) How secure is distributed computing? SETI and RC5 arent really all that concerned with the the integrity of the data they are getting back. They can just re-check a data block if it is a sure sign of ET or whatever. Here there will need to be a guarantee that data has not been tampered with.
2) It seemed to me that some of the tools used could do with some open source style improvement by the hacking(coding) community before throwing lots of computing power at them.
As for the patent stuff... bah!. Let the lawyers mess around with that, everyone else can concentrate on the advancement of the human race.. or something like that.
-
hmm
I worked for a bit as a CO-OP student in this area last summer, which is not to say I know anything about this, but..
:]While distributed computing would probably benifti the HGP, there are a couple of points to take into consideration.
1) How secure is distributed computing? SETI and RC5 arent really all that concerned with the the integrity of the data they are getting back. They can just re-check a data block if it is a sure sign of ET or whatever. Here there will need to be a guarantee that data has not been tampered with.
2) It seemed to me that some of the tools used could do with some open source style improvement by the hacking(coding) community before throwing lots of computing power at them.
As for the patent stuff... bah!. Let the lawyers mess around with that, everyone else can concentrate on the advancement of the human race.. or something like that.
-
Re:warm and fuzzyHardware at the moment generally are clusters of alpha boxes or intel boxes (running tru64 or linux respectively).
The two big drainers on CPU for analysis are gene prediction (genscan) and database searching (blast). database searching can't be distributed easily as you have to worry about the database
;)However, there are programs like sim4, genewise and est2genome that could greatly help us and could be distributed.
Genewise - you can download (I wrote it) at Wise2 est2genome is somewhere around as well.
For the more general overview of the problem - check out ensembl for an idea of the project.
-
warm and fuzzy
It's good that hackers are well-informed and principled enough to think it matters. This happens to be my area of interest; I'm responsible for Bioinformatics at the Institute of Cancer Research in the UK. A couple of weeks back I went to an excellent talk by a clever guy call Ewan Birney from the Sanger Centre near Cambridge, UK. He is writing code to catalogue and annotate the assembled sequences in real time as they come off the mammoth robot sequencing "production line". In one of those rare occasions where the British are leading a "big science" project the Centre has been responsible for the largest fraction of the Human Genome sequenced at any single institute. The code does stuff like figure out which bits of the sequence are real genes and which bits are that 90%+ of so-called "junk DNA" you might have heard of and also attempts to assign provisional functions to the genes by various computational means. Eventually people in white coats will have to confirm such assignments properly, but it's important to beat the drug companies to making good guesses.
Ewan's code and all the data are entirely Open Source. If you've got a good reason and a reasonable Pentium with lots of memory and a 30Gb hard disk you could mirror the human genome and get it updated every night. (I feel strange just typing that sentence and I've been following this story for years). The Wellcome Trust and others (including US and European government agencies) funding the project are keeping everything Open because that's the way science is done and because this will subvert commercial attempts to stake a claim on our species' genetic heritage. (Er, go Wellcome!)
Biochemists often talk about the "rate limiting step" in a reaction---the single point which sets the speed of the whole process---like a bottleneck. As far as I understood Ewan's talk (if you're reading this Ewan, please put me right), the rate-limiting step with the Genome Project isn't the assembly of the sequenced stretches of DNA (or "contigs") as the original poster suggests, but the collection of the data in the first place. At the Sanger they have clusters of PCs and Alphas crunching the contigs---distributing the effort would give us all a warm fuzzy feeling, but wouldn't be essential. Again, I may be wrong about this.
One thing that definitely is a priority is making some sense out of all of this information. What would be great would be if members of the global community of hackers started taking molecular biology and biochemistry classes so they could write code to help people like me make sense of the embarrassment of riches that the project is creating. I'm off to Cambridge in two weeks to the Bioinformatics Open Software Development meeting to listen to some project leaders talk and discuss the existing efforts. Personally, I would love to give crash courses in biology to programmers with time on their hands in an effort to harness their collective genius rather than sponsor an effort to write a contig-crunching client to harness their collective spare cycles, but I have no idea how such a thing could be organised. Any ideas?
-
Open Source Genome ProjectsThere are some good open source genome projects for doing this efficiently - and we do welcome help of any kind. Here are some open source projects which I know about/work on/
- ensembl is an open source genome project designed to get as much data and software into the public domain as possible
- EMBOSS
- bioperl
Anyway - check out these projects for more information about real open source efforts in biology.
-
Re:Wrong wrong..
Essentially, scientist are "reverse engineering" DNA. If we were smart, we'd have DNA put into the public domain.
Unsurprisingly, that's exactly what the public Human Genome Project is doing. Wellcome Trust, by the way, is putting their money where their mouth is; they fund the Sanger Centre which is doing part of the HGP work (along with several sites in the US).
Disclaimer: I work at one of the US sites, but I'm not a biologist, just a sysadmin. The head of my center compares patenting the human genome to patenting the periodic table; yeah, that would have really boosted chemistry....
-
Reference
Check out the Sanger Center's arguments against patenting. Worth a read.
-
some Human Genome Project links for ya
speaking as a biocomputing geek, the NCBI website is a great starting point...
- National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
- Genome Resources Guide, http://www.ncbi.nlm.nih.gov/genome/guide/
- Human Genome Sequencing Progress
- Links to all the Genome sequencing centers
- The Sanger Center (UK), http://www.sanger.ac.uk/HGP/
- The Whitehead Institute (MIT), http://www.genome.wi.mit.edu/
-
The Link
AceDB is the OO Database used by the Human Genome Project.