Distributed Computing and the Human Genome Project
troc asks: "I was watching a TV programme on UK TV last night about the Human Genome Project and how there was a race to sequence and publish the whole thing before the private companies do it and patent the sequences. Basically lasers are used to break up the strands, these are then read and fed into a computer that tries to match the bits up with other bits like a giant jigsaw puzzle. This requires a lot of computing time.
Is this an opportunity for the open source movement to help decode the sequences and publish the whole thing becore it's patented?
<soapbox>
I, for one, don't like the idea of a private company owning my gene sequences. They will be able to limit the use of these so only really rich pharmaceutical companies will be able to develop drugs etc and then sell them at huge profits, which isn't realy for the benefit of mankind blah blah blah.
</soapbox>"
I agree. I don't see how information like this can be patented. There is nothing truly proprietary about it, and it would do more good in the public where the benefit can truly be felt.
However, with the computer age, the speed of (dare I say) innovation has been astounding. This has produced two detrimental effects. First, the patent examiners simply don't have the niche expertise to scrutinize patents. I'm sure most of us have seen some of the idiotic patents out there. Second, the time span of a patent has become too cumbersome. By the time the patent expires, the invention is often useless.
I sincerely hope that this particular project will be placed under a HUGE spotlight when the patent requests inevitably filter in. I have a feeling it won't hold up, and at the very least, not in some countries.
However, keep in mind that this is scientific information about a human being, not software / computer advances. In that regard, a patent will be cumbersome, but not quashing. The patent (if granted) WILL expire someday. And I'm fairly certain that the information will still be very important and valuable when that day arrives.
Of course I'm all for beating the would-be patenters to the punch, if possible.
Best regards,
SEAL
The only problem I see here that developing a distributed client for this takes a lot of time and effort --- and one, which definitely cannot be open-source!
Two reasons:
So, sorry, folks, but I believe this is one of the few things that open source clearly is not suited for. But it would be kinda cool to have a proggy running on my machine that messed with genes ... ;-)
EagerEyes.org: Visualization and Visual Communication
- ensembl is an open source genome project designed to get as much data and software into the public domain as possible
- EMBOSS
- bioperl
All these are well backed, strong open source projects with different strengths. Everytime genome stuff comes up on slashdot I try to point these things out to people, but everything gets lost in the noise about people $%!"'ing on about patents (generally without alot of knowledge!).Anyway - check out these projects for more information about real open source efforts in biology.
First issue: could distributed computing help? My answer is a brief "no". First, the bottleneck is on the experimental side - getting the sequences, and not putting them all together. Second, although you need quite a lot of computing power to do so, much of the job must be revised and checked by humans, i.e. there is a lot of skilled manual work to do - you have to have "an eye" for the sequences. But the first point is more important.
Now, TIGR, the commercial alternative to the Humane Genome Project has sequenced more organisms then any other scientific group in the world. Craigg J. Venter seems to be very efficient and hard working guy. Even if you don't like the idea of making money with patents in this area the scientific community owes him a lot - he was the one to sequence the first organism, to sequence Helicobacter pylori and many, many others. On the other side... you know, when M. pneumoniae sequence was about to be published, it was supposed to be the first Mycoplasma sequence. But Venter was faster with Mycoplasma genitalium - and he kept it quiet, so noone involved in sequencing those organisms actually knew there is a race. Now Venter claimed to be able to complete the human genome with much less effort and much less $$, and considerably faster then the HuGeP. I'm not sure whether he is able to do so or not, because it depends chiefly on the "hardware" side - the new Perkin Elmer automatized sequencers they are supposed to use.
Anyway, the question is, whether it is good or bad if Venter sequences the human genome. In my opinion - it's OK. The Hugep is somewhot different in its purely scientific interest, and I'm convinced that they will produce data of much higher quality. On the other hand, human genome has a considerable variation, so two genomes are better then one. I would not be very concerned about the patent issue, because it will come anyway (because of **!'*%$! american and international patent law) - even if TIGR would not sequence the genome, someone takes the output of the HUGEP project and will patent the same sequences Venter would. Venter just wants to gain a little time for evaluating the sequence before releasing it to the public.
And of course, not the _sequences_ are patented - what is patented, is the usage of modification of a certain sequence for medical purposes, or a certain enzyme as an aim in medical treatment.
Regards,
January
It's good that hackers are well-informed and principled enough to think it matters. This happens to be my area of interest; I'm responsible for Bioinformatics at the Institute of Cancer Research in the UK. A couple of weeks back I went to an excellent talk by a clever guy call Ewan Birney from the Sanger Centre near Cambridge, UK. He is writing code to catalogue and annotate the assembled sequences in real time as they come off the mammoth robot sequencing "production line". In one of those rare occasions where the British are leading a "big science" project the Centre has been responsible for the largest fraction of the Human Genome sequenced at any single institute. The code does stuff like figure out which bits of the sequence are real genes and which bits are that 90%+ of so-called "junk DNA" you might have heard of and also attempts to assign provisional functions to the genes by various computational means. Eventually people in white coats will have to confirm such assignments properly, but it's important to beat the drug companies to making good guesses.
Ewan's code and all the data are entirely Open Source. If you've got a good reason and a reasonable Pentium with lots of memory and a 30Gb hard disk you could mirror the human genome and get it updated every night. (I feel strange just typing that sentence and I've been following this story for years). The Wellcome Trust and others (including US and European government agencies) funding the project are keeping everything Open because that's the way science is done and because this will subvert commercial attempts to stake a claim on our species' genetic heritage. (Er, go Wellcome!)
Biochemists often talk about the "rate limiting step" in a reaction---the single point which sets the speed of the whole process---like a bottleneck. As far as I understood Ewan's talk (if you're reading this Ewan, please put me right), the rate-limiting step with the Genome Project isn't the assembly of the sequenced stretches of DNA (or "contigs") as the original poster suggests, but the collection of the data in the first place. At the Sanger they have clusters of PCs and Alphas crunching the contigs---distributing the effort would give us all a warm fuzzy feeling, but wouldn't be essential. Again, I may be wrong about this.
One thing that definitely is a priority is making some sense out of all of this information. What would be great would be if members of the global community of hackers started taking molecular biology and biochemistry classes so they could write code to help people like me make sense of the embarrassment of riches that the project is creating. I'm off to Cambridge in two weeks to the Bioinformatics Open Software Development meeting to listen to some project leaders talk and discuss the existing efforts. Personally, I would love to give crash courses in biology to programmers with time on their hands in an effort to harness their collective genius rather than sponsor an effort to write a contig-crunching client to harness their collective spare cycles, but I have no idea how such a thing could be organised. Any ideas?