I'd love to switch to Thunderbird, from rickety old emacs RMAIL, but one thing keeps stopping me. I get a lot of business email and I need to keep it archived and organized well. My archive is organized by sender and year: about 350 files for different senders each year, averaging maybe 10-100 emails in each file, dating back now over 11 years (about 3000+ files). Keeping this in emacs RMAIL is trivial, because they're all just regular files in my home directory that I can rename or move to new subdirs at will, and I can save emails out of RMAIL just by typing "o" and giving the name of the file. And since Emacs is lightweight enough (!) to run over my DSL connection, I never really need to run an email client anywhere but from my main work machine where my archive is, even when I'm travelling, so I haven't needed IMAP capability.
When I look at Thunderbird and other modern clients, I just don't see a way to keep track of old email as efficiently. I can create "local folders", I guess, but it doesn't appear that Thunderbird is going to treat these as regular files that I can shuffle off into a 2004/ subdirectory at the end of the year. And worse, since Thunderbird is heavyweight enough that I'm not going to run it down a DSL connection, it's going to create them locally, not remotely on my work machine, when I'm reading mail from home or on the laptop while travelling. IMAP seems to be a partial answer but it's going to keep its data on the mail host, not in my home directory, if I understand right.
Surely people have the same problem - how do you solve it?
First of all, for those who aren't in the biotech industry, it should be mentioned that the NIH has an agenda to push...
OK, sure, I'll bite. What's our agenda, in your view? I haven't been to the secret meetings lately, maybe I'm not in the loop - but last I
checked our "agenda" was to facilitate scientific
research, by providing a massively important
basic resource to the entire public for free,
with no restrictions.
The Nature comparison is not on Jim's assembly
on
Genetic Stone Soup
·
· Score: 2
The "less biased" comparison is actually not
done using Jim's assembly. The Sanger Centre's
comparison does use Jim's assembly. It's
unclear to me why the authors of the Nature
paper chose to use an "assembly" that's different
from the assembly that we used for the public
human genome project.
BTW, the Nature article is free even to
nonsubscribers.
Re:The line between "public" and "private"
on
Profit vs. Science
·
· Score: 1
Great point... one that I'm concerned about a lot.
Bioinformaticians have to realize that our air supply is a freely available international
sequence database. When it comes time to
fight for our air supply, like now, it won't
help if we're viewed as a pack of hypocrites.
Published bioinformatics software has got to
be made open source. As you point out, it's
not much different from asking genome sequencers
to deposit in Genbank.
Neither the journals not the community have
established a standard of behavior for us yet,
so it a less clearcut question than DNA sequence
deposition right now. It will take more time
and work to get the journals on board with
respect to software access.
The line between "public" and "private"
on
Profit vs. Science
·
· Score: 2
You're confused, I think, about what it
means to publish a scientific paper.
Nobody's
bothered terribly about whether research is privately or
publicly funded. (Hell, my research is
funded by everything from your taxpayer
money, to Howard Hughes' will, to Sun Microsystems.. and even, gasp, by Celera itself!)
The point is that publishing a scientific paper
entails certain ethical responsibilities, among
which is the free and open disclosure of your
data to other scientists, so they can
effectively build on your work. The community standard
for *both* privately and publicly funded DNA
sequence data is that *when it is published*,
it goes to Genbank, EMBL, or DDBJ.
Companies that feel that disclosure will
negatively impact their business model should
not submit papers on their work, that's all.
They should not seek the rewards of publication
without meeting their responsibilities to
the community of scientists that read their
paper. Otherwise, their paper is an advertisement,
not something that moves the field ahead.
Other genomics companies seem to have no
problem with this -- Incyte and HGSI, for example,
don't try to muddy the waters by submitting
papers on their proprietary genome databases.
Lots of the apologists for Celera say "shouldn't
they be allowed to make money?" Sure they should.
More power to them, my stock will go up, I'll
be happy. But they can't have their cake
and eat it too -- they shouldn't
be able to get away with writing scientific
papers about a proprietary database. It's
not ethical.
"past custom" = "community standard" = "ethics"
on
Profit vs. Science
·
· Score: 3
What you deride as a "past custom" is in
fact the community standard for DNA sequences
in published papers.
To publish a paper
and not deposit your DNA data in Genbank, EMBL, or DDBJ is
literally unethical; it is not consistent
with accepted professional standards of
behavior.
Apologists for this deal argue
that little concerns like "ethics" should
be subservient to bigger concerns like "expediency". Where have I heard that
argument before throughout history?
Yes, this is all going to make
a great example for that required course
we teach in research ethics. We'll be able
to shorten the course a lot now. The lesson,
kids, is that if you're big enough, the
rules don't apply to you. Science is no
different than real life. Anyone surprised?
The agreement does *not* give free access
on
Profit vs. Science
·
· Score: 5
Oi. You folks saying "read the agreement", you
should, uh, read the agreement.
It's not giving free access to academics, not
in the open source meaning of "free" anyway.
If you want less than 1 Mb (that is, less than 0.03%) of the data,
you agree to a clickwrap license on the Celera
web site.
If you want all the data (about 3000 Mb), you and your institution
cosign a formal license with Celera.
What does this license say, you may wonder? Well,
so do we. Turns out, the details are still being
worked out. But the gist is this: you can use
the data for anything you want, so long as it is
for noncommercial purposes. You can publish
your results freely, with no reachthru rights
being asserted by Celera. And you agree not
to redistribute the data.
Oops. Look at that again. Ever see a scientific
paper where you a) published your results and
b) didn't "redistribute" (i.e show!) the primary
data? Can someone define the bounds between
publication and redistribution? I can't. Neither
can Science, as of yesterday.
Science and Celera has not yet defined the bounds between
trivial redistributions that Celera doesn't sue you
for ("Figure 1 shows a BLAST alignment to
my gene in the Celera database"), and real redistributions that they do ("Table 1 in
the Web Supplement gives the positions of
every DNA hexamer in the Celera database. Please
don't use it to reconstruct the original data.")
But I'll bet you that pretty much every large
scale bioinformatics/computational biology
analysis of the Celera data would be counted
as a "redistribution"... potentially blocking
the main use of the genome, which is for large-scale genomic analysis. And if the bounds
aren't defined by the agreement, the bounds will
be defined on a case-by-case basis by negotiation
with Celera lawyers. Yes, I'm looking forward
to that, I'll definitely get a lot of human genome
research done.
It's a horrible precedent. Part of the reason for
the success of bioinformatics has been the
public availability of the international DNA
databases. Science and
Celera now threaten to set a precedent
that could change that.
ob. disclaimer: I'm a coauthor on the
competing Human Genome Project paper, and also
a Celera stockholder. I'm conflicted both ways.
I'm either going to be able to do human
genome research freely, or I'll be rich. And I'd
rather do research.
Turns our, 19th century technologies actually work.
There's very little wrong with the punch card
ballots. The problem is that this election
fell within the statistical uncertainties
inherent in any large scale counting process. I don't
care what technology you use to count, there
will always be a +/-0.01%.
And
I will bet large amounts of money that
if you computerized elections, you would have
far more massive screwups than we've seen
this year.
My evidence? Witness the computerization of
the GRE (Graduate Record Exam). They no longer
give it on paper. You have to take it by computer.
The net result of the computerization of the
GRE -- my university, and many others, silently
no longer enforce our requirement that applicants
give us GRE scores, because the computerized
system is such a disaster, many students aren't
able to even take the test.
And don't get me started on other well-meaning
but totally screwed up attempts to replace
an "obsolete" but effective system with a
"modern" computerized one that doesn't work
any more.
Think voters are disenfranchised now? Wait
'til we turn an election over to a lot of
bug-ridden hardware and software.
Bah. I say keep using 19th century technology
until you actually need to replace it. Computers
are good at many things - and interacting
100% reliably with the general public isn't one
of them.
The whole purpose of insurance is to spread
a risk across a wide population, so that no single
person bears the full brunt of a rare mishap.
As the industry allows more and more detailed
genetic and statistical analysis, removing the element of chance, and identifying exactly who
the people are who will get hit by an event...
what the heck will the point of having insurance
be?
If I test negative for Huntington's, I would
be an idiot to get insured for it. If I test
positive, the insurance company would be an idiot
to insure me. Therefore, back to square one:
no spreading of the risk, no benefit to
me, or to the society, from
the insurance industry. Now obviously,
Huntington's isn't the only risk that might
affect me, but nonetheless, the direction the industry
is headed doesn't make long-term societal
sense.
I've been in almost the same situation. A company I worked for wanted to apply for a broad patent on a set of technologies that were already well known in the field, and developed by academic basic researchers. We had a novel combination of their ideas but I was very uncomfortable filing a broad patent claim.
I politely discussed my reservations with the
company's patent attorneys and with my CEO. They
ended up agreeing with my position and
we didn't file the patent application.
It's easy to assume that patent lawyers are
ravening inhuman beasts, I guess, but it turns out that they really are human and often quite reasonable. I wouldn't take a hyperlegalistic
or adversrial tack with your old company until I'd actually had
a few polite conversations with the relevant
people to explain your views.
"Venter says it's not a gene monopoly he's after, but information. In fact, he plans to publish all the company's findings on the genome. By immediately publishing their work, Venter and colleagues intend to make the base knowledge of the human genome unpatentable."
Uh-huh. Freedom is slavery. Ignorance is strength.
For those who've been asleep, we in the public Human Genome Project have been fighting tooth and claw to get the genome into the public domain, while Celera and Venter have done their damndest to make a proprietary product out of it.
The only reason that Celera will make anything available is that we have succeeded, and undercut their position so severely that they will try to rescue in public relations what they've lost in business model.
Venter's quote, therefore, is the purest spin control -- and to those of us on the inside, tantamount to an admission that Celera's attempt to lock up the genome has failed. It's also a lie. Celera has never "immediately published" anything. They do not adhere to the Bermuda Principles for public DNA sequence release. They have not released their human sequence to the public, though they have certainly issued enough press statements implying that they have.
The Human Genome Project looked hard at GPL'ing the human genome sequence -- so yes, first of all, scientists (and our legal advisors) in biotech know about the GPL.
The power of the GPL rests in copyright law. Unfortunately, copyright law doesn't protect many of the things that we'd like to "open genome". US copyright law provides only very weak protection on databases; no copyright law protects inventions/discoveries/natural products. Hence, you're forced to look at patent or contract law. It's hard to develop a system with the awesome viral power of the GPL that exploits patent or contract law.
In the end, the genome project settled for keeping the genome fully public domain, and Clinton and Blair made a high moral ground statement about how that was a Good Thing (tm).
... and the next day, the biotech sector cratered, costing investors billions of dollars. There is a shocking amount at stake when you start discussing "open genome" models for genetic data. It's not a lighthearted discussion. (Speaking from experience, having been attacked in a meeting for my "open genome" views by someone who probably lost on the order of many millions of US$ that day.)
Yup. That's Henry Huang, now here at Washington University.
And every time you read the NY Times caricaturing the Human Genome Project as slick big business versus slow and plodding federal lackies, remember two things: companies like Celera pay for PR firms, and universities like Washington University in St. Louis have people like Henry who only give a damn about doing the right thing for science.
Sorry, that's a pile of crap -- one of the Big Lies that Celera is telling the public -- and I'll tell you why.
If I told you Gateway was a more efficient PC manufacturer than IBM because I took their total budgets and divided by the number of PCs they sell, you'd tell me I was being simplistic and stupid; IBM has a number of other corporate focuses.
The public genome project involves much more than raw human sequencing. It also funds technology development, physical mapping, genetic mapping, and model organism genome sequencing, amongst other things.
Likewise, Celera is being disingenuous in comparing budgets and timelines. In actuality, we are all using the same basic strategy and the same equipment, so the rate and cost determining factors are identical.
Celera intends to reduce their costs in two simple ways.
First, half their data will be taken from the public domain. Automated scripts from our friends at Celera download data nightly from our anonymous FTP server, a source of great continuing amusement to us, considering the corporate press releases that boldly say that "Celera has never relied on any public resources".
Second, they will not attempt to finish the genome to the high quality that we are aiming for, and it is that high-quality finishing stage that consumes expensive labor.
The combination of these might reduce their costs to about 0.10/base, so they could get away at $300M for the genome, compared to our $1000M. There is no way they can get under 1/10 of our costs; they've already spent ~$200 million or so just in one year of salaries and capital costs, and I have no idea what their supply costs are (but our *major* expense here is supply costs). Their slightly greater speed comes at a substantially greater incremental cost. Don't bullshit people about what a small efficient company they are: they are a big-ass biotech company with about a $6 billion dollar market cap.
And if their business model excites you, and convinces you that Celera is so cool, hey, here's some insider info: this November I plan to start Sean Genomics, Inc., and I will sequence the human genome in 1 day for $0, by downloading the data by FTP from WashU and Sanger, and I'll start issuing my own press releases about how I'm a zillion times more efficient that the Human Genome Project. Watch for my IPO!
If we use Celera's definition of "complete" then the public project is already done too.
Any reasonable person would define "complete" as this: there's three billion bases of human DNA in 24 different linear chromosomes. The sequence is complete when you can give me a DVD with 24 files on it, each of which contains a contiguous sequence of a human chromosome.
That may never happen for any large animal or plant genome. Too many regions of a genome sequence are an ungodly mess, repetitive and difficult to sequence.
The public worm (C. elegans) project, at 98 million bases, defined "essentially complete" as "we've come as close as we can to complete using existing technology". We have 97 million bases sequenced and about ~50-100 remaining gaps.
The fly (Drosophila melanogaster) project, at 180 million bases in size, was recently declared "substantially complete" by Celera. They have 120 million bases of sequence, with several thousand gaps. The fly has more extensive regions of repetitive sequence than the worm.
The human, at 3 billion bases in size, is nowhere near complete, either by the public (us) or by Celera, no matter what Celera press releases say.
You need the following steps to get close:
1. shotgun coverage. Technology limits us to reading ~500 bases of sequence at a time, so we have to blow the genome to bits, sequence millions of fragments, then assemble it all back (computationally) into a contiguous sequence. Because a successful assembly relies on deeply redundant overlap amongst the fragments, we need ~8-10x shotgun coverage (24 to 30 billion bases) to try to assemble the human genome. The fly genome was shotgunned to 12x coverage to achieve the results Celera reports.
2. Assembly. Once you've got shotgun data, you can try to assemble the genome from those fragments.
3. Finishing. The automated assembly (like the fly genome now) will have a great number of gaps. These must now be closed, more manually, by expert molecular biologists; the gaps represent regions that are biologically difficult to sequence.
The actual science behind the Celera press release is that they have partially completed phase 1. They currently have 4-5x shotgun coverage of the human genome, about half of what they need for a proper assembly. They intend to get the other 4-5x coverage from the public "rough draft", which is at about the same stage Celera's project is in.
The two projects (Celera and public) are neck and neck in this "race". The difference is that we acknowledge that our sequence is a rough draft at this stage; whereas Celera claims that their sequence is complete. Celera has every right to spin their project to their investors any way they feel is appropriate, but scientifically, they are being rather disingenuous if not dishonest.
conflicting oblig. disclaimers: I'm a co-PI on the public project, and I (accidentally, through an acquisition) also hold substantial stock in CRA.
Indeed, that would be fine, if Celera would be clear about that.
Instead, they have tried to cast themselves as a competitor to the public genome project. Our goal is to provide the human genome sequence as a freely available research resource. Venter initially declared that that was also Celera's goal. Now, though, they're filing thousands of provisional patents and asking for exclusive distribution rights to the human genome. That's not compatible with the public goals. Celera should just admit that they're a business, and stop trying to claim otherwise. The proposed collaboration fell apart because Celera would not accept the terms that *they* initially stated: that the human genome sequence would be made freely available to everyone. (Indeed, Venter testified before the US Senate on this very point.)
Problem with biowarfare is not killing yourself.
on
Living Terrors
·
· Score: 2
Sure, growing known biowarfare
agents is "technically easy". However,
growing them without killing yourself
in the process is technically quite challenging,
and requires expensive facilities.
Having growing high-titer stocks of poliovirus
(vaccine strain) myself, under
moderate containment, I can tell you there's no way in hell
that I'd work with a potent biowarfare agent
that required high containment. Suicidal,
unless you're very highly trained. I'd sooner
fill my basement with C-4 than have a single vial of
a biowarfare agent.
Some/.'ers have commented, correctly, that patents are essential to moving research from the basic stage to the useful, applied stage. Blanket condemnation of biotechnology patents is basically naive.
However, another tenet of the patent system is the "fair use" exclusion, in which basic researchers are allowed to use patented technologies without having to pay licensing fees.
The Roche PCR patent case involved the first US case in which a company sought to claim that basic researchers were infringing on the patent, and that the "fair use" exception did not apply. Details are available from the Promega web site.
This, more than anything, was the infuriating thing about the Roche patent. Basic researchers were literally threatened by this company, for not paying exorbitant fees on an enzyme that many of us can make for pennies. Roche even produced a ridiculous hit list of basic researchers who were "infringing", based on the Materials and Methods sections of papers written by those researchers.
If the private sector sequenced the human genome to a high standard of completeness and released their data into the public domain , then by all means, the private sector should do it.
A fundamental assumption of the human genome project is that the genome sequence is "precompetitive" information that is best put in the public domain, to spur both additional basic research and commercial innovation. This makes it an obvious target for public, not private investment.
So far, no company has stepped up and said they intend to make the human genome sequence freely available. Celera is even waffling over their promise to release the Drosophila genome sequence. And somewhat understandably so; a company needs a business model.
AFAIK, most of the genome sequence being produced by the HGP is from a single male individual. (Male, because we need to see a Y chromosome too.) I dunno for sure about Sanger, but WashU and Whitehead in the US are working from the same clone library.
The identity of this person is a closely guarded secret, as well it should be: this person's genome sequence will be available on the Internet. We'd like to avoid a nightmare scenario where a well-meaning "genome hacker" discovers a fatal disease gene in the sequence, and calls this guy up out of the blue to tell him.
That's just an extreme example. Basically, there's serious privacy and confidentiality issues. We consider the genome sequence to be a "reference sequence" or a "typical example", and we don't need (or want) to know who it came from.
The human genome project is funded in the US by the National Institutes of Health and the Department of Energy, and in Britain, by the Wellcome Trust, a charitable organization.
Every base that we sequence is put in the public domain.
We strongly oppose the patenting of sequences. Some of our strategies are designed to preempt attempts by companies to patent sequences from the human genome.
You're exaggerating the difference between strategies by a lot. Both the HGP and Celera are using shotgun strategies. Celera's strategy is a "whole genome shotgun", whereas we shotgun 100-200 kb pieces that we've already mapped. Both strategies involve millions of subclones. There's no qualitative difference between the approaches.
And what do you mean by "slow pace"? We're putting out data a hell of a lot faster than Celera. They're struggling to put out the fly genome, whereas we've put out 1 billion bases of the human rough draft so far.
Sounds to me like you've been reading one too many Celera press releases, to tell you the truth.
(And note that it's Celera, not TIGR, that's one of the companies sequencing human. TIGR is a non-profit organization primarily involved in sequencing microbial genomes and Arabidopsis.)
It's naive to believe that just because most crackers cause no physical damage, therefore crackers aren't criminals.
Horseshit. So long as *any* crackers are causing physical damage, *all* crack attempts, unsuccessful or successful, must be followed up on and investigated by any site that values its data integrity.
Investigating a crack takes horrible amounts of valuable time.
It's the *time* that's being stolen by these "innocent" crackers. I'd rather spend the time hacking. It's not OK to force me to spend my time verifying some script-kiddie's benign intentions.
Katz, if you came home and found your door wide open but there's a note on the front table saying "hey, noticed the door was unlocked, I just had a look around" -- are you telling me you wouldn't be pissed off, and you wouldn't feel the need to make sure nothing was stolen or damaged? Are you telling me that a defense against trespassing is "I didn't hurt anything?"
I'd love to switch to Thunderbird, from rickety old emacs RMAIL, but one thing keeps stopping me. I get a lot of business email and I need to keep it archived and organized well. My archive is organized by sender and year: about 350 files for different senders each year, averaging maybe 10-100 emails in each file, dating back now over 11 years (about 3000+ files). Keeping this in emacs RMAIL is trivial, because they're all just regular files in my home directory that I can rename or move to new subdirs at will, and I can save emails out of RMAIL just by typing "o" and giving the name of the file. And since Emacs is lightweight enough (!) to run over my DSL connection, I never really need to run an email client anywhere but from my main work machine where my archive is, even when I'm travelling, so I haven't needed IMAP capability.
When I look at Thunderbird and other modern clients, I just don't see a way to keep track of old email as efficiently. I can create "local folders", I guess, but it doesn't appear that Thunderbird is going to treat these as regular files that I can shuffle off into a 2004/ subdirectory at the end of the year. And worse, since Thunderbird is heavyweight enough that I'm not going to run it down a DSL connection, it's going to create them locally, not remotely on my work machine, when I'm reading mail from home or on the laptop while travelling. IMAP seems to be a partial answer but it's going to keep its data on the mail host, not in my home directory, if I understand right.
Surely people have the same problem - how do you solve it?
OK, sure, I'll bite. What's our agenda, in your view? I haven't been to the secret meetings lately, maybe I'm not in the loop - but last I checked our "agenda" was to facilitate scientific research, by providing a massively important basic resource to the entire public for free, with no restrictions.
BTW, the Nature article is free even to nonsubscribers.
Bioinformaticians have to realize that our air supply is a freely available international sequence database. When it comes time to fight for our air supply, like now, it won't help if we're viewed as a pack of hypocrites. Published bioinformatics software has got to be made open source. As you point out, it's not much different from asking genome sequencers to deposit in Genbank.
Neither the journals not the community have established a standard of behavior for us yet, so it a less clearcut question than DNA sequence deposition right now. It will take more time and work to get the journals on board with respect to software access.
Nobody's bothered terribly about whether research is privately or publicly funded. (Hell, my research is funded by everything from your taxpayer money, to Howard Hughes' will, to Sun Microsystems.. and even, gasp, by Celera itself!)
The point is that publishing a scientific paper entails certain ethical responsibilities, among which is the free and open disclosure of your data to other scientists, so they can effectively build on your work. The community standard for *both* privately and publicly funded DNA sequence data is that *when it is published*, it goes to Genbank, EMBL, or DDBJ.
Companies that feel that disclosure will negatively impact their business model should not submit papers on their work, that's all. They should not seek the rewards of publication without meeting their responsibilities to the community of scientists that read their paper. Otherwise, their paper is an advertisement, not something that moves the field ahead. Other genomics companies seem to have no problem with this -- Incyte and HGSI, for example, don't try to muddy the waters by submitting papers on their proprietary genome databases.
Lots of the apologists for Celera say "shouldn't they be allowed to make money?" Sure they should. More power to them, my stock will go up, I'll be happy. But they can't have their cake and eat it too -- they shouldn't be able to get away with writing scientific papers about a proprietary database. It's not ethical.
To publish a paper and not deposit your DNA data in Genbank, EMBL, or DDBJ is literally unethical; it is not consistent with accepted professional standards of behavior.
Apologists for this deal argue that little concerns like "ethics" should be subservient to bigger concerns like "expediency". Where have I heard that argument before throughout history?
Yes, this is all going to make a great example for that required course we teach in research ethics. We'll be able to shorten the course a lot now. The lesson, kids, is that if you're big enough, the rules don't apply to you. Science is no different than real life. Anyone surprised?
It's not giving free access to academics, not in the open source meaning of "free" anyway.
If you want less than 1 Mb (that is, less than 0.03%) of the data, you agree to a clickwrap license on the Celera web site.
If you want all the data (about 3000 Mb), you and your institution cosign a formal license with Celera.
What does this license say, you may wonder? Well, so do we. Turns out, the details are still being worked out. But the gist is this: you can use the data for anything you want, so long as it is for noncommercial purposes. You can publish your results freely, with no reachthru rights being asserted by Celera. And you agree not to redistribute the data.
Oops. Look at that again. Ever see a scientific paper where you a) published your results and b) didn't "redistribute" (i.e show!) the primary data? Can someone define the bounds between publication and redistribution? I can't. Neither can Science, as of yesterday.
Science and Celera has not yet defined the bounds between trivial redistributions that Celera doesn't sue you for ("Figure 1 shows a BLAST alignment to my gene in the Celera database"), and real redistributions that they do ("Table 1 in the Web Supplement gives the positions of every DNA hexamer in the Celera database. Please don't use it to reconstruct the original data.") But I'll bet you that pretty much every large scale bioinformatics/computational biology analysis of the Celera data would be counted as a "redistribution"... potentially blocking the main use of the genome, which is for large-scale genomic analysis. And if the bounds aren't defined by the agreement, the bounds will be defined on a case-by-case basis by negotiation with Celera lawyers. Yes, I'm looking forward to that, I'll definitely get a lot of human genome research done.
It's a horrible precedent. Part of the reason for the success of bioinformatics has been the public availability of the international DNA databases. Science and Celera now threaten to set a precedent that could change that.
ob. disclaimer: I'm a coauthor on the competing Human Genome Project paper, and also a Celera stockholder. I'm conflicted both ways. I'm either going to be able to do human genome research freely, or I'll be rich. And I'd rather do research.
My evidence? Witness the computerization of the GRE (Graduate Record Exam). They no longer give it on paper. You have to take it by computer. The net result of the computerization of the GRE -- my university, and many others, silently no longer enforce our requirement that applicants give us GRE scores, because the computerized system is such a disaster, many students aren't able to even take the test.
And don't get me started on other well-meaning but totally screwed up attempts to replace an "obsolete" but effective system with a "modern" computerized one that doesn't work any more.
Think voters are disenfranchised now? Wait 'til we turn an election over to a lot of bug-ridden hardware and software.
Bah. I say keep using 19th century technology until you actually need to replace it. Computers are good at many things - and interacting 100% reliably with the general public isn't one of them.
As the industry allows more and more detailed genetic and statistical analysis, removing the element of chance, and identifying exactly who the people are who will get hit by an event... what the heck will the point of having insurance be?
If I test negative for Huntington's, I would be an idiot to get insured for it. If I test positive, the insurance company would be an idiot to insure me. Therefore, back to square one: no spreading of the risk, no benefit to me, or to the society, from the insurance industry. Now obviously, Huntington's isn't the only risk that might affect me, but nonetheless, the direction the industry is headed doesn't make long-term societal sense.
I politely discussed my reservations with the company's patent attorneys and with my CEO. They ended up agreeing with my position and we didn't file the patent application.
It's easy to assume that patent lawyers are ravening inhuman beasts, I guess, but it turns out that they really are human and often quite reasonable. I wouldn't take a hyperlegalistic or adversrial tack with your old company until I'd actually had a few polite conversations with the relevant people to explain your views.
Uh-huh. Freedom is slavery. Ignorance is strength.
For those who've been asleep, we in the public Human Genome Project have been fighting tooth and claw to get the genome into the public domain, while Celera and Venter have done their damndest to make a proprietary product out of it.
The only reason that Celera will make anything available is that we have succeeded, and undercut their position so severely that they will try to rescue in public relations what they've lost in business model.
Venter's quote, therefore, is the purest spin control -- and to those of us on the inside, tantamount to an admission that Celera's attempt to lock up the genome has failed. It's also a lie. Celera has never "immediately published" anything. They do not adhere to the Bermuda Principles for public DNA sequence release. They have not released their human sequence to the public, though they have certainly issued enough press statements implying that they have.
The power of the GPL rests in copyright law. Unfortunately, copyright law doesn't protect many of the things that we'd like to "open genome". US copyright law provides only very weak protection on databases; no copyright law protects inventions/discoveries/natural products. Hence, you're forced to look at patent or contract law. It's hard to develop a system with the awesome viral power of the GPL that exploits patent or contract law.
In the end, the genome project settled for keeping the genome fully public domain, and Clinton and Blair made a high moral ground statement about how that was a Good Thing (tm).
And every time you read the NY Times caricaturing the Human Genome Project as slick big business versus slow and plodding federal lackies, remember two things: companies like Celera pay for PR firms, and universities like Washington University in St. Louis have people like Henry who only give a damn about doing the right thing for science.
If I told you Gateway was a more efficient PC manufacturer than IBM because I took their total budgets and divided by the number of PCs they sell, you'd tell me I was being simplistic and stupid; IBM has a number of other corporate focuses.
The public genome project involves much more than raw human sequencing. It also funds technology development, physical mapping, genetic mapping, and model organism genome sequencing, amongst other things.
Likewise, Celera is being disingenuous in comparing budgets and timelines. In actuality, we are all using the same basic strategy and the same equipment, so the rate and cost determining factors are identical.
Celera intends to reduce their costs in two simple ways.
First, half their data will be taken from the public domain. Automated scripts from our friends at Celera download data nightly from our anonymous FTP server, a source of great continuing amusement to us, considering the corporate press releases that boldly say that "Celera has never relied on any public resources".
Second, they will not attempt to finish the genome to the high quality that we are aiming for, and it is that high-quality finishing stage that consumes expensive labor.
The combination of these might reduce their costs to about 0.10/base, so they could get away at $300M for the genome, compared to our $1000M. There is no way they can get under 1/10 of our costs; they've already spent ~$200 million or so just in one year of salaries and capital costs, and I have no idea what their supply costs are (but our *major* expense here is supply costs). Their slightly greater speed comes at a substantially greater incremental cost. Don't bullshit people about what a small efficient company they are: they are a big-ass biotech company with about a $6 billion dollar market cap.
And if their business model excites you, and convinces you that Celera is so cool, hey, here's some insider info: this November I plan to start Sean Genomics, Inc., and I will sequence the human genome in 1 day for $0, by downloading the data by FTP from WashU and Sanger, and I'll start issuing my own press releases about how I'm a zillion times more efficient that the Human Genome Project. Watch for my IPO!
Any reasonable person would define "complete" as this: there's three billion bases of human DNA in 24 different linear chromosomes. The sequence is complete when you can give me a DVD with 24 files on it, each of which contains a contiguous sequence of a human chromosome.
That may never happen for any large animal or plant genome. Too many regions of a genome sequence are an ungodly mess, repetitive and difficult to sequence.
The public worm (C. elegans) project, at 98 million bases, defined "essentially complete" as "we've come as close as we can to complete using existing technology". We have 97 million bases sequenced and about ~50-100 remaining gaps.
The fly (Drosophila melanogaster) project, at 180 million bases in size, was recently declared "substantially complete" by Celera. They have 120 million bases of sequence, with several thousand gaps. The fly has more extensive regions of repetitive sequence than the worm.
The human, at 3 billion bases in size, is nowhere near complete, either by the public (us) or by Celera, no matter what Celera press releases say.
You need the following steps to get close:
1. shotgun coverage. Technology limits us to reading ~500 bases of sequence at a time, so we have to blow the genome to bits, sequence millions of fragments, then assemble it all back (computationally) into a contiguous sequence. Because a successful assembly relies on deeply redundant overlap amongst the fragments, we need ~8-10x shotgun coverage (24 to 30 billion bases) to try to assemble the human genome. The fly genome was shotgunned to 12x coverage to achieve the results Celera reports.
2. Assembly. Once you've got shotgun data, you can try to assemble the genome from those fragments.
3. Finishing. The automated assembly (like the fly genome now) will have a great number of gaps. These must now be closed, more manually, by expert molecular biologists; the gaps represent regions that are biologically difficult to sequence.
The actual science behind the Celera press release is that they have partially completed phase 1. They currently have 4-5x shotgun coverage of the human genome, about half of what they need for a proper assembly. They intend to get the other 4-5x coverage from the public "rough draft", which is at about the same stage Celera's project is in.
The two projects (Celera and public) are neck and neck in this "race". The difference is that we acknowledge that our sequence is a rough draft at this stage; whereas Celera claims that their sequence is complete. Celera has every right to spin their project to their investors any way they feel is appropriate, but scientifically, they are being rather disingenuous if not dishonest.
conflicting oblig. disclaimers: I'm a co-PI on the public project, and I (accidentally, through an acquisition) also hold substantial stock in CRA.
Instead, they have tried to cast themselves as a competitor to the public genome project. Our goal is to provide the human genome sequence as a freely available research resource. Venter initially declared that that was also Celera's goal. Now, though, they're filing thousands of provisional patents and asking for exclusive distribution rights to the human genome. That's not compatible with the public goals. Celera should just admit that they're a business, and stop trying to claim otherwise. The proposed collaboration fell apart because Celera would not accept the terms that *they* initially stated: that the human genome sequence would be made freely available to everyone. (Indeed, Venter testified before the US Senate on this very point.)
Having growing high-titer stocks of poliovirus (vaccine strain) myself, under moderate containment, I can tell you there's no way in hell that I'd work with a potent biowarfare agent that required high containment. Suicidal, unless you're very highly trained. I'd sooner fill my basement with C-4 than have a single vial of a biowarfare agent.
However, another tenet of the patent system is the "fair use" exclusion, in which basic researchers are allowed to use patented technologies without having to pay licensing fees.
The Roche PCR patent case involved the first US case in which a company sought to claim that basic researchers were infringing on the patent, and that the "fair use" exception did not apply. Details are available from the Promega web site.
This, more than anything, was the infuriating thing about the Roche patent. Basic researchers were literally threatened by this company, for not paying exorbitant fees on an enzyme that many of us can make for pennies. Roche even produced a ridiculous hit list of basic researchers who were "infringing", based on the Materials and Methods sections of papers written by those researchers.
It's a Good Thing that Promega won this case.
A fundamental assumption of the human genome project is that the genome sequence is "precompetitive" information that is best put in the public domain, to spur both additional basic research and commercial innovation. This makes it an obvious target for public, not private investment.
So far, no company has stepped up and said they intend to make the human genome sequence freely available. Celera is even waffling over their promise to release the Drosophila genome sequence. And somewhat understandably so; a company needs a business model.
AFAIK, most of the genome sequence being produced by the HGP is from a single male individual. (Male, because we need to see a Y chromosome too.) I dunno for sure about Sanger, but WashU and Whitehead in the US are working from the same clone library.
The identity of this person is a closely guarded secret, as well it should be: this person's genome sequence will be available on the Internet. We'd like to avoid a nightmare scenario where a well-meaning "genome hacker" discovers a fatal disease gene in the sequence, and calls this guy up out of the blue to tell him.
That's just an extreme example. Basically, there's serious privacy and confidentiality issues. We consider the genome sequence to be a "reference sequence" or a "typical example", and we don't need (or want) to know who it came from.
Yup, we put everything in the public domain.
The human genome project is funded in the US by the National Institutes of Health and the Department of Energy, and in Britain, by the Wellcome Trust, a charitable organization.
Every base that we sequence is put in the public domain.
We strongly oppose the patenting of sequences. Some of our strategies are designed to preempt attempts by companies to patent sequences from the human genome.
And what do you mean by "slow pace"? We're putting out data a hell of a lot faster than Celera. They're struggling to put out the fly genome, whereas we've put out 1 billion bases of the human rough draft so far.
Sounds to me like you've been reading one too many Celera press releases, to tell you the truth.
(And note that it's Celera, not TIGR, that's one of the companies sequencing human. TIGR is a non-profit organization primarily involved in sequencing microbial genomes and Arabidopsis.)
It's naive to believe that just because most
crackers cause no physical damage, therefore
crackers aren't criminals.
Horseshit. So long as *any* crackers are causing
physical damage, *all* crack attempts, unsuccessful or successful,
must be followed up
on and investigated by any site that values
its data integrity.
Investigating a crack takes horrible amounts of
valuable time.
It's the *time* that's being stolen by these
"innocent" crackers. I'd rather spend the time
hacking. It's not OK to force me to spend my
time verifying some script-kiddie's benign
intentions.
Katz, if you came home and found your door wide
open but there's a note on the front table
saying "hey, noticed the door was unlocked,
I just had a look around" -- are you telling me
you wouldn't be pissed off, and you wouldn't
feel the need to make sure nothing was stolen
or damaged? Are you telling me that
a defense against trespassing is "I didn't
hurt anything?"