Celera Opens Up DNA Database
greenplato writes "Thirty billion base pairs from the sequences of humans, mice, and rats that were available only by subscription to Celera's DNA database are being put into the public domain. Celera will donate this information to a 'federally run database,' presumably GenBank. Francis Collins, head of the National Human Genome Research Institute, notes that 'data just wants to be public.' Stories in BusinessWeek and The New York Times."
Shouldn't that be "data want to be free?" :)
"data wants to be public!"
the new aniti-riaa/mpaa slogan
Francis Collins, head of the National Human Genome Research Institute, notes that 'data just wants to be public.'
Data hates when you anthropomorphize it.
Will this mean more clones, or more genetic modification treatments will become available, now that highschool students can get ahold of this, and work with it on their next science fair project?
Saskboy's blog is good. 9 out of 10 dentists agree.
That is so wrong on numerous levels. Hi Evil Corporation, here's ten thousand dollars so I can get a peek at genetic code that I inherently share with every human being in the first place.
Let's see, the one company that pioneered genome research with reliable and extremely efficient shotgun sequencing, is now an evil corporation because it wanted to use its investments in research for developing novel therapeutics. Which in the end benefits human-kind. Please...
I am defenseless. Use your button. Mod me down with all of your hatred.
Yeah but I don't see signs advertising "fair and balanced" all over the place like Fox News.
Considering the millions of dollars that Celera invested in gene sequencing, it should at least have the opportunity to make back that money. Heaven forbid, they might even deserve to make a PROFIT. Profit is a leading motivation of many corporations, you know...
A guy walks into a bar... well, I forgot the joke, but the punchline is that he's an alcoholic.
They've open sourced me! Does this mean I have to call myself GNU/Steve?
Hasn't much of the human genome been patented by greedy companies?
Powered by caffeine and sugar; BSD
Exactly, if you want to play with the data, you enter a phd program in genomics or you work for a company that use the data. sheesh
FTA "DNA database are being put into the public domain" Again, we find information and data that SHOULD be in the public domain, yet the patent office, government, and kickbacks protect those that stand to make money? Its time that we, as a populace, stand and shout for the rights of the public to information. Sure, there are those that say that without protection, such innovation would be stiffled, and I counter with this... "should such efforts be in the public sector?" Through emminent domain, they can take your property, but if you are a business, there seems to be no such thing. I hear of companies giving to this charity or that... but none are giving to the charity of mankind? Information is power, and in this information age, it is time for those with the information to take power from those that would use it to extort finance and power from those that do not know better. All such information should be in the public domain. Knowledge of the human genome, of anything that affects ALL of us, should be public information. For instance, any method of retrieving emergency information during an emergency should be in the public domain, not a subject of patent worthiness. The entire point of 911 service is to aid the community, not bilk them of dollars. The entire point of scientific discovery is to learn and advance humankind... when it becomes simply a method of making money, the advancement of humankind goes in the trash like yesterdays junk mail. At that point, what is the point of funding science? Think bigger than your new BMW. This might seem altruistic, but what is the point of discovery if your only reason to share is profit? When do you lose respect, when do you stop having authority? The ONLY method of advancing the human race is through sharing, through communal discovery. Perhaps this will advance that purpose, perhaps it won't.
Support NYCountryLawyer RIAA vs People
Who holds the patent for "viewing alpha sequences comprised of the letters G, A, T, and C, superimposed on a dual helix-shaped structure...on the internet"?
IIRC from so many years back, it was the CEO's own genome that was sequenced by Celera (who went by a different name back then, I think.) So in at least that sense, he holds the copyright and is entitled to sell subscriptions.
HSJ$$*&#^!#+++ATH0
NO CARRIER
I wonder why something like this isnt inherently unprotectable, like the contents of the phone book. A DNA sequence is, after all, simply a record of an existing state of things, NOT an original work (barring genetic engineering, which this isnt). If I take your phonenumber/basepair book and reproduce it... have I broken any laws (apparently the answers are no and yes, in that order)? The precedent for this has existed for decades.
He didn't create that sequence.
:)
He just possessed it and had a "license" to it.
If anyone should hold the copyright it should be God and his parents.
Just because it CAN be done, doesn't mean it should!
I work for a biotech company with a database which we've been trying to sell subscriptions to for a few years. The prevailing experience with trying to sell the database is that people are very reluctant to shell out the cash to access the data.
I think this is a symptom of trying to sell data to academic institutions. The problems with selling to academic institutions are two-fold; Firstly the universities don't have the cold hard cash to spend on the databases, so any cost over free is too expensive. Secondly, there is the free/open culture within universities that almost punishes commercial ventures for trying to build a business around adding some kind of value to the data (such as convenience or quality of data).
Because of the lack of sales for this database, we're considering handing the data over to a large government body so that they can maintain it, because the company can't simply afford to maintain the database - it costs a lot of money to hire talented people to do database curation.
So when Celera say that "data wants to be free", I think they mean "We'd sell you this data to try and recoup our investment, but we're resigned to the fact that you're not going to buy it".
Celera is pretty evil as a employer. At one time the company had an insane stock evaluation. They realised that the genome database profits will end soon and the "synergies" with its own drug research will not happen. So they fired the genome people and used the stock proceeds to buy up biologic instrument companies and some small biotech companies. Making instruments and biology tools is what produces any income for them.
I worked for a small biotech company that became a part of Celera. They are doing a good researchbut the high management is rotten. I was not there before Celera took over but my understanding is that the new management made all the changes for worse. Now the bulshit there is deeper than ice in Antarctica.
I doubt that we will ever figure out - and I suspect that even if we did figure out we couldn't do much about it
Sure the public can view the DNA but did Celera surrender the patents too??
Does this make Genbank "Internet Explorer" and Celera "Netscape"?
Now what do I do with it?
If wonder if SCO have threatened to sue them?
Personally, I think the real reason is the companies can't make a profit by simply having the "standard definition" and its effectively useless to them.
To 99.99999% of the population, these base pair sequences could be random bits, and we wouldn't know a chromosome if it came up and bit us on the ass.
They are holding a single sample of data, when in reality whats needed is the variation patterns based upon this starting point. We could start to see just how different we are from apes, and why behavioral patterns emerge.
liqbase
I hear what you're saying about academic institutions. They're incredibly whiny and expect everything to be free. We make very little money off of them, and they consume a large share of tech support, but we go out of our way to be nice to them because many of the same people later pop up in pharmaceutical companies in control of large quantities of cash.
Celera saw the writing on the wall. Everyone is using the public reference assembly because it's free, and in terms of contents the two are merging toward a complete consensus as they approach total coverage. You can only make money selling this kind of information while vast portions of the genome remain unknown or unavailable, and that's not true anymore.
Plus using a different assembly than other researchers cuts you off. When we import data from dbSNP, for example, we regularly drop references to positions specified in reference to Celera contigs. (Not much of a problem, since they're in the vast minority.) The Celera assembly has not been freely downloadable and redistributable, and we haven't been including a copy of it in our software (we always include a current public assembly build). Now that this has happened, I think the next build of the public assembly is going to be really good.
No more security through obscurity... and if they do have security patches forme, I would rather not have to recompile.
I have freaks! I did something right...
Cool. Now we can make DNA bombs specific for J. Craig's genome! w00t!
Whos DNA is it?
I'll tell you exactly what it wants. Human genome data wants to be anthropormorphised.
Are you sure you don't want to add "make love not war" to your rant?
The data generated would not EXIST had not investors (read people) put millions of dollars into the company to hire the researchers, buy the equipment, and develop and analyize the data. Odd that, at some point, they'd hoped to get their money back.
Some people, unlike most here it seems, understand that INFORMATION is not free, that it costs time and money and often sweat and tears to create. As such, in many cases it simply can not be given away.
However, if you believe otherwise, there's nothing stopping you from creating your own information and placing that value into the public domain.
Assuming you're capable of doing so, of course.
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
Excellent PBS video on race between government and Celera to crack the human genome:
http://www.pbs.org/wgbh/nova/genome/program.html
Mirrors please..
..with the typical /. groupthink. Everyone around her would like to think that the genome sequence should be free to the public. And liken this to open source software. I don't disagree with this. However, we must remember that one can sell a service. An annotated database of the Genome sequence is a service. Although it doesn't contain unique "created" data, annotation and organization is a huge undertaking in itself. Yes, it's horrible that a company invested money and resources towards capitalizing on something that everyone should own. But it's a fact of life and we have the publicly funded Human Genome project that is open to researchers already and obtained in a different manner than the private one.
It's already good. Release are coming farther apart and there are less and less changes. The next build should be a true gold standard; you're right about that.
Beware about dbSNP mappings. Many placements are ambiguous (98% alignment to one spot, 96% to another, which one is right?). Some of the data is probably bogus, too. Still, it's pretty good stuff.
I spill^H^H^H^H^H^H open up my DNA database everyday!
All your Sybase are belong to us.
or he'll write a bill preventing the data from being released.
Oh wait, there's no corporation for him to whore himself out to. Maybe this will actually see daylight.
You are in a maze of twisty little passages, all alike.
Then go and have fun reading yoru DNA. Keep in mind that if they couldn't make money from it then they would have never sequenced it so either way you don't see it, why complain?
bullshit. they decided to try and make money on it instead of assisting in the human genome project that was government funded. government funded == work has to be available to everyone free of charge. That is a benefit to human-kind. An evil corporation only care about the $$$ they can get in helping human-kind benefit. If they can't get enough $$$ then they don't give a damn and will not let anyone benefit.
Celera's "exremely efficient" method only worked because the NIH's freely available genome data was available. Without it Celera's "shotgun" fragments would have been just that - fragments. It took a base of comparison to complete the map.
Celera relied on the "free research" of the NIH. They extended that research with their own technique, and then patented the result of the joint data.
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
to start complaining about how another Hitchhiker's Guide story got posted.
Damned acronyms.
Here's a copy of the data
t atgactgatcggtagcatatattatgctatagctagcgtgtagctagtat cacatcagctactatgtagctacgatcgagcacactgactacgtagctag tagcggatcgatagctgatctgactgactatatatagcgcgcgatatata gcgcgtagatcgtagccgcgcgatgatatataaggagactgactagc...
acgcggcgatgcgtacatagctagcgctgcatagatcgactatgacgat
Does anyone remember the story of the hacker that actually wrote the code that cracked the genome sequencing problem? He is the unsung hero of this whole private vs. public debacle. He wrote a 10,000 line C program to do the sequencing in "rafts" and "contigs" in the space of a few days -- and had to ice his wrists from all the work... it was because of his brilliant work that the race went from being a 20-year thing to a 3-year thing, and of course nobody knows his name. (And I've forgotten it.)
The data was publicly available from Genbank or the public sequencing effort. Heck I can go to about 2 or 3 websites right now and get it.
Celera's advantage was/is that the data was of higher quality and their database was curated better and had a higher reliability.
Now the public databases have become good enough that you don't need to use Celera's tools. I still find that the public databases are a bit of a mess but they are good enough to get the job done.
Anyway, Celera seems to epitomize the way large projects like this become free: they sink billions upon billions of dollars into a project which is soon supplanted by a better free (though, of course, government funded) alternative, and after years of unsuccessfully trying to sell it, release it for free for a bit of good PR.
But then again, they've made a huge contribution to the field overall; Craig Venter may be an arrogant prick, but he gets shit done, while Francis Collins mostly waxes poetic about the bright future of genomics.
Well, that seems like enough venting about the sad state of research.
sic transit gloria mundi
Craig Venter better hope his health/life insurance company doesn't take a closer look at the sequence and drop him for "pre-existing" conditions.
In all seriousness however, Celera's sequences essentially suck anyway. The public projects have handily beat them and their sequencing methods have been deemed inferior (see last October's issue of Nature). They are not adding any scientific value by releasing their versions of these three genomes.
So, lemme get this straight: they fired the people in an unprofitable part of their business and expanded into profitable endeavours. God, that sounds absolutely evil. Err... maybe that's just basic sound business practice?
Upper management may or may not be rotten, but you don't really explain what was "evil" about their actions.
No, just the general treatment of people. I am so happy to be out of there. I got Dilbertized there and the way they fired me when the management learned that I leaving was just nice example of corporate nastiness. How they dealt with people in their Maryland site which got summarily closed down after squeezed all the dought for Celera from them just seems to fit my experience.
I doubt that we will ever figure out - and I suspect that even if we did figure out we couldn't do much about it
Because of Celera's choice of approach (whole genome shotgun) they could not even successfully assemble the millions of small stretches of sequence into the chromosomes. They resorted to using the public sequence to assemble their own data, very much like using another person's solution to a puzzle to solve it yourself; not a trivial hint.
Celera also wavered on exactly what limitations they would impose on their subscribers. They slowly backed away from unworkably restrictive EULAs but that was only in response to the lack of subscriptions.
Fixed: Car companies rely on the "free roads" of the federal government. They extend that infrustructure with their own cars, and then profit off the result of the joint use.
How evil of them!
I thought it said Caldera there at first. I thought that if I looked too much like my dad, I'd get sued.
Someone has probably already pointed out that human DNA contains 3 billion base pairs and not 30 billion. It is a sad shame that a company as renown as Celera is overshadowed by blatant misinformation; even from former CEO Craig Venter who is known for calling archea a type of bacteria in the December 2004 issue of SCIENCE magazine. Mishaps like this further alienate the real intellectuals who would normally be capable of over-running the Internet towards an information rapture in the scientific community.
-Bio major/Nerd
The book is very readable, and from my own experiences rings of the truth.
The information was most likely taken from a press release by Celera. Press releases tend lean to hyperbole so long as it remains technically truthful. Either there were a heck of a lot of mice and rat genomes, which along side the human totaled to 30Gbp, or much of the data is redundant.
You're wrong.
Celera's data was better BECAUSE they used the public data in addition to their own sequences. The Celera assembly simply contained more information (using any definition of the word) than the public project's.
Genomes are available at http://www.ensembl.org/ . I know I've said this before, but I feel it can't be overemphasized. Ensembl is so incredibly cool. I imagine Celera is releasing their data because no one wants to pay for it when Ensembl has it for free. Additionally, Ensembl has tools that provide so much more than just genome sequence-scanning. And they use open source projects like BioPerl and use Wiki for documentation! I think this is just a PR stunt for Celera.
"fist in the air in the land of hypocrisy"
The difference is Celera was able to patent gene sequences as they pieced them together using their own data and that from the NIH. After they patent a sequence, the NIH has to pay royalties to work with it, regardless of the fact that they provided part of the research.
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
Okay, since this data, too, "wants" to be free, how about posting links to the CVS / rsync / snapshot.bz2 / BitTorrent / ftp site for downloading the database? "I'm okay to go..."
They did swear under
oath
that they would release they data without restrictions.
They also told congress (under oath) that their strategy
would end speculative patenting of the human
genome, whereas infact they've applied
for thousands and thousands of speculative
patents.
Shame on them.
Celera have long been seen as the Microsoft of the Science world, snaffing up patents 'like a powered up pacman'. So I'd say you got the two mixed up there. But Craig Venter (celera) 'opensourcing' is like Bill Gates stealing your cereal, and never replacing your milk - then one day giving you a cow. They are both pricks, and this gesture doesn't change anything. They are both ,aligned in different fields.
(I am pretty certain this data has been freely available but making drugs based on research using it etc might have been the restricting factor.
Maybe it just wasnt freely available to academia).
'Caldera Opens NDA Database!'
OK, heart rate is lowering now...
... computer hackers have known this for quite a while now.
- "They misunderestimated me."
This brings a whole new meaning to the phrase Identity Theft.
Anyone got a torrent?
I am one of many. My idea is not unique, nor do I expect my voice alone to sway you. I speak in a chorus of opinion.
Both sides had a difficult time assembling the sequence. Celera's data was of higher quality because their method provided for better coverage AND they could use the public data to clear up any ambiguities.
Can anyone tell me if this is that big of a deal? Im no biologist just a college kid in a bioinformatics class but from what i've experienced the major free databases out there like GenBank, EMBL, and DDBJ seem to be pretty comprehensive.
Open Source has nothing to do with GNU.
Profit motivates conservative power-grabbers. It hardly ever motivates creativity nor interesting research. That is why the lean, mean, modern corporations so desperately suck at basic research, and almost all cutting-edge work is still coming from universities and national labs.
These genomes are in the latter category: sitting on this information and trying to wring profit out of it will never earn back the investment Celera expended. Publishing it on the other hand will allow it to be used for intangible and tangible benefits to society, some of which will come back to the company.
Someone has probably already pointed out that human DNA contains 3 billion base pairs and not 30 billion. It is a sad shame that a company as renown as Celera is overshadowed by blatant misinformation...
...
According to my 2004 Bioniformatics in the Post-Genomic Era textbook, there are approximately 100 billion bases - so that's more than 30 billion base pairs (60 billion)
But, hey, so you're working on outdated texts, why should the facts bother you?
-- Tigger warning: This post may contain tiggers! --
Whos DNA is it?
It's mine. Prior art.
All your patents are belong to humanity.
-- Tigger warning: This post may contain tiggers! --