Domain: ensembl.org
Stories and comments across the archive that link to ensembl.org.
Comments · 34
-
Re:So, create a public DNA museum of sequences
-
It's still in the gene databases
Can anybody explain what patenting the genes actually means? The articles aren't too clear. They're still in the public databases: BRCA1 and BRCA2. This includes the sequence, SNPs, transcript information and all the other goodies. In fact, the Ensembl home page still lists BRCA2 as an example for its search box...
I can understand they might patent technology they have developed that is associated with those genes, which seems fair. But if all this information is still available, they haven't really patented the gene itself.
-
It's still in the gene databases
Can anybody explain what patenting the genes actually means? The articles aren't too clear. They're still in the public databases: BRCA1 and BRCA2. This includes the sequence, SNPs, transcript information and all the other goodies. In fact, the Ensembl home page still lists BRCA2 as an example for its search box...
I can understand they might patent technology they have developed that is associated with those genes, which seems fair. But if all this information is still available, they haven't really patented the gene itself.
-
It's still in the gene databases
Can anybody explain what patenting the genes actually means? The articles aren't too clear. They're still in the public databases: BRCA1 and BRCA2. This includes the sequence, SNPs, transcript information and all the other goodies. In fact, the Ensembl home page still lists BRCA2 as an example for its search box...
I can understand they might patent technology they have developed that is associated with those genes, which seems fair. But if all this information is still available, they haven't really patented the gene itself.
-
Re:1 in 2000 people
Also, don't forget that each person has two haplotypes, one from each parent, so
when one sequences a person, one captures the variation on two human genomes at once.
Of course, this all relies on the coverage you sequence at, and one option for
the 1,000 genomes project is doing this at low (2x?) coverage, using pretty sophisticated
methods to combine statistical power between sample datasets.
The "1,000" though is more a round number that is in the right range. it might well be
1346 people or something like that (often some multiple of 96, as 96, or 4*96, 384
is the standard size of a molecular biology "tray" put into a robotic system).
We're going to have alot of fun at http://www.ensembl.org/ with this... -
Re:Messy Speghetti Help
We are predominantly Perl programmers at least on the European side - http://www.ensembl.org/ is the European based Genome browser (probably a million lines of Perl)... plus most of the http://www.sanger.ac.uk/ Wellcome Trust Sanger Institure data manipulation and presentation is in Perl...
See also http://www.bioperl.org/ -
Re:How will it compare?For those unaware, you can currently browse the genome libraries: http://www.ncbi.nlm.nih.gov/genome/guide/human/re
s ources.shtmlIts not as if the NCBI is the only ones publishing genomes. taking a few examples from our useful links page
Its Google is not even doing something new type in a human gene (say ABCA1and you will get taken to the gene data pages anyway
The only reason why they picked on Google is that it would get headlines, now move along nothing to see here
-
Re:Which Database?
www.ensembl.org/info/software/index.html
"(...) Ensembl uses MySQL relational databases to store its information. (...)" -
w00t! Opensource genetics!
genetic information of organisms - mice, fish, flies, bacteria and, of course, humans... All the data are freely available to the world scientific community (http://trace.ensembl.org/) Sweet, now I can finally build myself that fleet of flying super monkeys I've always wanted!
-
Re:I suspect so but didnt know for sure
>> RNAse is the bugbear of RNA work, its a normal part of every cell and its job
>> it to break up RNA (which it does very well). When its in the cell its kept under
>> close control, however if the cell is broken up (to extract RNA for example) the
>>control is broken and it eats any RNA it can find.
>Darned DRM. You'd think I would at least have fair use rights over my own body!Don't sweat it, the binarys have DRM but the source code is freely avalable
-
genes run in both directions
and overlap. please see my other post linking to the http://www.ensembl.org/ genome browser.
if you want to see a very dense genome, try looking at some viri. they take advantage of the fact that each amino acid that is used to make the protein machinery are encoded using three bases, and so can put three genes almost on top of each other. It's on the level of funkyness of a programmer writing a sequence of bits in machine language where 8 fully functional programs could be derived depending on whether you shift out one to eight bits from the start of the "program" before loading the program onto the stack of a cpu that has an 8 bit opcode system.
-
Re:I haven't a clue...
I don't really know what these guys are doing with their computing power, but one cool free bioinformatics resources that allow you to browse the genome is
http://www.ensembl.org/
User interface is fairly intuitive and well documented.
You can see that serving this information is a non-trivial engineering problem. -
Re:Who owns the results?
A lot of the analysis software used is also freely availible as it most of the web display code
http://www.ensembl.org/
another sangerite -
Re:Alpha!
Like Gurdy, I work at the Sanger Institute as a sysadmin, in particular I work in the group which maintains the largest cluster. And it isn't Alpha.
Alphas were used in the early days, because of their fully 64-bit nature, and it was known that to represent the entire genome in a single file requires 3GB, and at the time few OS's had 64-bit filesystems, let alone 64-bit processors.
We continued to buy them as server machines, but the compute farm which produces http://www.ensembl.org/ amongst other things, has not had any Alpha CPUs added to it since 2001.
We got into X86-based blade servers in 2002, and currently have well over a thousand of them, mostly running Debian GNU/Linux. -
It's already free
Genomes are available at http://www.ensembl.org/ . I know I've said this before, but I feel it can't be overemphasized. Ensembl is so incredibly cool. I imagine Celera is releasing their data because no one wants to pay for it when Ensembl has it for free. Additionally, Ensembl has tools that provide so much more than just genome sequence-scanning. And they use open source projects like BioPerl and use Wiki for documentation! I think this is just a PR stunt for Celera.
-
It's already free
Genomes are available at http://www.ensembl.org/ . I know I've said this before, but I feel it can't be overemphasized. Ensembl is so incredibly cool. I imagine Celera is releasing their data because no one wants to pay for it when Ensembl has it for free. Additionally, Ensembl has tools that provide so much more than just genome sequence-scanning. And they use open source projects like BioPerl and use Wiki for documentation! I think this is just a PR stunt for Celera.
-
On the use of the chicken genomeSeveral posters seem to assume that the main objective of having the chicken genome available is to make better and cheaper food products. There is of course some truth to that, but there are also other advantages.
Through domestication and long time (traditional) breeding, the farm chicken has become quite frail and there are several genetic dispositions for problematic conditions for chickens. Knowing its genome could help breeding (both traditional and more modern directed) generate a healthier bird. It is worth noting the man's best fried, the dog, also has these problems due to breeding.
The sequenced genome is actually from the wild Red Jungle fowl, and not the domestic chicken, so there will be plenty of "healthy genome" to learn from.
For scientists, finally having a bird genome is also great. It is further away from chimp, mouse, rat, dog, and other "close" genomes, while closer than, say, fly and nematode. It lands somewhere between us and fish, of which we today have something like three genomes (zebrafish, fugu, and tetraodon). A goal for choosing species to sequence today is having a good and even species sampling to make what is called comparative genomics better materials for comparisons. A nice resource for genomics of higher organisms is Ensembl, where you can get a glimpse of some of the more interesting animal genomes available.
-
Re:For an open source siteyou really should have references UCSC's annotated genome browser and tools and the genome browser. Incredible array of annotations, and completely in the public domain.
The UCSC browser is not completely in the public domain: a license is required for commercial downloads/installations. The Ensembl project is completely free to all as they use an apache-like-license.
-
here's some genomic data
You can always use genomic data - there's plenty of it to go around for everyone. Following is a link to some downloads for mySQL: http://www.ensembl.org/Download/
-
Re:Oh my.
Get your own installation of the ensembl genome browser and related apps. Why? Just because it's cool to have half a dozen genomes in your computer to play with
:) -
Browse its genome
The C. Elegans genome may be browsed here...
-
Re:What license?To quote the Ensembl Project website, where you can get your very own copy of the mouse genome,
"Access to all the data produced by the project, and to the software used to analyse and present it, is provided free and without constraints."
So, it's pretty much license free. See, that's why us bio-geeks are smiling all the time.
-
Re:What license?To quote the Ensembl Project website, where you can get your very own copy of the mouse genome,
"Access to all the data produced by the project, and to the software used to analyse and present it, is provided free and without constraints."
So, it's pretty much license free. See, that's why us bio-geeks are smiling all the time.
-
Re:"Public domain"
In most open source biology the data is in the public domain so it can't be patented as there is prior art, one fo the main reasons ensembl was set up so people couldn't patent genes
-
Re:Free Flow of information?
I agree. "Open Source" in biology is more the rule than the exception.l I have been astounded by the FREE resources out there available for anyone to use! Databases like Genbank and Swiss-Prot are invaluable to modern molecular work. Pedro's Biomolecular Tools is just a sample of the plethora of free resources available today.
Incidentally, I can't recommend Ensembl highly enough. Not only have I been able to significantly further my research with their tools, but they have open-sourced the entire code behind their site! And the documentation is even in Wiki! I really think what they have done is incredible and should be one of the first projects anyone mentions when expounding the virtues of open-source software as well as sharing information in the field of Biology.
-Ryan -
Re:Free Flow of information?
Another problem is that researchers can go months, even years on wrong information, and theories. If these were published, yes theres a possibility they could be discounted, but they could be perpetuated, with lots of wrong data all over the place.
Agreed. However it's not so much that people don't want to publish things that will be disproven as it is that they don't want to publish negative data. This is a real problem as the above poster points out. If you have 3 or 4 labs pursuing the same project, you could end up with them all pursuing the same dead end leads wasting years of work and hundreds of thousands of public research dollars. If on the other hand, the first group to get those negative results had published them (and part of the problem is that there isn't really anywhere to publish negative results) then that group would have had another publication on their collective roster and the other groups could have avoided wasting time on money on this line of investigation.
How is this different from any other science. I mean, in physics, there's lots of papers out there that will eventually be shown to be wrong. That's how science is supposed to work.It's a shame that biology has become so profitable. Hoarding data and discoveries is not how science advances. The history of chemsitry and physics are ample illustrations of that fact.
FWIW there is still plenty of sharing going on in biology. The genome projects (human, mouse, yeast and others) are good examples of that. Sure, there's the private, Celera stuff out there but the public projects at NIH-NCBI and EBI-Ensembl are excellent examples of those.
E -
Re:statistical approaches
why would you want to use Perl over a flat file data set
Good Question. Answer is yes and no.
Flat Files are really quite useful in biology (btw, when a biologist mentions a "database", he almost certainly mean a "flatfile"). DNA/RNA/Proteins are just a long sequence of letters, and therefore these are perfectly represented by good 'ol ASCII. This is particularly useful for means of distribution etc. When annotations are added to the data, they are traditionally added to the flatfile by way of an "annotation table", to keep the simple ease of ASCII.
However, more advanced ways are used to store annotations of biological data, although traditional databases arent allways that good at expressing the rather messy, randomness of biology ;-) Therefore, specialised databases such as acedb are quite useful and intuitive to the biological mind. Furthermore, projects such as ensembl (which ambitiously attempts annotations on the whole genome) store their data in an SQL database. However, they still make extensive use of perl to interact wiht the database. -
Nothing is remotely firm yet...The article (and the writeup here) makes it sounds like one presumably very accurate estimate has been supplanted by a very different, presumably very different, estimate. The reality is that identifying genes in raw sequence is very much a work in progress. At the annual Genome Sequencing meeting at Cold Spring Harbor in May, a bunch of groups presented different methods that resulted in widely divergent numbers. Everyone's numbers were increasing over the estimates of last year, though.
It'll sort itself out over the next couple of years as the sequence gets better assembled, more non-human sequence is available for comparison and the groups adopt one another's good ideas. In the meantime, it looks like a good PR person at Ohio State managed to make their findings seem more revolutionary than they are.
By the way, if you want to bet on the number, see the GeneSweep page. (Note that bets must be placed in person!) I put my $5 on 44,000 and change.
Unsettling MOTD at my ISP.
-
Slashdot PR again!This annoys me. Slashdot are really happy to pander to the PR that these sorts of companies have but consistently turn down interesting stories about how we are trying make the human genome open and accessible for all, in projects like Ensembl. What are these guys really going do with this? Probably nothing. They don't look like they know what they are doing. And yet they get posted to slashdot.
I wish Slashdot was more interested in the real science of the genome and less PR orientated. Slashdot aint what it used to be...
-
Re:Genome Sizes.
Haploid Genome Sizes (collected from various sources):
A more comprehensive list of genome sizes is here:
http://www.cbs.dtu.dk/databases/DOGS/abbr_table.by size.txt.
These pages show how much of each organism is finished and publically available:
http://www.ebi.ac.uk/~sterk/genome-MOT/MOTgraph.ht ml
http://www3.ebi.ac.uk/Services/DBStats/
Arabidopsis thaliana: 1.17 x 10^8 bp, ~25,000 genes.
25000 genes is near the low end of the range for the estimates of the number of genes in the human genome:
http://www.ensembl.org/Genesweep/ -
Re:Grrr.You want an incentive? Isn't the honor of seeing a journal article attached to your name incentive enough? And if you can't get a journal to print your letter or article on the new gene, perhaps, your discovery doesn't really merit a patent either.
Scientific research is not driven primarily by commercial institutions. It's driven by academics. Gene liscences, patents, and other concepts of intellectual property stifle the academic process.
I'm really surprised that Genset has patents on 36000 sequences. Considering tht the median number of human genes is about 53 thousand , this seems a bit high. Of course, some of Gensets sequences may duplicate genes. More likely however, is the possibility that Genset has patented a goodly number of introns (non-coding sequences).
-
Re:Automated sequence annotation?The value of Doubletwist's database is exactly as you've analyzed, but for smaller biotech firms, it's worth it -- because this type of bioinformatics service is quite expensive right now, due to a lack of people who are capable of doing it.
There are public projects that are working to provide this sort of service for free. Ensembl is one example.
-
Open Alternatives to Commercial Genomes
It seems odd that
./ is focussing on the commercial aspects of the HGP again.Especially on a day when the public consortium have made this press release announcing 85% genome completion, which is freely available to the public, and the ensembl project, an open source project, making genome data, annotation, and analysis tools freely available, has reached Milestone 2.
-
open source genome analysis & annotation toolsEwan Birney will probably chime in shortly
:)Ewan is heading up just such an open source project you mention. Check out www.ensembl.org.
In a more general way we are also working on tools over at bio.perl.org