Domain: ebi.ac.uk
Stories and comments across the archive that link to ebi.ac.uk.
Comments · 31
-
Re:Grant whores and PR scientists
This. Dunno about other fields, but it's pretty routine these days in bioinformatics and biostatistics for authors to post their data either as supplemental material with the article or on their departmental web site. The problem is that the format for the data they post is generally "whatever format I have it in at the moment" -- if your lab chooses to keep everything in Excel, that's your business, but it's no fun for the rest of us. Microarray data all goes into GEO or ArrayExpress these days, but even there the file specifications are much looser than they should be; and of course microarray data, as important as it is, is only one small portion of the bioinformatics data universe.
-
Re:Are his customers happy?
Because amino acids use electrons to react. Just like any other reaction in Chemistry. Yes, what he's saying is true. How do you think Hydrogen bonds work? Gravity? Jesus. And your whole spiel
Chemical reactions are electrical interactions after all.
Hmm. Not in any meaningful sense, no.
I say this as someone who works in a research group on chemoinformatics, involving comparison and analysis of (bio)chemical reactions. For example, here is a drawing (made by graphics software written by me of an atom-atom mapping from my colleague):
cinnamate beta-D-glucosyltransferase
Cinnamate (in cyan) is being attached to the sugar (purple). This is carried out by an enzyme, with a precise arrangement of amino acids in an active site. How on earth would 'electrical interactions' (in general) affect this reaction - or any other?
where you're basically just name dropping what your research does? What is this, argumentation by shock and awe?
-
Re:Are his customers happy?
Chemical reactions are electrical interactions after all.
Hmm. Not in any meaningful sense, no.
I say this as someone who works in a research group on chemoinformatics, involving comparison and analysis of (bio)chemical reactions. For example, here is a drawing (made by graphics software written by me of an atom-atom mapping from my colleague):
cinnamate beta-D-glucosyltransferase
Cinnamate (in cyan) is being attached to the sugar (purple). This is carried out by an enzyme, with a precise arrangement of amino acids in an active site. How on earth would 'electrical interactions' (in general) affect this reaction - or any other?
-
That is tiny
That read head is about the size of 3 big protein molecules side-by-side! (That's what she said. Sorry, I've been watching The Office reruns.)
-
Re:Data availability
The way I see it most journals (even the closed access ones) actually require that you make your data available. This is especially true for DNA microarray studies, where you will be required to deposit the data in a public database - for example ArrayExpress or the Gene Expression Omnibus at NCBI. Personally I see the publication of the data as a very important way to drive citation of your papers. When I link to data on the department webserver, I group the data into specific directories depending on the area of research - that way a person look for data from one particular paper will also find data and reference to our other papers within that area (for example see: Probe Design datasets and Cell Cycle datasets).
Regarding the fee for Open Access publication: In my personal experience this has not really been a problem - performing the experimental work behind the datasets has always been the expensive part and the Open Access fee has been paid using the same grant as the one paying of the experiment. For non-experimental papers ("pure" Bioinformatics) the department or the University pays the fee (of cause it may not be as easy everywhere - I work at the Technical University of Denmark).
It should also be notes that for some new grants your are actually required to publish your finding in an Open Access journal (I think this may the true for the EU grants, but I am not completely sure).
-
Re:An example of what should be done!
There is actually a lot of data associated with human disease that has been made available to the public. There are three main DNA databases throughout the world: NCBI from the US, EMBL from Europe, and DDBJ from Japan. These public sequence databases have a plethora of links associated with them that you can explore and find out more about the biology of human disease from sequences to academic papers. An example of is the The Online Mendelian Inheritance in Man. The down side, of course, is that many of the newer papers require a subscription to read in their entirety.
-
Near miss
This will certainly have put the authors' gizzajob plea in front of many eyeballs, and that may be its primary value. A more interesting approach to the harnessing of our pattern recognition abilities to spotting significant sequences in the chromosomes would be to display the genetic code in colours relating to, e.g. the hydrophilic/hydrophobic nature of the encoded amino acids. I agree with earlier posters; anything you spot in an arbitrarily-wrapped 4-colour mapping of bases is so far separated from a meaningful biological message that the site as it stands is just a bit less interesting than zooming in on bits of the Mandelbrot set. FRACTINT, anyone?
-
Genetic data has always been publicly available!
All available genetic data (and protein data) from every sequenced organism has always been publicly available. Whether it's due to requirements by publishers of the journals that they publish their analysis in, a requirement of their funding agencies, or for the mere goal of sharing their data with the global scientific community.
Gene sequence databases have been around since 1981:
EMBL: http://www.ebi.ac.uk/embl/
GenBank: http://www.ncbi.nlm.nih.gov/
DDBJ: http://www.ddbj.nig.ac.jp/
HUGO: http://www.gene.ucl.ac.uk/nomenclature/
JGI: http://www.jgi.doe.gov/
Protein sequence/structure data is also publicly available:
Expasy: http://ca.expasy.org/
PDB: http://www.pdb.org/
Their statement "Google is guilty of biopiracy because a searchable database could make it easier for private genetic information to be abused" is flawed on many levels.. and is merely an attempt at media hype.
A - If the genetic data is private (ie. industry funded and not shared with the global scientific community), how will Google get access to it?
B - Searchable databases that contain private/public genetic information have existed since before most other types of searchable databases.
C - Sharing data from biological analyses (whether genetic sequence data, protein sequence data, gene expression data, protein structure data, etc.) is an important aspect of understanding the underlying mechanisms of biological systems.
Many of the medical advances that we've seen these past couple decades have resulted directly from the fact that biological data has been publicly available... facilitating collaborations beyond borders and beyond disciplines.
I look forward to Google's role in facilitating access to this information, and look forward to applying it in future research projects.
Ryan -
Re:So which programs do you use?
I am not sure why simply because it is about one of many available tools, the post is out of place on Slashdot. I am not a member of a huge biochem or medical lab, but I am trying to learn and use biochemistry, so I can use every bit of help.
It's out of place because the announcement is somewhat akin to posting a front page article when some guy releases version 0.1 of a new text editor onto Sourceforge. It's been done a million times before, and it doesn't cover any new ground. It isn't even interesting to people who don't use text editors.
That said, if you're really trying to get a handle on biochem and molecular biology (and the bioinformatics that goes along with it), almost all up to date textbooks on the subject include a section (or more) on bioinformatics. In 2006, knowing how to perform basic analysis on your DNA or protein sequence is just about as important as understanding the concept of a gene, or how the complementary nature of DNA works. If the textbooks you currently have are a little out of date, take a look around the library and grab something more recent. There are also plenty of bioinformatics and sequence analysis textbooks on the shelves now.
If you're looking for some places to get started, (and I think someone has already mentioned these), try ExPASy . Although it's more protein oriented, it has an extensive list of links to a very broad cross-section of bioinformatics and sequence analysis tools (along with some tutorials). Also take a look at NCBI, which not only has a range of important tools (like BLAST), but also PubMed. In a similar vein, also explore the EBI site which has another extensive set of tools and databases.
Since you ask, some of the stuff that I commonly use for bog-standard molecular biology tasks (in addition to the links above) includes PlasMapper (finds restriction sites and generates tasteful plasmid maps) and the New England Biolabs site which has some similar tools (NEBcutter, for example), but also handy information on all the restriction enzymes themselves.
If you're into writing bioinformatics applications yourself, start by looking at something like BioPerl. Just using Perl as an example (since it's very popular in biology), there are pre-existing libraries, all fully open sourced and Free(tm), which do things like reverse translation and interfacing with analysis tools like BLAST already.
That's just the tip of the iceberg. Anyone getting started in molecular biology will discover these kinds of sites very quickly. They're mentioned in the textbooks, they're easily found with Google, and they'll be revealed after a 2 minute conversation with anyone working in the field. That's what make this story so pointless. There's nothing new here. It's all been done before, and done 500 times before at that. Even outsiders from other sciences will discover this kind of stuff within a day or two if they're actually serious. -
Re:My Car Gets Forty Rod to the Hogsgead
-
Two-way transcription should present a puzzleAccording to the NCBI, the chromosome for M. Genitalium is circular. Some proteins are produced by transcribing one way around the circle, and others are produced by transcibing the other way.
My question is, does the DNA encoding the conterclockwise proteins overlap with the DNA encoding the clockwise proteins? If so, then you can't rip out one without damaging the other. I randomly looked at a few by clicking on the aforementioned link and I did see some overlaps; for example, MG264 and MG265 overlap.
According to GeneQuiz, the entire genome of this creature is only 0.58 Mb (which I presume stands for mega-bases). About 3/4 of the genes have guesses about their function, to varying degrees of certainty.
It's also interesting that this bacterium uses a non-standard transcription. The latter reference above says "UGA, normally a stop codon, in this organism encodes for the amino acid tryptophan.". Does anyone know how common this is?
-
Mammoth DNA
There is only 115 streches of DNA that are known in public databases. Most of these are not that interesting if you want to make a clone. So there is still a long way to go.
-
Re:Failed pedanticism
Usage as I stated seems to be quite prevalent, regardless of your assertion that my information is out of date.Please see:
- This UK page (now moved here
- This UK news site A quote:
Ten thousand million nucleotides The number of nucleotides in the EMBL Database has now exceeded 10,000,000,000.
Seems to indicate that 10^10 = 10 thousand million. - This site, With quote:
Despite this, the U.S. meaning is still rare outside journalism and finance, its introduction having served merely to create confusion. Throughout the U.K., a common response to the question "What do you understand by 'a billion'?" would be: "Well, I mean a million million, but I often don't know what other people mean." Few schoolchildren are confident of the meaning, though, again, 10^12 seems to be preferred. Many well-educated adults, aware of both meanings, either avoid the term altogether or use it only in the unambiguous phrases "English billion" and "American billion". English-speaking South Africans, Australians, and New Zealanders are similarly reluctant to use a term that has become ambiguous.
Scientists have long preferred to express numbers in figures rather than in words, so it is easy to avoid "billion" in contexts where precision is required. The plural is still used freely with the colloquial meaning of "a very large number".
Publications consulted: OED, Editions 1 and 2. Robert, Dictionnaire historique de la langue francaise. P Pamart, "A propos d'une reforme des mesures legales", in "Vie et Langage", (125)1962, pp 435-437.
-
Re:what's left?
my remaining options are perl, tcl or awk.... hmmm.
Too late for perl as well.
PerlOS is already well underway. -
Re:Easy solutionAll we need is to plow some of our considerable energies into genetically engineering a giant monster Tux
Unfortunately, (as can be seen here) only very few stretches of DNA are known for penguins!
-
Re:Genome Sizes.
Haploid Genome Sizes (collected from various sources):
A more comprehensive list of genome sizes is here:
http://www.cbs.dtu.dk/databases/DOGS/abbr_table.by size.txt.
These pages show how much of each organism is finished and publically available:
http://www.ebi.ac.uk/~sterk/genome-MOT/MOTgraph.ht ml
http://www3.ebi.ac.uk/Services/DBStats/
Arabidopsis thaliana: 1.17 x 10^8 bp, ~25,000 genes.
25000 genes is near the low end of the range for the estimates of the number of genes in the human genome:
http://www.ensembl.org/Genesweep/ -
Re:Genome Sizes.
Haploid Genome Sizes (collected from various sources):
A more comprehensive list of genome sizes is here:
http://www.cbs.dtu.dk/databases/DOGS/abbr_table.by size.txt.
These pages show how much of each organism is finished and publically available:
http://www.ebi.ac.uk/~sterk/genome-MOT/MOTgraph.ht ml
http://www3.ebi.ac.uk/Services/DBStats/
Arabidopsis thaliana: 1.17 x 10^8 bp, ~25,000 genes.
25000 genes is near the low end of the range for the estimates of the number of genes in the human genome:
http://www.ensembl.org/Genesweep/ -
Re:Okay, that's it!
-
Re:Grrr.
It is open source
:) thats the point of the Human Genome Project. www.ncbi.nlm.nih.gov www.ebi.ac.uk -
This is for MODELING
Does anyone even bother reading the article? This is to use open source to develop further molecular modelling software. Of course, such software would be useful for nanotech, but that's not the point. This type of technology already exists, see for example the Catalogue of Molecular Biology Programs, some of which are open source, like Garlic, and MMTK. The actual creation of nanotech can't be open sourced, since the requirement to create it can not be bought off the shelf. (Well, if you have a few million, you probably could buy it.) The primary prerequisite for open source research is that the materials are relatively cheaply and easily available to the general public. Thalia
-
Re:What Possible Use Would Anybody Have For This?
There are a TON of freely available bioinformatics type programs available, a good start is to browse Biocat. Of course to get any use out of these programs you should have some knowledge of biology & computational biology. Traditionally academic software is very open, though not GPL (I'm beginning to hate all you GPL-wanting whining fuckers (this is not necessarily directed to the poster I am replying to)). An unfortunate trend as of late is servers which provide an application but no binaries to run locally, and no code. Not very scientific if you ask me. Also not helped by the GPL.
one day not too far away kids in highschool will do lab exercises in biology class that involve cloning genes and so forth
In high school I took Advanced Placement Biology (suppose to be equiv to an intro college course) and one of the labs was to introduce a plasmid into E. Coli so that it became immune to ampicyllin, an anti-biotic. Genetic experiments are definitely possible at the high school level, it's just a matter of getting the expensive machines and specialized knowledge. Maybe a schoold district could put its money into a couple PCR machines and a knowledgable lab tech?
-
Re:You can't write an OS in Perl so ...
-
Pitiful...
It is truly pitiful that a grad student is asking slashdot for advice. You would be much better off walking down the hall or waiting for a department meeting and ask people what they use. Read papers and see what packages they use. I'll give you advice anyway, because I feel sorry for you. A place to start would be Biocat.
-
Databases are already open to all
-
Databases are already open to all
-
Re:Open source, patents and scientific community
Thanks for posting this. I agree that the biological side of things could be improved. There are quite a lot of important Open/Free bio-informatics sites and projects. I know that the EBI is pretty committed and there are links available through the Scientific Applications on Linux site. Perhaps it's such a large area that it would be a good thing to have an independent
/.-like site that provided forums for these discussions - taking sci.molbio etc to a nicer medium, it could allow for sharing of graphics which would help some awkward discussions. -
Re:Open source, patents and scientific community
Thanks for posting this. I agree that the biological side of things could be improved. There are quite a lot of important Open/Free bio-informatics sites and projects. I know that the EBI is pretty committed and there are links available through
-
Re:warm and fuzzyHardware at the moment generally are clusters of alpha boxes or intel boxes (running tru64 or linux respectively).
The two big drainers on CPU for analysis are gene prediction (genscan) and database searching (blast). database searching can't be distributed easily as you have to worry about the database
;)However, there are programs like sim4, genewise and est2genome that could greatly help us and could be distributed.
Genewise - you can download (I wrote it) at Wise2 est2genome is somewhere around as well.
For the more general overview of the problem - check out ensembl for an idea of the project.
-
Re:This was my idea.
I actually submitted this a few weeks ago, but with the huge anmout of submissions, things tend to take a while to filter through the system
:)
I've had some email from Ewan Birney at ensembl about doing this but it seems they lack experience of client coding! I personally no nothing about that at all, I'm a bender of metals and I can just about write html on a good day. If anyone has any help to offer, you could visit their webpage....... I've not added his email address in case he's paranoid, but I can forward stuff to him :)
Cheers
Troc -
Open Source Genome ProjectsThere are some good open source genome projects for doing this efficiently - and we do welcome help of any kind. Here are some open source projects which I know about/work on/
- ensembl is an open source genome project designed to get as much data and software into the public domain as possible
- EMBOSS
- bioperl
Anyway - check out these projects for more information about real open source efforts in biology.
-
open source Human Genome annotation project
Ewan Birney, bio.perl.org hacker extrordinaire is heading up a new effort called ensEMBL which is intended to provide a free and open "baseline" annotation of the human genome. You can find more info at http://ensembl.ebi.ac.uk.