Domain: bioinformatics.org
Stories and comments across the archive that link to bioinformatics.org.
Comments · 45
-
some good PHP-based libraries like htmlawed
There are some good PHP-based libraries like htmLawed to protect against such issues.
-
Re:Command line is more error-prone
You can use pipes, loops etc. in a GUI as well. What can you really do with a CLI?
* Activate a built-in command
* Launch a program
* Specify arguments and input to programs
* Pipe the output of programs to other destinationsYou can do all those things graphically. These guys made a GUI pipeline:
http://bioinformatics.org/loci/screenshots/
Whether it's faster/easier/"better" than command line is another question, but it is obviously possible.
-
Yes
There are quite a few open source projects on bioinformatics.org. Some of these are little more than quick command line tools. Others are entire frameworks. Personally, I use the following tools on a regular basis. Bioconductor (with R), EMBOSS, Primer3, and ImageJ.
-
I, Librarian seems pretty closeWhat the submitter needs (and I also need) is an organizer for scientific papers with an interface for standard fields such as authors, journal, title, doi, http links etc. I, Librarian seems to fulfill this need; unfortunately with direct interfaces (for retrieving pdf and meta information at the same time) only with pubmed.
If anybody knew of (or planned for) an adaptation to physics (with interfaces to arXiv.org, the APS journals and ideally other journals), I would be very interested (even as a paying customer).
-
I am a Mac BioTech developerIn my experience, there isn't a single place to find Mac OS developers.
Posting a job opening or project on rentacoder.com or dice.com is very often like looking for a needle in a haystack.
Two groups that specialize in Mac OS development jobs are:- Yahoo - mac-dev-jobs
- LinkedIn - Mac Developer Jobs
As to programming rates, it varies with experience and which part of the world you are dealing with. If you are dealing with programmers in the USA, you will have to pay higher rates for programmers working on the East or West coast because the cost of living is higher. Don't expect experienced programmers to work cheap either! An experienced programmer with 5-10 years of experience will start at $50/hour, with a typical rate of $75-$100 depending on project length and difficulty.
Remember, a good experienced programmer will do the job right the first time. An inexperienced programmer will sometimes take several tries to complete that task and the resulting program will be fragile and difficult to maintain.
A quick check for determining programming experience is to get a development estimate for your project specification. Give the programmer a complete project specification (including screen mock-ups) and have them give you a project development estimate. An inexperienced programmer will typically under-estimate the time and difficulty of the project.
If you are developing "general-purpose, scientific programs developed and released as open source", you should check the BioCocoa site to see if your project can leverage work already done there. If you can do the work in Java, the BioJava project is a good place to look for BioTech related libraries.
Another good place to find more information about doing scientific research using Mac OS X is the Mac Research web site. -
Some starting points
Here are some resources that might help you out.
Overview of the field:
http://bioinformatics.sdsu.edu/education.htm
News:
http://news.thinkgene.com/
http://www.bioinformatics.org/
Org:
http://www.iscb.org/
I assume you've crawled through Wikipedia already -- they break it down pretty well. Also, remember that a startup can be anything, not just a specific kind of work like you've seen in school. It's common in the biotech industry to make your career out of a string of startups; you'll get a pike of options from each, and most won't go anywhere, but one or two probably will, and you can benefit from that even after you've left. -
Let's get serious,
Five minutes of thoughtful searching brought up useful, important information for anybody willing to take these sciences and technologies seriously. The National Institute of Health (NIH) stem cell page has some paper abstracts as well as listed universities with programs in these United States (and some online resources). Useful sources of information at this bibliography re: human reproductive cloning, at Boston University and this one. CiteSeer popped up the paper on nuclear transfer / human cloning. Apparently there's at least one dedicated research foundation out there.
Granted, most of these links are preliminary- check those deep databases, like over at PubMed Central, for those detailed reviews of the state of the art. And just for kicks, one last link which (still) impresses me. -
I'll stick to SMS2
At first blush, GeneDesign 2.0 offers nothing over the long-available, free, web-based or local-mirrorable Sequence Manipulation Suite 2 at http://bioinformatics.org/sms2/. When I start on a molecular bio project, I use a mix of SMS2, BLAST, NEB cutter, IDT's web-tools, and other free online tools to accomplish everything I need, and keep track of my thought process in a simple Word document. This suite adds no functionality I don't have free access to already elsewhere.
-
Get rid of those Black&White goggles
It's not really surprising how deeply rooted the "virus equals evil" idea is in most people, but some of the comments here are nothing more than FUD.
First of all: this virus was never "eradicated". There was no vaccine, no miracle drug. Influenza is an RNA virus, so it mutates very quickly. Many of today's influenza viruses are actually descendants of the spanish flu. And they're usually more successful because they're not very deadly.
1918 was just the perfect time for a deadly pandemic. The economy was a mess, there was a major war going on and people were hungry all over the Western world. The spanish flu spread like wildfire and infected virtually everyone, but the human species is genetically diverse enough to (as a whole) survive even the worst viruses. For one, there are hundreds of different versions of Major Histocompatibily Complexes among us.
You have one, 99% chance your neighbour has another, each with a different specialization. The spanish flu probably killed a lot of people with the MHC especially unsuited for fighting it.
MHC evolution and genetic diversity
It may sound ridiculous, but viruses of the past likely made us human in the first place. The human genome contains proteins that might have been useful to survive prehistoric plagues, but perform different functions today. For examply, you might have a protein that happens to bind rather strongly to a viral anchor protein from 900 million years ago, but with a minor mutation it might just as well facilitate mammalian cellular respiration.
And then there's retroviruses. Since those can be inherited through infected germline cells, it can become very interesting for a virus to become mutually symbiotic (a dead host doesn't reproduce all that well, now does it).
Chimp caught a virus, became human
Some environmentalists like to call the human species a virus. They may be right in an entirely unintended way: a substantial part of our genome consists of viral DNA.
I for one welcome our old virus overlords. -
Perl is not optional
If you have a real interest in bioinformatics, I cannot stress enough that you should learn Perl. Even if you are a biologist by background, Perl is not like Java or C, and stresses more on getting things done rather than on abstract computer science concepts.
Once you learn Perl, using something like BioJava will give you all you need to handle sequence data. For instance, you could build a data pipeline that you use on all of your sequences of interest, instead of a graphical tool which pretty much forces you to do alignments and such one at a time.
Now there are some tasks that will require a graphical tool (editing alignments is an example), and one free tool you could use is JaMBW. There is also a list of open bioinformatics software for Linux (generally will be Java or Perl, occasionally C) hosted at Bioinformatics.org.
-
Perl is not optional
If you have a real interest in bioinformatics, I cannot stress enough that you should learn Perl. Even if you are a biologist by background, Perl is not like Java or C, and stresses more on getting things done rather than on abstract computer science concepts.
Once you learn Perl, using something like BioJava will give you all you need to handle sequence data. For instance, you could build a data pipeline that you use on all of your sequences of interest, instead of a graphical tool which pretty much forces you to do alignments and such one at a time.
Now there are some tasks that will require a graphical tool (editing alignments is an example), and one free tool you could use is JaMBW. There is also a list of open bioinformatics software for Linux (generally will be Java or Perl, occasionally C) hosted at Bioinformatics.org.
-
Re:Innovators?I really hate this kind of reasoning because it makes the reasoner unwilling to accept anything open source as innovation. A similar argument is often used in AI -- since many people define intelligence as "that which sets humans apart", if a computer can do it using simple math, it's not intelligence. AI is defined as making computers do that which computers can't do, so nothing remains AI for long.
I've collected a list of Open Source projects that display innovation for situations like this. Here's the best ones:
- Dashboard
- Piper for a while was trying to implement an entire new Unix desktop based on GUI-based command-line scripting, but never quite got off the ground, and eventually abandoned the idea.
- Knoppix and other liveCDs are innovative -- an entire operating system on a CD-ROM! -- though you might quibble with "prior art" in the form of boot disks that you'd use to play your DOS games. They didn't have entire filesystems on them, though, so I'd maintain that this was innovation. A Windows liveCD exists in a primitive form somewhere, I think, but I don't know anything about it.
- gaim and other pluggable communication programs -- Firefox and xchat spring to mind -- are very useful, and you can probably find a plugin on one of those programs that does what you want. To my knowledge, the furthest the proprietary world got in this direction was skinning, but I could be wrong.
- Also in this vein is KDE, specifically the use of DCOP to help automate GUI tasks. DCOP isn't very well known and you have to discover it, but it can be very useful.
- GNU Screen, to my knowledge, is one-of-a-kind software, though you might cite inspiration in terms of VNC programs, which I don't know much about.
- I believe the concept of numerous virtual terminals on the same physical terminal (ie. Alt-F1, Alt-F2) is not only unique to OSS, but unique to Linux.
Ethan
-
in biology it happens too...
Today biology heavily depends on specific software to analyse lab generated data. However, even academic, public funded software are not open-source. It's a sad situation, but there are efforts like Bioinformatics.Org trying to change the situation.
-
Bioinformatics.org
Many of these open-source projects are hosted at bioinformatics.org. The site also contains great information in the FAQ such as definitions of bioinformatics, colleges and universities that offer programs, both undergraduate and graduate level, in bioinformatics, and discussion of skills required or suggested for the area of study.
-
Bioinformatics.org
Many of these open-source projects are hosted at bioinformatics.org. The site also contains great information in the FAQ such as definitions of bioinformatics, colleges and universities that offer programs, both undergraduate and graduate level, in bioinformatics, and discussion of skills required or suggested for the area of study.
-
Amazing
This is the third or fourth time BIOS has been mentioned on Slashdot, and they haven't even gotten started. Their BioForge idea is not any different from what Bioinformatics.Org has been doing for years, but BioForge has no projects -- Bioinformatics.Org has 192. And the latter has not just Open Source Software projects but Open Access databases and educational websites too.
BIOS may be somewhat different by directly addressing patents, but, again, there's not much on their website.
-
bioinformatics.org?
Doesn't this mostly just duplicate the efforts of bioinformatics.org?
"The Bioinformatics Organization, Inc. (Bioinformatics.Org) was founded to facilitate world-wide communications and collaborations between practicing and neophyte bioinformatic scientists and technicians. The Organization provides these individuals, as well as the public at large, free and open access to methods and materials for and from scientific research, software development, and education. We advocate and promote freedom and openness in the field as well as provide a forum for activities which facilitate the development of such resources."
This is just another example of someone trying to carve out a niche in the "hot" area of bioinformatics - the same way as this profusion of Live-CD's for Bioinformatics. It seems to me it's all quite divisive. Bioinformatics models itself on the OSS movement for the most part, but its inherent bindings with industry means there seems to be a lot of people trying to make names for themselves with "projects" even if it means duplicating the effort of someone else.
(Yes I am a bioinformatician).. -
Well it's ONE view on Grid ComputingThis is meant to be a primer, and it just about "primes" the debate on Grid Computing.
The grid discussed here seems only to be built on the OGSA and Globus Toolkit, and Globus has not really covered itself in glory with their poor UIs etc.
Grid seems to address occasional demand for "much more power" from your computing resource, but does not really provide a consistent flexible computing resource.
The academic world uses External Grids to pool resources but private Enterprise has little to gain from these External Grids in exchange for a HUGE security problem.
And Internal Grids? These are so immature as to beggar belief. Why risk investing in these configurations when bang per buck is so uninviting.
/joelethan -
Bioinformatics linksYesterday wrapped up over a week of intense Bioinformatics seminars, poster sessions, exhibitions, and brainbusting studying at Bio Japan in Tokyo and related links. I just saw a presentation on the H-Invitational database which though in Japan also combines the content of foreign databases. It is extremely impressive, and they combine lots of online calculators and results visualizers that are really impressive.
Also figuring out biology seems to be a lot harder than figuring out networking, at least there are all kinds of nefarious things but also serendipitous things found. Like one presentation I just heard had a U.S. scientist who announced that they had discovered an entire signalling network in human cells that was like the one found in yeast cells. And apparently more proteins can be encoded than the number of genes, because of alternate orderings (counting from different displacements in the gene, I think, ask a real bioinformatics expert). One talk I heard a year ago that stuck with me was a scientist who had devised a way to find signalling pathways in cells quickly; by forcing the cell to die if certain requirements were not met, he created a parallel computer that allowed him to discover a whole swath at once. There is also a lot of math and statistics, as well as a lot of biological knowledge behind it, it is not strange to see various statistical tests, references to different computer programs they used for analysis, or a mention of simulated annealing (well maybe that one not so often, came up yesterday though).
One interesting thing is that they (the H-Invitational people / Japan Bioinformatics Consortium) have I believe twice held what they call annotation jamborees, much like a hackfest! In 2002 they had 120 scientists gather (mostly Japan but from all over the world) in a big room with a computer per person. They locked them in for 10 days, and annotated IIRC over 20,000 genes, basically doing a figure some man years of work in a week, inputting data so it can be searched, analyzed, and crossreferenced.
They do have a comparison between mouse and human genome there, I wonder if something similar could be done in open source in terms of annotating and indexing a libary of open source code in different languages, really all in one pseudo language would be more useful perhaps. Anyway biologists are learning from computer scientists learning from mathematicians, and someone famous has said that in the future, all science will be computer science.
Bioinformatics people are doing text mining and data mining, but also there are many flavors and types of analysis programs designed to penetrate and match up information as encoded by tiny molecules, folded proteins, genes, and so on. Here are some links to get started. Also note the perl for bioinformatics books, and there was a big oreilly bioinformatics conference archived from 2003 and other links too (see bio.oreilly.org link below).
I cannot speak for everyone, but I can convey what I have heard, that there have long been communication gaps that have held back some of this, actually cultural differences. For example physicists like pure math and biologists deal in dirty, wet things.. when people successfully combine different perspectives in this area [more] discoveries start getting made. In Japan at least they are trying to figure out how to grow more bioinformaticists, since students tend to go only towards either biology or towards computer science (why study twice as hard). But there seems to be a lot of interesting stuff in there for both sides.
-
Idea not new...Lincoln Stein, a very famous bioinformatics programmer and creator of a vast amount of perl modules like CGI and GD, gave a talk about this last month.
The talk was his acceptance speech for the 2004 Benjamin Franklin Award at the BioIT-World conference.
The award was presented by bioinformatics.org. In his speech Lincoln talked about essentially open sourcing the R&D process and leaving the manufacturing and distribution to big Pharma. Thus, in theory, allowing academic R&D to push new drugs towards current public health concerns versus the money making drugs big Pharma produces now.
Not likely to happen but interesting to think about.
-
Check out bioinformatics.org...
....i.e., right here. Looks sort of GForge-ish, although with frames and a custom theme and such-like...
-
Re:Out of bad things to say
That's already happening: Genetic Technologies, a company that patented the "junk DNA" is called the "SCO Group of biotech" in this article.
-
Getting to be a crowded market:
-
"Scientific Applications on Linux" page...
It's not 'hard numbers', but then, a lot of people have already pointed out that hard numbers may not REALLY be what you want. (After all, since when is "Everybody's doin' it" a persuasive argument for a good scientist?)
On the other hand, I see there are still lots of applications listed at the Scientific Applications on Linux site and the NCBI Toolbox of Bioinformatics code compiles and runs just fine on my linux box, and BioPerl, BioJava, and BioPython all run just fine on Linux (there are even a couple of fledgling BioPHP projects out just getting started out there, which will obviously also work.
Disclaimer - both of the semi-active "BioPHP" type projects that I know of - Here and here - were started independently by individual amateurs...and one of them is me. Both projects are still in the early stages (Genephp has more code available at the moment) and have different development approaches, but are slowly working on trying to combine development towards a 'formal' set of "BioPHP" modules. Blatant plug - if you are interested in helping with friendly advice or actual development or testing, please join the mailing list which both projects use)
-
"Scientific Applications on Linux" page...
It's not 'hard numbers', but then, a lot of people have already pointed out that hard numbers may not REALLY be what you want. (After all, since when is "Everybody's doin' it" a persuasive argument for a good scientist?)
On the other hand, I see there are still lots of applications listed at the Scientific Applications on Linux site and the NCBI Toolbox of Bioinformatics code compiles and runs just fine on my linux box, and BioPerl, BioJava, and BioPython all run just fine on Linux (there are even a couple of fledgling BioPHP projects out just getting started out there, which will obviously also work.
Disclaimer - both of the semi-active "BioPHP" type projects that I know of - Here and here - were started independently by individual amateurs...and one of them is me. Both projects are still in the early stages (Genephp has more code available at the moment) and have different development approaches, but are slowly working on trying to combine development towards a 'formal' set of "BioPHP" modules. Blatant plug - if you are interested in helping with friendly advice or actual development or testing, please join the mailing list which both projects use)
-
Re:Red Hat is Headed for ExtinctionI recently evaluated several linux distros for our beowulf cluster, and we chose Rocks Linux. This OS is designed specifically to make it dead easy to setup clusters. Rocks is built on top of stock redhat. They do some magic with the kickstart installer to automatically setup the compute nodes. All the information about the nodes (MAC, ip address, hostname) goes into a mysql DB.
We are actually using a derivative of rocks, called BioBrew. BioBrew also comes with software for biologists:
the NCBI toolkit, BLAST, mpiBLAST, HMMER, ClustalW, GROMACS, PHYLIP, WISE, FASTA, and EMBOSS.
-
Re:I've long waited for thisWell I'm afraid I agree with the troll 8-) although I will put it in a less trollish way.
For a tool like a "graphic command line" to achieve widespread acceptance, it have to be both practical and easy to learn. I was thinking in something like Piper when I post the first comment. Piper has the following "paradigm" to connect different pieces:
- Everything is a component.
- Every component can accept (input) or produce (output) data, or do both.
- Components can be connected or "piped" according to their input and output of data.
- All components have a network "location." Components can therefore be refered to as "loci."
- Nodes are only represented locally, if possible.
If a project like Piper had a wrapper to access it from Karamba, for example, it would have the eye-candy and the consistent and well-known interface that it currently lacks. That kind of friendliness to end user is necessary, and I think that if Piper could be used directly from KDE it would have more acceptance.
And why do you call KDE and Gnome proprietary?
;-) They are less proprietary that Unix pipes! -
Sure
http://bioinformatics.org/
End of thread -
Bioinformatics runs on Open SourceSo don't sit there saying its hard to get into
:)
Run, don't walk to bioinformatics.org and contribute!
The first O'Reilly bioinformatics conference rocked. Shame I wont make the next one in San Diego - I get to go to Adelaide for the ISMB in June instead :) -
We Don't Have To Be
John is an accountant. He determines how his company's money adds up. That's what he went to school for, that's why he was hired.
John gets cancer. John goes in for new treatment with new cancer drug. New cancer drug was found because of Free Software written for biological research and improved upon by scientist-programmers all over the world. John's life is extended or even saved because people could all contribute to the software that researchers were able to use to make something valuable to everyone.
Sally is a housewife. She uses a computer to do things for her family. She has no time to write a driver for the new GeForce card, Jimmy's braces are way more important than some piece of software.
Jimmy's orthodontist uses a closed-source OS in the office for everything. This closed-source OS has a security hole. Not only that, but it's a known security hole that the company decided wasn't worth fixing. So even though the computer is regularly auto-updated, this hole remains unpatched because the corporation decided not to. The orthodontist's computer is broken in to and Sally's credit card information is stolen, and all the billing records for the orthodontist is stolen. This causes incredible headache for Sally over the next year or more.
We don't all have to be programmers to benefit from freedoms. We don't all have to be writers to benefit from freedom of speech, because we can all read what others have written and learn from it. We don't all have to be recluses to benefit from a right to privacy. Freedoms benefit you in more ways than you can realize, and it is a sign of enslavement when you're willing to sacrifice them for nothing. -
Open source bioinformatics tools
Get open source bioinformatics tools from:
bioinformatics.org
bioperl.org
biojava.org
and even www.cvbig.org for a talk on bioinformatics with PHP/Ming -
Additional open source bioinformatics projects
You can also find a large number of open source bioinformatics projects hosted at
Bioinformatics.org
with links to BioPerl, BioPython, BioXML, BioJava, BioCORBA, and BioRuby projects on the
lower right hand side of their page. -
Re:Too much focus on majors nowadays anyway...
I don't agree with the implication that only those who major in a broad field such as CS, English or Biology develop problem solving skills, observation skills, etc". I'm currently a Bioinformatics student at RIT, and I don't consider myself unable to solve problems or unobservant, nor do I think that I will become so after I get my diploma.
Many fields such as Biology are becoming so broad that it is impossible to have an undergraduate major that sufficiently covers all of the relevant topics. Many colleges offer degrees in Molecular Biology, Pathology, and Biotechnology for students who have research interests that they want to specialize in, or those who want jobs with pharmaceutical companies. Biotech and pharmaceutical companies would much rather hire someone with lab experience in Molecular Biology and a good foundation of the theory behind it than a Biology major who has their ichthyology down cold but wouldn't know a lysozyme from a solution of granzymes and perforins, much less how to use either. The development of more specific majors most likely arises from the fact that colleges realize that there is only so much that they can do. The students are allowed to specialize in things that their students will actually use in their future careers, and there is nothing that says that a technologically-geared education precludes them from having brains. -
Here's another site that follows this stuff
bioinformatics.org
They host a large number of these open-source bio software. Really worth a look if you're interested in the topic. -
Re:Graphical PipesI expect somebody can point us at a project that has already done this?
Not sure how far along it is, but a project which I've been following off-and-on for a couple months is located here , called Piper. From their page:
ABSTRACT
Piper is a peer-to-peer (P2P) distributed workflow system. It is an independent, GNU-based project which brings the power and flexibility of the GNU/UNIX command-line interface (CLI) to the graphical user interface (GUI) and Internet-distributed computing.
Networks, programs, files, widgets, and so on, can be Internet-distributed components represented in a GUI as the nodes of a flow chart. The user can join nodes via lines that depict links for data flow, procedural steps, relationships, and so forth.
-
PiperPiper is a peer-to-peer distributed workflow system that brings the UNIX paradigm to the GUI and GUI features to CLI programs.
It has been called an "Open Source alternative to
.NET", although it is by no means a clone. Rather, it focuses on extending existing UNIX features and programs to the Internet, where they haven't been before.Perhaps we don't need a clone, just as Linux is not a clone of Windows. And it's a good thing it is not.
Here are some articles and mentions of Piper:
Gnome Gnotices (It's interesting to note that the article first posted there referred to Piper as an alternative to
.Net. The moderator later changed that. Paranoid minds, such as mine, wonder about this and the future intentions of GNOME with respect to .Net.)And some other online magazines/forums:
--
This sort of thing has cropped up before. And it has always been due to human error. -
Conference on Open-Access PublicationsHere are some of the issues that scientists (and publishers) are dealing with:
- Copyright on scientific communications (published articles and so forth) belongs to publishing companies and not to authors, for most publications. Scientists wishing to share relevant communications, even their own in some cases, face legal challenges from publishers.
- Publishing companies charge expensive subscriptions to access scientific communications. Scientists in developing countries and poorly-endowed institutions, although intellectually on par with their peers, are severely hindered by this.
- These two problems have prevented scientists from gaining any access, even for simple searches, to the full text of these communications.
- Scientific communications are published in journals segregated by topic. This has resulted in confusion as to the best place to publish, retrieve or extract (using computer automation) information (e.g., mathematical biology communications could be published in either a mathematical journal or a biological one).
- Communications are also published in journals differing by publisher. This has caused the segregation of communications by the prestige of the journal (e.g., how difficult it is to be published in the journal and the composition of the readership). This has also allowed room for personal politics in scientific communication.
- These two problems are compounded by the first two: with a limited budget, to which journals should one subscribe? What we are left with is an artificial selection, by publishers, of which communications are best suited to a scientist's field of study.
- This may be the result of a competitive marketplace for readership, but is there an alternative to profit-based publications? Should there be? Can an alternative publication model be profitable for a publisher?
- Additionally, even with the advent of computers, databases, and the World Wide Web, scientific communications are published as they were 100 years ago: as linear, printable text. And they are archived this way. While this makes good reading, it is not the best format for information retrieval or extraction.
- All of these problems restrict information retrieval, extraction, and scientific inquiry. How do we resolve them? As the ultimate solution, should future communications be published in an "open-access, global knowledge-base"? Before or after information extraction techniques are applied?
Bioinformatics.org, an organization committed to freedom and openness in the field of bioinformatics (a very commercial field), is hosting a joint conference on open-access publications and informatiion extraction in the biological sciences. We have sought several speakers who can address how the above problems might be solved. They come from the Public Library of Science, BioMed Central, and PubGene (mentioned on Slashdot before).
The conference will be in Copenhagen, Denmark, and there is room for more attendees. The first 50 can in fact register for free.
--
This sort of thing has cropped up before. And it has always been due to human error. -
Conference on Open-Access PublicationsHere are some of the issues that scientists (and publishers) are dealing with:
- Copyright on scientific communications (published articles and so forth) belongs to publishing companies and not to authors, for most publications. Scientists wishing to share relevant communications, even their own in some cases, face legal challenges from publishers.
- Publishing companies charge expensive subscriptions to access scientific communications. Scientists in developing countries and poorly-endowed institutions, although intellectually on par with their peers, are severely hindered by this.
- These two problems have prevented scientists from gaining any access, even for simple searches, to the full text of these communications.
- Scientific communications are published in journals segregated by topic. This has resulted in confusion as to the best place to publish, retrieve or extract (using computer automation) information (e.g., mathematical biology communications could be published in either a mathematical journal or a biological one).
- Communications are also published in journals differing by publisher. This has caused the segregation of communications by the prestige of the journal (e.g., how difficult it is to be published in the journal and the composition of the readership). This has also allowed room for personal politics in scientific communication.
- These two problems are compounded by the first two: with a limited budget, to which journals should one subscribe? What we are left with is an artificial selection, by publishers, of which communications are best suited to a scientist's field of study.
- This may be the result of a competitive marketplace for readership, but is there an alternative to profit-based publications? Should there be? Can an alternative publication model be profitable for a publisher?
- Additionally, even with the advent of computers, databases, and the World Wide Web, scientific communications are published as they were 100 years ago: as linear, printable text. And they are archived this way. While this makes good reading, it is not the best format for information retrieval or extraction.
- All of these problems restrict information retrieval, extraction, and scientific inquiry. How do we resolve them? As the ultimate solution, should future communications be published in an "open-access, global knowledge-base"? Before or after information extraction techniques are applied?
Bioinformatics.org, an organization committed to freedom and openness in the field of bioinformatics (a very commercial field), is hosting a joint conference on open-access publications and informatiion extraction in the biological sciences. We have sought several speakers who can address how the above problems might be solved. They come from the Public Library of Science, BioMed Central, and PubGene (mentioned on Slashdot before).
The conference will be in Copenhagen, Denmark, and there is room for more attendees. The first 50 can in fact register for free.
--
This sort of thing has cropped up before. And it has always been due to human error. -
Open Source BioinformaticsSome people in the field are now releasing their software under Free/Open Source licenses. It may seem odd to non-scientists that the license is an issue. Isn't all scientific work free and open? Far from it, especially in bioinformatics, where, as you may have read, there is a lot of money involved.
A couple organizations have taken it upon themselves to promote freedom and openness in bioinformatics. One, Bioinformatics.org, has a modified version of SourceForge so that the community can perform project management and collaborations on a community-run website. Bioinformatics.org has other services, such as website hosting, news forums, a software registry and repository, and more to come. The organization currently hosts 27 projects and has over 600 members. (Disclaimer: I am the Director of the organization.)
Another organization, The Open Bioinformatics Foundation, supports the development of several language libraries for bioinformatics, such as the famous BioPerl. They also host the BOSC conference mentioned in the post.
--
This sort of thing has cropped up before. And it has always been due to human error. -
Re:Licensing?
Sadly, given Celera's past history, it will almost certainly be proprietary. Although they have benefited immensely from government funded research and data collection, they have refused to make their sequence data publicly available in GenBank. Most journals require you to publish your sequence data in GenBank as a condition for publication of papers related to the sequence data. Celera was granted a special exemption to this policy by Science when they published their paper on the human genome recently and I anticipate a similar special exemption will be allowed for the mouse data as well, though I haven't closely followed what's going on with the mouse genome, since I work on Acetabularia Acetabulum (this is my professor's web page, not mine, the views expressed here are not
...and so on)
If you want to analyze publicly available gene sequence data, you can use GenBank at NCBI and software from Bioinformatics.org. There is also a great directory of online molecular biology tools and information here
-
PiperJXTA looks very much like the Piper project, mentioned on Slashdot before. Piper is developing a text shell very much like JXTA, but the big distinction is Piper's "connect-the-dots" GUI, which brings command-line functionality to the GUI -- unlike most modern GUI's, which are really Apple Lisa work-alikes.
Piper is licensed under the GNU LGPL and is a merger between several GNU-licensed projects. It's community-developed, and the programmers are the copyright holders. It's not controlled by a big corp with big name programmers making big bucks. Stop by and lend a hand if you'd like.
--
This sort of thing has cropped up before. And it has always been due to human error. -
Command Compilation in PiperFor a document more directly related to this article (using Piper as a better UNIX GUI), try this link:
http://www.bioinformatics.org/piper/documentation
/ command-compilation.html
--
This sort of thing has cropped up before. And it has always been due to human error. -
Re:"P2P" my ass...Uh hummm
...Tom gave a rather interesting talk at a BioInformatics Open Source Conference (same mob that's into www.bio{perl|java|xml|python}.org and generic tools for hacking the genome) a month ago where he did discuss some of the relevance of peer-peer. The essence of peer-peer is an basically lack of centralised control (something that quite rightly annoys corporations) and dynamic reconnectivity (create new services by adapting old). Since I was there, I've scribbled down a transcript of his talk which may be of some interest (caveat
... it's released under OpenContent but Tom should be given right of first proof to make sure I didn't take down his words in vain :-) so treat it as rough working notes until then). Basically we had the old point-point connectivity (think 1-1 e.g. ftp) of the old days, then the client-server paradigm (think 1-n e.g. http) currently. Now we have an arbitrary n-n connection pattern where the programming style is not as clear. Different services have different patterns of usage and new protocols/frameworks are currently being explored like BXXP. However, the value proposition is not gated communities (aka portals) but how many other groups find your services valuable (ie commons). You can't churn users through a limited set of data portals (cough*hotmail*cough) and influence/restrict their movement. Remember the basics of commerce is built upon the premise of an economic good which is excludable and rivalable and peer-peer sorta tweaks that model quite seriously (hard to stop another peer replicating your "stuff"). This becomes a little more interesting when you're trying to search a couple of hundred terabytes of gene annotations, ESTs, microarray data, etc. as you want to combine both completeness (to maximise success) and minimal covering set (to save costs).Why are the big names interested? As ever, they want new drivers of growth (notice the PC market is becoming saturated). As for the buzzword du jour crowd, well that's what a cluebat is for
:-).LL
-
I know of at least two X11 implementations...There's Piper, which is exactly what you are talking about... I think it assumes that programs always have stdin and stdout, and you simply draw a data flow diagram showing the connections between them. And then you can save a configuration for later use.
The older thing like this which I know of, is Khoros, a library of image processing utilities, which you could wire together with Cantata. However cantata allows each module to have multiple inputs and outputs (for example, there might be a module which takes two images and blends them together). Seems like a really good idea to me, and I think the GUI could be reused for other tasks besides image processing, because cantata is only a shell for starting up multiple processes and connecting their inputs and outputs together. Khoral Research tries to make money off this product, but it looks like you can still download some stuff from ftp://ftp.khoral.com/pub/khoros/. I was running it on my Linux box in 1996 or so, and prior to that, had used it at ASU on a Sun.
-
more on the genome rape and pillageWe've been following stories on how greedy corporations are trying to lay claim to the information in your genome. You can find excerpts and links at...
BIOINFORMATICS.ORG: The Open Lab
BIOINFORMATICS.ORG: The Open Lab, is a non-profit, scientific organization for research, development and information projects in the field of bioinformatics (biological information). We stand for 'open-source science' or the application of open-source ideals to science. Of course, this means we're against patenting scientific information.
Jeff
--
This sort of thing has cropped up before. And it has always been due to human error.