Bioinformatics
tadghin pointed out this Newsweek article on bioinformatics, and also notes: "At O'Reilly, we just published our first bioinformatics book last week, Learning Bioinformatics Computer Skills, by Cynthia Gibas and Per Jambeck, and it immediately rocketed to the top of the Amazon Computer bestseller list. This definitely appears to be a new area for the computer industry that's just starting to hit people's radar big time. I've also made the point to VCs looking at distributed computation startups that what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems. And science looks a lot more interesting than some of the business computing that's been front and center the past couple of years. And the Biological Open Source Computing Conference I spoke at last year was definitely popping with ideas and excitement. Unfortunately, this year's conference is in Copenhagen, right before the O'Reilly open source convention, but I definitely urge slashdotters to check out this area. Demand for perl expertise is especially high."
No, what you see on sites like Slashdot is a lot of talking by bored sys admins about new and interesting problems they wish they could work on.
I haven't read the book myself, although I did know one of the authors (Per Jambeck) in grad school (in fact I still have his copy of Knuth's "The Metafont Book " if he's looking for it). I doubt the book is fluff, just not for CS folk. Like all new sciences, bioinformatics is done by people coming from other areas. If you are looking for a book about bioinformatics for CS folks who are non-biologists look at Dan Gusfield's "Algorithms on Strings, Trees, and Sequences", (1997) although it is beginning to be a bit dated.
As I mentioned in another posting, Dan Gusfield's "Algorithms on Strings, Trees and Sequences" is good, although getting a bit dated now. Another excellent book is Durbin, et al's "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids".
Take some code, say the tinest known CSS descrambler in C. Maybe compress it into a nice tight zip/.gz binary. Now convert it to a DNA sequence (It seems you could actually make a couple possible sequences by switching around the letters) I wonder what the odds are of finding one of these sequences in the billions of combinations currently being sequenced? W
-------------------
-------------------
This is my SIG. There are many like it, but this one is mine.
I happen to be involved in the biotechnology industry and I live in Philadephia, so I know a thing or two about the subject. You, on the other hand, do not. I also went to business schoool, as in finance, economics, and all that jazz, so you're way off base there as well.
You ignore many fundamental issues in this business:
There is strong competition. This means that it is very rare for any one company to totally dominate a market, especially for a prolonged period of time. From an offensive point of view, this means that a company with its hands on a cure would be choosing not from owning a market outright, but from owning a sliver of it, and even then with risk involved in not coming out with better alternatives as time progresses. With a "cure", a company would:
1) be free to charge a lot for it. HMOs and insurers would prefer to pay for a cure like this, especially when you consider that so many of the costs that they pay go not to any one drug company, but (mostly) to the thousands of other ailments ASSOCIATED with that disease. (e.g., hiring doctors, nurses, medical equipment, etc).
2) have relatively low risk. This, in financial terms, is equivalent to money.
3) have quick turn over, when you compare that to the average 10+ year time to market for the drug companies, that's like a dream come true. put simply, 7b dollars today is worth a hell of a lot more to any one of these companies than 10b dollars over 5 years. This again, translates to money. Hint: Those dollars could have been invested in less risky ventures and returned more.
4) would allow the company to take the entire market, rather than just a sliver. Meaning more money...
5) saves on-going R&D dollars
6) establishes a solid reputation...
In addition, sitting on a cure also can easily become a defensive problem, when and if competitors find it for themselves. All those minority players in a given market would have plenty of motivation to release a cure if they had it. Meanwhile, the company that sits on it risks losing all their previous sales.
I could go on, but you just don't get it. Now this is not to say that it's so cut and dried, that a company would never fail invest in the discovery a cure. There are certain times when the allignment of certain circumstances, say, risk, market size, pecularities of the disease, may prevent a company from investing large sums of money in a cure, but if you think companies sit on their hands on large and lucractive markets where such an opportunity is clearly exploitable you're only kidding yourself.
A couple organizations have taken it upon themselves to promote freedom and openness in bioinformatics. One, Bioinformatics.org, has a modified version of SourceForge so that the community can perform project management and collaborations on a community-run website. Bioinformatics.org has other services, such as website hosting, news forums, a software registry and repository, and more to come. The organization currently hosts 27 projects and has over 600 members. (Disclaimer: I am the Director of the organization.)
Another organization, The Open Bioinformatics Foundation, supports the development of several language libraries for bioinformatics, such as the famous BioPerl. They also host the BOSC conference mentioned in the post.
--
This sort of thing has cropped up before. And it has always been due to human error.
--
This sort of thing has cropped up before. And it has always been due to human error.
HAL9000
This is interesting to see bioinformatics in the spotlight.. I used to work at a place trying to do "meaning based search" in the medical field. They were working on among other things ontology based search and a search for protein-gene relationships for quicker drug discovery.
/labbook and a host of others are working on this stuff to sell to pharama companys to do better search and allow quicker more accurate drug creation. The thinking is that if you can make a parma discover drugs faster than the rest you can charge a boatload of money for the software. Discovering new drugs while keeping the side effects minimal is non-trivial.
We also had a doctor on board before the money started to run out.. It helps because the biology terms are very foreign to Computer types (assay, gene clips etc....)
There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.
3rdmill and spotfire
There is a lot of computing power in the life sciences field,and a lot of data created with gene-clips and assay data. People can't sort it all out anymore some computer analysis makes everything faster. Look at the human genome. Computers made it happen.
"Sit back and enjoy the chaos" -Unknown
There are two factors that I think are driving the emergence of bioinformatics: culture and data explosion.
.com's use: big iron data warehouses running heavy duty RDBMS's like oracle, DB2. Nope. I have yet to come across a single bioinformatics project that has a clue about data modelling. It's actually much above average to use a database at all, let alone well. If I was head of the NIH, you can bet that Freshmen biologists would take a class in SQL starting immediately.
When I was in college, the computer science majors "hung out" with the math majors, the physics majors, and the electrical engineering majors. Biologists hung out with the less analytical crowd. Obviously these are generalizations, but I believe a lot of "the problem" is that culturally biologists just don't have very good computer skills. Suddenly it is the case that biology as a science absolutely requires these skills. If you were one of the few (and some do exist) that broke the stereotype, you need to be starting a company about now. Otherwise the race is on for the biologists to learn programming and the CS-math-physics types to learn biology.
Second is the fact that biologists are drowning in data. Projects like the human genome project are producing lots of data, but thats just the tip of the iceberg. There is already an exploding market in high throughput assays and measurement computation. The result is that the field as a whole simply isn't managing it's data well. Often groups store there data in extremely crappy formats. Custom text formats, asn.1, etc... I'm an Oracle programmer, so I expect the kind of solutions that Banks and
When you combine the two factors: culture and data innundation, very strange things start to happen. The data infrastructure just isn't there and worse a lot of people just don't realize it. Biology is presenting problems that require massive data warehousing solutions to a field whose main data background is calculating p-values to show the effect of a drug is significant.
hehe.. I thought they only created vaccines for things that would kill you. That way, by creating the vaccine they will actually make MORE money off you because you will live longer and spend more money on coff syrup and headache pills ;)
Need a website host? Try out http://WebQualityHost.net
There is something to be said for this position (that drug companies can't make money on curing diseases but rather by selling drugs that treat symptoms), however it is a somewhat alarmist position, at least the way it has been expressed here. I don't know why it would be suprising to see a company invest in technology that will generate future profits.
What bothers me about this issue is the futile attempts the federal government has made to attempt to regulate biological research with respect to use of the Genome Project data to assist in such morally ambiguous areas as human cloning. The attempts to regulate this field of resesearch are futile, as they are being handled now, since the industry high profit potential, that virtually unlimited funds will be expended to house research facilities in places beyond the borders of countries that choose to regulate this field of research.
While on the subject, I'd like to aplaud the genobe project researchers for enbracing the concept of 'Open Source' science. There were a number of firms that actively tried to gather together and copyright genome project data.
Well done gentlemen!
you have allowed the creation of an entirely new field of science. The openness of the research data will reduce the percieved moral ambiguity of the derivative works based on that data.
--CTH
--
--Got Lists? | Top 95 Star Wars Line
Eventually, the proponents of bioinformatics claim, the new field will change health care by allowing pharmaceutical companies to shave years off the drug-discovery process, and letting doctors tailor medicines to an individual's genetic makeup.
Pharmaceutical companies are around to make money. That's why they create drugs that treat symptoms and not drugs that are cures. Now they're investing in ways to make more money from us. Great.
-... ---
Stripped of header and gzipped, I get 366 bytes, X4 is 1464 nucleotides.
The probability of any two random sequences of the same length being equal is the inverse of the number of expressible sequences of that length. In this case, it is 4^length.
When you are looking for a random sequence (of length N) within a longer sequence (length M), the probability of finding it is the above probability multiplied by M-N (the same chance over and over again for every sub-sequence of length N, assuming you don't count wrapping substrings).
So N=1464, and M equals roughly 3 billion. So the probability is:
(3*10^9-1464)/4^1464
Which is in the neighborhood of one to a squared googol odds.
Of course, that assumes random data, but I figure it's a good enough approximation.
Don't knock yourself out looking for it. It's not there.
--