Slashdot Mirror


Bioinformatics

tadghin pointed out this Newsweek article on bioinformatics, and also notes: "At O'Reilly, we just published our first bioinformatics book last week, Learning Bioinformatics Computer Skills, by Cynthia Gibas and Per Jambeck, and it immediately rocketed to the top of the Amazon Computer bestseller list. This definitely appears to be a new area for the computer industry that's just starting to hit people's radar big time. I've also made the point to VCs looking at distributed computation startups that what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems. And science looks a lot more interesting than some of the business computing that's been front and center the past couple of years. And the Biological Open Source Computing Conference I spoke at last year was definitely popping with ideas and excitement. Unfortunately, this year's conference is in Copenhagen, right before the O'Reilly open source convention, but I definitely urge slashdotters to check out this area. Demand for perl expertise is especially high."

51 of 105 comments (clear)

  1. Slashdotted already by Anonymous Coward · · Score: 2
    Here is the text.

    Craig Benham has a problem. As a professor at Mount Sinai School of Medicine in New York, he trains students in the exploding new field of bioinformatics?the fusion of high-powered computing and biology that is aimed at revolutionizing the health-care industry. But Benham can?t keep a postdoctorate researcher for more than a year. They keep leaving for jobs that pay up to $100,000 at bioinformatics start-ups, giant pharmaceutical companies or technology giants like Motorola and IBM that are targeting the rapidly growing life-sciences field. ?These companies need a whole new class of biologists who have training in the computational and mathematical methods,? Benham says. ?I?ve got one former student who has been hired four times in three years, increasing his salary 30 percent each time. There?s huge demand for these skills.? Benham knows of what he speaks: this summer he will join the University of California, Davis, heading up its new $ 95 million bioinformatics program.

  2. What you see on Slashdot ... by Anonymous Coward · · Score: 3
    what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems

    No, what you see on sites like Slashdot is a lot of talking by bored sys admins about new and interesting problems they wish they could work on.

  3. Re:AI and Bioinformatics by Tim · · Score: 2

    "A lot of the work that's been done so far has been done by biologists who happen to be able to program, rather than by programmers who have learned the biology. As a result, a lot of the work uses inefficient algorithms, primitive approaches, bad statistics, and the like....Somebody who actually knows interesting new algorithms that can be applied to the problems can do even more."

    This is kind of a bad generalization to make. The software that has achieved notoriety and widespread use, while primitive in method (i.e. dynamic programming--boring, but widespread), is often based on very, very solid statistical theory. To the point where I almost find the "programmer" appeal to computational biology laughable--you'd be better served with some advanced statistics knowledge under your belt, rather than some programming knowledge, frankly.

    Also, as a student in comp bio myself, I can't tell you the number of times I've heard computational "biologists" stand up and give silly lectures on new algorithms to resolve solved problems (but in slightly faster time), or worse, completely abstract away the relevant details of a biological system in order to make new applications for their fancy new methods. While, yes, there's a danger to having a poor grasp on CS skills and doing computational biology work, this danger is significanty smaller than for those who are doing the same work without the biology skills. In my experience, it usually works like this: a biologist who can sort-of program will tend to write ugly code that gets the job done. A computer scientist who sort-of knows biology will get nowhere fast.

    --
    Let's try not to let fact interfere with our speculation here, OK?
  4. Re:Bioinformatics by mvw · · Score: 2
    Some words in advance:

    I worked for a company in cheminformatics so to say, we did software to gather and evaluate spectral and structural data, to store and retrieve it from a large database. Then I went to company that developed software for banks. Today I work in a bioinformatics company.

    The scenario was roughly the same, a lot of data in one or various databases, plus software to browse and manipulate that data. The difference is probably in the scale, the sheer amount of data, which is huge in bioinformatics.

    Compared to the guys from the financial software, the physical chemists had to work really hard!

    The problems were advanced and the number of customers, large chemical companies was less than the number of financial instutes in the second company.

    I believe the same will hold for the bioinformatics. What I can't tell however are the margins. The bankers seem to had a much better profit margin than the physical chemists. No idea what the bioinformatics customers are willing to pay. I expect pharmaceutical companies to be able to spend more on their tools and services than general chemistry companies.

    On the other hand, the present bioinformatics hype will probably to lead to a lot of competition.

    So I am not sure what will happen. Could be a good market, could be a very tough market. What I am sure of however is that the job is very interesting. State of the art software development, state of the art scientific work.

    In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.

    Yes and no. You need a diverse team of specialists. Of course you will have scientists there, some molecular biologists, and experts in genetics, perhaps some mathematicians or computer scientists. But because you need to create good software as well, you need very good software people. Good database people, good GUI programmers, good software architects etc. Even good system admins for the large machines.

    So people need to be specialists in their IT subject plus be able to work in the bioinformatics domain as well. Interesting for me to see that many physicists seem to have this profile.

  5. Re:Applying Open Source philosophy to Bioinformati by mvw · · Score: 2
    In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.

    There are a lot of open source bioinformatics projects. These are typically spawned by university or other public research projects. You mention Python and Perl, so try bioperl or biopyhon for a start.

    The one thing I didn't like about the biotech industry was how their research and information distribution was tied closely to their purse strings.

    You will have a lot of open source (where the majority of development money will come from public research funds) and a lot of commercial applications.

    It is unlikely that a large bunch of hackers will revolutionize this field. This is because you need a lot of domain specific knowledge and because a lot of work that needs to be done is too tedious or uncool to attract open source people from outside the bioinformatics field.

    Something like the Gimp could be done, because nearly everyone needs such a tool - but who needs for example a multiple sequence alignment editor besides biologists?

    Did you see some open source satelite control software or hydrodynamic simulation from outside their engineers communities?

  6. Re:Twenty Points To Whomever Finds DeCSS in DNA by mvw · · Score: 2
    I wonder what the odds are of finding one of these sequences in the billions of combinations currently being sequenced?

    But what is your reference DNA?

    There are regions on the chromosomes that are common to all individuals (like sequences that encode important cell machinery), while there are regions that vary more or less among indiviudals (e.g. those couple of nucleotides that differ between George Bush jr. and Al Gore :)

    And of course with ongoing research some of the DNA map data gets rewritten with higher accurate data versions (as it has been happened with the geographical world map in the past).

  7. Re:Why just Perl? by mvw · · Score: 2
    While Perl is great for cranking out some web sites with high mutation rates anyway, IMHO Perl is a maintenance nightmare.

    Anyone tried to do non trivial changes to his old Perl programms?

    My Perl programs were those that were the hardest to get understood after I had stopped working with them for some weeks. It is usually easier to rewrite them.

    Nonetheless I valued the good performance of Perl programs and was thus sceptical to other kids on the scripting language block, like Python.

    Months later, I must say that the much saner syntax of Python, the formidable documentation and the large library have changed my scripting preference from Perl to Python. Like Perl, Python has been ported to a lot of platforms.

    Ruby is a language I have not looked into yet. Its strong Japanese supporter base, has led to a lot of FreeBSD ports. So I might have a look soon.

    BTW, there are bioperl, biopython, bioruby and biojava efforts - anyone spotted a bioc or bioc++ one? And some dork registered www.biofortran.org.

  8. Re:perl expertise. sure. by Jonathan · · Score: 2

    Perl (and other scripting languages) can call C/C++ routines fairly easily. I myself prefer to write the bulk of my bioinformatics code in a scripting language for easy modification, and only write the routines that really need speed in C++.

  9. Re:Don't forget the flip side by Jonathan · · Score: 2

    The computer scientists who don't know their biology are just as lost in the field the as biologists who don't know their computer science.

    True. If I have to sit through one more seminar where somebody thinks that they are doing bioinformatics by proving some unrealistic abstraction of a biological problem to be NP-hard, it will be one too many.

  10. Re:perl expertise. sure. by Jonathan · · Score: 2

    Actually there is a BioRuby but it is 1) fairly new and undeveloped and 2) mostly documented in Japanese, as most Ruby modules are. I myself like Ruby and have done several projects in it -- the problem though is that where I am now I have to work in a team, and Perl is the only scripting language that everbody knows.

  11. Re:Book looks like fluff by Jonathan · · Score: 3

    I haven't read the book myself, although I did know one of the authors (Per Jambeck) in grad school (in fact I still have his copy of Knuth's "The Metafont Book " if he's looking for it). I doubt the book is fluff, just not for CS folk. Like all new sciences, bioinformatics is done by people coming from other areas. If you are looking for a book about bioinformatics for CS folks who are non-biologists look at Dan Gusfield's "Algorithms on Strings, Trees, and Sequences", (1997) although it is beginning to be a bit dated.

  12. Re:similar book for CS people? by Jonathan · · Score: 3

    As I mentioned in another posting, Dan Gusfield's "Algorithms on Strings, Trees and Sequences" is good, although getting a bit dated now. Another excellent book is Durbin, et al's "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids".

  13. Re:AI and Bioinformatics by Lars+Arvestad · · Score: 2
    The software that has achieved notoriety and widespread use, while primitive in method (i.e. dynamic programming--boring, but widespread), is often based on very, very solid statistical theory.

    The software packages that comes to my mind when reading this are in fact written by statisticians and/or computer scientists... And if there is a rivalling package by a biologist, you'll see that they have often picked up statistics and methods from their competitors.

    Also, as a student in comp bio myself, I can't tell you the number of times I've heard computational "biologists" stand up and give silly lectures on new algorithms to resolve solved problems (but in slightly faster time), or worse, completely abstract away the relevant details of a biological system in order to make new applications for their fancy new methods.

    I have numerous examples of biologists contending that their heuristic is much better than all the other heuristics (well at least on their own dataset). There are also excellent examples of biologists promoting their version of a traditional greedy heuristic as a "new algorithm" for solving an NP complete problems. Have you looked at protein folding? That field is a source for the most repugnant oversimplifications ever made in science.

    My point is that pointing fingers to either disciplin is ridiculous. There are offenders on both sides. The real path to successful bioinformatics is cooperation and humility. Biologists need to talk to CS, and CS must talk to biology. I think this is generally well understood these days.

    Disclaimer: I am a computer scientist.

    Lars
    __

    --
    Reality or nothing.
  14. Re:AI and Bioinformatics by Lars+Arvestad · · Score: 2
    This line of debate gets silly fast. Is a "computer scientist" a computer scientist if they focus on biology, or take a biology-centric view of the world? Vice-versa? Yes, you can define people in computational biology as either computer scientists or biologists, depending on how you like to think of them. I'll agree with you that the best researchers are highly competent in both realms. But the absolute best are biologists at heart.

    I'd say that you should ask the scientist, and I don't think Stephen Altschul, Michael Waterman, Gene Myers, Webb Miller, Anders Krogh, David Haussler, David Sankoff, to name a few, would call themselves biologists. And you do agree that they have made significant contributions to computational biology, don't you?

    Please don't go into rating of scientists, because that is silly.

    In my experience, this happens more often among CS researchers in computational biology than it does among biologists.
    [snip]
    Biologists are certainly not innocent--after all, everyone has an ego--but the debates of this variety tend to be among two or three competing alternatives (i.e. the parsimony vs. likelihood debates in phylogeny) that have been accepted by a majority of the researchers in a field.

    Computer scientists are generally interested in methods, so yes, they more likely to propose their own method than a biologist. New methods are a good thing, although not if they have not biological basis, sure. But most CS people I know actually try to communicate with biologists to try to establish what is relevant or not. It is not easy, especially when biologists are unsure themselves. You mention micro array data; The k-means or hierarchical clustering methods in use to day are to me quite without biological relevance too in my opinion. Go look at some of the examples in the literature and start scrutinizing (sp?) the computed clusters. They can look quite weird.

    The parsimony vs. ML is about whether they have biological relevance and are scientifically sound. If a method has to be accepted before you can start deciding whether it should be thrown out or not, nothing will ever happen.

    If Biologists were studying NP-complete problems, and not biology, then they couldn't be excused for not knowing the existing research in that field of computer science. But they're not. They're studying biology, and they're using whatever computational tools they need to do their job If they write papers that actually are more about describing algorithms than applying them, then they should not be excused. What is wrong with walking over to the CS department and discuss methods a little? Why not try to do just a little bit more than the greedy approach to see if you can get an improvement? Using a hammer on a screw is not very impressive when you neighbour might have a screwdriver.

    Well, if you're going to make that kind of dispersion, you're going to have to be a lot more specific. Yes, there's a lot of bad literature on protein folding. A lot of it comes from polymer physicists.

    I am thinking of the "beads on a lattice" model, and I actually know molecular biologists who have worked on stuff like that, so I don't think you should blame the polymer physicists...

    "Repugnant" might be a bad word, English is not my native tongue, but my opinion is that the logical step needed to make conclusions on real proteins based on the simplistic lattice models is a giant leap that is very hard to defend. CS people have certainly worked on it, but they did not invent the field!

    Certainly. But not equal humility. Computer scientists are entering an entirely new discipline where their own skills are of lesser importance, and they need to understand that. The CS/biology trade-off isn't equal at all, IMO.

    Computer scientists have an interesting field on their own right and they don't need excuse themselves in any way. Some make forays into computational biology that are not the brilliant, but that is no reason to put down all those computer scientists that actually make the effort of learning the biology and even better, talk to the biologists, and make contributions.

    Lars
    __

    --
    Reality or nothing.
  15. Twenty Points To Whomever Finds DeCSS in DNA by VValdo · · Score: 3
    This seems to be a fun application of bioinformatics.

    Take some code, say the tinest known CSS descrambler in C. Maybe compress it into a nice tight zip/.gz binary. Now convert it to a DNA sequence (It seems you could actually make a couple possible sequences by switching around the letters) I wonder what the odds are of finding one of these sequences in the billions of combinations currently being sequenced? W
    -------------------

    --
    -------------------
    This is my SIG. There are many like it, but this one is mine.
  16. Free software. by FallLine · · Score: 2
    Pharmaceutical companies are around to make money. That's why they create drugs that treat symptoms and not drugs that are cures. Now they're investing in ways to make more money from us. Great.
    "Free Software coders are around to get fame and attention. That's why they create software incrementally and not perfect software from the start. Now they're trying to get more attention from us. Great."

    The difference is? Nothing. Both are totally unsubstantiated and ignore well established theory, common sense, and any first hand understanding of the subject matter.
  17. In some ways it is... by FallLine · · Score: 2

    The US and Canada combined account for about 50% of world wide pharmaceutical sales. Africa and Asia (excluding Japan) less than 5%. Others somewhere in between. What's more, this doesn't fully convey the important fact that US (and other high paying) consumers do more than that to carry the market. If the United States (and to a less extent other parts of the world) had drug prices as regulated and controlled as they are throughout Europe and Canada even, many drugs would NEVER come to market because the profits aren't enough. The drug companies sell to these countries because it's just above their variable costs, meaning they make money, but not much.

  18. Re:Actually, yes, I do by FallLine · · Score: 2
    While your points may be sound for the general competitive marketplace, you overlook the clear anecdotal evidence to the contrary. There is case upon case where there are incredibly effective treatments (meaning a cure or one that gives a dramatic improvement to the patient's quality of life) that have *already* been developed and tested...and the product was PULLED.
    "Clear anecdotal evidence" to the contrary? What is that supposed to mean precisely? There are at least two possibilities here that I can think of. A) You're pulling it out of thin air. B) You fail to understand the real issues behind them, since you don't actually work directly with the product. The odds are, in fact, that your statements strongly imply that the extent of your experience with them is academic and well removed. If there are so many examples, name a couple please! That should not be much of a problem, right? Or is this Nth hand knowledge?

    Furthermore, to shorten a potentially infinite thread, you're mostly barking up the wrong tree. I never said these companies do not exist to make profits. I won't even apologize for that. What I will say, however, is that the mere fact that these companies exist, by and large, to increase shareholder wealth does not preclude the efforts to find cures. Quite the contrary, as I've laid out in my previous post, "cures" are a dream for most of the companies' shareholders most of the time. They generally WILL pursue them.

    To briefly address some of your other comments, "breaking even" is not really breaking even in financial terms. If by "breaking even", you mean investing 500m dollars, and geting a return on that investment of 500m (hopefully) some 15 years later, then that's actually LOSING money in financial terms. Besides inflation, you must also take into account opportunity cost. That money could have been invested in other places in the stock market and returned millions more. If you figure a very reasonable number like 10% a year, that's about 1.2 billion dollars. Then you must also factor in risk. If the market can only be 500m dollars, but may be less, then you're asking even more of the company. No matter what you think of these companies, it is simply not up to them, shareholders simply will take their money elsewhere. It's equivelent to asking the shareholders to give their money away.

    In addition, where these situations tend to arise, the benefits to society also tend to be relatively small. Not to mention the most important fact, that RESOURCES are scarce. It may sound horrible that the 10k people in the country with a rare genetic disorder do not recieve their treatment, but remember that a decision MUST be made as to where to put it, because there simply is not enough to go around to every cause. The more lucractive markets also tend to be areas that society values more highly, areas where more lives can be saved/improved per dollar spent.

    Lastly, just because some companies are unwilling to pursue certain ventures does not mean other companies and/or the public sector are magically held back. If the other means do not work, that does not mean it is their fault. Do not penalize the only thing that really works. If you want to try to start a "break even" drug company, be my guest, the other drug companies aren't going to stop you from pursuing worthless markets. Or if you want to use the public sector as an alternative, again, be my guest. You won't get very far though, they have a lousy track record when it comes to actually making the end product. Just don't penalize the only system that works, you're only going to be harming those that you think you're trying to help.

  19. Actually, yes, I do by FallLine · · Score: 3

    I happen to be involved in the biotechnology industry and I live in Philadephia, so I know a thing or two about the subject. You, on the other hand, do not. I also went to business schoool, as in finance, economics, and all that jazz, so you're way off base there as well.

    You ignore many fundamental issues in this business:

    There is strong competition. This means that it is very rare for any one company to totally dominate a market, especially for a prolonged period of time. From an offensive point of view, this means that a company with its hands on a cure would be choosing not from owning a market outright, but from owning a sliver of it, and even then with risk involved in not coming out with better alternatives as time progresses. With a "cure", a company would:

    1) be free to charge a lot for it. HMOs and insurers would prefer to pay for a cure like this, especially when you consider that so many of the costs that they pay go not to any one drug company, but (mostly) to the thousands of other ailments ASSOCIATED with that disease. (e.g., hiring doctors, nurses, medical equipment, etc).

    2) have relatively low risk. This, in financial terms, is equivalent to money.

    3) have quick turn over, when you compare that to the average 10+ year time to market for the drug companies, that's like a dream come true. put simply, 7b dollars today is worth a hell of a lot more to any one of these companies than 10b dollars over 5 years. This again, translates to money. Hint: Those dollars could have been invested in less risky ventures and returned more.

    4) would allow the company to take the entire market, rather than just a sliver. Meaning more money...

    5) saves on-going R&D dollars

    6) establishes a solid reputation...

    In addition, sitting on a cure also can easily become a defensive problem, when and if competitors find it for themselves. All those minority players in a given market would have plenty of motivation to release a cure if they had it. Meanwhile, the company that sits on it risks losing all their previous sales.

    I could go on, but you just don't get it. Now this is not to say that it's so cut and dried, that a company would never fail invest in the discovery a cure. There are certain times when the allignment of certain circumstances, say, risk, market size, pecularities of the disease, may prevent a company from investing large sums of money in a cure, but if you think companies sit on their hands on large and lucractive markets where such an opportunity is clearly exploitable you're only kidding yourself.

    1. Re:Actually, yes, I do by FallLine · · Score: 3
      So it's more lucrative to charge a person once rather than weekly for the rest of their lives? I can't see how that's possible.
      Why not? Who says that a series of pills must be sold for more than a single one (not that a cure is necessarily a single pill, in fact that's very unlikely)? Who says that the profits on those sales must be more? If you think it's impossible, you have little to no understanding of business, never mind the drug business.

      Ok put it this way, imagine you're Eli Lilly, you're in a drug market and sell 2b dollars a year with 30% of a given market. However, that 2b dollars a year product took 15 years to bring to market. (Hint: This depreciates the value of that return hugely). You've only been on the market 2 or 3 years and your patent will soon expire, meaning that your prices will get cut by 3x at least by the generics. Plus you've got other competitors banging at your door with alternatives today, chipping away at your sales. Furthermore, you should understand that the mere invention of that one drug was by no means assured, it was risky (investors demand a lot more return for taking on that kind of risk). You could very easily find yourself 3 or 4 years down the road without a single hit drug. In fact, to even have a hope of staying on top, you need to spend very substantial sums on R&D and marketing. In fact, only 3 out of 10 drugs on the market meet or exceed their R&D costs. Of those, only a small fraction will really generate your profits. Realistically, you're looking at a profit margin of about 15-9% (9 when you figure in depreciation), when all is said and done (remember only a very small fraction actually make it to market, let alone suceed), on a 2b dollar a year product. The picture I am painting is fairly close to reality.

      Now, imagine you're that same company, and you have a cure at hand (since you imply that they can do either just as easily). You can either continue down that same path (to the extent that you can control it) or you can bring the cure to market. The cure, if it's a given, is a no brainier. That's about ~7b dollars in revenues in the first year alone if you could sell the "cure" for the cost of one years worth of drugs, a very reasonable and low number. In fact, the HMOs and insurance companies would be willing to pay much more than this, considering how much they save from other medical bills, the complications alone far far outweigh the costs. What's more, that money comes relatively risk free. As a percentage of sales you would spend far less on R&D, meaning higher return for the shareholders, marketing would also be significantly reduced, given that it is a "cure", which would quickly become common knowledge in the medical community. So quick and dirty, ~6b in profit (minimum) for the cure versus 180m a year (figure 9% of 2b) for however many years. It really is a no brainer.
  20. Open Source Bioinformatics by Bizzaro · · Score: 3
    Some people in the field are now releasing their software under Free/Open Source licenses. It may seem odd to non-scientists that the license is an issue. Isn't all scientific work free and open? Far from it, especially in bioinformatics, where, as you may have read, there is a lot of money involved.

    A couple organizations have taken it upon themselves to promote freedom and openness in bioinformatics. One, Bioinformatics.org, has a modified version of SourceForge so that the community can perform project management and collaborations on a community-run website. Bioinformatics.org has other services, such as website hosting, news forums, a software registry and repository, and more to come. The organization currently hosts 27 projects and has over 600 members. (Disclaimer: I am the Director of the organization.)

    Another organization, The Open Bioinformatics Foundation, supports the development of several language libraries for bioinformatics, such as the famous BioPerl. They also host the BOSC conference mentioned in the post.

    --
    This sort of thing has cropped up before. And it has always been due to human error.

    --

    --
    This sort of thing has cropped up before. And it has always been due to human error.
    HAL9000

  21. Bioinformatics by the+eric+conspiracy · · Score: 2

    Don't quit your day job. While bioinformatics is a very interesting and exciting area, it is also a very small field, with potential for maybe to be a $10 billion industry at most. Bioinformatics companies have a very limited number of potential clients - other pharm companies - for which they perform various services. In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.

  22. Re:Drugs by the+eric+conspiracy · · Score: 2

    There is something to be said for this position (that drug companies can't make money on curing diseases but rather by selling drugs that treat symptoms),

    I don't buy it. Drug companies operate in a competitive marketplace, whith very cost concious insurance companies footing the bill. If company A has a product that treats on symptoms, it's product will be soon replaced by company B's that actually treats the disease. Finally B's product will be driven out of the market by a product from Company C that cures the disease. The body of medical knowledge IS cumulative, and company C's route to profit is to develop a better product than B's.

    Sure, there is a process here, and some diseases may never have a cure, but the fact is that cures do really enter the marketplace, and drive out treatments.

  23. Re:I believe you're wrong by the+eric+conspiracy · · Score: 2

    As a PhD student in bioinformatics, I must say I strongly disagree with you.

    I am not saying its a bad career choice, or anything like that. But the article made it sound like this was going to cause a big increase in the overall demand for CS types. It just isn't so. The overall projections are not something I just made up, either - there are plenty of well thought out surveys being published as to where this is going. I have advanced degrees in Math, Chemistry and a lot experience in industrial use of statistics, as well as strong programming skills, and have followed this field with great interest. Living in central NJ, where many of these bioinformatics companies are headquartered, this seemed like a field that I would be able use my skills. But after some real investigation of the nature of the business I decided that this was not where I wanted to go.

    You talk about 'companies with thier foot biology', please tell me what exactly those are except the Pharmas and Agribusinesses? Nobody else is messing with genes.

    Hospitals are not going to have people writing bioinformatics software on staff - there is a little matter of FDA regulations on what they can use. Biotech companies that develop hardware are part of the core bioinformatics industry - their customers are the same Pharmas. Departments at universities are nice IF you can get a faculty position. Otherwise you will be paid $30,000 per year and be living grant to grant. No thanks.

  24. AI and Bioinformatics by acomj · · Score: 3

    This is interesting to see bioinformatics in the spotlight.. I used to work at a place trying to do "meaning based search" in the medical field. They were working on among other things ontology based search and a search for protein-gene relationships for quicker drug discovery.

    We also had a doctor on board before the money started to run out.. It helps because the biology terms are very foreign to Computer types (assay, gene clips etc....)

    There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.

    3rdmill and spotfire /labbook and a host of others are working on this stuff to sell to pharama companys to do better search and allow quicker more accurate drug creation. The thinking is that if you can make a parma discover drugs faster than the rest you can charge a boatload of money for the software. Discovering new drugs while keeping the side effects minimal is non-trivial.

    There is a lot of computing power in the life sciences field,and a lot of data created with gene-clips and assay data. People can't sort it all out anymore some computer analysis makes everything faster. Look at the human genome. Computers made it happen.

    "Sit back and enjoy the chaos" -Unknown

    1. Re:AI and Bioinformatics by rgmoore · · Score: 2
      There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.

      I think that this points out an important reason that bioinformatics is such an exciting field for computer people to get into. A lot of the work that's been done so far has been done by biologists who happen to be able to program, rather than by programmers who have learned the biology. As a result, a lot of the work uses inefficient algorithms, primitive approaches, bad statistics, and the like. People are constantly reinventing the wheel, and in many cases are making ones that barely turn. Somebody who comes into the field with a strong computer background can turn out to be a real hero just by cleaning up the useful but inelegant work that's out there already. Somebody who actually knows interesting new algorithms that can be applied to the problems can do even more.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

  25. University of Waterloo strikes again by MrNixon · · Score: 2
    This year, the University of Waterloo started a new program in Bioinformatics, with three ways of getting to that end:

    BSc (Honours Bioinformatics)

    BMath (Honour Computer Science - Bioinformatics option)

    BSc (Honours Biology and Bioinformatics)

    Hooray UW!

  26. Re:Drugs by miahrogers · · Score: 2

    Ever heard of vaccines? People make vaccines, even though it's only required one time(or around once a decade). That's not treating the symptoms, or the cure, it's cutting it off before it even happens, which saves you lots and lots of money. Be thankful.

  27. Re:How many points for telling you the odds? by The+Musician · · Score: 2
    Your analysis is correct, insofar as you require the DeCSS bases to appear, unbroken, as a string within the genome.

    However, perhaps we don't have to require the string to be unbroken. For example, would the pattern "use 100 bases, skip 10, use 100 bases, skip 10..." be an acceptable algorithm for finding DeCSS in the genome? If so, the probability increases combinatorally, so perhaps isn't as unlikely as you think.

    As the string length gets small enough to be feasable (log4 3*10^9) ~ 16 bases, you have to start using inclusion-exclusion instead of just multiplying by M-N, which I don't feel particularly compelled to do right now.

    My point is just that there are more feasible encodings than a bits-to-bases unbroken string, so the chances are higher when you allow those cases.

  28. Culture clash: biologists and programmers by bwt · · Score: 5

    There are two factors that I think are driving the emergence of bioinformatics: culture and data explosion.

    When I was in college, the computer science majors "hung out" with the math majors, the physics majors, and the electrical engineering majors. Biologists hung out with the less analytical crowd. Obviously these are generalizations, but I believe a lot of "the problem" is that culturally biologists just don't have very good computer skills. Suddenly it is the case that biology as a science absolutely requires these skills. If you were one of the few (and some do exist) that broke the stereotype, you need to be starting a company about now. Otherwise the race is on for the biologists to learn programming and the CS-math-physics types to learn biology.

    Second is the fact that biologists are drowning in data. Projects like the human genome project are producing lots of data, but thats just the tip of the iceberg. There is already an exploding market in high throughput assays and measurement computation. The result is that the field as a whole simply isn't managing it's data well. Often groups store there data in extremely crappy formats. Custom text formats, asn.1, etc... I'm an Oracle programmer, so I expect the kind of solutions that Banks and .com's use: big iron data warehouses running heavy duty RDBMS's like oracle, DB2. Nope. I have yet to come across a single bioinformatics project that has a clue about data modelling. It's actually much above average to use a database at all, let alone well. If I was head of the NIH, you can bet that Freshmen biologists would take a class in SQL starting immediately.

    When you combine the two factors: culture and data innundation, very strange things start to happen. The data infrastructure just isn't there and worse a lot of people just don't realize it. Biology is presenting problems that require massive data warehousing solutions to a field whose main data background is calculating p-values to show the effect of a drug is significant.

  29. Not for us... by Porfiry · · Score: 2

    By "us", I mean computer geeks.

    This is meant as an introductory text to computing for biologists. Not vice versa. If you don't understand the biology, it's pretty much meaningless.

  30. Don't miss www.biolisp.org, either by alispguru · · Score: 2

    ... just to chime in. That's www.biolisp.org

    --

    To a Lisp hacker, XML is S-expressions in drag.
  31. Re:perl expertise. sure. by krmt · · Score: 2

    Check this out:

    Bioperl
    Biopython
    Biojava
    BioXML
    BioCORBA

    I couldn't find anything for ruby (either linked from bioperl, as those were, or on their own app list) but you can bet it's coming. I'd personally love to see it. But there's plenty of options for bioinformatics other than perl, although perl's excellent text handling makes it a very suitable choice.

    "I may not have morals, but I have standards."

    --

    "I may not have morals, but I have standards."

  32. Re:DNA isn't as hard as everyone says it is. by krmt · · Score: 2

    Yes and no. This is a decent idea for something like drosophila, where you can mutate the gene and see what happens, but there's no way this would work on humans. If you've got a phenotype, you've got to do massive forward (reverse? I always get them confused...) genetics to find your gene. Cystic Fibrosis took a decade or so.

    If it's the other way around and you've just got a sequence then you've got some different work to do. What if multiple genes in a pathway are mutated? What if there are multiple pathways affected by this gene? What if there is no noticable phenotype for a mutated gene?

    Sure, with a massive database, this could work (and I mean huge, like multi-century lineage-total-human-population huge) but realistically, linking DNA string object to phenotype object and expecting to elucidate a pathway is pretty insane, even for someone with your nick :-)

    A better way is the genomics approach, where you sit down with the microarray and say "Ok, what's going on here?" The biological systems are too big for just a two variable approach.

    The black box idea is a good one, but not the way you propose. If you use a black box to abstract away what's actually happening (i.e. ignore what you don't need in the microarray) without actually dumping any data (if you need it in the microarray, it's still there) then you have a feasible method. With the two variable approach, you force people to leave out so many variables that the system becomes just theoretical and pretty much impractical.

    "I may not have morals, but I have standards."

    --

    "I may not have morals, but I have standards."

  33. Re:Drugs by psin+psycle · · Score: 3

    hehe.. I thought they only created vaccines for things that would kill you. That way, by creating the vaccine they will actually make MORE money off you because you will live longer and spend more money on coff syrup and headache pills ;)

    --
    Need a website host? Try out http://WebQualityHost.net
  34. Re:Drugs by rgmoore · · Score: 2

    It doesn't work as well as you might hope. A big part of the problem is that most doctors don't seem to have the time or inclination to independently research the latest medical findings. Instead they depend on pharmaceutical companies to tell them. The problems with this should be pretty obvious. This is a particularly severe problem when all of the companies have similar treatments for a problem. In that case, none of them wants to push an alternative that will cut into their cash cow. News about alternate therapies can get out, but it's slowed appreciably. And, of course, there's always some reason to doubt the new findings, which the pharmaceutical salesmen will quickly point out when the doctors ask them about it. When that doesn't work, they try pitching directly to patients so that they won't talk to their better informed doctors and find out about available alternatives.

    Peptic ulcers are a classic case of this. For a long time people thought that ulcers were caused by organic problems that caused people to produce too much stomach acid. That suggested that the only treatment was a long-term regimen of antiacids or acid-blocking medicines; patients would be stuck taking them for the rest of their lives. This was obviously a lucrative field, so all of the Big Pharma companies started producing acid blocking medicines. Then somebody discovered that the excess acid production wasn't organic after all, but was caused by a bacterium, Helicobacter pylori, so ulcers could be cured by a short regimen including antiacid medication and antibiotics. Naturally, the Big Pharma companies didn't like this and they've tried very hard to keep it out of the public eye. They've tried hard to convince doctors that the new therapies are unreliable and ineffective, and now they're trying to convince people to take over the counter forms of their acid-blocking medication instead of talking to their doctors about the problem. It's disgusting, but it's also very profitable, so you can't expect Big Pharma to give it up any time soon.

    --

    There's no point in questioning authority if you aren't going to listen to the answers.

  35. Book looks like fluff by Ars-Fartsica · · Score: 2
    Look at the TOC - chapters like "Can I learn a programming language without taking classes?".

    Obviously this book is for bio folks who are non-programmers.

  36. Re:Why just Perl? by oingoboingo · · Score: 2
    It doesn't just have to be perl...but since so much down-in-the-trenches bioinformatics involves sorting, manipulating and processing text strings of DNA and protein sequences, perl is, for the most part, a perfect fit. there's also a really nice set of perl classes available at bioperl for doing a lot of the more tedious sequence processing jobs.

    also, just because you're dealing with genetic information doesn't necessarily mean you need to use a 'genetic' programming technique either...they're quite different things.

  37. Dot Com Refugees by Alien54 · · Score: 2
    Sounds like a good place for all those talented dot com refugees out there.

    VCs should make sure to look out for those who lost them money the first time around. Especially those whop were into smoke and mirrors.

    Check out the Vinny the Vampire comic strip

    --
    "It is a greater offense to steal men's labor, than their clothes"
  38. Haiku by 575 · · Score: 2

    Sexuality,
    As bioinformatics
    Was that "grep" or "grope"?

  39. Re:Drugs by danudwary · · Score: 2
    Actually, it's fairly well known now that a significant number of ulcers are caused by a bacterial infection - specifically Helicobacter pylori (hope I spelled that right). Those people taking black market Zantac would be better served by black market antibiotics. :)

    And anyway, if you could develop a drug that cures a hole in your stomach (or anywhere for that matter) without causing cancer, you'd be a very rich multinational corporation.

  40. Re:Why just Perl? by Phillip2 · · Score: 2
    "but since so much down-in-the-trenches bioinformatics involves sorting, manipulating and processing text strings of DNA"

    The reasons that perl is so prevelant is that by and large the bioinformatics community has screwed up their data representation. The reason for this is simple. We have the biggest legacy problem in the world. The amount of data has expanded beyond all recognition. We use techniques which were knocked together to represent hundreds of sequences to represent millions.

    The end result of this is that we spend vast amounts of time chopping and changing text formats, which is an absurd way of spending time. Of course perl is great for this, which was why it got used so much. Which leads to the second problem. Many bioinformaticists are converted biologists (myself included). The end result of this is that we are often not terribly good programmers, and have a perhaps greater tendancy to stick with the langauge that we know than we might otherwise do.

    The fact that we are using perl for relatively large projects is really a admission of failure on our half rather than the strength of perl!

    Phil

  41. Re:Drugs by hillct · · Score: 3

    There is something to be said for this position (that drug companies can't make money on curing diseases but rather by selling drugs that treat symptoms), however it is a somewhat alarmist position, at least the way it has been expressed here. I don't know why it would be suprising to see a company invest in technology that will generate future profits.

    What bothers me about this issue is the futile attempts the federal government has made to attempt to regulate biological research with respect to use of the Genome Project data to assist in such morally ambiguous areas as human cloning. The attempts to regulate this field of resesearch are futile, as they are being handled now, since the industry high profit potential, that virtually unlimited funds will be expended to house research facilities in places beyond the borders of countries that choose to regulate this field of research.

    While on the subject, I'd like to aplaud the genobe project researchers for enbracing the concept of 'Open Source' science. There were a number of firms that actively tried to gather together and copyright genome project data.

    Well done gentlemen!

    you have allowed the creation of an entirely new field of science. The openness of the research data will reduce the percieved moral ambiguity of the derivative works based on that data.

    --CTH

    --

    --

    --Got Lists? | Top 95 Star Wars Line
  42. Drugs by swagr · · Score: 3

    Eventually, the proponents of bioinformatics claim, the new field will change health care by allowing pharmaceutical companies to shave years off the drug-discovery process, and letting doctors tailor medicines to an individual's genetic makeup.
    Pharmaceutical companies are around to make money. That's why they create drugs that treat symptoms and not drugs that are cures. Now they're investing in ways to make more money from us. Great.

    --

    -... --- .-. . -.. ..--..
    1. Re:Drugs by the+real+jeezus · · Score: 2
      Zantac is the world's #1 medicine, and also the world's #1 black-market drug. That's because people with stomach ulcers [and there are a *lot* of them] have to take it daily

      Thank you for unwittingly proving the parent's assertion that the drugs treat the symptoms rather than cure the disease.

      The AMA claims that over 90% of ulcer cases are caused by a strain of Heliobacter Pylorii, which is carried by the housefly. The metal Bismuth is toxic to the bacterium, so Bismuth-containing OTC drugs such as Pepto-Bismol have been successful in actually curing ulcers (large quantities required, though--see your doc). Other antibiotics do the same thing. Naturopathic physicians have had similar success with such unlikely agents as cabbage juice. In both cases, the cure involves eliminating the colony of H. Pylorii from your upper GI tract.

      The production of stomach acids is a normal part of the digestive cycle and should not be painful. It should also not be reduced. IIRC, Zantac reduces the secretion of stomach acids, causing the stomach to be less painful--it treats one of the symptoms. A reduction in the quantity of stomach acid causes its own problems: reduced digestion (and the slight malnutrition that obviously accompanies it...) and an increased risk of stomach cancer.

      Like clockwork, the pharmaceutical companies once again push drugs to only treat the symptoms. If this wasn't true, the commercials would say "Ask your doctor about curing your ulcer with large quantities of Pepto-Bismol" instead of "Ask your doctor about Zantac". Their Macchiavellian value system ignores the ever-increasing number of side effects of all of their wares, another issue entirely.



      Ewige Blumenkraft!
      --

      Ewige Blumenkraft!
  43. I work at one by daniel_isaacs · · Score: 2
    www.paragen.com

    Check it out, always looking for talented people in the RTP.

    It is a very cool industry to be in. Most of the people I work with are PHDs. Almost all are Perl hackers. They use Linux. Lots of trendy VC furniture and free beer. Monster hardware to play with.

    It is presently a small industry. But the scope is not limited to Pharmaceutical and Ag companies. Can't say too much (Insider laws and whatnot) but if you think this field is anything less that pre-supernova, you're wrong. Commercial applications of our technolgies are just now becoming apparent. There is no end to the potential growth of this field.

    www.paragen.com

    Check it out, always looking for talented people in the RTP.

    --
    - Dan I.
  44. This is silly. by Flying+Headless+Goku · · Score: 2

    On average, the description of how to construct the data from substrings of a random string will be as long as the data itself. The human genome does not in any meaningful way include the data, any more than does a sufficiently long repeating string of the four bases (AGCTAGCTAGCT...).

    Give it up. (speaking of which, I'm surprised nobody has turned this into an "All your base" joke)
    --

    --
  45. Re:How many points for telling you the odds? by Flying+Headless+Goku · · Score: 2

    First multiplying by M-N will, for a big enough M, give you a probability greater than 1. Clearly this is wrong. What you seem to want is a series of Bernoulli trials where each trial has the probability of randomly matching the N characters.

    True. That was sloppy of me, each successive trial would have its probability of success multiplied by probability of failure of all preceding trials. So the difference only lowers the probability of a match, and not significantly in this case.

    However there are not going to be M-N independent trials. This is because when checking character 1 through N of the longer sequence with the shorter sequence, there are going to be a lot a matches and mismatches on the individual characters. This is going to impose constraints on getting a match for characters 2 through N+1. So you just can't shift the sequence over one character and get an independent trial.

    This doesn't affect the probability calculation. Each substring of a random string is still a random string, and they are no more probable to be equal to one another than two successive randoms generated independently. If you calculated how the randomly distributed matches and mismatches of one trial changes the probability of the next trial, you'd find that on average they don't affect it at all.

    However, it can affect the number of expected matches, if you examine the characteristics of the given short string. A string of one symbol repeated is more likely to get more than one match than a string in which there are no symbol repetitions. But the probability of just "one or more" matches is unaffected.

    (I also goofed about the number of trials, which would be M - N + 1)
    --

    --
  46. How many points for telling you the odds? by Flying+Headless+Goku · · Score: 4

    Stripped of header and gzipped, I get 366 bytes, X4 is 1464 nucleotides.

    The probability of any two random sequences of the same length being equal is the inverse of the number of expressible sequences of that length. In this case, it is 4^length.

    When you are looking for a random sequence (of length N) within a longer sequence (length M), the probability of finding it is the above probability multiplied by M-N (the same chance over and over again for every sub-sequence of length N, assuming you don't count wrapping substrings).

    So N=1464, and M equals roughly 3 billion. So the probability is:
    (3*10^9-1464)/4^1464

    Which is in the neighborhood of one to a squared googol odds.

    Of course, that assumes random data, but I figure it's a good enough approximation.

    Don't knock yourself out looking for it. It's not there.
    --

    --
  47. Bioinformatics Books by StevenBrenner · · Score: 2

    I am pleased to see O'Reilly's entrance to the field, as well as the interest on Slashdot.

    My research group studies computational genomics, and I teach two classes in the field. For this reason, I've scoured the earth for suitable books on the topic.

    I have put together a list of 36 books on computational biology. Most of these are suitable only to niche interests, outdated, or simply bad -- and many are intended for the Llama crowd. I've reviewed several proposals for new books, so I expect the offerings to become stronger in the next year or so. Those desiring a brief introduction to the field might want to look at the Trends Guide to Bioinformatics (free, registration required; disclaimer: I was a guest editor). It's intended for biologists, but should be readable by /.ers.

  48. Re:I believe you're wrong by apofex · · Score: 2

    hmm. Note to self: "stay away from medical profession."