Slashdot Mirror


Celera Maps Entire Fruit Fly Genome

cjoh345 wrote: "Celera Genomics has just sequenced all the genes in the fruit fly. Apparently the scientists involved are amazed at the genes that we share with this dorm-room annoyance. This discovery also validates Celera's "shotgun approach" to mapping out this stuff. And yes, the genome is available free of charge via Genbank. Good form, Celera!" What would Mendel have thought of this? How about Watson and Crick? This makes me want to break out my copy of The Double Helix .

15 of 123 comments (clear)

  1. Open Source Fruitfly? Bah. Open Source Turlington! by Anonymous Coward · · Score: 3

    Yeah, so they released the Fruitfly source code. Just what we need--soon the OSS community will stomp all the remaining bugs in the bug, and we'll have a stable, reliable, and high performance pest.

    On the other hand, I'll bet you a thousand dollars that they never release the source to a really valuable product, like Christy Turlington. Greedy corporate bastards.

  2. Read Science by Lars+Arvestad · · Score: 3
    For a more informative source, check todays issue of Science, which includes a special about the feat. If you register, you get to read abstracts of the articles. This alone is quite informative.


    Lars
    __

    --
    Reality or nothing.
  3. finished my ass by irongull · · Score: 3

    I've spent the better part of the last year massaging the drosophila genome with perl scripts. I am intimately familiar with it, and I can tell you that it is NOT complete. While it is 90% sequenced, it is still in a bazillion tiny pieces. This makes it difficult to get complete sequence for many genes, as they may start in one contiguous piece and end in another. From personal experience, I would guess that about 15% of the genes are split like this.

    And even if it were complete, this still wouldn't tell us where all the genes are. Its very difficult to string coding regions together into a complete gene, when there may be large introns that confuse the matter. The state of the art still only identifies about 70% of genes correctly, even given complete sequence.

    And even if we did know all the genes, we still wouldn't know how they interact. We can make guesses based on previous experimets, but the majority of the genes in a given genome are experimentally uncharacterized. Current attempts at molecular simulations can't even predict how these proteins will fold, let alone with what other proteins they interact, or what they do.

    There is still a lot of work to be done - getting the genomic sequence is only the beginning. A significant start, but only the first step. Could you reverse engineer the entire linux kernel if you were only given the binary? Probably not. I assure you that deducing the operation of an organsim given the raw DNA code is much more difficult. The complete sequence of the E.coli (an intestinal bacteria) has been available for a couple of years, and we haven't even begun to understand it. E.coli is single celled, and the genome is only 4.5Mb (thats megabases). Drosophila is very complicated, and the genome is about 120Mb. Don't look for anyone to 'solve the fly' any time soon.

    Its a great time to be in bioinformatics - tons and tons of data that noone understands. If we did understand it all, I'd be out of a job. I'm not worried.

    ted

  4. Well, actually . . . by Graham+Clark · · Score: 3

    human genome (more than 200trillion base pairs)

    A bit over 3,000 million, actually.

    To compare, the government-funded Human Genome Project has so far spent over 10 years on the same job.

    There's been a lot of mapping done, that Celera is actually using at one remove, because they're picking up our data for their assembly. Celera's job would have been a lot harder if our work hadn't been available to them.

    Celera is doing a 4x oversampling on the human genome, unlike the HGP, which does 10x oversampling. This is possible because Celera is sequencing DNA from one single individual (most likely Craig Venter?), thus avoiding the uncertainty of wheter differences are due to sequencing artifacts or personal variations.

    The extra depth is more to do with accuracy. Incidentally, does anyone know when Celera ditched their aim to collect lots of variation data? That was going to be their great contribution to human knowledge, and their main selling point. When did they change their minds?

    Plus, Celera is using our (PE Biosystems) 3700 DNA Analyzer (fully automated, unattended operation 24 hours per day), whereas the Human Genome Project mostly use our older 377 DNA Sequencer, which requires manual reloading of samples after each run (every 2-3 hours).

    As I've pointed out, here at the Sanger we've got more than 100 3700s : other institutions have gone for MegaBace machines instead.

  5. The next step by RJ11 · · Score: 3

    I guess the next logical step would be to mass produce giant mutant fruit flies the size of cows and then harvest them for food?

  6. Initial Reconstruction by yuriwho · · Score: 3
    In what they call an ``initial reconstruction,'' the researchers state up front that they must still analyze their map and that their work is far from done.

    From their press release....they have a map, they don't have the sequence yet. A map is a guide with landmarks as to where major chunks of sequence fall within the genome. Celera has pioneered the approach of shotgun cloning. They randomly capture chunks of the genome and sequence them. If they do this enough times (10-50X genome size) they will ultimately have the entire genome after some sophisticated algorithyms sort the data and place it onto the map. They probably have most of the genome and have a few difficult bits to figure out (some sequences are harder then others to get).

    Some searching reveals the NCBI press release. Looks like they have most of the sequence together. I'll bet berkeley provided the map which is allowing Celera to put their info together.

    The big question is: Has Celera already filed patents on every ORF it has found; Will the patent office grant the patents; Will Celera get patents on the human homologs of these genes (they have identified most of the homologous sequences from EST's in the human genome project).

    And you thought software patenting was fusked.

    --
    no sig.
  7. Wow, great post! by yuriwho · · Score: 3

    You are absolutely correct! The next misson (after sequencing the genome is to figure out what all of those genes(proteins) are doing. So, onto the human structome project (structures of every protein that has been inferred through sequence) and the functome (functions of all genes that have been inferred from sequece data). I think 50 years is a resonable estimate. Things may accellerate a little once physicists (and coders) get involved and start creating models of signaling networks within cells. Models that predict real cellular responses to the additional expression of genes X & Y will be the one that begin to understand how cells operate.

    Just think how different coding will be 50 years from now.

    --
    no sig.
  8. Re:Good, but the hard work remains to be done. by rgmoore · · Score: 3
    Similarly, we can now move on to the next step in understanding biological creatures - trying to figure out what all of the proteins do, and how the systems built from them operate and interact with other such systems.

    This would not be an easy task under the best of circumstances. It's made worse by the fact that evolution puts little value on modularity - the systems will interact with each other to such a degree that it will be difficult to even define individual systems within the chaos that is an evolved being.

    Fortunately, things aren't quite as bleak as you portray them. Many, many proteins do in fact have a single, clearly defined primary function, either by themselves or as part of a larger complex. Those proteins can have their function inferred either by watching them catalyze reactions, deleting them and seeing how it affects celluar function, or comparing them to similar proteins from the same or other species.

    More promisingly, new techniques of functional genomics and proteomics are being developed to analyze protein function by looking at more subtle factors. To find the function of a protein of unknown function, you can find out what other proteins it interacts with and infer what role it plays. You can also grow cells or organisms under different conditions and look for changes in levels of gene or protein expression to determine what proteins are associated with specific metabolic or other life states. Some very interesting work is also being done by determining the 3D structure of proteins (either by analysis or simulation) and predicting function based on structure.

    The tools for the next big thing are out there. It's just a matter of going through the long grind of applying them. It's going to be a very long road, probably much longer than the process of sequencing the genome, but finding out (to a rough and ready approximation, at least) what every protein does is an accomplishable goal.

    --

    There's no point in questioning authority if you aren't going to listen to the answers.

  9. Eat that, Clinton & Blair! by Tor · · Score: 4

    Actually the sequencing part was completed earlier this year (giving tremendous subsequent rise to Celera stock). What they now did was the mapping - i.e. piecing the small fragments together.

    Moreover, Celera have also completed more than 90% of the human genome within the last year or so. Once complete, this is an indication that the time before the human genome (more than 200 trillion base pairs) are mapped will be shorter than originally anticipated. To compare, the government-funded Human Genome Project has so far spent over 10 years on the same job.

    Celera is doing a 4x oversampling on the human genome, unlike the HGP, which does 10x oversampling. This is possible because Celera is sequencing DNA from one single individual (most likely Craig Venter?), thus avoiding the uncertainty of wheter differences are due to sequencing artifacts or personal variations.

    Plus, Celera is using our (PE Biosystems) 3700 DNA Analyzer (fully automated, unattended operation 24 hours per day), whereas the Human Genome Project mostly use our older 377 DNA Sequencer, which requires manual reloading of samples after each run (every 2-3 hours).

    As originally stated when Celera was created two years ago, the data is going to be publicly available - a point that has gotten lost among very opinionated but not so informed readers of Slashdot. There will be a 3-month lag period, to ensure accuracy of the data, and to see if there is any information that could be used for patentable drugs & applications.

    (Mostly, Celera's business model is based on providing the tools that will give access to this database).

    And I have stock options! :-)

    -tor

    1. Re:Eat that, Clinton & Blair! by Lars+Arvestad · · Score: 4
      The comparisons you make are quite unfair. First of all, the Drosophila project was done in cooperation with the US funded Berkeley Drosophila Genome Project. That alone should keep you from barking at the government funded projects.

      Second, the fact that HGP has been going on for 10 years (how long has Celera been going) means nothing when the sequencing capacity seems to double every year. This means that you in one year can recover what has accumulated over several years of work!

      Third, HGP is making their data public within 24 hours. You think Celera doesn't make use of that data?

      I am also uncertain about your oversampling claims. If Celera is content with 4x and do not use public sequences (which they can't if they are supposed to be ahead), then they will have serious problem of actually connecting the pieces. Granted, you can still go gene-mining and make important discoveries. Also, I think (due to the competition) that the HGP has settled for 5x oversampling to get a rough draft available later this year. Whether there is one or more individuals sampled doesn't really matter. You are fighting statistics which says that you need 10x if you are going to have any hope of connecting the pieces. Notice that they used 14x for Drosophila. Actually, this came up in an earlier /. discussion where it was claimed that the HGP uses a single individual as well.


      Lars
      __

      --
      Reality or nothing.
    2. Re:Eat that, Clinton & Blair! by jor-el · · Score: 4
      this is misinformation tor,

      1) much like all the comments here your posting neglects to mention berkeleys fruitfly genome sequencing project that did a vast amount of work and without which celeras data wouldn't have been nearly so useful. it certainly wouldn't have made it to finished so quickly without the mapped BACs would it? which leads to point

      2) this crap about celera mapping 90% in one year when the public efforts spent 10 years blah blah blah. This really ticks me off, from the very start the plan with the public effort was to spend the vast majority of time developing the technology and techniques necessary to sequence rapidly and accurately, the accelerated curve has been known for ages and our lab went from sequencing apx 2 mb/year to 20 mb in (I think it was) 98 to over 350 mb now seq stats
      Considering that massive purchases of 377 sequencers and scientific collaborations by the hgp contributed VASTLY to the development of the 3700 it's rather crass to read the crowing about how celeras kicked ass while the public effort allegedly just sat around twiddling their thumbs. The press releases from the formation of celera at least give credit to the planning of the hgp
      Since the inception of the Human Genome Project (HGP) in 1990, a major shift in technology has been anticipated that would allow the entire sequence to be completed

      3) The HGP is now likely sequencing FASTER than celera, I know the doe has 80 megabases(equivalent to the 3700), sanger has 100 3700s and a ton of 377s, that's only 40% of the genome project and celera has what, 230 3700s? hrmm, rough unsubstantiated calcs would put total human effort near least 450 3700s ... ouch! Plus MIT now has more than any other group I believe, (although they aren't all working on human) and there's the vast capacity at washU.
      while it's true celera has the largest private supercomputer and that will help with assembling, the DOE started the human genome project, is still involved and just happens to have the largest supercomputers period.

      4)where you get the 10x oversampling number I haven't a clue. the goals are laid out here and additionally a figure of 6x is generally aimed for before trying to finish the clone, finished is still the bahama definition of I believe no more than 1 error per 10k

      yeah, I'll bet you have stock options and I'm sure they don't bias your postings and don't influence your continued use of outdated figures

  10. More detailed link by Brento · · Score: 4

    Here is a much more detailed link of the story from Celera's site, talking about the similarities between our genes and the fruit fly's. (I've got a dollar that says their computers are all Celerons, ha ha ho ho.)

    --
    What's your damage, Heather?
  11. Actually no they haven't sequenced the genome by ozborn · · Score: 4

    Celera hasn't even by their own definition sequenced the entire genome of Drosophila. What they have done is sequenced most of, or all of the euchromatic region. The highly repetitive heterochromatic DNA that is clustered around the centromeric regions and makes up an estimated 30% of Drosophila genome is not sequenced. There may even be some B-heterochromatic regions which are also unclonable which is a serious problem in trying to sequence highly repetive DNA. While these regions don't have the glamour of the gene-rich euchromatin (it is often referred to as junk DNA for that reason) they can effect everything from gene expression to chromosome pairing.

    And to the few posts I have read which think that this is some sort of private enterprise success story versus a slow blundering government, it isn't. It is a classic example of business going after the highly profitable bits and leaving the taxpayer to fund the basic research. The same basic research which incidentally made it all possible in the first place.

  12. Good, but the hard work remains to be done. by Christopher+Thomas · · Score: 5

    This is a wonderful accomplishment - now we can get started on the main problem.

    Having a complete map of a creature's DNA tells us, in principle, all of the proteins that it can synthesize throughout its lifetime. This gives us the building blocks that the creature uses to build things, and the chemical signals that it uses to direct internal operations.

    This is wonderful, and essential. To use an analogy, this is like a Victorian scientist, after years of studying a 1999 notebook computer, managing to deduce how transistors and the wires that connect them work.

    He still needs to deduce a lot about capacitance, resistance, and inductance to tell how signals will propagate and influence each other, and needs to build up from scratch all of the disciplines involved in integrated circuit design before he can understand how it works, but it's a start.

    Similarly, we can now move on to the next step in understanding biological creatures - trying to figure out what all of the proteins do, and how the systems built from them operate and interact with other such systems.

    This would not be an easy task under the best of circumstances. It's made worse by the fact that evolution puts little value on modularity - the systems will interact with each other to such a degree that it will be difficult to even define individual systems within the chaos that is an evolved being.

    I wish them luck. They have opened the door, and made available for study the vast landscape of interacting systems that we'll have to understand to truly understand how living creatures work.

  13. Here it is by MortimerK · · Score: 5

    -----BEGIN FRUITFLY-----
    GATCGATCGATGCTAGCTACGATCTGATCGATCGATCGTAGCTAGCTA
    ATCGTAGCTAGCTGACTATCGTAGCTAGCTAGCGTATCGTAGCGATCG
    GCATCGTAGCTAGCTAGCTACGTAGCTAGCTAGCGATCGTACGATCGA
    CGTAGCATCGTAGCTAGCTACGTACGATCGATCGATCGTAGCTAGCTA
    GACTAGCTAGCTAGCTAGCTAGCTACGTAGCGCGATCGATTCGATCGC
    AGCTTGACTGATCGGATCGTGCTACGGACTGTACGATCGTACGATCGC
    ------END FRUITFLY------