Slashdot Mirror


Decoding the Genome: Serious Infrastructure

Roland Piquepaille writes "The Wellcome Trust Sanger Institute is one of the largest genomics data centers in the world. In "The Hum and the Genome," the Scientist writes about the IT infrastructure needed to handle the avalanche of data that researchers have to analyze. With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K). But the data center will need more than a petabyte of storage within three years, and its yearly electricity bill will reach 500,000 (more than $600K) for about 1.4 MW, enough to power more than a thousand homes. The original article gets all the facts, but this summary contains all the essential numbers."

30 of 175 comments (clear)

  1. Breaking news from our reporter Roland Piquepaille by JamesD_UK · · Score: 5, Funny

    Lots of computers use lots of power which costs lots of money!

  2. Amazing! by poopdeville · · Score: 2, Funny

    The Wellcome Trust Sanger Institute is amazing it will-

    - optimize seamless communities
    - generate vertical e-services
    - everage synergistic convergence

    and best of all

    - engage e-business content Perfect solution

    --
    After all, I am strangely colored.
  3. Decoding the gnome? by LiquidCoooled · · Score: 5, Funny

    I misread that and thought it involved a spotlight and torture methods to a poor garden gnome :(

    "You will tell us what we need to know. WHERE IS THE LAWN MOWER!"

    --
    liqbase :: faster than paper
  4. Who owns the results? by Dancin_Santa · · Score: 5, Interesting

    The idea behind all this mapping is to find genetic sequences that can be used to mend ailing people. Using a computer to throw every single combination possible against the wall and seeing what sticks is certainly a way to go about this, but it also raises the spectre of a single large company owning all these combinations. This wouldn't be such a terrible thing if there was some sort of actual science involved, but by brute-forcing results, they are doing nothing more complicated than running a counting program with an infinite number of bits.

    So each result is directly traceable to a number. Will these companies own these numbers? Can you even take out a patent on a number? In the DeCSS case, it was argued that the decoding algorithm was protected even though some implementations of it were nothing more than a carefully crafted prime number.

    I don't like the idea of someone owning numbers any more than I think someone should be entitled to the fruits of their own work. This whole patent "creation/reward" system is getting turned on its head because of the power of computers. What would have been prohibitive even 10 or 15 years ago is possible (even easy) now. How can we keep our rights without sacrificing the progress of science and the arts?

    1. Re:Who owns the results? by Hittite+Creosote · · Score: 3, Informative

      The centre is funded by the Wellcome Trust and the UK's Medical Research Council. The Wellcome Trust Sanger Institute is a non-trading, non-profit making registered charity. And they tend to make their results open - these are the people who said that the genome should belong to no one individual or company. In other words, if you want to keep your rights without sacrificing the progress of science - we need more places like the Sanger centre.

    2. Re:Who owns the results? by Gurdy · · Score: 5, Informative

      > "it also raises the spectre of a single large company owning all these combinations."

      You might be interested to read our data release policy http://www.sanger.ac.uk/Projects/release-policy.sh tml which describes how the finished data is made publicly available, to all, no charge.

      (I work at the Sanger Centre.)

      Dave

    3. Re:Who owns the results? by dan+dan+the+dna+man · · Score: 2, Informative

      Well the nice thing about the Wellcome Trust is that they are an independent charity and the largest non-corporate non-governmental source of biomedical research funding in the UK.

      Maybe you'd like to read their constitution: here

      Sure theres a chance that things can get tied up in the hands of companies - but lets look at the human genome project. The best data came out of the academic sector, the private data (held by Celera) didn't turn out to be too profitable after all (or even better quality) and is now in the public domain. I worry about the commercialisation of science as much as the next man, but lets face it, business just doesn't care unless there's a drug to sell at the end. Data is still just data.

      --
      I don't read your sig, why do you read mine?
    4. Re:Who owns the results? by Anonymous Coward · · Score: 2, Informative

      A lot of the analysis software used is also freely availible as it most of the web display code

      http://www.ensembl.org/

      another sangerite

  5. Enough to power a thousand homes by jurt1235 · · Score: 2, Interesting

    Doing some quick math here: 2000 processors+1petabyte, divide by 1000=
    2 processors + 1TB per house.
    In processors: Way past it
    In storage: Getting there (quick count of harddisks lying around= 750GB at least)

    Since my energy bill is lower, even with the hardware running 24/7/365, are they buying their energy to expensive or what?

    --

    My wife's sketchblog Blob[p]: Gastrono-me
    1. Re:Enough to power a thousand homes by Tassach · · Score: 2, Insightful
      It is more about the power consumption
      Yep.

      First off, utility companies generally charge a higher rate for business/industrial power than they do for residential power; so even if all things were equal, they'd still be paying more per KW/H than you.

      Secondly, you can't compare a couple of desktop machines running in a home office to a datacenter with multiple fully-populated 72U racks. Running 2 or 3 computers in a 120 ft^2 room isn't going to require any additional cooling. Running 2000 mahines in (say) a 1000 ft^2 data center is going to require heavy-duty air conditioning. Finally, remember that enterprise-grade hardware generally has redundant power supplies, 15000K rpm SCSI disks, and more powerful fans -- all of which draw more power (and throw off more heat) than a typical desktop system.

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
  6. Windows by Elshar · · Score: 2, Funny

    They must be using Windows ClusterFun edition.

    1. Re:Windows by gstoddart · · Score: 3, Funny
      They must be using Windows ClusterFun edition.

      I think you mis-spelled that. :-P
      --
      Lost at C:>. Found at C.
  7. Big computers = big power by goneutt · · Score: 4, Insightful

    TANSTAFL. This post seems drawn into the spinning power meter dials and not caring about what the computer is. If you want a lot of power, you need a lot of power. Chip scale efficiency could reduce their bill, but its a research foundation crunching numbers all day. If they need more money they just ask their contributors politly.
    How's this stack up with google's server farm bill.

    --
    Bacardi + slashdot = negative karma.
  8. Re:Roland by eclectro · · Score: 5, Funny

    What's the deal wiht this roland guy

    They're trying to decode his genome to find the missing link.

    Which will lead to his website, of course.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  9. have they heard of the petabox? by itsme · · Score: 5, Interesting

    http://www.archive.org/web/petabox.php

    it uses only 60kW for 1 Peta byte

  10. Whats with the emphasis on power and its costs? by manavendra · · Score: 3, Insightful

    What about the costs of scaling and maintaining such an infrastructure? The routine administrative tasks, reporting, etc? The costs for someone actually looking at the generated results to see if they are meaningful at all, and if it is all going in the right direction?

    --
    http://efil.blogspot.com/
  11. Math by Alphanos · · Score: 4, Interesting

    Cost of 0.75 MW: ~$170K
    $/MW: ~$227K

    Cost of 1.4 MW: >$600K
    $/MW: >$429K

    Why the difference?

    --
    Alphanos
    1. Re:Math by Walkiry · · Score: 2, Interesting

      >Why the difference?

      Presumably, the infrastructure to get 1.4 MW safely inside the same building and distribute it is more complcated and expensive than what two independent .75 MW would be. Things tend to go down in price when you buy in bulk, until you reach a point where the amount you're asking for is giving more trouble than what is usually dealt with.

      --
      ---- Take the Space Quiz!
    2. Re:Math by Renraku · · Score: 2, Insightful

      Diminishing returns.

      You've gotta have a lot of infrastructure outside the facility to be able to support 1.4MW. Infrastructure that is probably taken care of by the power company, for a fee.

      And the more power you push down the line, the more power that is lost to the environment. Especially if you're overcharging the lines, which causes acceleration of the loss the more power you pump into them.

      --
      Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
  12. Re:Fuck Roland by frakir · · Score: 5, Informative

    Mod parent up.

    Just have a look on http://www.google.com/search?query=Roland+Piquepai lle&as_sitesearch=slashdot.org/ or search slashdot articles on roland piquepaille.

    Real whore here is Timothy. I bet he'll post an ad for your site for some change, too.

  13. Re:Breaking news from our reporter Roland Piquepai by matt+me · · Score: 2, Funny

    enough to power 1000 homes with the equivalent power of distributed computing software?

    probably not.

  14. Units by Hank+Chinaski · · Score: 2, Insightful

    They use Megawatts as a measurement of energy consumption? Should't that be Megawatt/hour ? P.S.: Dont click the link. Editors could at least include as "Signup required" warning.

    --
    IAAL
  15. Genome - the dog chasing its tail? by Wayne247 · · Score: 3, Insightful

    The interesting bit about genome research is that suppose we do find what the human genetic code all means. We can then start treatments to correct genetic problems, right? If we do so, and say we correct illness X on some kid. When this kid grows up, becomes an adult and have kids of his own, what kind of genetic heritage will he give his own kids? Will these kids inheric the original bad gene of their parent? If so, we'd be running at our lost since defects would multiply across generations...

    1. Re:Genome - the dog chasing its tail? by J.+Random+Luser · · Score: 2, Informative

      To correct the kid's kids, you need to make the correction in the gamete, before the original kid is conceived. Maybe I'm not reading enough lately, but from Huxley to Gattaca, I don't recollect anyone actually trying that method...

  16. Re:Exchange Rate by Anonymous Coward · · Score: 2, Informative
    Ways to put the Euro symbol in webpages:
    • Hex code 0xA4 (decimal 164) in codepage 8859-15 is what you get when you press AltGr+e. This happens to be the general currency symbol in 8859-1, so it's not a good choice if you can't make sure that the document comes with the correct encoding declaration. ""
    • HTML entity € "€"
    • Unicode character reference € ""
    • Hexadecimal unicode character reference € ""
    As you can see, Slashcode filters all but the html entity, so that's your only choice here if you have to have the symbol. Most people simply use EUR.
  17. Re:Fuck Roland by MrNonchalant · · Score: 3, Funny
    According to the Google ads the joke might be on him:

    Roland On Sale
    Low Prices, Free Shipping
    12 Months To Pay, Always In Stock
    www.SamAsh.com

    Roland in stock
    Roland sale
    up to 80% off Liquidation Sale
    www.infinitemarketplace.com


    Anyone else want to buy Roland and make him shut up?
  18. Re:Roland by metlin · · Score: 2, Funny


    Awww, he's just French... =)

  19. Re:Some computers use more power and do less by jordie · · Score: 2, Informative

    Good to see you've got your facts straight before you posted.

    AMD does not have a CPU running anywhere NEAR 4GHz, you're thinking of Intel.

    As far as power consumption..
    "Even the Athlon 64 X2 4800+ consumes less power than all single core 90nm Pentium 4 CPUs" - Anandtech

    For more information please see this and this

    For less power, better performance use AMD.

  20. We can do either by cookie_cutter · · Score: 3, Informative
    Will these kids inheric the original bad gene of their parent?

    It depends. If you are doing somatic cell genetic engineering, then you only fix those cells in the patient in which the defect manifests itself, and not the germ-line cells (ie, sperm and eggs), so the 'fix' is not passed on to the next generation. If instead you modify the germ-line cells as well, then the 'fix' is passed on to the next generation.

    One of the main reasons for doing the somatic fix rather than the germ-line fix is that we're still pretty damned new to this genetic engineering thingy, so it's probably a good idea to not fuck with the genetic heritage of future generations just to cure a patient today. However, as the science and technology develops, and we gain more experience with it, our self-assuredness in our abilities will likely increase, and we'll think we know what we're doing enough to risk making 'permanant' changes to the germ-line. I put 'permanant' in quotes, because if we make genetic changes one way, we should be able to turn them back if and when we decide they are mistakes.

  21. Re:I haven't a clue... by oneandoneis2 · · Score: 5, Insightful
    No, no!

    There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.

    There are 23 chromosomal pairs. Each half of each pair contains the same (more or less) information - you could think of it as a genetic back-up system. (Except for the XY chromosomal pair in males). At the start, one chromosome is maternal, the other is paternal. But over time, they actually swap bits around until there's a mixture.

    Each chromosome contains one immensely long strand of DNA, a double-helix. This double helix is NOT redundant, only one of the two strands contains genetic information: The other strand is only there to make it easier to copy the helix.

    The human genome is approximately 3 billion bases long, and it takes three bases (known as a codon) to code one amino acid. 4 x 4 x 4 = 64 possible amino acids. (Altho they only actually code 20 or so). Then you have to filter out all the codons that don't actually code anything, and are discarded before the gene is transcribed into a protein.

    NOW do the math!

    --
    So.. it has come to this