Decoding the Genome: Serious Infrastructure
Roland Piquepaille writes "The Wellcome Trust Sanger Institute is one of the largest genomics data centers in the world. In "The Hum and the Genome," the Scientist writes about the IT infrastructure needed to handle the avalanche of data that researchers have to analyze. With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K). But the data center will need more than a petabyte of storage within three years, and its yearly electricity bill will reach 500,000 (more than $600K) for about 1.4 MW, enough to power more than a thousand homes. The original article gets all the facts, but this summary contains all the essential numbers."
A beowulf cluster of these... Oh, wait, never mind!
Rrrrrr ... I love it when you talk dirty
Lots of computers use lots of power which costs lots of money!
The Wellcome Trust Sanger Institute is amazing it will-
- optimize seamless communities
- generate vertical e-services
- everage synergistic convergence
and best of all
- engage e-business content Perfect solution
After all, I am strangely colored.
I misread that and thought it involved a spotlight and torture methods to a poor garden gnome :(
"You will tell us what we need to know. WHERE IS THE LAWN MOWER!"
liqbase
I think I'm immune to large numbers. This article summary has absolutely no effect on me whatsoever.
The idea behind all this mapping is to find genetic sequences that can be used to mend ailing people. Using a computer to throw every single combination possible against the wall and seeing what sticks is certainly a way to go about this, but it also raises the spectre of a single large company owning all these combinations. This wouldn't be such a terrible thing if there was some sort of actual science involved, but by brute-forcing results, they are doing nothing more complicated than running a counting program with an infinite number of bits.
So each result is directly traceable to a number. Will these companies own these numbers? Can you even take out a patent on a number? In the DeCSS case, it was argued that the decoding algorithm was protected even though some implementations of it were nothing more than a carefully crafted prime number.
I don't like the idea of someone owning numbers any more than I think someone should be entitled to the fruits of their own work. This whole patent "creation/reward" system is getting turned on its head because of the power of computers. What would have been prohibitive even 10 or 15 years ago is possible (even easy) now. How can we keep our rights without sacrificing the progress of science and the arts?
With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K)
I just use 11 stone of meat and goop to deal with that very same data. It's been running 24 hours a day for 30 years now with no serious bugs... Yet...
Unfortunately, I am not Wil Wheaton
I wish I could get the submitters exchange rate. I'd be rich rich rich. It's currently around 1.9 dollars to the pound meaning anual running costs are more line $260k which could rise to around $1m.
Having said that everything is cheaper on the US side of the pond so the submitter is probably about right. Sigh.
I used to have a better sig but it broke.
Doing some quick math here: 2000 processors+1petabyte, divide by 1000=
2 processors + 1TB per house.
In processors: Way past it
In storage: Getting there (quick count of harddisks lying around= 750GB at least)
Since my energy bill is lower, even with the hardware running 24/7/365, are they buying their energy to expensive or what?
My wife's sketchblog Blob[p]: Gastrono-me
They must be using Windows ClusterFun edition.
He gets a shitload of ad revenue from slashdot due to him linking to his own site in every fucking story he posts here. And like by magic, every fucking story he posts is accepted by the "editors".
TANSTAFL. This post seems drawn into the spinning power meter dials and not caring about what the computer is. If you want a lot of power, you need a lot of power. Chip scale efficiency could reduce their bill, but its a research foundation crunching numbers all day. If they need more money they just ask their contributors politly.
How's this stack up with google's server farm bill.
Bacardi + slashdot = negative karma.
What's the deal wiht this roland guy
They're trying to decode his genome to find the missing link.
Which will lead to his website, of course.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
http://www.archive.org/web/petabox.php
it uses only 60kW for 1 Peta byte
What about the costs of scaling and maintaining such an infrastructure? The routine administrative tasks, reporting, etc? The costs for someone actually looking at the generated results to see if they are meaningful at all, and if it is all going in the right direction?
http://efil.blogspot.com/
DO NOT CLICK ON THE SUMMARY.
It leads to primidi.com, which is his 'blog'. If nobody goes to his site, he gets no money.
MOD PARENT UP!!!
I'm still in stiches
Cost of 0.75 MW: ~$170K
$/MW: ~$227K
Cost of 1.4 MW: >$600K
$/MW: >$429K
Why the difference?
Alphanos
I don't know which is the best in the ix86 market at the moment, possibly Via, possibly Intel, used to be Transmeta.
It's one of the things many geeks tend not to consider when they're dreaming up their ideal ultra powerful, ultra cheap beowulf cluster. The fact that you need a megaWatts worth of power and a megaWatts worth of cooling to go along with those $400 1U high density servers running the latest 4GHz AMD CPUs. Suddenly those cheap servers don't look so cheap.
Deleted
Right, I'm not going to this fuckwit's blog. Can anyone tell me what sort of kit they're using?
How do you know every submitted article is accepted?
We only see his ACCEPTED submissions.
Chimps on bicycles...
I look forward to the replies.
Quit badmouthing my man Roland! He provides a valuable service to slashdot, and he even gave me a blowjob for only $10! Roland will do ANYTHING for many, god bless his 'im.
Mod parent up.
i lle&as_sitesearch=slashdot.org/ or search slashdot articles on roland piquepaille.
Just have a look on http://www.google.com/search?query=Roland+Piquepa
Real whore here is Timothy. I bet he'll post an ad for your site for some change, too.
enough to power 1000 homes with the equivalent power of distributed computing software?
probably not.
not suprising really with webpages like this
--------
Decoding the Genome Needs Superpower
The Wellcome Trust Sanger Institute is one of the largest genomics data centers in the world. In "The Hum and the Genome," the Scientist writes about the IT infrastructure needed to handle the avalanche of data that researchers have to analyze. With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K). But the data center will need more than a petabyte of storage within three years, and its yearly electricity bill will reach 500,000 (more than $600K) for about 1.4 MW, enough to power more than a thousand homes. Read more...
Below is a small diagram showing the current IT infrastructure of the Wellcome Trust Sanger Institute, used by the Human Genome Project (Credit: Wellcome Trust Sanger Institute).
The current IT infrastructure of the Wellcome Trust Sanger Institute
Here is a link to a larger version of this chart.
Now, let's look at this IT infrastructure in detail.
* Computers
o Today: The datacenter hosts about 2,000 Alpha processors, originally designed by Digital Equipment (DEC), before its acquisition by Compaq, and later by Hewlett-Packard (HP).
o Tomorrow: The Sanger Institute is looking at cheaper solutions, especially now that HP has officially stopped any development on the Alpha front.
* Storage
o Today: Three different computer rooms have a total capacity of about 300 terabytes.
o Tomorrow: The IT management forecasts about a petabyte within three years -- at least.
* Databases
o Today: There are about 40 different databases, and only two of them are in the 50 terabytes area.
o Tomorrow: One of the databases, the Trace sequence archive currently contains about 700 million entries, and it doubles every 10 months.
* Power bills
o Today: The current equipment needs about 0.75 megawatts for a cost of 140,000 per year (about $170K).
o Tomorrow: The new setup will need about 1.4 megawatts, which will raise the yearly bill to about 500,000 (about $615K today).
The supercomputer vendors can say all they want about diminishing costs. But they almost never talk about the power bills...
Sources: Stuart Blackman, The Scientist, Volume 19, Issue 11, Page 15, June 6, 2005; and various websites
...do they run on Linux??
*ducks*
They use Megawatts as a measurement of energy consumption? Should't that be Megawatt/hour ? P.S.: Dont click the link. Editors could at least include as "Signup required" warning.
IAAL
It seems all their boxen are based on Alpha processors. Why? Simply, because even today, you can get the most flops per clock tick out of Alpha. It's a shame such a wonderful architecture was burried.
:)
Anyway, I think I'll be the first in line when they deceide to retire their gs320 servers
The interesting bit about genome research is that suppose we do find what the human genetic code all means. We can then start treatments to correct genetic problems, right? If we do so, and say we correct illness X on some kid. When this kid grows up, becomes an adult and have kids of his own, what kind of genetic heritage will he give his own kids? Will these kids inheric the original bad gene of their parent? If so, we'd be running at our lost since defects would multiply across generations...
n/t
Why is it that the processor is not named in the articles it if is Alpha or Itanium based. If this was Power/PowerPC/Cell based, there would be a nice big IBM logo on the article.....
Imagine a beowulf cluster of these.
Generating enough heat to provide for the winter time needs of all europe
-Shaunak
Dont bother decoding the thing!! ask God for the password!!
Is he digitally signed ?
I wonder what the target of this research is. Daily I hear news on TV about people dying of hunger in Africa and other parts of the world. Can't this money be used there? Or am I nuts to think that way.
a beowolf cluster with these..
BTW, mod parent up.. (oops, no parent)
FUCK OFF TIMOTHY
Hasn't anyone told them that decoding a genome is in violation of the DMCA?
It's an interesting story, which I wanted to see.
The fact that the same person who submitted it also submitted a whole bunch of other stories is besides the point.
My Journal
That remark should be sufficient here. I mean... whoa...
Physics is the universe's operating system.
Do YOU hate Roland Piquepaille? It doesn't have to be so. With my scientifically proven brainwashing program, you can rid yourself of piquephobia forever!
http://www.bemmu.com/pique/
My car requires 1.21 jigawatts and a flux-capacitor.
In C++, friends can touch each others private parts.
Anyone else want to buy Roland and make him shut up?
I'll take stories like this (Roland and all) over the consistently boring "Here's what Apple/Microsoft/SCO/Sony/USPTO is doing today!" stories we're inundated with otherwise.
At least this story is interesting. Why does it piss you off so much that someone makes some money off finding this story? If Roland makes some coin because he's bothered to pay attention to news sites I don't read and report interesting articles to a site I do read, by all means, more power to him! I'm glad he's doing the legwork so I don't have to.
Jeremy
Looking for a Python IRC bot?
Awww, he's just French... =)
I've suggested an option that would let users filter, but it seems to have been ignored.
Your hair look like poop, Bob! - Wanker.
Don't tell me endlessly repeated combinations of the same four base pairs needs 300 TB...
Friend of mine manages a cluster that models the worlds oceans. One thing they forgot about when planning it was the cooling needs. That added a nice chunk to the budget.
I doubt they even looked at the power requirements.
But it is cool to have access to a super computing cluster.
Those stats sound roughly comparable, if anything slightly lower, than what a private company I know of runs for seismic data processing.
11*43+456^2
It depends. If you are doing somatic cell genetic engineering, then you only fix those cells in the patient in which the defect manifests itself, and not the germ-line cells (ie, sperm and eggs), so the 'fix' is not passed on to the next generation. If instead you modify the germ-line cells as well, then the 'fix' is passed on to the next generation.
One of the main reasons for doing the somatic fix rather than the germ-line fix is that we're still pretty damned new to this genetic engineering thingy, so it's probably a good idea to not fuck with the genetic heritage of future generations just to cure a patient today. However, as the science and technology develops, and we gain more experience with it, our self-assuredness in our abilities will likely increase, and we'll think we know what we're doing enough to risk making 'permanant' changes to the germ-line. I put 'permanant' in quotes, because if we make genetic changes one way, we should be able to turn them back if and when we decide they are mistakes.
Seriously, I'm putting "127.0.0.1 primidi.com" in my hosts file TODAY.
So much for the whole, "only as complex as a fruit fly" blurb which people use to say humans are simple creatures.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
as to what they are actually doing with all this computing power.
OK I broadly understand 'sequencing the human genome' is mapping out all the combinates of genes. There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.
So why does it take so much computing power ?
What are the really doing with it ?
Why do they dum down these stories down so much ?
Facts and figures dont make a science story/article!
Roland can't hold a candle to the master.
Unfortunatly im lying
The only things certain in war are Propaganda and Death. You can never be sure which is which though
Only one person in the world has ever claimed to have met him - in the pressroom at Microsoft Devshed Conference in Boston complete with a Roland Pipaquelle badge - and described him as a fortyish reddish-blonde who giggled a lot.
;P
Oh yeah? Wonder what cold crème he uses. Rolland Pipaquelle is a 61-year-old Jehovah's Witness who lives in a shabby genteel garden apartment in desperate need of an interior decorator on a heavily trafficked commercial road at nnnn XXXXXXXXX XXX. XXXXXX, New York. XXXXXX is in XXXXXXXXXXX and XXXXXXXXXXX is Microsoft territory.
[snip]
Stop that! This is silly! Really! There is no room for this kind of silliness on Slashdot! Now... go home! And stop it!
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
and so are fruit flies.
An arabadopsis plant on the other hand, that like most plants survives by modifying its cells rather than running away from danger, that's complex.
It's kind of sad that the datacenter I work in does nothing anywhere near as important as genome number crunching.....yet uses a TON more power, and has WAY more storage than this genome DC in this story...
Instant Karma's gonna get you...
I have a sony DMS-8400 petabyte storage array sitting in storage. They should buy it from me.
You wouldn't believe how hard it is to sell something like this. It seems like any of the companies that need it have the money to purchase it new. Argh!
Need Free Juniper/NetScreen Support? JuniperForum
and overlap. please see my other post linking to the http://www.ensembl.org/ genome browser.
if you want to see a very dense genome, try looking at some viri. they take advantage of the fact that each amino acid that is used to make the protein machinery are encoded using three bases, and so can put three genes almost on top of each other. It's on the level of funkyness of a programmer writing a sequence of bits in machine language where 8 fully functional programs could be derived depending on whether you shift out one to eight bits from the start of the "program" before loading the program onto the stack of a cpu that has an 8 bit opcode system.
Why does it piss [some people] off so much that someone makes some money off finding this story?
When Jon Katz left they decided to hate Michael Sims. Now Sims is gone so they need a new target. Not that Katz, Sims, offered any great insight or content to slashdot, but the hatred and paranoia against them is beyond reason. That said, I don't much like Piquepaille's site and don't click on his links. But he does offer the service of collecting and collating technology information - a service not much different from slashdot I might add - and clearly, some people find it interesting and worthwhile. The claims of some kickback scheme going on really reek of defamation though. Get some proof or STFU. --M
I'd like to know the load average on all of those processors or if, for the most part, they're just spinning their electronic wheels.
Be careful not to get a Flux Capacitor to close to this
-- www.globaltics.net
Political discussion for a new world
The Wellcome Institute... I wonder how they get their samples?
I don't like the idea of someone owning numbers any more than I think someone should be entitled to the fruits of their own work.
I'm not crazy about the patent system either but is that what you really meant? People are not entitled to the fruits of their own work?
What are you? Some sort of freaking communist?
They should check out the going-out-of-business sale at Transmeta. Pick up a few dumpsters of low power chips.
Using insults and insinuation only serve to make you look bad.
If you want better content, go submit it.
Information: "I want to be anthropomorphized"
That's some cheap electricity. I pay > $600/y for electric in my apartment. They must have their own power plant (no I didn't RTFuckingA, thank you very much.) I wonder how much of that waste heat they use for heating in the winter. ;)
"A shift from proprietary to commodity hardware will also help keep costs down, as will the planned move away from a proprietary 64-bit operating system to open-source Linux. Though the move chimes with Sanger's open-source ethos for its sequence data, Butcher cites solid practical reasons for the change. "HP [Hewlett Packard] pulled the plug on the Alpha chip," he says, "so we have nowhere to go." Moving to another proprietary system means it could happen again, he says. "I want something we can rely on and have control over our destiny for a good few years into the future.""
Linux rules again, proprietary fanboys!
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
I just saw a commercial the other day from IBM, int he commercial 2 scientists were looking for computers to help them map the human genome and replicate the folding of protiens.
The IBM representative said "Here is Gene, it is able to fold protiens and map the entire human genome". It was a cluster of IBM systems (maybe 40 total).
I just laughed and tried to explain to my wife how much BS this was (which basically describes all marketing).
Oh, I'm terribly sorry! I clicked the link!
I admit, I wasn't going to, until you and your nazi friends convinced me that this Roland fellow must have something important to say, since everyone wants to repeal his freedom of speech.
Congradulations! You achieved exactly the opposite of your intentions.
Some social pointers:
1. Using bad language does NOT make you cool.
2. If someone has a different point of view than you do, it's NOT OK to assault them, verbally or otherwise.
3. If Roland and Slashdot have a financial arrangement, it's none of your business. Implying that you should have been informed of such an arrangement is conceited.
Remember Slashdot ( like life ) is what you make of it.
"I'd rather win in an ugly car than lose in a pretty car" - Jari Lahdenpera
Friend of mine manages a cluster that models the worlds oceans. One thing they forgot about when planning it was the cooling needs. That added a nice chunk to the budget.
I used to work at the Sanger Institute (before they were quite so big). One year after the main building was occupied, there was a fire in the main server room, filling the informatics corridor with black smoke. It turned out to have been started by a fan that had been left on all year...
1.4 kW is about 2 horsepower. At 110 V, 1.4 kW is a current draw of 12 A. (At 220 V, it's 6 A.) I guess "over 1000 houses" sounds much better than "a few hundred houses."
Unlimited growth == Cancer.
I want their electricity provider. This works out to less than $.025 per kilowatt-hour. Large industrial plants do not get that low of a rate.
300 terabytes, advancing to a petabyte in three years? I can see it now.
/all/ the genome? Okay. Uhm. Well, we already got Alice scanned in. I guess we can have Bob done in another couple years or so, maybe Charlie after that, but I think we're going to need to look at how we're doing this..."
"Yeah! We got it! The whole Human Genome! We scanned that sucker in... wait, what? You meant...
Yahoo! Pipes are awesome. How awesome? http://pipes.yahoo.com/jesdynf/slashdot
$200 billion for war in Iraq and nothing to show for it. that;s the real waste. Military spending. It sucks the life blood.
$200 billion new highway bill. Waste of cash.
Science is the seed corn of the future. You are not nuts, just not completely informed.
However, if I were doing this seriously, I would be shopping here.
DS-96
96 CPU's (Efficeon)
230 GFlops
1500W Max Power
Is the hardware cheaper? No. But once you factor in the power and cooling infrastructure and costs, it's definitely the way to go. Especially when you consider that 22 of the above machines will give you 2112 CPU's cranking away with a max power draw of 33,000 Watts, instead of 750,000 Watts.
Granted, with WTSI's setup, there's still the consideration of power and cooling requirements for the storage arrays. Not an insignificant issue.
Mediocrity knows nothing higher than itself; but talent instantly recognizes genius. -- Sir Arthur Conan Doyle
How many Library of Congresses would this power?
I am a comp sci major, just finished my Bsc and continuing towards data mining and bioinformatics for a graduate degree...I do not know as much as I would like to know however I can tell you that the best algorithms are typically brute-force solutions. Why? Because the amount of data and the precision that we require does require all combinations to be processed and analyzed. This all comes back to algorithm efficiency and analysis (big-O a.k.a asymptotic growth rates). Many of these algorithms are of the NP-Hard class, meaning that a polynomial time(read: 'efficient') algorithm in terms of the input (n x m matrix[n dna sequences, each at most m bases long]) are very very very unlikely.
There are proofs showing that the algorithms are the best (even though they are the brute force) that is possible....so one way or another someone will have to do the 'infinate computations'
Might as well get started!