Green Grid Argues That Data Centers Can Lose the Chillers
Nerval's Lobster writes "The Green Grid, a nonprofit organization dedicated to making IT infrastructures and data centers more energy-efficient, is making the case that data center operators are operating their facilities in too conservative a fashion. Rather than rely on mechanical chillers, it argues in a new white paper (PDF), data centers can reduce power consumption via a higher inlet temperature of 20 degrees C. Green Grid originally recommended that data center operators build to the ASHRAE A2 specifications: 10 to 35 degrees C (dry-bulb temperature) and between 20 to 80 percent humidity. But the paper also presented data that a range of between 20 and 35 degrees C was acceptable. Data centers have traditionally included chillers, mechanical cooling devices designed to lower the inlet temperature. Cooling the air, according to what the paper originally called anecdotal evidence, lowered the number of server failures that a data center experienced each year. But chilling the air also added additional costs, and PUE numbers would go up as a result."
Of course they are wasting energy keeping it that cold. I am surprised the servers aren't frozen keeping it below 32 like that!
65 to 70 is plenty at my data center.
Yeah right, I'm not running a data center at 35 degrees C. People do have to go inside there and they shouldn't have to die from heat stroke. And, it would probably heat up any other rooms/offices it's next to.
If the owners of the building could run cooler I would think they would. Heat is expensive and building owners are cheap; if it is possible to spend less I would think that owners would.
"Maybe this world is another planet's hell"
Aldous Huxley
There is a flip side to the coin.... Higher inlet temperature can cause higher leakage current, resulting in lower efficiency. Some electricity extra will be lost to this effect. Also, in such conditions, thermal throttling can occur, reducing performance and particularly performance per watt, causing more energy to be required for same amount of work. Finally, there is some degree of longevity, which causes component failure ahead of expectations.
A way of getting the similar energy benefit without the risk would be something like what SuperMUC (http://en.wikipedia.org/wiki/SuperMUC) does: run water direct to the components. The problem is the upfront cost is generally not worth it except in places where energy costs are high enough to recoup that cost.
We looked for where the fibermap ran over a mountain range, and was near a hydroelectric plant. Our data center is cooled without chillers, simply by outside airflow 6 monhts of the year and with only a few hours use of chillers per day for another 3 months. I know this won'r help people running a DC in Guam, but for those who have a choice, locatiion makes a world of difference.
Can the baby seals insulate the data center?
Tree huggers telling an IT manager it's OK for his servers to burn up so save a baby seal.
Well, Google has already started running their data center much warmer than many data centers of the past, apparently with no ill effect.
It has nothing to do with hugging trees, simply hard nosed economics. If 5 degrees induces 3 more mother board failures in X number of months and you already have the fail-over problem handled it only takes a few seconds on a hand held calculator to figure out that trees have nothing to do with it.
The rules were written, as the article explaines, based on little if any real world data, designed for equipment that no longer exists, built with technology long since obsolete. It was probably never justified, and even if it was back in thr 70s and 80s, it isn't any more.
Google and Amazon and others have carefully measured real world data talen from bazillions of machines in hundreds of data centers. They know how to do the math.
Sig Battery depleted. Reverting to safe mode.
November 2012 Wired covers "hot" machine rooms in its paean to Google's data centers. Usually by the time they've picked up a story, it's done.
Vision with execution is hallucination.
I am bad in physics so I might say something stupid. But does it actually make a difference? I feel like the temperature of the hot components are WAY over 20C. So whatever energy they output is what you need to compensate for. In the steady state you need to cool as much as they heat. Isn't that constant whatever the temperature the datacenter is run at?
I've been an operator and sysadmin for many years now, and I've seen this experiment done involuntarily a lot of times, in several different data centers. Trust me, even if you accept 35 C, the temperature goes well beyond that in a big hurry when the chillers cut out.
Heat is death to computer hardware. Maybe not instantly, but it definitely causes premature failure. Just look at electrolytic capacitors, to name one painfully obvious component that fails with horrifying regularity in modern hardware. Fifteen years ago, capacitors were made with bogus electrolyte and failed prematurely. Some apparently still do, but the bigger problem NOW is that lots of items are built with nominally-good electrolytic capacitors that fail within a few months, precisely when their official datasheet says they will. A given electrolytic capacitor might have a design half-life of 3-5 years at temperatures of X degrees, but be expected to have 50/50 odds of failing at any time after 6-9 months when used at temperates at or exceeding X+20 degrees. Guess what temperature modern hardware (especially cheap hardware with every possible component cost reduced by value engineering) operates at? X+Y, where Y >= 20.
Heat also does nasty things to semiconductors. A modern integrated circuit often has transistors whose junctions are literally just a few atoms wide (18 is the number I've seen tossed around a lot). In durability terms, ICs from the 1980s were metaphorically constructed from the paper used to make brown paper shopping bags, and 21st-century semiconductors are made from a single layer of 2-ply toilet paper that's also wet, has holes punched into it, and is held under tension. Heat stresses these already-stressed semiconductors out even more, and like electrolytic capacitors, it causes them to begin failing in months rather than years.
Yes, it's generally in the nature of these companies to spend unneeded money. They hire people who's exact job is to make data centers' as efficient as possible. Even to the extent Facebook and others are open sourcing their information to try and get others involved to improve data center design. I say generally as I'm sure most seen the story on here recently over Microsoft wasting energy to meet a contract target, that however is a totally different kettle of fish.
Well, Google has already started running their data center much warmer than many data centers of the past, apparently with no ill effect.
This is an understatement. Google increased the temp in their data centers after discovering that servers in areas with higher temps had fewer hard errors. So they went with higher temps across the board, saved tons of money on lower utility bills, and have fewer hard errors.
Back in the 1950s, early computers used vacuum tubes, which failed often and were difficult to replace. So data centers were kept very cool. Since then, data centers have continued to be aggressively cooled out of tradition and superstition, with little or no hard data to show that it is necessary or even helpful.
And our customers (the telcos and enterprise) don't care enough about power savings for our management to pay me to work on it.
So our systems run with C-states disabled and no frequency/voltage stepping when idle.
The board of directors of the "Green Grid" is composed almost entirely of the companies that would benefit if data centers had to buy more computing hardware more frequently, rather than continued paying for cooling equipment.
Liberty in your lifetime
You can go a couple degrees warmer than in the "old days" (ten years ago). Things like bearings in fans and drives will fail. Capacitors will fail. Data centers produce LOTS of heat. I don't believe that the coin counters figured in the staff to replace the failed parts or the extra staff and time needed when manual procedures are used due to a downed system.
I have been running several passive heatsink cooled servers for 5+ years on ambient temps that get as high as 85 F during the day while the AC is off and I'm at work. IMO, money is better spent on lower TDP components. Generally for server CPUs you have a choice of a lower TDP more $ cpu vs a higher TDP but cheaper CPU of similar power. The lower TDP CPU will use less electricity, plus generate less heat, which amounts to less cooling, so you theoretically cover the extra cost over time.
Additonally the lower TDP means you have less likely hood of heat related failure. In my own limited experience, if you have something that gets so hot that it needs a fan to stay alive, then that moving part is what is going to fail before any solid state component fails, and thus you will loose more components to fan failures than anything else. Therefore its better to get a CPU that has a low enough TDP to allow passive cooling. Usually I do have case fans in place, but no single fan is responsible for a single component, so if one fails the others will keep things from dieing until it is replaced.
Additionally, you don't have a crisis when the chiller dies and you find that the recent maintenance has left the backup in a state where it is not operable, or some such nonsense(personally witnessed this kind of thing on more than one occasion). I am a programmer, so am usually just an observer to these things in production environments.
Low TDP CPUs + SSDs is the way to go IMO. After accounting for electricity/cooling savings, I believe an SSD is on par or cheaper than than 15k RPM HDD. Most scenarios will not reach the write limits of the SSD in 5 years(theoretically, time will tell on SSD reliability, as right now anecdotal evidence seems to suggest they have a high rate of DOA, but that may be due to poor QA).
*Pretty isolated anecdotal account, but I speculate that a setup that eliminates component specific fans(i.e. cpu fans) and focuses on lower TDP passively cooled components, will suffer fewer failures, reduce electricity and cooling costs, and decrease the risks posed by fan failures, chiller failures, heat related failures, or mechanical failures(HDDs). Less failures and less risks will save man hours replacing hardware or planning for those contingencies.*
I really wish there were some flexible heatpipes on the retail market(they do exist but only when ordered in large quantities), so we could have large heatsink plates on the side of the server chassis, and attach the flexible heapipes to transfer heat from internal components to the external heatsink. Similar to some of the custom built cases out there which use standard custom made heatpipes to create silent PCs, where the entire outside of the case is metal with large fins(usually used in audio studios). These cases though are built for specific components that fit the heatpipe configuration. Flexible heatpipes would fit into current cases/form factors with only the addition of the external heatsink. Additionally several units could share a large heatsink, which would be more effective since the area of several smaller seperate heatsinks would be wasted since several servers at any given time could be idle. I don't know how reliable heatpipes are though. If there was an internal DOA defect(maybe fluid leaked out), it won't be as obvious without doing testing on them, compared to a fan that simply doesn't spin.
Consolidated redundant power supplies would be another thing that would reduce heat, but these are super expensive and usually specialized to specific brands of servers+chassis. I don't know if it's an economy of scale thing, or economy of it's-for-enterprise-and-there-is-no-standard-so-you-won't-find-a-comparable-option-so-we'll-make-as-much-profit-as-we-can-get-away-with. Not that I object(sic?) to that, but usually puts it out of reach of all but the very large enterprises.
Computers crash/fail when overheating and in a datacenter that can happen very fast. You absolutely must keep the temperatures from getting too hot. Some datacenters can get away with minimal cooling. Some datacenters need chillers and tons of money invested in keeping things at a low enough temperature where computers wont randomly lock up on you from the heat. There must be some datacenters who have too much cooling but to say that datacenters in general dont need them demonstrates a lack of understanding what a datacenter is, that they are not all the same size nor is the hardware in them the same or all generating the same predictable temperatures.
http://interserver.net/
Well.. yes and no. Can you build servers that can take the heat? Sure. But that's not what most datacenters have. Sure, processors and maybe (and it's a big maybe) memory can take the heat... but in general, those 15K rpm disk drives are not going to like the extra heat. They have enough problems dissapating heat currently.
So.. possible, sure. But it does require some extra work. Your off the shelf HP, Dell or IBM, I wouldn't recommend it.
You do lower the life span of the equipment by placing it under enormous heat stress.. life could be reduced by several years if assuming a 10 year lifespan. If you have a 5 year life cycle, you may have to consider a 4 year life cycle. And even then, I'd avoid the 15K rpm drives and other things that aren't cooled very well (e.g. gobs of memory and even chipsets on some designs). That passive heat sink on your fibre channel card and/or 10Gbit ethernet... probably not going to cut it anymore... so that's also changes you'd have to make... there are many.
Again, you CAN do it.. but design has to be done... it's done with intent, not through random experimentation (unless you have money to burn).
My own experience with mostly-passively-cooled modern PCs is that while temperatures within remain low enough that everything continues to work fine on a hot day (if I switch off the window AC when I'm out for the day, things can exceed 110F ambient inside), there are localized failures of capacitors.
Specifically, the hottest caps fail first. Cooler caps fail later, or not at all, or are shown to be visibly in a lesser state of failure.
The hot caps are right next to and/or above the passive heatsink. The cooler caps are a little farther away, and/or lower (in terms of gravity).
In many cases, these capacitors are identical and wired in parallel.
So, electrically, things are exactly the same. The only difference is temperature.
Just throwing that out there. (I try to keep cooling and airflow to a minimum to reduce noise.)
Kid-proof tablet..
Why not ask the engineers who designed the chips, they have quite a bit of data on how their transistors, and wires behave at specified temperatures. It's only the latest (Untested) materials that have unknowns. This trial and error approach is like playing darts in a dark room, you may hit the target once in a while but your overall accuracy will suck.
Therefore if you run hotter, the cooling of that hotter air or extraction to work is better.
Heat your company's hot water tank from the hot air from the server room and you save energy twice.