The Risks and Rewards of Warmer Data Centers
1sockchuck writes "The risks and rewards of raising the temperature in the data center were debated last week in several new studies based on real-world testing in Silicon Valley facilities. The verdict: companies can indeed save big money on power costs by running warmer. Cisco Systems expects to save $2 million a year by raising the temperature in its San Jose research labs. But nudge the thermostat too high, and the energy savings can evaporate in a flurry of server fan activity. The new studies added some practical guidance on a trend that has become a hot topic as companies focus on rising power bills in the data center."
Locate the server farm in Antarctica!
http://www.geoffreylandis.com
1. Get a thermostat you can control with a computer
2. Give the computer inputs of temperature and energy use, and output of heating/cooling
3. Write a program to minimize energy use (genetic algorithm?)
4. Profit!!
Possible problem: do we need to factor in some increased wear & tear on the machines for higher temperatures? That would complicate things.
No they didn't - what they did do is figure out that increased temperature is not correlated to higher failure rates - the failure rates don't magically decrease as it gets hotter.
Here's the link for your review: http://hardware.slashdot.org/story/07/02/18/0420247/Google-Releases-Paper-on-Disk-Reliability
I realise that this is not something that could be done quickly, it would require co-operation from all major vendors and then only if it would actually end up being more efficient overall. There would be lots of hurdles to overcome too... Efficient ducting (no jagged edges or corners like int domestic HVAC ductwork), no leaks, easy interconnects, space requirements, rerouting away from inactive equipment etc etc etc.You would still need some ac in the room as there is bound to be heat leakage from the duct-work, as well as heat given off from less critical components, but the level of cooling required would be much less if the bulk of the heat was ducted straight outside.
So I know the implementation of something like this would be monumental, requiring redesigning of servers, racks, cabinets and general DC layout. It would probably require standards to be laid out so that any server will work in any cab etc (like current rackmount equipment is fairly universally compatible), but after this conversion, could it be more efficient and pay off in the long run?
Just thinking out loud.
Tom...
Well, if you have a large cluster, you can load balance based on CPU temp to maintain a uniform junction temp across the cluster. Then all you need to do is maintain just enough A/C to keep the CPU cooling fans running slow (so there is excess cooling capacity to handle a load spike since the A/C can only change the temp of the room so quickly)
Or, you can just bury your data center in the antarctic ice and melt some polar ice cap directly.
I know it was meant as a joke, but moving to colder climates may not be such a bad idea. Moving to a northern country such as Canada or Norway, you would benefit from the colder outside temperature, in the winter, to keep the servers cool and then any heat produced could be funnelled to keeping nearby buildings warm. The real challenge will be keeping any humidity out, but considering how dry the air during the winters can get there it may not be any issue.
All this said and done, trying to work out the sweet spot between not cooling a room to save energy and not having the server fans turn on is important. I would be curious to know if there are any solutions that allow the system temperature monitors to be linked into a central system, which is then linked to the room's climate control system exist?
Jumpstart the tartan drive.
If you save enery by having warmer data centers, but that it shortens the MTBF, is it really that big of a deal?
Let's say the hardware is rated for five years. Let's say that running it hotter than the recommended specifications shortens that to three years.
But in three years, new and more efficient hardware will probably replace it anyway because it will require, let's say, 150 watts instead of 200 watts, so the old hardware would get replaced anyway because the new hardware will cost less to run in those lost two years.
"Sure, the fans kick in and you aren't saving as much, but are you still saving? I suspect you still are, there is a reason you are told to run ceiling fans in your house even with the AC on."
If only someone would do a study based on real-world testing, we could be sure... Oh, wait...
There are several differences between ceiling fans and server fans. You can't use one to make predications about the other. "Using one large fan to increase airflow in a room is a more efficient way for people to feel cooler than using AC to actually drop the temp a few extra degrees", but this does not imply that "running a bunch of little fans to individually increase heat sink efficiency in each of a number of computers would be moer efficient than just keeping the room cool enough for those heat sinks to do their job in the first place".
Fahrenheit backwards? That shit was metric before the Metric System even existed.
To wit:
0F is about as cold as it gets, and 100F is about as hot as it gets.
See? Metric.
The studies were not long enough to constitute a very in-depth analysis. It would have to be a multi-month, or up to a year to analyze all the effects of raising temperatures.
For example, little was considered with:
1) Mechanical Part wear (increased fan wear, component wear due to heat)
2) Employee discomfort (80 degree server room?)
3) Part failure*
*If existing cooling solutions had issues, it would be a shorter time between the issue and additional problems since you have cut your window by ~15 degrees.
It's all fun and games till someone divides by 0. Then it's hilarious.
For starters, people sweat and computers do not. So, airflow helps cool people by increasing evaporation, in addition to direct thermal transfer. Even when you think you aren't sweating, your skin is still moist and evaporative cooling still works.
Unless someone invents a CPU swamp cooler, that's just not happening on a computer. You do need airflow to keep the hot air from remaining close to the hot component (this can be convection or forced), but you don't get that extra... let's call it "wind chill" effect that humans feel.
Fahrenheit backwards? That shit was metric before the Metric System even existed.
To wit:
0F is about as cold as it gets, and 100F is about as hot as it gets.
You're right for the 40th parallel or so. But there are parts of the world that routinely dip below 0 deg F (-18 deg C) and other parts that routinely climb above 100 deg F (38 deg C). Things like that are why SI switched from Fahrenheit and Rankine to Celsius and Kelvin.
And yet the temperature here measured in F gets negative every winter. And where I previously lived it got above 100F every summer (and it also does where I am now, but only a day or three each year).
But in both those places a temperature of 0C was the freezing point of water, and 100C the boiling point. Yes that 100C one isn't so useful in terms of daily temperature, the 0C is though since whether water will freeze or not is the main transition point in daily temperature.
I'm less concerned with the fine-tuning of the environment for servers than I am with getting the basics right. How many bad server room implementations have you seen?
I'm sitting in one. We used to have a half-dozen built-for-the-purpose Liebert units scattered around the periphery of the room. The space was properly designed and the hardware maintained whatever temp and humidity we chose to set. They were expensive to run and maintain but they did their job and did it right.
About seven years ago, the bean-counting powers-that-be pronounced them "too expensive" and had them ripped out. The replacement central system pumps cold air under the raised floor from one central point. Theoretically, it could work. In practice, it was too humid in here the first day.
And the first week, month, and year. We complained. We did simple things to demonstate to upper management and building management that it was too humid in here, things like storing a box of envelopes in the middle of the room for a week and showing management that they had sealed themselves due to excessive humidity.
We were, in every case, rebuffed.
A few weeks ago, a contractor working on phone lines under the floor complained about the mold. *HE* got listened to. Preliminary studies show both penicillin (relatively harmless) and black (not so harmless) mold in high concentrations. Lift a floor tile near the air input and there's a nice thick coat of fluffy, fuzzy mold on everything. There's mold behind the sheetrock that sometimes bleeds through when the walls sweat. They brought in dehumidifiers that are pulling more than 30 gallons of water out of the air every day. The incoming air, depending on who's doing the measuring, is at 75% to 90% humidity. According to the first independent tester who came in, "Essentially, it's raining" under our floor at the intake.
And the areas where condensation is *supposed* to happen and drain away? Those areas are bone dry.
IOW, our whole system was designed and installed without our input and over our objections by idiots who had no idea what they were doing.
So, my fellow server room denizens, please keep this in mind - When people (especially management types) show up with studies that support the view that the way the environment is controlled in your server room can be altered to save money, be afraid. Be very afraid. It doesn't matter how good the basic research is or how artfully it could be employed to save money without causing problems, by the time the PHBs get ahold of it, it'll be perverted into an excuse to totally screw things up.
I was at a Google presentation on this last night. If I remember correctly, I believe they found the 'ideal' temperature for running server hardware without decreasing lifespan to be about 45 C.
In the beginning, there was null.
If there is a failure of AC ... that is, either Air Conditioning OR Alternating Current, you can see a rapid rise in temperature. With all the systems powered off, the latent heat inside the equipment, which is much higher than the room temperature, emerges and raises the room temperature rapidly. And if the equipment is still powered (via UPS when the power fails), the rise is much faster.
In a large data center I once worked at, with 8 mainframes and 1800 servers, power to the entire building failed after several ups and downs in the first minute. The power company was able to tell us within 20 minutes that it looked like a "several hours" outage. We didn't have the UPS capacity for that long, so we started a massive shutdown. Fortunately it was all automated and the last servers finished their current jobs and powered off in another 20 minutes. In that 40 minutes, the server room, normally kept around 17C, was up to a whopping 33C. And even with everything powered off, it peaked at 38C after another 20 minutes. If it weren't so dark in there I think some people would have been starting a sauna.
We had about 40 hard drive failures and 12 power supply failures coming back up that evening. And one of the mainframes had some issues.
now we need to go OSS in diesel cars
UPS batteries are sealed lead-acid and they definitely benefit from being kept cooler, it's also good to keep them in a separate room, usually close to your main power switching. As far as servers are concerned, I've always been happy with ab ambient room temp of about 22 or 23, provided air-flow is good so you don't get hot-spots, and it makes for a more pleasant working environment (although with remote management I generally don't need to actually work in them for long periods of time).
I am a little skeptical since most hard drive failures I've had have been right after a air conditioning outage. The Google paper uses temperature obtained from SMART, which is usually 10 to 15C higher than the ambient temperature in the room, and the tail of their sample falls off rapidly over 40C. What would the SMART temperature be if the ambient temperature was 40 or so? Probably 60 or above. Their graphs don't do that high.
But we're talking raising the temperature of a data center only 2 or 3 deg. Meat lockers are not helpful. Moral of the story? Maybe spend your cooling bucks on your storage, then let the rest of your systems eat their exhaust. I have some new Juniper routers, no moving parts inside except fans - the yellow alarm doesn't kick off until 70C and the machine doesn't shut down until 85C.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
"Wouldn't it be fun to be a head engineer at one of the bigger companies and be able to test it out :)"
Oh really?
Let's see your proposal, your test criteria, your plan.
Let's see your budget... cut it in half
Now for risk analysis, what if you're right and the servers all fail sooner than expected (i.e. sooner than budgeted)?
Spend 3 weeks filling out red tape
Spend 2 weeks waiting.
OK, you can run your study. Set up two racks in a closet and take measurements every day for a year.
Now write up the review.
Alright, thanks for your study, but our lawyers have advised us that it wasn't peer reviewed and published in a respected compsci journal and therefore we can't do anything with it, or the insurance wouldn't cover us and we'd be liable for deaths resulting from servers or something.
File in circular file or far back of filing cabinet never to be seen again until you're clearing out your office because they had to let you go because server replacement costs were too high to keep you on the payroll.
-1 disagree is not a modifier for a reason. -1 troll, flaimbait, redundant, overrated are NOT acceptable substitutes.