The Risks and Rewards of Warmer Data Centers
1sockchuck writes "The risks and rewards of raising the temperature in the data center were debated last week in several new studies based on real-world testing in Silicon Valley facilities. The verdict: companies can indeed save big money on power costs by running warmer. Cisco Systems expects to save $2 million a year by raising the temperature in its San Jose research labs. But nudge the thermostat too high, and the energy savings can evaporate in a flurry of server fan activity. The new studies added some practical guidance on a trend that has become a hot topic as companies focus on rising power bills in the data center."
Locate the server farm in Antarctica!
http://www.geoffreylandis.com
1. Get a thermostat you can control with a computer
2. Give the computer inputs of temperature and energy use, and output of heating/cooling
3. Write a program to minimize energy use (genetic algorithm?)
4. Profit!!
Possible problem: do we need to factor in some increased wear & tear on the machines for higher temperatures? That would complicate things.
Google did a study that said the MTBF for HDD decreases significantly with each warmer degree of temperature.
I read that as "The Risks and Rewards of Warner Data Centers", see the previous news item, "Time Warner Cable Modems Expose Users"
I'm getting old.
so, this is all good and well, but for us simple sysadmins who run just a few servers in a closet room, what does it mean?
in my case, I have 8 servers and 1 12k btu AC, currently set at 22 degrees celcius. is this in line with the recommendations?
Sure, the fans kick in and you aren't saving as much, but are you still saving? I suspect you still are, there is a reason you are told to run ceiling fans in your house even with the AC on.
The thermal modeling for all this isn't that difficult. You can get power consumption, fan speeds, temp, etc and feed them into a pretty accurate plant model that should be able to on the fly adjust temperature for optimal efficiency. Or I guess we can hire company to form a bunch of committees to do a bunch of studies and come up with a bunch of papers that state the obvious.
I'll bite... We all know that a couple of degrees can save a good bit over a years time. (Somewhere around 5 to 10% IIRC) Will a couple of degrees make that much of a difference? Likely not in the general lifespan of the equipment. Will A LOT of dgrees make a difference? I'm willing to bet so.
oh wait, you mean the coal, chemical, oil, and nuclear companies already have heated the rivers up and killed a bunch of fish? damnit.
maybe we can use people as a heatsink? 'help wanted, must love drinking water and pee-ing'
I thought the internet was free (or so people keep telling me). You mean it actually costs these companies money to maintain the connections??? Wow. I guess my $15/month bill actually serves a purpose after all.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
80 whats? Obviously they mean 80F (running a temperature at 80K, 80C or 80R would be insane), but you should always specify units (especially if your using some backwards units like Fahrenheit!)
IranAir Flight 655 never forget!
I realise that this is not something that could be done quickly, it would require co-operation from all major vendors and then only if it would actually end up being more efficient overall. There would be lots of hurdles to overcome too... Efficient ducting (no jagged edges or corners like int domestic HVAC ductwork), no leaks, easy interconnects, space requirements, rerouting away from inactive equipment etc etc etc.You would still need some ac in the room as there is bound to be heat leakage from the duct-work, as well as heat given off from less critical components, but the level of cooling required would be much less if the bulk of the heat was ducted straight outside.
So I know the implementation of something like this would be monumental, requiring redesigning of servers, racks, cabinets and general DC layout. It would probably require standards to be laid out so that any server will work in any cab etc (like current rackmount equipment is fairly universally compatible), but after this conversion, could it be more efficient and pay off in the long run?
Just thinking out loud.
Tom...
Well, if you have a large cluster, you can load balance based on CPU temp to maintain a uniform junction temp across the cluster. Then all you need to do is maintain just enough A/C to keep the CPU cooling fans running slow (so there is excess cooling capacity to handle a load spike since the A/C can only change the temp of the room so quickly)
Or, you can just bury your data center in the antarctic ice and melt some polar ice cap directly.
We have replaced Tom's Decaf with DOUBLE ESPRESSO this morning, let's see if he's noticed the difference..
They did try using 40 year old female virgins as heat sinks but there was took much icing.
I used to have a Pentium 4 Prescott , the truth is processors can run significantly above spec (hell the thing would go above the "max temp" just opening notepad). It's already been shown that higher temps don't break HDD, are the downsides of running the processor a few degrees hotter significant or can they be ignored?
IranAir Flight 655 never forget!
I know it was meant as a joke, but moving to colder climates may not be such a bad idea. Moving to a northern country such as Canada or Norway, you would benefit from the colder outside temperature, in the winter, to keep the servers cool and then any heat produced could be funnelled to keeping nearby buildings warm. The real challenge will be keeping any humidity out, but considering how dry the air during the winters can get there it may not be any issue.
All this said and done, trying to work out the sweet spot between not cooling a room to save energy and not having the server fans turn on is important. I would be curious to know if there are any solutions that allow the system temperature monitors to be linked into a central system, which is then linked to the room's climate control system exist?
Jumpstart the tartan drive.
If you save enery by having warmer data centers, but that it shortens the MTBF, is it really that big of a deal?
Let's say the hardware is rated for five years. Let's say that running it hotter than the recommended specifications shortens that to three years.
But in three years, new and more efficient hardware will probably replace it anyway because it will require, let's say, 150 watts instead of 200 watts, so the old hardware would get replaced anyway because the new hardware will cost less to run in those lost two years.
The studies were not long enough to constitute a very in-depth analysis. It would have to be a multi-month, or up to a year to analyze all the effects of raising temperatures.
For example, little was considered with:
1) Mechanical Part wear (increased fan wear, component wear due to heat)
2) Employee discomfort (80 degree server room?)
3) Part failure*
*If existing cooling solutions had issues, it would be a shorter time between the issue and additional problems since you have cut your window by ~15 degrees.
It's all fun and games till someone divides by 0. Then it's hilarious.
I've RTFA and the article lacked most of the information that was discussed in the summary. It doesn't really explain about the many risks of higher temperatures, only about the cost savings of raising the temperature.
With modern cooling infrastructure, are cooling costs that high? At my datacenter, cooling isn't that much expensive. The chiller units are expensive to buy, but the price for electricity and chilled water isn't that high.
Don't people know what happens when computer equipment is exposed to high temperatures? Hardware failures increate, hard drives may fail sooner, and fans will be running way faster (TFA mentions that one).
Running a datacenter too cool isn't good, either. Your staff will be freezing and you'll be wasting money on chiller maintenance.
Yes, but if you have the room at the tipping point what does this do to your ability to recover from a fault? I know one reason many datacenters have experienced outages even with redundant systems is that the AC equipment is almost never on UPS and so it takes some time for them to recover after switching to generators. If you are running 10F hotter doesn't that mean you have that much less time for the AC to recover before you start experiencing problems? For a large company with redundant datacenters or in Cisco's case where they are mostly development labs it probably is worth the risk, but for your average small to midsized corporate datacenter it's probably smarter to stay with the tried and true.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
The use of SSDs in data centers can dramatically impact power usage and temperature management costs:
"The power savings for the SSD-based systems is about 50 percent, and the overall cooling savings are 80 percent, according to the white paper. These savings are significant for a datacenter that spends 40 percent of its budget on power and cooling, and they're bound to make other datacenter operators sit up and take notice." http://arstechnica.com/business/news/2009/10/latest-migrations-show-ssd-is-ready-for-some-datacenters.ars
While MTBF and unit cost are still concerns, the potential savings will likely see more centers moving in this direction.
I'm less concerned with the fine-tuning of the environment for servers than I am with getting the basics right. How many bad server room implementations have you seen?
I'm sitting in one. We used to have a half-dozen built-for-the-purpose Liebert units scattered around the periphery of the room. The space was properly designed and the hardware maintained whatever temp and humidity we chose to set. They were expensive to run and maintain but they did their job and did it right.
About seven years ago, the bean-counting powers-that-be pronounced them "too expensive" and had them ripped out. The replacement central system pumps cold air under the raised floor from one central point. Theoretically, it could work. In practice, it was too humid in here the first day.
And the first week, month, and year. We complained. We did simple things to demonstate to upper management and building management that it was too humid in here, things like storing a box of envelopes in the middle of the room for a week and showing management that they had sealed themselves due to excessive humidity.
We were, in every case, rebuffed.
A few weeks ago, a contractor working on phone lines under the floor complained about the mold. *HE* got listened to. Preliminary studies show both penicillin (relatively harmless) and black (not so harmless) mold in high concentrations. Lift a floor tile near the air input and there's a nice thick coat of fluffy, fuzzy mold on everything. There's mold behind the sheetrock that sometimes bleeds through when the walls sweat. They brought in dehumidifiers that are pulling more than 30 gallons of water out of the air every day. The incoming air, depending on who's doing the measuring, is at 75% to 90% humidity. According to the first independent tester who came in, "Essentially, it's raining" under our floor at the intake.
And the areas where condensation is *supposed* to happen and drain away? Those areas are bone dry.
IOW, our whole system was designed and installed without our input and over our objections by idiots who had no idea what they were doing.
So, my fellow server room denizens, please keep this in mind - When people (especially management types) show up with studies that support the view that the way the environment is controlled in your server room can be altered to save money, be afraid. Be very afraid. It doesn't matter how good the basic research is or how artfully it could be employed to save money without causing problems, by the time the PHBs get ahold of it, it'll be perverted into an excuse to totally screw things up.
I was at a Google presentation on this last night. If I remember correctly, I believe they found the 'ideal' temperature for running server hardware without decreasing lifespan to be about 45 C.
In the beginning, there was null.
If there is a failure of AC ... that is, either Air Conditioning OR Alternating Current, you can see a rapid rise in temperature. With all the systems powered off, the latent heat inside the equipment, which is much higher than the room temperature, emerges and raises the room temperature rapidly. And if the equipment is still powered (via UPS when the power fails), the rise is much faster.
In a large data center I once worked at, with 8 mainframes and 1800 servers, power to the entire building failed after several ups and downs in the first minute. The power company was able to tell us within 20 minutes that it looked like a "several hours" outage. We didn't have the UPS capacity for that long, so we started a massive shutdown. Fortunately it was all automated and the last servers finished their current jobs and powered off in another 20 minutes. In that 40 minutes, the server room, normally kept around 17C, was up to a whopping 33C. And even with everything powered off, it peaked at 38C after another 20 minutes. If it weren't so dark in there I think some people would have been starting a sauna.
We had about 40 hard drive failures and 12 power supply failures coming back up that evening. And one of the mainframes had some issues.
now we need to go OSS in diesel cars
UPS batteries are sealed lead-acid and they definitely benefit from being kept cooler, it's also good to keep them in a separate room, usually close to your main power switching. As far as servers are concerned, I've always been happy with ab ambient room temp of about 22 or 23, provided air-flow is good so you don't get hot-spots, and it makes for a more pleasant working environment (although with remote management I generally don't need to actually work in them for long periods of time).
describes temperatures using the Fahrenheit scale.
now we need to go OSS in diesel cars
This is slashdot, OF COURSE you should use Nagios!
/. Kung-Fu, buy an EM01;
And to increase your
http://www.nagios.org/products/environmental
Learn Nagios the FAN way;
http://fannagioscd.sourceforge.net/drupal/
or play with GroundWork, they're awesome;
http://www.groundworkopensource.com/community/community-edition.html
(Yes, I actually run this in a real data center, we eat our own dog food.)
we're damn tired of seeing that lose/loose error, in particular
Just spell it "luse", and everybody wins.
http://www.geoffreylandis.com
Yes, he got promoted to a higher level of management. At least he's now one step further removed from the actual facilities he manages and can no longer screw things up quite as directly as he did in the past.