Google Demands Higher Chip Temps From Intel
JagsLive writes "When purchasing server processors directly from Intel, Google has insisted on a guarantee that the chips can operate at temperatures five degrees centigrade higher than their standard qualification, according to a former Google employee. This allowed the search giant to maintain higher temperatures within its data centers, the ex-employee says, and save millions of dollars each year in cooling costs."
Wouldn't Intel run into physical limitations that simply don't allow chips to run at that low a temperature? I'm surprised Google isn't considering moving some of its data centres to Arctic locations where you get cool temperatures year-round. We've seen reports of appealing places like that on Slashdot before. (Of course, that would just be a short-term fix before we move the Earth to a farther orbit around the sun to avoid suffocating in our own waste heat like the Puppeteers in Niven's Ringworld ).
Uhhhh. Wouldn't making chips a bit more efficient be better, as opposed to making them "less likely to burn out at higher temps"
Seems that google's not really thinking green in this case (despite the pretension to do so in others), unless they plan on making use of the datacenter heat elsewhere.
If you don't have the clout of a Google-sized organization to buy higher-rated chips from Intel, I wonder if you can basically achieve the same thing by underclocking. An underclocked chip will run cooler, but I don't know if it'll run more stably at higher temps, although I think it would.
Does anyone have any experience with doing this?
I think it'd be interesting to see whether the cost savings in power and cooling is offset by the cost of the performance losses.
You see? You see? Your stupid minds! Stupid! Stupid!
Not mentioned in the story. What CPU are they talking about, and what is the upper end Google is looking for?
(and this having to wait five minutes between posts is moronic. Look at my posting history, and all of them from the same IP address. Tell me why I have to wait this long to post.)
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
Under-clocking them a bit can't be that hard to do.
When in college, I heated my crappy little schack by putting 150W bulbs in every light. It was like my own little Easy-Bake oven.
So Google claims they're more environmentally friendly... but burn through chips faster.
When you are a big company that spends enough money, you can ask for this sort of thing and your demand will be met.
"Guarantee us a higher temp CPU or we will switch to AMD...and tell everyone about it."
I have a feeling that the CPUs can handle a bit more temp than they are rated as a CYA move by Intel, anywho.
Bearded Dragon
This sounds like a scenario where lawyers are trying to act as engineers. That works about as well as you might expect.
There are these engineering things, amusingly called "Schmoo plots", that map out a chip's operating envelope of voltage versus speed versus temperature. From those an engineer can forsee how hot you can run a chip before its rise and fall time margins start to get marginal.
There is very little Intel can do to stretch thing by another 5 degrees. It's not something that can be imposed by fiat. Intel engineers have already juggled all the variables to come up with the best performance possible. SOMETHING is going to have to give. Either the chips will have to be selected and graded for speed, lowering the overall envelope for the chips everyone else gets, or they'll have to fudge some other parameters, hoping nobody will notice, or worse yet they'll tweak some variable right to the edge of raggedness, resulting in worse reliability down the road.
Lawyers and accountants generally don't know you can't have everything. let's hope the engineers educate them.
other businesses have this same questionable practice. for example, walmart requires special packaging from its suppliers that is not normally afforded to other retailers. broadcomm, microsoft, and nvidia likely have a few cozy agreements that are exclusive and hushed. a possible example here might be the ACPI standard and how it seems to "just work" in windows but struggle in some cases with *nix.
it certainly gives google a cost advantage, and i can imagine why they vehemently deny it in TFA as i glance over the justice department article. although whatever gains google makes up for in cooling, they may just as easily have lost in a more power-hungry architecture overall:
http://people.freebsd.org/~brooks/papers/bsdcon2003/fbsdcluster.pdf has experienced it, and his 2007 update also confirms.
im left wondering what AMD might do for its biggest customers?
Good people go to bed earlier.
just buy Intel?
What?
Odds are this is being driven by a data-center engineering team, who are looking at the cost savings of running their data center 5 degrees hotter.
You don't get what you don't ask for.
Intel will do exactly as much engineering as necessary to keep their target market up, and no more.
If the market wants chips that operate 5 degrees hotter.. the engineers will do their job and see if it can be done. Intel will charge a premium for this.
That's business.
Google said recently that it runs its data centers at 80 degrees as an energy-saving strategy, so chips that support higher temperatures would mean fewer hardware failures in their data center. Most data centers operate in a temperature range between 68 and 72 degrees, and I've been in some that are as cold as 55 degrees. Lots of companies are rethinking this. In the Intel study on air-side economizers, they cooled the data center with air as warm as 90 degrees. ASHRAE is also considering a broader temperature range for data center equipment.
RichM
Data Center Knowledge
Hmm, they don't say if this is commercial (0..70) or industrial (-40..85) temperature range - I guess intel chips are normally commercial range, so they've bumped then up to 75.
Then they can set the temperature to whatever they want. ;)
Most of the power supply systems for my servers, which are HP G3-5 systems of various U sizes, tend to waste more power as temperature goes up.
This has nothing to do with CPU's though. It is the power supplies on the machines. As temperature goes up, efficiency goes down. At around 80 degrees I noticed a significant larger draw on the power supply with my amp meter.
I had a gaming system with two ATI 4870's and the 800 Watt power supply would crash my machine if I did not run the air conditioner and keep the room at 70 degrees after some fairly long Supreme Commander runs.
I noticed that the amperage would go up, and the power output would go down as temperature would go up.
I have not conducted any experiments in a lab setting with this stuff, but from experience, jacking the temperature up usually makes power supplies work harder and makes them less efficient.
-gc
Got Geometrodynamics? Awe, too hard to figure out? Too bad.
Yes, but way I see this is:
Intel isn't arbitrarily going, "man, we could make chips that run ok 5 degrees hotter, but we're gonna piss everyone off by demanding more cooling. Just because we can." Most likely Intel is already doing the best it can, and getting a bunch of chips which vary in how good they are. And they're getting the same bunch of chips regardless of whether Google demands higher temps or not.
Google just gets a cherry-picked bunch, but the average over Intel's production is still the same. Hence everyone else is getting a worse selection. They what remains after Google took the best.
It's a zero-sum game. The total load on the planet is the same. The same total bunch of chips exits Intel's fabs. On the total, no energy was conserved.
So Google's "going green" is at the cost of making everyone else less "green". They can willy-wave about how energy efficient they are, by simply dumping the difference on someone else.
That's not "going green", that's a predatory approach. Your computers could require on the average an extra 0.5W in cooling, so Google can willy-wave that theirs uses 1W less. They just dumped their costs and their "eco burden" to someone else.
It's akin to me willy-waving that I'm so green and produce less garbage than you... by dumping some of my garbage in random other people's garbage bins across the town. Yay, I'm so green now, you all can start worshipping me. Well, no, on the total the same amount of garbage being produced, I just dumped it and the related costs on other people. That's not going green, that's being a predator.
I can see why a business might want to cut their own costs, and not care about yours. That's, after all, the whole "invisible hand" theory. But let me repeat it: on the whole no energy was conserved. They just passed off some of their cooling costs (energy _and_ money) to someone else.
A polar bear is a cartesian bear after a coordinate transform.
chips near the centre of the wafer are higher quality. All google is asking for are these chips instead of a random mix of those from all over the wafer. This is why some chips over clock far better than others even though they were produced at the same week from the same plant. They can presumably ask for this because they are buying such large quantities. It's quite a novel way of saving money though.
If they want the chips to run hotter, why not just use poorly conducting heat sinks?
Comment removed based on user account deletion
I might be missing something here, but why would Google be demanding "higher" chip temps to save on cooling??
Surely they should be demanding lower chip temps.. or is it just a mistake in the headline?
...using more effective means to extract the waste heat from the processors they already have. Lower thermal resistance equals lower operating temperatures. As many boxes as they have maybe they should invest in large-scale refrigerant-based cooling system with tiny heat exchangers for each CPU. I envision overhead refrigerant lines with taps coming down into each cabinet to serve the procs in it. Each server could have quick-disconnect lines on the back for easy removal. No need to cool all that air, and you'd get very good thermal resistance figures.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
Right on.
Of course, Intel will give them whatever they want because Google is such a large customer. And will then pay in terms of higher failure rates, hence warranty costs. And Google will notice the same thing, assuming they do decent data gathering on failures, and find out that this is a really bad idea because those failures cost them even more than Intel.
Seems to me like bean counters are trying to beat physics.
thegodmovie.com - watch it
I wouldn't say the post is off topic. When I read it first, I thought the title suggested Google wanted the processors to produce more heat.
I have to say I am a bit surprised. A CPU operating at a higher temperature will draw more power and thus produce more heat at the same performance point. This is one of many temperature dependencies in silicon circuits. Now, it's possible that Google's demand is that they can run at the same speed and power at the higher temperature, which means in reality they are underclocking a faster chip to run it warmer.
There is very little Intel can do to stretch thing by another 5 degrees. It's not something that can be imposed by fiat. Intel engineers have already juggled all the variables to come up with the best performance possible. SOMETHING is going to have to give. Either the chips will have to be selected and graded for speed, lowering the overall envelope for the chips everyone else gets, or they'll have to fudge some other parameters, hoping nobody will notice, or worse yet they'll tweak some variable right to the edge of raggedness, resulting in worse reliability down the road.
In the real world, processors don't fail (barring power spikes/motherboard failures that fry them). The consideration here is much more likely to involve legal concerns about the warranty or the temperature at which thermal throttling or shutdown occur. Most likely Google and Intel were both able to confirm that the processors would not fail during their expected lifetimes in Google's datacenters even when operating continuously at this new maximum load, which is why they agreed to amend the processor specifications. I sincerely doubt these CPUs are different from others in any way other than possible the thermal protection setpoints that are pre-configured.
>> So Google's "going green" is at the cost of making everyone else less "green". They can willy-wave about how energy efficient they are, by simply dumping the difference on someone else.
The difference is that Google is going to actively exploit the ability of those hand-picked CPU's to run hotter. Chances are that the users who would have otherwise received those chips would not reap any energy savings from the capabilities.
At a minimum, Google is contributing here by forcing a vendor to differentiate chips that have a capability of running hotter from ones that don't. No matter who uses that capability, it's a benefit to the planet (versus the alternatives at least).
MadCow.
I used to have a sig, but I set it free and it never came back.
Hard disks. In fact, I am typically far more concerned with long-term issues with my data than with the computing itself. Not to mention, the CPU is NOT the only chip that can suffer from heat issues.
Alfred Spector (Google VP of Research & Special Initiatives) recently spoke at a recruiting event at my school. While going over some of the work he is in charge of, he mentioned that they had found that the optimal temperature to minimize hard disk failure was actually higher than what is generally accepted in most datacenters. Because of that, Google keeps their datacenters a few degrees warmer than most other companies do. Sounds like that's what this is about. Not necessarily saving on energy costs, but improving the reliability of their hard disks (and we know that Google has a huge interest in doing that)
There are two issues with higher operating temp.
One is that you get less drive current from your transistors, so you get less performance (which everyone seems to understand), but this is usually a fairly small effect for 5 degree C.
The _big_ deal with 5 degree C would be electromigration in interconnect metal, which goes up very quickly with temperature. So the difference in failure rates might be quite large.
If there was any deal at all, it's likely that the Intel engineers tried to remove some conservatism from their temperature estimates to see if they could squeeze out 5 degrees from the thermal budget, or perhaps information on the workload itself to get Intel to "bless" the higher data center temperature.
Then just leave the windows open.
a *hot* topic, Google raises a cool argument." But, because you remind us of the Arctic option, i'll have to say YOU raise a cool argument/reminder.
But, is there a feasible way for various sloped shafts to be cored (or existing ones, such as the former Super Conducting Semi Collider (or is it Semi Conducting Super Collider?, etc...used) such that filtered draft air (sounds like beer, huh?) is blown past the chassis?
And, isn't there a way to decouple the processors from such numerically high boards? Can't these processors be (to bring up images of the Star Trek USS Enterprise (NCC-1701 D) central computer core, or even the main warp core, with vertical shafting, but horizontal/azimuthal projections) attached to shafts, and the ancillary wiring be attached down/up stream? Then, the cooling air could be better directed, controlled and overall flow demands reduced, to in essence, cut the high energy costs.
For visuals, see:
http://startrekspace.blogspot.com/2007/01/geordi-la-forge-and-his-warp-core.html
http://www.loony-archivist.com/lowerdecks/life.html
http://www.ussenterprise.co.uk/enterprise/entd/
I would envision that at the very least, Google can -- or already has underway -- plans to exploit polar or Canadian, or cold North Dakota type environs in which to shaft-locate their computers.
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
Temperature is just the stable point across a heat flow. You could immerse an entire data center in liquid nitrogen, and you still only need to remove energy at the same rate in goes in. One problem with a lower temperature (relative to outside) is that the rate of leakage from outside is higher. That's fixed by having more/better insulation, in addition to raised solar panels covering the entire building.
Higher temperatures also increases resistance in conductors and electrical parts. That leads to more I2R power loses which is energy that doesn't get used but still has to be removed. Superconductors would be a plus for some of the core power feeds, too (you still do have to cool them).
Computers should be fed power at the highest practical voltage to keep current low (I2R losses, again). Most power supplies are more efficient at the lower current levels. That means at least 230 volts, if not higher. Commodity power supplies handle that fine. It wouldn't take much design change to bring them up to 277 volts (US) or 347 volts (Canada). And of course there might be even better efficiency involved in delivering 340 volts DC right to the PSU designed for it (most AC ones have that voltage as an intermediate step, anyway).
And efficient coding can be a factor here, as well, at the scale Google operates. Fewer cycles used for a given transaction means less energy needed to carry it out. Then the CPU goes back to idle and the clock can be kicked back to low frequency. And fewer machines needed to do searching means more machines can be completely off.
now we need to go OSS in diesel cars
to supply Google with water that boils at 5 degree C less. That way Google will save millions of dollars on costs of making coffee & tea for their employees.
Switch to AMD, the servers run on lower power RB-DDR2 not FB-DDR2 and the chipsets are much cooler.
Replace the CPU coolers with more efficient TI to AIR coefficitient sinks thus allowing a warmer room temp as the sink is able to move heat more efficiently. Opterons also take advantage of load distribution moving the running processes from core to core allowing overall cores to remain cooler yet with warmer room temps.
It's a zero sum game if Intel does nothing within its capabilities to shift its production to higher temperature chips. Since they are getting paid a premium to provide them, they are motivated to tweak their processes to make all their CPUs more efficient. And if that's the case, your premise is completely wrong.
Isn't Google close enough to the ocean to pump cold water out of the depths to help pre- or post- cool air ?
Nullius in verba
There is very little Intel can do to stretch thing by another 5 degrees.
They'll either agree to the guarantee with their standard chips or they'll bin the chips just like they do with speed grades. My company does the same thing with the chips that it produces. We have a commercial temperature range and an industrial temperature range - they're the same chip, but some are binned for the higher temperature.
What? American businesses like saving money almost as much as they like making it. It's environmentalism is not as a big motivator as profit, at least in the US. Make being efficient profitable long term, and some businesses will do it. Make it profitable short term and businesses will fall over one another to do it.
“Common sense is not so common.” — Voltaire
Intel demands that Google no longer require any sort of internet connectivity for the use of their search engine, as Intel spends hundreds of thousands a year in bandwidth costs.
>One way around it could be to locate datacenters at locations with natural cooling available like rivers and larger lakes.
Google has a large data center on the bank of the Columbia River in Oregon now...
It sounds like what Google's looking for isn't so much a technical change as a contractual one.
They basically want Intel to replace burnt-out CPU s that are run over spec. Chips are usually spec'd lower than what they're physically capable of (see also: Overclocking) to extend their MTTF to a point just beyond their warranty.
From the article:
[i]If the chips failed prematurely at these higher temperatures, the former Googler says, Intel was obliged to replace them at no extra charge.[/i]
CPUs aren't the only point of failure in an over-hot system. Thanks to recent reformulations of solder, a lot of the contact points on modern electronics easily separate at lower temperatures (see also: Xbox360 RROD)
Actually you could easily do this on the Norwegian Svalbard islands. Not only does it have Arctic conditions, reliable power supplies and high-speed fiber connections to main land Norway/Europe. But it's also a special status island group where citizens of any nation are free to live and work. Being part of Norway it's under a stable, free and democratic government.
Because of it's position close to the North Pole it's heavily used as a satellite communications site by NASA, JAXA and the European Space Agency.
There's even a University there.
Svalbard Satellite station (SvalSat) was established in 1997 and the rapid expansion of the ground station is changing this perspective. SvalSat is recognized, not only as the northernmost, but also the best-located ground station in the world. The extreme northern location on the Svalbard archipelago, at 78Â13' N, gives SvalSat its unique and favourable position. The satellite coverage at this latitude holds unique opportunities and SvalSat is the only commercial ground station in the world able to provide all-orbit-support (14 of 14 orbits) to owners and operators of polar orbiting satellites.
I thought everyone knew we don't say centigrade anymore. It's Celsius since 1948, sheesh.
AC
ps: captcha was appropriately "apology"
If it were that warm, those bears would be doing something else.
This is called 'screening' and is common in the electronics components industry. Your 2.66 GHz CPU is the same as a 2.8 GHz CPU but during screening started to generate errors at 2.8 and was simply sold as a 2.66
Google is paying (or insisting) that all their processors to be screened at +5 degC. They may pay a couple of dollars more for this, or may hold the fear of using AMD processors over Intel's head to get this testing at no additional charge.
Screening is done in the simplest technologies including resistors. The difference in a 1% and 0.01% 100 Ohm resistor is the price paid for screening, not necessarily the quality of the component...
But why don't they just use mil-spec grade chips? Aren't they already graded for -55 to +125C operation. They're usually radiation hardened too.
Coming soon - pyrogyra
...so it must be true...interesting...
Why higher temps? Google should be demanding chips that run cooler. Cooler running chips save more money then running a data center at a higher temp.
if you were an engineer, you would realize they optimize all the variables including cost of production -- who's to say they made a conscious decision not to run hotter simply because they would have to charge X amount of dollars more per chip? For a company specializing in data-centers, trading in a variable cost for a fixed cost, its a win-win situation ;)
When are you going to stop being such a troll?
OK, could someone please explain to me how running hotter processors saves money on cooling costs? It seems that it would actually increase cooling costs.
Higher temp does not equal more heat. Just thought you should know that.
I think we should support Google's quest to have hotter servers so they can harness the heat output to power the hover craft engines for their floating data centre ... on the sea ... of the moon ... of Neptune ... in a parallel universe. Why would Google want to do this? Because they can. Why should we support them? Because that would be neat.
Centigrade is one hundredth of a grade which is an angle
Not everyone is in Denmark and not all of Google's servers are in the Silicon Valley, ya know? Equally, now some of those chips which need less cooling will go to some Google server in Ireland (check it out, they're about as high up as you are on the map), and some which need more cooling will go to some poor sods in Mexico or Saudi Arabia or Israel or God knows which other hot place.
A polar bear is a cartesian bear after a coordinate transform.
Finaly, thank you! Centigrade!
No more Fahrenheit in frontpage articles please.
Hivemind harvest in progress..
I don't think you've got the full understanding. The envelope for a product has to take into account the fact that they need to be able to produce that product in large volumes. There will be many parts that test out as being able to operate at a higher temperature, but perhaps not enough to justify an entire new model of the processor. So to fulfill an order from a single customer for parts rated to a higher temp, all they have to do is set aside some chips that exceed the standard product specs. These parts always exist, you just never have any idea if you've got one or not. Google apparently had the clout to basically get a special processor model available just to them.
If we put all those hot chips up there, we will melt the snow faster and them people will complain. The real trick would be to take the heat from the chips and use it to generate energy to run the chips. Sort of a perpetual motion deal.
I do, and I implied as much by admitting to my initial confusion with the title and specifically using the word heat.