Slashdot Mirror


Google Reveals "Secret" Server Designs

Hugh Pickens writes "Most companies buy servers from the likes of Dell, Hewlett-Packard, IBM or Sun Microsystems, but Google, which has hundreds of thousands of servers and considers running them part of its core expertise, designs and builds its own. For the first time, Google revealed the hardware at the core of its Internet might at a conference this week about data center efficiency. Google's big surprise: each server has its own 12-volt battery to supply power if there's a problem with the main source of electricity. 'This is much cheaper than huge centralized UPS,' says Google server designer Ben Jai. 'Therefore no wasted capacity.' Efficiency is a major financial factor. Large UPSs can reach 92 to 95 percent efficiency, meaning that a large amount of power is squandered. The server-mounted batteries do better, Jai said: 'We were able to measure our actual usage to greater than 99.9 percent efficiency.' Google has patents on the built-in battery design, 'but I think we'd be willing to license them to vendors,' says Urs Hoelzle, Google's vice president of operations. Google has an obsessive focus on energy efficiency. 'Early on, there was an emphasis on the dollar per (search) query,' says Hoelzle. 'We were forced to focus. Revenue per query is very low.'"

19 of 386 comments (clear)

  1. The New Mainframe by AKAImBatman · · Score: 5, Insightful

    Most people buy computers one at a time, but Google thinks on a very different scale. Jimmy Clidaras revealed that the core of the company's data centers are composed of standard 1AAA shipping containers packed with 1,160 servers each, with many containers in each data center.

    Mainstream servers with x86 processors were the only option, he added. "Ten years ago...it was clear the only way to make (search) work as free product was to run on relatively cheap hardware. You can't run it on a mainframe. The margins just don't work out," he said.

    I think Google may be selling themselves short. Once you start building standardized data centers in shipping containers with singular hookups between the container and the outside world, you've stopped building individual rack-mounted machines. Instead, you've begun building a much larger machine with thousands of networked components. In effect, Google is building the mainframes of the 21st century. No longer are we talking about dozens of mainboards hooked up via multi-gigabit backplanes. We're talking about complete computing elements wired up via a self-contained, high speed network with a combined computing power that far exceeds anything currently identified as a mainframe.

    The industry needs to stop thinking of these systems as portable data centers, and start recognizing them for what they are: Incredibly advanced machines with massive, distributed computing power. And since high-end computing has been headed toward multiprocessing for some time now, the market is ripe for these sorts of solutions. It's not a "cloud". It's the new mainframe.

    1. Re:The New Mainframe by AKAImBatman · · Score: 4, Insightful

      By some measurements they exceed the computing power of a mainframe, by others they don't.

      A fair point. However, I should probably point out that mainframe systems are always purpose built with a specific goal in mind. No one invests in a hugely expensive machine unless they already have clear and specific intentions for its usage. When used for the purpose this machine was built for, these cargo containers outperform a traditional mainframe tasked for the same purpose.

    2. Re:The New Mainframe by divisionbyzero · · Score: 4, Insightful

      Not quite. While these server farms in a box are fault-tolerant they are not fault-tolerant in the same way as at least some mainframes where the calculations are duplicated. With mainframes you'd have wasted resources (doing every calculation twice) with lower latency. With server farms in a box you get, arguably, better resource utilization (route around something that is broken but wait till it breaks before doing so) but higher latency. The difference is incorporating the way the internet works into "mainframe" design.

    3. Re:The New Mainframe by DerekLyons · · Score: 3, Insightful

      When used for the purpose this machine was built for, these cargo containers outperform a traditional mainframe tasked for the same purpose.

      Well, I think it goes without saying that machine A (designed for a specific type of computing) will outperform machine B (not so designed) - and this will remain true whether A is a server cluster and B is a mainframe, or vice versa. And you need to keep in mind there are significant design differences between a server cluster and a mainframe, even when the mainframe is itself a clustered machine.
       
       

      However, I should probably point out that mainframe systems are always purpose built with a specific goal in mind. No one invests in a hugely expensive machine unless they already have clear and specific intentions for its usage.

      Huh? Here in the real world, mainframes are as generic as desktops - what determines what they can do is the OS and the applications. People buy mainframes because they need a mainframe's capability. (And container data centers aren't exactly cheap either - nobody is going to buy them without a use in mind either.)

    4. Re:The New Mainframe by Znork · · Score: 4, Insightful

      by others they don't.

      Seriously, I've fairly recently gone through every single benchmark, comparison, inference, etc, that I've been able to find on the subject (they're not exactly sprinkled all over the place) and I can't find any indications anywhere that mainframe hardware can surpass modern commodity hardware on any measurement. On price/performance variants it's not rare to see it outclassed more than an order of magnitude, and in absolute performance, well, there's very little magic hardware in the mainframe either anymore, it's pretty much the same silicon as anywhere else; Power CPU's, DDR infiniband, CPU to SC bandwidth almost equivalent to Hypertransport, same SAN as is used anywhere else, and as far as I can tell, to my horror, DDR2 533 memory(??). Please, correct me if I'm wrong and I very well may be, because actual specs aren't exactly flaunted. I mean, it's nice enough, but it's hardly magic.

      Sure, there's the old trick of moving system and IO load into extra dedicated CPUs, but that's becoming less and less relevant as pretty much any significant IO load has long since moved to dedicated ASICs that do DMA on their own without any CPU cost, and things like encryption accelerators aren't that hard to find. And it's not like you're not paying for the assist processors.

      Two or three years ago it might have been conceivable that it could have had at least a possibility of being superior in consolidation capabilities like being able to have the most unused OS instances running at a time, but with paravirtualized xen-derived tech commodity x86 hardware can accomplish the same or higher density. I can't say I've tried running 1500 instances, but for fun I did try running 100 instances on 5 years old junked x86 hardware which went fine until I ran out of memory at 6GB on the (like I said, junk) hardware in question. No significant performance degradation in relation to load versus what could be expected of the hardware, all 100 instances fully loaded both IO and CPU for a week to test for any throughput issues or over-time degradation, but that worked as well.

      IE, no practical limit for any non-contrived consolidation situation, and I have no doubt that it scales fine up to 1500 instances on reasonably modern hardware as well as it did on that hardware (and if you need higher density than that you should seriously be considering why you're using that number of OS instances that don't appear to actually be doing anything or consider moving to system-level virtualization like vserver or openvz)).

      So have you found any measurements that I couldn't find that you could point out that demonstrate lingering categories in which a mainframe might consistently outperform commodity hardware (ie, any measurement that is or can be compared to another at least somewhat related measurement on commodity hardware which demonstrates an advantage for the mainframe)?

      Outside pure performance there is the in-system redundancy which is nice in theory but which in practice seems to rarely result in higher actual uptime (mainframes appear to require an inordinate amount of scheduled service time and admins often engage in a disturbingly high IPL frequency).

      There is also the consistent load levels they tend to get (which seems to be largely due to culture, load selection and ROI requirements, rather than any inherent capacity), but beyond that it seems that the remaining aura of capability doesn't have much basis in reality anymore.

    5. Re:The New Mainframe by Anonymous Coward · · Score: 3, Insightful

      Disclaimer: I work at Google, though the stuff below is something anyone from a large web company could tell you.

      Actually the argument depends on the application, and Google does have some applications that make different tradeoffs. For search, availability is more important than consistency: A search on 99% of the data is still better than a 404 any time you don't have all of your servers available. However, for something such as billing (which occurs on every single ad click for pay-per-click ads), you'd better achieve consistency. Billing lets you sacrifice short-term availability however, since few people will notice if they get billed an hour later than usual.

      Hardware reliability is a somewhat different issue; Here it is really a question of scale. If you have a couple of servers, it's worth it to go for 5+ nines of reliability, because you get almost that reliability from your system, and you don't have go through the engineering expense and software complexity of fault tolerance in the face of frequent failures. However, if you have 10k+ servers, your reliability is (0.99999^10k = 0.90), which implies you'll have to build fault tolerance into the software anyway. The "light bulb moment" is when you realize that once you've built fault tolerance into your apps, you can buy machines with 4 nines of reliability, achieve the same results, and save a bunch of money.

      So, it turns out both the crotchety old UNIX admin with 12 machines, and the Web2.0 hipster with 1000s of cheap commodity hardware, are actually both right. They'll might not agree in a forum, but that's because they don't contemplate how the tradeoff changes with scale. Btw, this is something a lot of startups really need to pay attention to -- when you grow traffic 100x, it might not be wise to get 100x of the same hardware design, since at some point sticking to the old tradeoffs can become expensive mistakes.

  2. Re:Hey google, want to save some money? by Bill,+Shooter+of+Bul · · Score: 5, Insightful

    Google claims they did the math and found it was cheaper with commodity hardware. I advise everyone else to do the same and run the calculations for themselves to determine the optimal hardware for their particular load. With out the specifics of their situation, its difficult to criticize in an intelligent fashion, other than a more generalized statement expressing surprise at their configuration.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  3. Re:Onboard UPS not new by geekoid · · Score: 5, Insightful

    A patent is an implementation of an idea.

    You can have the idea of how to put an UPS in a computer one way, and I can do it another way and both be valid patents.

    I do know this gets abused, and companies try to sue becasue it's there 'idea', but that's ot how it works.

    If you find a different way to do a hard drive plugin board, then yes you can patent it. I would advise you only do it if it's better in some way, and there is a demand.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  4. Re:Hey google, want to save some money? by EvilMonkeySlayer · · Score: 4, Insightful

    I've a few questions, if the data centre is built in the desert don't you have a number of issues?

    * Latency, if you have all your data centre's located in essentially a single part of the USA (lets ignore the rest of the world for this.. regardless that there are no deserts in Europe for example) won't that increase latency quite a bit to the more further away places that want the search results?
    * Bandwidth/redundancy, if you have all your eggs in one basket as it were aren't you going to have to pay extra to have lots of extra fibre laid down to be able to handle all that extra traffic? What about natural disasters, if you have all your data centres in a single location then surely you run the risk of things going pear shaped if it burns down, suffers earthquakes, aliens destroy the building etc.
    * Cooling, because it's in the desert isn't a lot of the electricity that is generated going to be cooling not only the building because of the outside heat, but also the heat generated by the servers? Surely it makes more logical sense to build in a colder climate say further north and use hydroelectricity? (if you're talking of using exclusively non active polluting (and non radioactive) natural electricity solutions)

  5. Re:No way by Anonymous Coward · · Score: 4, Insightful

    Greater than 99.9% efficiency? They likely made a mistake in their measurements.

    Maybe they measured 99.92% efficiency.

    That is greater than 99.9% efficiency and they aren't breaking any laws of thermodynamics.

  6. Always wondered.... by zogger · · Score: 3, Insightful

    ...why desktops didn't have a built in battery deal that lived in an expansion bay. If you could even keep RAM alive for extended periods even with the machine shut down that would be spiffy as an option, let alone as a little general UPS.

  7. 99.9% efficiency by Anonymous Coward · · Score: 4, Insightful

    This is a questionable number. The best DC-DC conversion is around 95% so they aren't including voltage conversions from the battery to what the system is actually using.

  8. A quick peek at the picutres says a lot by Khopesh · · Score: 3, Insightful

    This is composed purely of commodity parts. The power supply is the same thing you'd buy for your desktop, those are SATA disks (not SAS), and that looks like a desktop motherboard (see the profile view where all the ports on the "back" are lined up in the same manner they would need for a standard desktop enclosure).

    Only the battery is custom (or even non-consumer grade), and you can note that since the power goes through the PSU first, that's DC power. DC is significantly better than AC, since the PSU then has to convert AC-to-DC (which wastes power and generates needless heat). While you can get DC battery supplies for server-grade systems, these are not server-grade systems. Built-in DC battery backup therefore affords them the ability to keep the motherboards cheaper. Very smart.

    Also, if you recall from a few months ago, Google has applied pressure on its suppliers (I'm not sure why Dell comes to mind...) to develop servers that can tolerate a significantly higher operating temperature (IIRC, they wanted at 20 degree (Fahrenheit?) boost). I wouldn't be surprised if the higher temperature cuts down on operating expenses more than smarter battery placement.

    --
    Use my userscript to add story images to Slashdot. There's no going back.
  9. Re:Who swaps out all those dead batteries? by WPIDalamar · · Score: 4, Insightful

    Or maybe they think bigger...

    They're deploying containers of servers. Maybe when a container gets a to a certain age or a certain failure rate, they replace/refurbish the entire container.

    I doubt they care if some of their nodes go down in a power outage as long as some percentage of them stay up.

  10. Re:Who swaps out all those dead batteries? by mlwmohawk · · Score: 4, Insightful

    Hundreds of thousands of servers == thousands of dead batteries each month, since those batteries don't last more than a few years.

    I would imagine that the battery replacement schedule mimics the server obsolescence perfectly.

    LOL, when the battery catches fire, time to replace the server.

  11. Re:Hey google, want to save some money? by Anonymous Coward · · Score: 3, Insightful

    How did this get marked informative?

    I mean it's certainly true that Deserts are defined by lack of rainfall but since the GP said
    "Build your data center in the desert and build 150 MW industrial solar thermal system to power it."
    I think it's fair to assume they were talking about the stereotypical sunny and hot desert.

    Secondly the reason it's cool underground is because soil is generally a very good insulator. I would suggest that it's a really bad idea to put things that are going to get hot inside a huge lump of insulating material.

  12. Re:Hey google, want to save some money? by TheSunborn · · Score: 3, Insightful

    Have you ever even seen Mainframe pricing? No really have you?

    It will cost you at least 10000$ to match the power of a single quad core intel/amd cpu.

    And you do not want to run a mainframe(Or other computer that have a cpu bound task) for a decade. I think my current desktop computer have more power then avg mainframe
    from a decade ago, and when I buy a new development workstation in then next decade, it will most likely have more cpu power then a 1 million $ mainframe you could buy today.

    Just to set things in perspective: I am pretty sure, that google have more cpu power, more ram, more hd space and more aggregate io, then all mainframes in USA combined.

  13. Re:Date centre fire risk? by T-Ranger · · Score: 3, Insightful

    In Google case, Id say they just seal off the container and be done with it. If there is a fire, they bring in a new (40') box.

    But anyway. A rack mount HP UPS I installed in the past year has a stand-off that you can hook into the "Big Red Button System". I'm guessing such hookups are either standard on rack mount units, or at least it wouldnt be hard to find models with that feature.

  14. Re:They are computers, no more advanced than befor by petermgreen · · Score: 3, Insightful

    Modern high speed chips (which draw the bulk of the power in a typical PC) run thier core logic at much lower voltages. Typically somewhere between 1V and 2V though I think some may have gone below a volt now. Theese very low voltages have to be produced very close to the chip that uses them to avoid huge losses.

    This means that modern PC motherboards take most of thier power at 12V anyway. The 5V and 3.3V lines really only serve to power the low speed chips and some of the interfaces between chips.

    Given that I doubt there would be too much efficiancy loss from making a 12V only board. You could probablly even design it to hapilly deal with an input that was only approximately 12V without losing too much (since most of that 12V power is going to the input of switchers anyway).

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register