Slashdot Mirror


Inside Amazon's Cloud Computing Infrastructure

1sockchuck writes: As Sunday's outage demonstrates, the Amazon Web Services cloud is critical to many of its more than 1 million customers. Data Center Frontier looks at Amazon's cloud infrastructure, and how it builds its data centers. The company's global network includes at least 30 data centers, each typically housing 50,000 to 80,000 servers. "We really like to keep the size to less than 100,000 servers per data center," said Amazon CTO Werner Vogels. Like Google and Facebook, Amazon also builds its own custom server, storage and networking hardware, working with Intel to produce processors that can run at higher clockrates than off-the-shelf gear.

5 of 76 comments (clear)

  1. Re:What Does This Mean by Spy+Handler · · Score: 4, Insightful

    Probably means they buy in bulk, so they get to pick the more overclock-able chips.

    Say, Core i7 xxxx runs at 3.0ghz and i7 yyyy chip runs at 3.4ghz. They make a batch of i7s and test them at 3.4ghz. Some barely pass QC and are sold as retail i7 yyyy. Some fail at 3.4ghz so they're marked as i7 xxxx 3.0ghz. Some pass at 3.4ghz with flying colors, these are the ones overclockers want the most. Retail buyers like us don't get to pick which ones we get when we buy the i7 yyyy, but Amazon might.

  2. Re:What Does This Mean by bobbied · · Score: 4, Insightful

    They are building custom hardware and a lot of it so they get a bit of special treatment from Intel.

    You engineer the thermal paths and better control how you get rid of heat. You tweak the board layout for the best performance of the chipset and CPU and run closer tolerances on voltages and clock frequencies while keeping it small. Buying in bulk also lets you customize the chipset and CPU packaging to get you better performance/watt and higher density by eliminating all the "fluff" stuff you really don't want on the cloud machine. Who needs all those USB controllers, PCI-e busses, and sound cards you find in your average server chassis in a high density server farm that just take up space and suck power? Just give me a couple of NIC's, a SATA connection and a serial console and a way to reset an individual system and I have what I need to stand up an OS and grant somebody external access to it.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  3. Re:AWS' problem is not the infrastructure... by turbidostato · · Score: 3, Insightful

    "It's the fact that they only focus on infrastructure"

    You are not looking carefully enough.

    "In that sense, Microsoft is far, far ahead of the others"

    You know what happens with the ones too far, far ahead of others? In the future, people rise statues honoring them, but they usually die poor and/or too young.

    It's quite funny you talk about Microsoft since, back in the day, it was Novell the one far, far ahead of Microsoft on PC-based client/server deployments. And know what? Microsoft not only didn't give a damn but they mocked Novell as too complex. And they were right: most people wasn't ready for Novell forests and inherited/nested permissions and Windows for Workgroups was everything they could cope with. Then they grew up to "classic" domains, still tad simpler than Novell while still being "good enough" for their customer base (in fact, being not only "good enough" but "top notch" since for most of them it was all they knew as in practical terms it was Microsoft itself the one "educating" them).

    Eventually, Novell died and, who could think about it!? the very next day Microsoft came up with their new and shinny Active Domains that were basically what Novell had been doing since ten years before: now, somehow, that wasn't "too complex" anymore but the only true way.

    I'd say Amazon is exactly on the same track today: on one hand, most people, as you say, is not ready yet for higher abstraction levels like PaaS, IaaS is good enough and strongly growing. On the other hand, PaaS market is far from mature enough: writing code against any public API today is guaranteed to have it rewritten even before the provider gets to declare it non-beta.

    And there's even more: it's said that in the gold rush, the only ones consistently making money where the shovel shops, not the miners: nowadays, the "hardware store" is Amazon and it is the people building on top of AWS the ones taking the real risks of doing business. And Amazon is not just seeing the time going by: few years back they offered pretty simple virtual machines; now they offer quite a complex landscape with databases, routing, DNS, load balancing, tiered persistent storage... They are the Microsoft of today mocking on the ones too far, far ahead while, at the same time, cultivating their own customer base to make them ready for their future products and services.

  4. Re:What AWS outage demonstrates .. by Anonymous Coward · · Score: 2, Insightful

    I thought the outage demonstrated the relative unreliability of Amazon cloud Services.

    Incorrect. What was demonstrated was the inability of AWS customers to design fault tolerant systems. Any system that cannot tolerate any downtime should be multi-region.

  5. Re:What AWS outage demonstrates .. by lgw · · Score: 4, Insightful

    Well, it's their second major outage in the ~10 years of AWS. Far better than any in-house IT department I've ever seen.

    --
    Socialism: a lie told by totalitarians and believed by fools.