A Look At the Workings of Google's Data Centers

← Back to Stories (view on slashdot.org)

A Look At the Workings of Google's Data Centers

Posted by Soulskill on Saturday May 31, 2008 @12:07AM from the we're-gonna-need-a-bigger-boat dept.

Doofus brings us a CNet story about a discussion from Google's Jeff Dean spotlighting some of the inner workings of the search giant's massive data centers. Quoting: "'Our view is it's better to have twice as much hardware that's not as reliable than half as much that's more reliable,' Dean said. 'You have to provide reliability on a software level. If you're running 10,000 machines, something is going to die every day.' Bringing a new cluster online shows just how fallible hardware is, Dean said. In each cluster's first year, it's typical that 1,000 individual machine failures will occur; thousands of hard drive failures will occur; one power distribution unit will fail, bringing down 500 to 1,000 machines for about 6 hours; 20 racks will fail, each time causing 40 to 80 machines to vanish from the network; 5 racks will "go wonky," with half their network packets missing in action; and the cluster will have to be rewired once, affecting 5 percent of the machines at any given moment over a 2-day span, Dean said. And there's about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover."

160 comments

Min score:

Reason:

Sort:

And the Network That Connects These Clusters? by eldavojohn · 2008-05-31 00:09 · Score: 4, Insightful

A surprisingly lengthy and revealing blog posting indeed. Quite informative and interesting.
While Google uses ordinary hardware components for its servers ... I would like to point out that the networking details were vastly overlooked. Information about the servers is interesting but when you're networking such a vast amount of computers together, I would be more interested in a quick graphic of how the IP addresses are layed out over 'a typical' cluster of 1,800 machines.

I understand distributed computing and I understand distributed searching. But the fact of the matter is that at some point at the top of the chain, you're usually transferring very large amounts of data--no matter how tall your 'network pyramid' is. The coding itself is no simple feat but I have heard rumors that Google was building their own 10-Gigabit ethernet switches since they couldn't find any on the market. You'll notice a lot of sites are just speculating but it certainly is a nontrivial problem to network clusters of thousands of computers with more than 200,000 in the whole lot and not require some serious switch/hub/networking hardware to back it.

--
My work here is dung.
1. Re:And the Network That Connects These Clusters? by magarity · 2008-05-31 01:32 · Score: 4, Insightful
  
  a quick graphic of how the IP addresses are layed out over 'a typical' cluster of 1,800 machines
  
  I'll bet they don't mess with tcp/ip - that's way too slow and bulky. Think Infiniband or some other switched fabric instead of heirarchical.
2. Re:And the Network That Connects These Clusters? by arktemplar · 2008-05-31 02:19 · Score: 3, Interesting
  
  Agreed, but their interconnect topology is what should be interesting not just the hardware, after all with simple topologies etc., there is a limit to how it scales efficiently, I have been doing some work on parallel processing for supercomputers as my undergrad thesis and believe me the major thing that differs amongst the top some 100 super computers is their interconnect topology not just their hardware.
  
  Also, their search algo is based on eigen values I think, a very very profitable algo to parallelize. what version of parallel libraries do they use ?
  
  --
  blog plug -> The Darker Side of Light
3. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 02:28 · Score: 0
  
  I bet google doesn't bother with inifinband or other exotic technologies. In fact, I would not be surprised if they stuck to 100bT instead of gigabit to keep power consumption and cost down. As far as I can tell, google's search servers are mostly autonomous, they do their work locally and report the finished result back, through a rather flat hierarchy and with very little if any communication between the nodes.
4. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 02:53 · Score: 0
  
  Think again... I know. I worked in a DC for 6 months.
5. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 03:02 · Score: 1
  
  And you forgot you signed an NDA when you started? :-)
6. Re:And the Network That Connects These Clusters? by chelsel · 2008-05-31 03:06 · Score: 1
  
  Maybe the NDA is expired? But, in that case why post AC?
7. Re:And the Network That Connects These Clusters? by dodobh · 2008-05-31 04:03 · Score: 2, Interesting
  
  AFAIK, Google uses Force10 switches for the networking infrastructure. Details are confidential though. I learnt this from the Force10 salesguy convinving me to buy their hardware.
  
  --
  I can throw myself at the ground, and miss.
8. Re:And the Network That Connects These Clusters? by Nethemas+the+Great · 2008-05-31 05:57 · Score: 3, Informative
  
  Here's what they used in 1998... A Wikipedia article explains a bit of what they're doing now...
  
  --
  Two of my imaginary friends reproduced once ... with negative results.
9. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 06:07 · Score: 1, Informative
  
  Typically Infiniband gets used for clustering and Fibre Channel for mass storage. They offer low latency 'lossless' connections. If your putting together a data center and you talk to the vendors on how to do that, they will steer you towards IB and FC. Of course they would, the tech works and it makes them very rich (huge profit margins on the hardware).
  
  I wouldn't be surprised to learn that Google is instead using 'Ethernet everywhere', it has some good advantages:
  
  1. Ethernet is cheap and scales in speed faster than anything else. 2009 will be a big year for 10GigE price reduction.
  2. You have Ethernet for your networking, why not use it for your clustering and storage as well - reduce cables, physical links, power...
  3. Google wants to cluster between data centers. Sounds like internetworking, but regardless using IB or FC between data centers is going to be a PITA.
  4. Ethernet currently doesn't ensure 'lossless' however, with some effort, your clustering traffic can be practically lossless anyway.
  5. 10Gig Ethernet is being enhanced to ensure lossless (priority based pause frames and congestion control)- enabling FCoE, but not limited to FCoE.
10. Re:And the Network That Connects These Clusters? by milsoRgen · 2008-05-31 06:52 · Score: 1
  
  While Google uses ordinary hardware components for its servers... I thought that was an interesting quote as well but for different reasons. Once I read the about failure rates I thought maybe the vendor wouldn't enjoy being mentioned. But sure enough there it was, good ol' Intel. I just wish they would of specified a bit more as to what they consider "ordinary hardware".
  
  --
  I'm sick of following my dreams. I'm just going to ask where they're goin' and hook up with 'em later.
11. Re:And the Network That Connects These Clusters? by corychristison · 2008-05-31 07:16 · Score: 1
  
  Desktop grade hardware.
  
  No 'enterprise grade' parts.
12. Re:And the Network That Connects These Clusters? by kd5ujz · 2008-05-31 07:31 · Score: 1
  
  I have some DOD level Linksys routers I will sell you, Details about the use of them in government infrastructure is confidential, but I can assure you they use them. :P
  
  --
  -William
  God is everything science has yet to explain.
13. Re:And the Network That Connects These Clusters? by agristin · 2008-05-31 10:14 · Score: 2, Informative
  
  No, they use gigabit and 10G ethernet. Infiniband is the opposite of cheap commodity hardware. Infiniband is expensive per port and not commodity.
  
  Google has a two vendor policy, I know some of their network gear for gig-e and 10G-e is Force10. Google and Force10 are both involved in the 802.3ba (40G and 100G), Force10 is on the IEEE committee and Google is one of the customers with demand, they may have a seat on the committee I don't really know all the members.
14. Re:And the Network That Connects These Clusters? by agristin · 2008-05-31 10:27 · Score: 1
  
  I've done a couple clusters of 2200 machines per cluster (small for google). I'd bet Google does geographic IP addressing, using the RFC1918 10.0.0.0/8 network. We did. With 40 or 80 servers in a rack we did L3 bounds pretty easily for every rack or so. Since L3 switching at the edge is cheap and fast, solves scaling at L2, and L3 routing protocols have quick predictable ways to route around failure, it was easy to aggregate. If you can subnet and supernet, you too can build huge networks for clusters without too much trouble.
  
  It really isn't that hard to build huge networks anymore. I wouldn't say it was non-trivial, but it didn't require as much smarts and research as building some really good software. The operational end can be a pain sometimes, but there are some really nice datacenter switches available now.
15. Re:And the Network That Connects These Clusters? by cheater512 · 2008-05-31 10:31 · Score: 1
  
  They use 10 GigE.
16. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 12:23 · Score: 3, Insightful
  
  Bwaahhahhahah. ARe you kidding?
  
  1) TCP/IP isn't really slow and bulky. It's one of the best protocols ever designed. With only minimal enhancements to the original protocol as designed, a modern host can achieve nearly line speed 10Gbit with pretty minimal CPU. We can push 900+Mbyte/sec from a single host. If you need more bandwidth, then do channel bonding.
  
  2) Infiniband? That costs at least $250-500 per node plus more for switches. Google is not going spend that kind of money for the limited benefits.
  
  I would suspect their in-house networking is actually pretty boring- standard TCP/IP with VLANs and LACP to make addressing easier and performance a bit higher.
17. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 12:23 · Score: 1, Interesting
  
  How much of googles not caring about hardware has more to do with the fact that for google it doesn't have any reason to? It doesn't really matter if every single page on the planet is indexed or a few million go missing here and there or a few terrabytes of data walks off you can just crawl new copies and be on your merry way.
  
  I agree entirely with the jist of the argument software based fault tolerance and scaling is a great thing and any meaningful scaling of applications really must done in an application specific context (not data tiers or hardware) with good software design.
  
  Having said that at a very high level most businesses with datacenters can't afford to play fast and loose with their processing loads where total avaliability of data is much more valuable to them than indexing some poor fools web site half way around the world. In many areas partial return or return of wrong data on a query is much much worse than no response. For practical reasons the differences in realities tends to make more expensive hardware a better fit for a specific practically *finite* task in the real world outside of google.
18. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-05-31 14:08 · Score: 1, Interesting
  
  How much of googles not caring about hardware has more to do with the fact that for google it doesn't have any reason to? It doesn't really matter if every single page on the planet is indexed or a few million go missing here and there or a few terrabytes of data walks off you can just crawl new copies and be on your merry way. Not much actually. You can't lose people's gmail, docs, images (picassa), their adwords and adsense accounts and ads. Ad clicks must be recorded with auditable durability, and auditable account data durability for advertisers and website publishers showing ads. Also, people get angry if you lose their custom settings in any of myriad services. Don't make the mistake that running a huge search company means that search is the only thing that needs to be done. There's a lot more that has to go on to support those systems, and some of them have pretty strict ACID requirements.
  
  If you don't believe that cheap components can still make a reliable system, read the GFS paper. GFS ends up more fault-tolerant than a bunch of RAID arrays.
  
  In the end, it all boils down to scale. If you have two servers, go ahead and buy the best ones you can. If you have 5 disks, put them in a RAID. If you have thousands however, duplication and replication on cheaper hardware is the way to go. Projects such as Hadoop mean you don't need to roll your own anymore either.
19. Re:And the Network That Connects These Clusters? by KwelDood · 2008-05-31 14:10 · Score: 1
  
  No 10-gig switch on the market? Look at Alcatel, Cisco, Extreme... or better yet do a google search!
20. Re:And the Network That Connects These Clusters? by dfj225 · 2008-05-31 16:40 · Score: 2, Interesting
  
  TCP is slow and bulky if dropped packets are a very rare thing. Confirming delivery of every packet results in a lot of wasted communication for the vast majority.
  
  My guess is that they use something else for internal communication. You can always recover from errors at the application level instead of forcing every packet to be confirmed.
  
  TCP is great for general communication over the Internet and not so great for specialized cases where performance is important, like at Google.
  
  --
  SIGFAULT
21. Re:And the Network That Connects These Clusters? by inKubus · 2008-05-31 17:21 · Score: 1
  
  Yeah, but they just have lots of load-balancing. Their index is just a huge lookup table (inverted index). Instead of actually asking the database for something like you might be thinking, they use a column-orientated database to store a lookup table of pages. So basically every search is two fast operations (find the word (search term), then go to the exact location in the DB and return the page), whereas actually building the lookup table takes forever. That's the genius of it (although it's 10 year old technology now, and was old when they "invented" it) the knew that no one expected the data to be real-time (it's impossible to crawl the internet in real time... Well, not *impossible*......), so they chose a method that would be intensive to set up, but fast to query. Not unlike storing a lookup table of logs or something.
  
  For the network, they could even use something slick like BROADCAST packets to copy a query 1000 times (in a UDP-type of packet [connectionless]) and whenever they arrive at a server, the server pulls whatever items it has that match and sends them back. Then if you just make sure that the least important results are located FARTHER from the search origin, the results come back LATER and you can automatically assume they need to go on the end of the list without having to perform comparison operations AT ALL..
  
  Anyway, 10G switches have been around for quite some time, for both Ethernet and ATM or SONET. And as another person suggested, why even use TCP/IP internally if you're writing the OS anyway? I mean, TCP/IP is a beast designed for slow, unreliable connections.
  
  --
  Cool! Amazing Toys.
22. Re:And the Network That Connects These Clusters? by Rogerborg · 2008-05-31 21:36 · Score: 1
  
  Is this satire? I honestly can't tell. I hope so.
  
  --
  If you were blocking sigs, you wouldn't have to read this.
23. Re:And the Network That Connects These Clusters? by jcaplan · 2008-05-31 23:36 · Score: 1
  
  They appear to be using TCP/IP. I was reading this
  paper (PDF) about GFS which says, in part,
  A large chunk size offers several advantages.
  First, it reduces clients' needs to interact with the master
  because reads and writes on the same chunk require only
  one initial request to the master for chunk location informa-
  tion. The reduction is especially significant for our work-
  loads because applications mostly read and write large files
  sequentially. Even for small random reads, the client can
  comfortably cache all the chunk location information for a
  multi-TB working set. Second, since on a large chunk, a
  client is more likely to perform many operations on a given
  chunk, it can reduce network overhead by keeping a persis-
  tent TCP connection to the chunkserver over an extended
  period of time. Third, it reduces the size of the metadata
  stored on the master. This allows us to keep the metadata
  in memory, which in turn brings other advantages that we
  will discuss in Section 2.6.1.
  
  (from page 3, section 2.5)
  
  -Jon
24. Re:And the Network That Connects These Clusters? by Anonymous Coward · 2008-06-01 02:24 · Score: 1, Interesting
  
  More nonsense. TCP returns incredibly quickly when there are dropped packets. The amount of data for ACKing packets on "perfect" networks is tiny relatively to the volume being pushed. Go look at the documents on TCP performance, if you can get 900+Mbyte/sec on a 10Gig host, you might as well go home for the day with a job well done feeling.
  
  TCP's costs are ones I'm happy to pay any day. Even internally, I really doubt that Google would have implemented their own in-order reliable delivery protocol. They might tweak the stack details a little bit to get a bit more performance, but I really doubt they've implemented GoogleCP instead of TCP.
25. Re:And the Network That Connects These Clusters? by dodobh · 2008-06-01 18:02 · Score: 1
  
  Given that the guy is asking me to contact Google as a reference, I suspect that he isn't lying.
  
  From http://code.google.com/soc/2008/freebsd/about.html :
  Relevance to Google : Google has many tens of thousands of FreeBSD-based devices helping to run its production networks (Juniper, Force10, NetApp, etc..), MacOS X laptops, and the occasional FreeBSD network monitoring or test server.
  
  --
  I can throw myself at the ground, and miss.
Failure tolerance vs. failure prevention by Anonymous Coward · 2008-05-31 00:14 · Score: 1, Interesting

At what point is skimping on hardware because the system is failure tolerant costlier than using more reliable hardware?
1. Re:Failure tolerance vs. failure prevention by Vectronic · 2008-05-31 00:25 · Score: 3, Insightful
  
  Interesting, but I would probably venture a guess: never.
  
  Unless of course you are talking about P2's and ISA's, and its not a matter of "reliability" I dont think, it could easily be argued that a $200 [component] is just as reliable as a $500 [component] I think mostly what they are doing, is buying 3 of something cheaper, instead of one of something greater.
  
  Component A:cheaper, less cutting edge (generally more reliable)
  
  Component B: Has 3 times the power, 3 times the load, costs 3 times as much.
  
  If a single component A fails, there is still 2 running (depending on the component) and thus a 33% loss in performance, a third the of total cost to replace (making it like a 6th of the costs compaired to component B)
  
  If component B fails, 100% loss, complete downtime, 100% expense. (relatively)
2. Re:Failure tolerance vs. failure prevention by PerspexAvenger · 2008-05-31 00:28 · Score: 5, Insightful
  
  It's a lot easier and cheaper to make failure-tolerant software if you're looking at system functionality on a cluster/datacentre level than it is to ensure all your hardware is bulletproof.
  Hardware will fail - it's up to the intelligence of the overlaid systems to mitigate that.
3. Re:Failure tolerance vs. failure prevention by dotancohen · 2008-05-31 00:53 · Score: 4, Funny
  
  At what point is skimping on hardware because the system is failure tolerant costlier than using more reliable hardware? Google is not skimping on hardware. They are simply not trusting hardware to be reliable. Actually, they are buying twice as much hardware as they would otherwise need, according to TFA. Er, not that I read it or anything, I swear,....
  
  --
  It is dangerous to be right when the government is wrong.
4. Re:Failure tolerance vs. failure prevention by The+Second+Horseman · 2008-05-31 01:00 · Score: 5, Interesting
  
  It depends on the kind of applications you're running. Google is something of a singular case. A lot of businesses need to run a lot of small servers for dissimilar applications, not similar ones. If you're talking about business apps that don't play well together on a single server and you virtualize them, you can get a pair of 8-core servers (something like an HP Proliant DL380 G5) with an extra NIC, fibre channel HBA and 32 GB of RAM, plus local SAS drives.
  You can easily run a dozen large VMs on one of those with room to spare (assuming some of them have 2GB or 3GB of RAM allocated to them). If you limit it to ten per box, that's twenty VMs, and you can migrate servers between them or fail them over in case of a fault. Those DL380's (if you have dynamic power savings turned on) can average under 400 watts of power draw each - so 40 watts per server. In our environment, we've got 5 hosts running a ton of VMs, some of which don't have to fail over (layer 4-7 switch, also a VM), so we're getting closer to 25 or 30 watts per VM. We'd have the SAN array anyway for our primary data storage, so that wasn't much of an extra. We're using fewer data center network ports, and few fibre channel ports. We've actually been able to triple the number of "servers" we're running while actually bringing energy use down as we've retired more older servers and replaced them with VMs. And it's been a net increase in fault tolerance as well.
5. Re:Failure tolerance vs. failure prevention by cp.tar · 2008-05-31 01:05 · Score: 4, Funny
  
  Actually, they are buying twice as much hardware as they would otherwise need, according to TFA. Er, not that I read it or anything, I swear,....
  Don't worry, your secret is safe with us.
  
  Real Slashdotters not only fail to read TFAs, but they also completely miss any and all relevant information in other people's posts.
  Therefore, someone may hook on your claim that Google is not skimping on hardware and try to argue that they, in fact, do. Your admission to having read TFA will go completely unnoticed.
  
  And before you ask yourself how come I noticed it: I didn't.
  And besides, I'm new here.
  
  --
  Ignore this signature. By order.
6. Re:Failure tolerance vs. failure prevention by SpinyNorman · 2008-05-31 01:12 · Score: 5, Insightful
  
  You could say that Google is taking advantage of the fact that hardware is unreliable to reduce cost.
  
  With server farms the size of Google's, failures are going to occur daily regardless of how "fault-tolerant" your hardware is. Nothing is 100% failure free. Given that failures will occur, you need fault tolerance in your software, and if your software is fault tolerant, then why waste money on overpriced "fault-tolerant" hardware? If you can buy N cheapo servers for the price of 1 hardened one, then you'll typically have N times the CPU power available, and the software makes them both look as reliable.
7. Re:Failure tolerance vs. failure prevention by ioshhdflwuegfh · 2008-05-31 01:20 · Score: 1
  
  You forgot to incorporate life-time of components into your calculation. Same everything but longer life time and more expensive component should become cheaper in the long-run and/or large-scale use.
8. Re:Failure tolerance vs. failure prevention by Anpheus · 2008-05-31 01:21 · Score: 2, Insightful
  
  You're also paying through the nose for every extra nine of uptime.
  
  That's not to say it's impossible, IBM, HP, any of the "big iron" companies can offer you damn near 100% uptime without major changes to your software.
  
  But be prepared to pull out the checkbook. You know, the REALLY BIG one that is only suitable for writing lots of zeroes and grand prize giveaways.
9. Re:Failure tolerance vs. failure prevention by TheRaven64 · 2008-05-31 01:21 · Score: 4, Interesting
  
  It depends on how much downtime costs you. If Google is down for five seconds, no one will notice - they will just assume that their link is slow, blame their ISP, and hit refresh. If a telecom's billing system or a bank's transactional system is down for five seconds then they are likely to lose a lot of money. The only difference between doing this kind of thing in hardware and software is the fail-over time and the cost. Google take a slower fail-over time in exchange for lower costs. For them and for 99.9% of businesses, it makes perfect sense. The remaining 0.1% are the reason IBM's mainframe division is so profitable.
  
  --
  I am TheRaven on Soylent News
10. Re:Failure tolerance vs. failure prevention by ioshhdflwuegfh · 2008-05-31 01:37 · Score: 1
  
  Actually, they are buying twice as much hardware as they would otherwise need, according to TFA. Er, not that I read it or anything, I swear,.... I don't know whether you can read or not, but your post surely makes no sense: you're saying that you've read that google buys more hardware that they would otherwise need. Other than what? Do they buy hardware that they don't need? I don't think so, that does not make sense. Okidoki?
11. Re:Failure tolerance vs. failure prevention by Znork · 2008-05-31 01:58 · Score: 4, Insightful
  
  I think mostly what they are doing, is buying 3 of something cheaper, instead of one of something greater.
  
  From what it looks like they're doing exactly what I do for myself; skip the extraneous crap and simply rack motherboards as they are.
  
  In that case we're not talking 3 of something cheaper; you could probably get up towards 5-10 of something cheaper. Then consider that best price/performance is not generally what is bought, and the difference is even wider.
  
  Of course, it's not going to happen in the average corporation, where most involved parties prefer covering their ass by buying conventional branded products. Point out to your average corporate purchaser or technical director that you could reduce CPU cycle costs to 1/25 th, and that you could provide storage at 1/100th of the current per gigabyte cost and they'll whine 'but we're an _enterprise_, we cant buy consumer grade stuff or build it ourselves'.
  
  Ten years ago people brought obsolete junk from work home to play with. These days I'm considering bringing obsolete stuff from home to work because the stuff I throw out is often better than low-prioritized things at work.
12. Re:Failure tolerance vs. failure prevention by jo42 · 2008-05-31 03:22 · Score: 0, Troll
  
  they are buying twice as much hardware as they would otherwise need In other words they are wasteful and Environmentally Evil. Lazy Google. Bad Google.
13. Re:Failure tolerance vs. failure prevention by dotancohen · 2008-05-31 03:51 · Score: 1
  
  I don't know whether you can read or not, but your post surely makes no sense: you're saying that you've read that google buys more hardware that they would otherwise need. Other than what? otherwise: Other than had they bought so-called reliable hardware. Now, my reading skills may be in doubt, but assuming that you read TFA your comprehension skills are in doubt as well.
  
  --
  It is dangerous to be right when the government is wrong.
14. Re:Failure tolerance vs. failure prevention by dotancohen · 2008-05-31 03:52 · Score: 1
  
  they are buying twice as much hardware as they would otherwise need In other words they are wasteful and Environmentally Evil. Lazy Google. Bad Google. I hear that they recycle the broken servers into Symbian machines and post on pornotube.com. Still evil?
  
  --
  It is dangerous to be right when the government is wrong.
15. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-05-31 04:36 · Score: 3, Insightful
  
  First let me state that I'm a mainframe systems programmer and a true believer of this technology. IMHO Google should start looking at mainframe based virtualization instead of the server farms they currently depend on.
  
  One z10 complex with 64 CPU's, 1.5 TB of memory, can support thousands of Linux instances all communicating with each other using hypersocket technology. Hypersockets uses microcode to enable communications between environments without going to the actual network.
  
  A z10 processor complex is as close to 100% fault tolerant as possible, energy efficient, cost effective when compared to the total cost of the alternatives.
16. Re:Failure tolerance vs. failure prevention by Anonymous Coward · 2008-05-31 05:21 · Score: 0
  
  With server farms the size of Google's, failures are going to occur daily regardless of how "fault-tolerant" your hardware is. Nothing is 100% failure free. Given that failures will occur, you need fault tolerance in your software, and if your software is fault tolerant, then why waste money on overpriced "fault-tolerant" hardware? If you can buy N cheapo servers for the price of 1 hardened one, then you'll typically have N times the CPU power available, and the software makes them both look as reliable.
  
  You're forgetting that different components fail at different rates, and the cost of redundancy varies greatly by component.
  
  The most likely things to fail in a server are the power supply and hard disks. For not a lot of money, you can buy servers with redundant power supplies and a raid array. You are now protected from the vast majority of hardware failures.
  
  The price difference between ECC memory and regular memory isn't that big. Some of my Dell servers even have hot-standby ram - if the memory starts to go bad, the dimm is disabled and a spare is brought online.
  
  I've never seen an ethernet card in a server fail, but it's very easy to set up two cards with failover or load-balancing.
  
  On the other hand, CPU failure is very hard to handle. You can buy big systems from Stratus or IBM that can handle CPU failures and do hotswap replacements, but they're very expensive. But how often do CPUs fail?
17. Re:Failure tolerance vs. failure prevention by bishiraver · 2008-05-31 06:21 · Score: 1
  
  The most likely things to fail in a server are the power supply and hard disks. For not a lot of money, you can buy servers with redundant power supplies and a raid array. You are now protected from the vast majority of hardware failures.
  When you're running on anything ginormous-scale, you don't really care about local raid all that much, especially if the data is massively replicated in the datacenter.
  
  In fact, you may not even care if an individual machine breaks down - you just unplug it because it happens so often. When you're dealing with 10's of thousands of machines, that shit just happens constantly. You don't even care until it takes a rack down - at which point you take the rack out, put a new one in, and see if you can create some frankenzombies out of the bits left over to recoup costs.
18. Re:Failure tolerance vs. failure prevention by Serpentegena · 2008-05-31 07:10 · Score: 1
  
  What is software? A lot of ones and zeroes, according to Neal Stephenson. What does it take to deploy it? A brain, an input device, a compiler, a basic hardware infrastructure to run it. If well-written, it can outlast the hardware it was installed on. Example: Delrina WinFax.
  
  OTOH, hardware is *designed* to fail. If HP servers built and sold 5 years ago were infallible, why would one need to replace them? It would take away sources of revenue like warranty, parts, service.
  Warranty money is free money. Duh.
  Parts are less profitable, but unavoidably necessary and "big iron" is aware of it. They rely on a constant cycle of upgrades to existing hardware, and continue their business by replacing flawed machines they themselves built.
  
  --
  Microsoft put the "sucks" in "success".
19. Re:Failure tolerance vs. failure prevention by Anpheus · 2008-05-31 07:40 · Score: 1
  
  You realize that the computing requirements do change, do increase over time?
  
  One of the things big iron provides is a clear update cycle without sacrificing those 9s, as well. You don't have to worry about whether or not the latest batch of Dell machines is going to have bad capacitors that will incur 10% more expenses. No, you pay for all of the potential costs up front, at once, for high reliability.
  
  For a lot of big businesses this makes a lot of sense to them. It's reliable, it doesn't depend on network technicians and system administrators and properly inspected requisitions for new equipment. It doesn't require a lot of the overhead that in-house departments would have to take on. Instead, they give you a big ol' box and a number to call if you manage to break it.
20. Re:Failure tolerance vs. failure prevention by mrbooze · 2008-05-31 08:07 · Score: 1
  
  Of course, it's not going to happen in the average corporation, where most involved parties prefer covering their ass by buying conventional branded products. It's not *just* ass-covering, although there's definitely some of that. Average corporations also do not *remotely* employ enough IT staff do be doing the sort of constant maintenance and replacements as Google is doing, not to mention the engineers doing testing and design of the specialized architecture, etc. And IT is often one of the first groups up against the wall when it's time to shore up numbers for the fiscal year.
  
  I've worked with managers who believed very much in the commodity hardware philosophy, that they'd rather spend money on more technicians who can fix things than on support contracts with vendors. This is a laudable goal and one I wish could work reliably in many corporations, but it has one fatal flaw: Support contracts can't be laid off. As the annual corporate layoffs kept picking off more and more IT staff, and more and more IT work is transitioned to the Indian offices or wherever the next offshore hotspot will be, he gets stuck with machines basically built from off-brand parts and insufficient local support staff to keep up with the work of maintaining them, since what staff remains is also picking up all the other work from the laid-off employees.
  
  And ironically, it's harder to get budget approval for support contracts after the fact. Also odds are good that the budget he got assigned 6 months ago didn't include costs of those support contracts.
21. Re:Failure tolerance vs. failure prevention by Anonymous Coward · 2008-05-31 08:14 · Score: 0
  
  OTOH, hardware is *designed* to fail. If HP servers built and sold 5 years ago were infallible, why would one need to replace them? It would take away sources of revenue like warranty, parts, service.
  
  Take off your tin-foil hat and think for a moment.
  
  Why replace a 5-year old system in perfect working order? In 5 years, computers have gotten a hell of a lot faster and cheaper, while IT needs keep increasing (more data, bigger databases, more video, more complex software).
  
  Modern gear is far more efficient in terms of speed, space, heat and electricity. That matters to many people.
  
  Why replace your 5-year old desktop in perfect working order? It doesn't run the latest software. That matters to many people.
  
  Why replace your 5-year old laptop in perfect working order? A modern laptop is much faster & lighter. That matters to many people.
  
  Last year I finally disconnected some rackmount 3com 10/100 ethernet hubs (yes, hubs) and replaced them with gigabit switches that take less rackspace, use less power, and are much faster. The hubs have been working continuously for 9 years with no problems aside from some cooling fan failures.
  
  I'd be surprised if you can get $20 for these hubs - they are slow & obsolete. As a engineer, you might look at these hubs and say too much effort was placed into making them reliable since their reliability has outlived their usefulness. Incidentally, they have a lifetime warranty (aside from the fans).
22. Re:Failure tolerance vs. failure prevention by mrbooze · 2008-05-31 08:22 · Score: 1
  
  Coincidentally, I was at a ZFS talk recently, and the Sun employee giving the presentation was running some version of Solaris on her laptop, and it happened to pop up a warning about a CPU fault, but it was just a warning and the system kept working around it, albeit with some performance hit. The presenter mentioned it when it happened and claimed that Solaris was the only non-mainframe OS that would do this. A claim that I'm not informed enough to evaluate for accuracy, but certainly sounded suspicious.
  
  (This was not a full "The CPU is on fire" type of failure, obviously. It was some sort of intermittent problem with a component or the bus or something I didn't understand. Supposedly after identifying the faulty component Solaris was simply working around it by doing extra work in software.)
23. Re:Failure tolerance vs. failure prevention by Serpentegena · 2008-05-31 08:44 · Score: 1
  
  Good point about computing requirements. However, what end-users *buy* does not necessarily reflect what they *need*.
  
  Example 1: PC-refresh programs. Why should these even exist in a three-tier architecture? The apps and databases run on dedicated servers. Desktop needs at the staff level have been the same for years. What is this, fashion?
  
  Example 2: designed-for-Windows hardware. Is it really cost-effective to upgrade to 2.4G dual-core proc, 2-3G of RAM for Vista? How much cheaper would it be to build around a 3G single-core proc from 3 years ago and run XP instead? No, wait. Some businesses still run Windows 2K on upgraded hardware. I wonder why.
  
  If the IT sales channel didn't exist, and every enterprise client dealt directly with the engineering team at HP/IBM/wutev, I would agree with you wholeheartedly. But that level of honesty and openness is just bad for business. Gear capacity is routinely oversold under pretext of scalability. In reality, by the time you need that capacity, that device will be obsolete.
  As for sacrificing the nines, that's why they have testing environments.
  
  --
  Microsoft put the "sucks" in "success".
24. Re:Failure tolerance vs. failure prevention by Anpheus · 2008-05-31 08:58 · Score: 1
  
  And honestly, if Vista means more effective, more complete controls over security policies, then I think it's totally worth whatever cost it takes to get it to run. There are alternatives, of course, if they wanted to run some really hardcore SELinux configurations, but it's frankly easier to have a homogeneous user configuration and with Vista at least, more secure.
  
  Yes there are flaws, but we're discussing what pros exist. There are definite cons here, and I would be very inclined to agree with you that desktop computing power is oversold, and I'm sure big iron is more than adequate for any mid-size business that uses it. Large businesses that are able to adequately judge needed capacity and jump-start programs with full capacity can very much go to these mainframe manufacturers and say, we need this much throughput, this much whatever. Big iron is -not- for even medium sized businesses and I think IBM recognizes that.
  
  Clusters on the other hand, make perfect sense for startups. Cheap, easily replaced, don't need to be homogeneous, etc.
25. Re:Failure tolerance vs. failure prevention by cheater512 · 2008-05-31 10:40 · Score: 1
  
  Isnt a single box running thousands of virtual environments which are then running clustering software just a tad redundant?
  
  Anyway its far cheaper and has better bang for buck for Google to use cheap nasty hardware than your exotic stuff.
  Remember that even if they did use what your suggesting, they'd still need thousands of them.
26. Re:Failure tolerance vs. failure prevention by SuperQ · 2008-05-31 10:43 · Score: 1
  
  So how much does that z10 cost? What's the physical footprint? 1.5T of RAM is a good target for comparison.
  
  HP DL160 G5: $6672 USD
  Low power dual quad-core, 4x 500GB disk, 32GB ram.
  
  Say Google gets a good discount for quantity, maybe 25%.. $5000 each.
  
  That seems like a simple enough commodity server these days.. A rack of 40 machines would come out to $200,000 USD, add another $50k for misc stuff and switching gear (rack and core)
  
  Each rack now has 1.25TB of RAM, 320 cores, 80TB of disk (who needs FC or iSCSI when you have your own distributed FS and DB). The rack also has a power footprint of about 20KW under full load.
  
  I also don't see why thousands of VMs is a good thing. If each VM runs a linux kernel and basic daemons, you'll need about 100MB of ram each. There's 5-10% of your RAM gone for doing real work.
  
  If a z10 with 1.5TB of RAM and 80T of disk fits in a single 19" rack, there's a small density win, but I doubt you can get a z10 with those specs for $250k USD.
27. Re:Failure tolerance vs. failure prevention by Anonymous Coward · 2008-05-31 11:23 · Score: 0
  
  Do a Google search and find the presentation on Google MapReduce. One of the slides is a side-by-side comparison: one expensive and reliable Sun server, compared to an entire rack full of 1U servers. The rack full of 1U servers (that's 42 servers IIRC) costs less than the Sun box, and each 1U has two CPUs and 4 GB of memory. Add it up and the rack of boring hardware is staggeringly more powerful than the Sun box, and with Google's custom software stack, it's completely reliable.
28. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-05-31 11:40 · Score: 1
  
  You are correct, a z10 with all the other "stuff" is a lot more (not even in the same ballpark) as racks of massive servers.
  
  Some of the things you do gain however are the ability to create logical partitions (max of 64) on a z10. Each partition can share the CPU's with the the hipervisor dispatching physical CPU's to each partition as necessary to dispatch work effectively.
  
  Each partition can run zVM, virtualization software with 40 years development behind it. Each zVM instance can support the above mentioned Linux guest machines by the hundreds or thousands. zVM can easily run at 100% cpu busy and drive all the guests without any performance issues. You can't run normal servers at 100% and expect good results.
  
  Need another guest, run a script under zVM and boot it up. Total time to create a new guest is measured in the minutes. Total cost for each new guest is close to zero.
  
  I can fit a z10 with enough DASD to make it interesting in my 12x12 home office (might be a little tight though) I couldn't fit the google server farm in my house.
  
  Yes just looking at the initial numbers a z10 doesn't make financial sense buy once the total cost of ownership is worked out, server administrators, power/cooling requirements, software license fees a z10 can be very competitive.
29. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-05-31 11:48 · Score: 1
  
  One of the reasons why Google has to run clustering software is to fight against the expected hardware failures. A modern mainframe with modern DASD devices simply doesn't fail in such a way that takes stuff down.
  
  In my 13 years at my current position we have had one hardware failure that took a box down. IBM was as upset as we were at the failure, diagnosed what the problem was and fixed it not only for us for for everyone else so it would never occur again.
  
  A handful of fully loaded z10's could easily support Google's entire computing requirements.
30. Re:Failure tolerance vs. failure prevention by cheater512 · 2008-05-31 12:02 · Score: 1
  
  If 'a handful' is has less than 5 digits in its number then your sorely mistaken.
31. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-05-31 12:24 · Score: 1
  
  Well lets do some math here;
  
  One z10, 64 lpars each running an instance of zVM. Each zVM can support a couple of thousand virtual Linux guests.
  
  Now take five of them. 64*5*2500=800000 Linux images.
  
  It boggles my mind and I've been an IT professional for almost thirty years.
32. Re:Failure tolerance vs. failure prevention by SuperQ · 2008-05-31 13:00 · Score: 1
  
  Yup, and all that stuff is basically worthless because you can spawn tasks on a cluster in seconds without any need for the wasteful overhead of VMs. All you need is a good task scheduler.
  
  Even at the scale of a couple of racks (an average 12x12 colo cage at savvis or some place) you get many times the compute/storage capacity of a z10.
  
  It's interesting technology, but completely wasteful when it comes to $/work
33. Re:Failure tolerance vs. failure prevention by mtmra70 · 2008-05-31 14:03 · Score: 1
  
  You must have sold this crap to my works IT department. Let me tell you, with the problems they have with servers the last thing they needed was having a bunch of servers on one server. One VM server goes down and takes out 30 applications instead of 1.
  
  Yea, great stuff....
34. Re:Failure tolerance vs. failure prevention by Anonymous Coward · 2008-05-31 16:10 · Score: 0
  
  Google gross revenue in Q4: 5,186,043,000
  Minutes in Q4: 131,400
  $/minute: $39467
  
  I'd wager Google's billing system has pretty expensive downtime compared to anything but a financial company.
  
  Luckily for them, failover in software can actually be done quite fast; All you need are reliable frontends. The backends, of which there are many more for any interesting search or database problem, can be composed of cheaper hardware. The frontend then manages which backends are live, and with replication there doesn't ever need to be downtime with missing data. See TFA's example of GFS, where one machine manages thousands.
35. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-06-01 02:17 · Score: 0
  
  Mainframes unlike consumer or even business grade servers simply don't fail in the way you described.
36. Re:Failure tolerance vs. failure prevention by mtmra70 · 2008-06-01 06:02 · Score: 1
  
  I work at the worlds largest pharmaceutical company. And Im talking NT boxes, not 400s or *nix
37. Re:Failure tolerance vs. failure prevention by enoz · 2008-06-01 14:39 · Score: 1
  
  And besides, I'm new here. Lying about your age for karma? I've seen your 6-digit UID.
38. Re:Failure tolerance vs. failure prevention by CoughDropAddict · 2008-06-01 18:11 · Score: 1
  
  "Linux images" is not a meaningful measure of computing resources. Things like bytes of RAM, bytes of disk space, and megahertz of CPU are.
  
  The smallest Linux image can run in 8MB of RAM, so if I get 100 servers with 64GB of RAM I could theoretically fire up 800,000 Linux images on them. Does this mean my 100 servers are as powerful as your five z10s?
39. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-06-01 22:56 · Score: 1
  
  And do what work? The 800,000 Linux servers I mentioned would have 4-8GB of virtualized storage and be able to run Oracle or DB2 databases, webserving, or any other production work you could throw at it.
40. Re:Failure tolerance vs. failure prevention by Anonymous Coward · 2008-06-02 07:49 · Score: 0
  
  It depends on how much downtime costs you. If Google is down for five seconds, no one will notice - they will just assume that their link is slow, blame their ISP, and hit refresh. If a telecom's billing system or a bank's transactional system is down for five seconds then they are likely to lose a lot of money. The only difference between doing this kind of thing in hardware and software is the fail-over time and the cost. Google take a slower fail-over time in exchange for lower costs. For them and for 99.9% of businesses, it makes perfect sense. The remaining 0.1% are the reason IBM's mainframe division is so profitable. The difference is that GFS allows Google to never "go down", even in the event of massive hardware failure. The software simply rebalances the load, and keeps going. It might slow down, but even at that point, it's probably faster than most financial systems.
41. Re:Failure tolerance vs. failure prevention by drsmithy · 2008-06-02 10:07 · Score: 1
  
  Say Google gets a good discount for quantity, maybe 25%.. $5000 each.
  They'll get a lot more than that. Heck, we typically get 25% and we're nobody - we maybe buy 50-60 servers a year from Dell.
42. Re:Failure tolerance vs. failure prevention by Areyoukiddingme · 2008-06-03 04:40 · Score: 1
  
  I think Google's business model is the proof that your theory doesn't hold up. I'm absolutely certain they've looked at z10s and everything else on the market. I'm just as certain that they looked at some hard numbers to make the decisions they've made. The z10 obviously costs more than you think it costs vs commodity hardware, or Google would be using them.
  
  I can guess where your estimate falls down, too. Software licensing fees. z10 software costs an arm and a leg, and it's an annual fee. All of those things you talk about, like zOS and zVM and such cost a LOT of money. Google runs Linux. Software licensing fee: $0 Google wrote and runs GFS. Software licensing fee: $0. Google wrote and runs MapReduce and BigTable. Software licensing fee: $0. Google had to write that software. It didn't exist before they wrote it. Their alternative was to pay to have it developed, then pay licensing fees. Can you imagine the incredible price tag a company like IBM would quote for that kind of development and maintenance work? I can. It'd be enough to make Google's entire business infeasible. Once Google had to hire the native talent to write that kind of code, having the talent write a little more code to distribute jobs was a no-brainer. Why pay for zVM when you can roll your own?
43. Re:Failure tolerance vs. failure prevention by jacobsm · 2008-06-03 05:01 · Score: 1
  
  z10 is a higher cost item than commodity hardware. There is no question about it, but you get what you pay for. Approximately 100% uptime, massive I/O support, concurrent hardware upgrades, concurrent microcode updates. World class engineering backed by a major hardware vendor that you might have heard of.
  
  Since I never mentioned zOS I won't discuss pricing but you are correct zVM does have a cost involved to it. But once again you get what you pay for. Over 40 years of virtualization development support YOUR vital business.
  
  z/VM V5.3 is designed to offer:
  
  * Improved scalability and constraint relief
  o Support for more than 128 GB real storage
  o Up to 32 real processors in a single z/VM image
  o Enhanced memory management for Linux guest
  o Enhanced memory utilization using VMRM between z/VM and Linux guests
  o HyperPAV support for IBM System Storage DS8000
  o Enhanced FlashCopy support
  * Virtualization technology and Linux enablement
  o Support for IBM System z specialty engines (processors)
  o Enhanced VSWITCH and guest LAN usability
  o Modified Indirect Data Address Words (MIDAWs) for guests
  o Guest ASCII console support
  o Enhanced SCSI support
  * Network virtualization
  o Improved virtual network management
  o Enhanced failover support for IPv4 and IPv6 devices
  o Virtual IP Address (VIPA) support for IPv6
  * Security
  o Delivery of LDAP server and client
  o Enhanced system security with longer passwords
  o Conformance with industry standards
  o SSL server enhancements
  o Tape data protection with support for encryption
  * Systems management
  o Enhanced management functions for Linux and other virtual images
  o New function level for DirMaint
  o Enhancements to the Performance Toolkit
  o Enhanced guest configuration
  
  Linux under zVM is available for FREE, but you will pay for support, just like everywhere else. Do you think that Google doesn't have a support contract somewhere. As far as their customized software goes it would have been easier and cheaper to write if the underlying hardware and software was as reliable as a z10 and zVM.
44. Re:Failure tolerance vs. failure prevention by Areyoukiddingme · 2008-06-03 07:34 · Score: 1
  z10 is a higher cost item than commodity hardware. There is no question about it, but you get what you pay for. Approximately 100% uptime, massive I/O support, concurrent hardware upgrades, concurrent microcode updates. World class engineering backed by a major hardware vendor that you might have heard of.
  That right there is the only reason to pay for z10s. 99.99999% uptime is worth some money, if you have to have it. It's just that so very few businesses actually require it. Google certainly doesn't. Hence their architecture choice. Other people have pointed out that if you write your software correctly, it can make unreliable hardware appear to be reliable, as long as you have enough of it. So even that reason for buying a z10 only applies if your problem can't be broken up in a fault tolerant manner.
  
  Since I never mentioned zOS I won't discuss pricing but you are correct zVM does have a cost involved to it. But once again you get what you pay for. Over 40 years of virtualization development support YOUR vital business.
  z/VM V5.3 is designed to offer:
  
  Improved scalability and constraint relief
  
  Support for more than 128 GB real storage
  Up to 32 real processors in a single z/VM image
  Enhanced memory management for Linux guest
  Enhanced memory util...
  blah blah marketing copy/paste
  
  ...Performance Toolkit
  
  o Enhanced guest configuration
  Linux under zVM is available for FREE, but you will pay for support, just like everywhere else. Do you think that Google doesn't have a support contract somewhere. As far as their customized software goes it would have been easier and cheaper to write if the underlying hardware and software was as reliable as a z10 and zVM.
  Of course they don't have a support contract. Support contracts are for companies whose core business isn't IT. Google's business is managing servers and software. They'd be idiots to contract that out. The problem with z10s is the support contract isn't optional.
  I dispute the idea that writing their software would be either easier or cheaper on a z10. The second reason to buy a z10, after the reliability, is the sheer scale of the compute space available. zVM and virtual Linux machines is a waste of its potential. To fully utilize a z10, and therefore make it worth the money, requires purpose-built software written specifically for it. That's both hard and expensive, because there aren't very many software developers who can write for the z10.
  Developers who can write for Linux are relatively common. If you're going to run Linux anyway, it's foolish to run it in emulation when the software processes you're dealing with are not only the same everywhere but heavily interlinked with one another. Linux is a commodity OS for commodity hardware. I understand why IBM decided to make it run on z10s, but I think they're shooting themselves in the foot in doing so. A z10 is NOT commodity hardware. Trying to pretend it is by making it more "accessible" with Linux only adds fuel to the fire I'm busily fanning. If my problem can be solved using a bunch of Linux machines, why would I pay to put them on incredibly expensive hardware in VMs when I could run them natively in dedicated boxes for so very much less. Especially when for just a little more effort, my software can be fault tolerant, Google-style. A cluster of Linux VMs inside a z10 begs to be a cluster of Linux machines on commodity hardware.
  IBM went about solving their developer problem the wrong way. Rather than fragment and cripple the capabilities of the z10 with VMs, they should be running their own Summer of Code program. Put a z10 on the Internet and start handing out login IDs to anybody with a project proposal. Let high school kids and college kids log in and screw around with it. Give them a mountain of documentation
Traffic Patterns for Google by Anonymous Coward · 2008-05-31 00:21 · Score: 0

I'd like to see the traffic patterns for their data centers. Our University has a daily and weekly pattern, no surprise there, but I wonder how much their traffic changes through the night.
1. Re:Traffic Patterns for Google by dotancohen · 2008-05-31 00:55 · Score: 1
  
  I'd like to see the traffic patterns for their data centers. Our University has a daily and weekly pattern, no surprise there, but I wonder how much their traffic changes through the night. There is no 'night' and 'day' for a worldwide internet-based organization such as google. When you have night, someone else has day. Both of you use google.
  
  --
  It is dangerous to be right when the government is wrong.
2. Re:Traffic Patterns for Google by ioshhdflwuegfh · 2008-05-31 01:06 · Score: 1
  
  I'd like to see the traffic patterns for their data centers. Our University has a daily and weekly pattern, no surprise there, but I wonder how much their traffic changes through the night. There is no 'night' and 'day' for a worldwide internet-based organization such as google. When you have night, someone else has day. Both of you use google. and the distribution of google uses is uniform across all the meridians of this world all the time---if this is not globalization then I don't what is: no night, no day, no east, no west, no nothin, just the steady state gooogling all the time everywhere...
3. Re:Traffic Patterns for Google by tristian_was_here · 2008-05-31 01:11 · Score: 3, Funny
  
  I bet certain trends happen at night
4. Re:Traffic Patterns for Google by eebra82 · 2008-05-31 01:43 · Score: 5, Insightful
  
  There is no 'night' and 'day' for a worldwide internet-based organization such as google. When you have night, someone else has day. Both of you use google. Google consists of dozens of data centers spread out over the planet. Therefore, Asian Google users connect to Asian data centers and not American ones. Because of this, traffic will obviously vary greatly over a 12 hour period.
  
  And even if you think of Google as a whole, it is significantly more popular in Europe and the US than it is in Asia, so you would still have uneven traffic rates.
  
  --
  Full Tilt
5. Re:Traffic Patterns for Google by mrbooze · 2008-05-31 08:25 · Score: 1
  
  But Google has data centers distributed all over the world. The question would be does, say, the Chicago data center have a particular traffic pattern that is distinct from a data center in Shanghai or Greenland or wherever else Google might have DCs.
6. Re:Traffic Patterns for Google by dotancohen · 2008-05-31 08:56 · Score: 1
  
  But Google has data centers distributed all over the world. The question would be does, say, the Chicago data center have a particular traffic pattern that is distinct from a data center in Shanghai or Greenland or wherever else Google might have DCs. I'm sure that not very many people know the answer to that. But Google has been buying quite a bit of fiber, and the benefits of distribution I do not need to explain.
  
  --
  It is dangerous to be right when the government is wrong.
Hard drive failures by pacroon · 2008-05-31 00:28 · Score: 2, Interesting

When looking at it on that massive scale, you really get the idea of just how fragile a hard drive really is. I wonder how much money the new generations of data storage is going to cost for large corporations like Google. And not to mention how existing corporations will handle it, once those devices goes from "super computers" to mainstream hardware.

--
It's all fun & games until someone loses the game.
1. Re:Hard drive failures by Vectronic · 2008-05-31 00:36 · Score: 1
  
  Probably close to the same, if anything probably cheaper.
  
  I would imagine that Google wouldnt adopt SSDs until they were financially viable, which probably wont be too long, they will be about the same price per GB as HDDs, and eventually cheaper, making for greater profit for aslong as HDD's are being sold (200 GB HDD costs $50, 6 months later 200GB HDD costs $10, etc)
  
  Then, if SSDs are more reliable and the same price, thats also less expense.
2. Re:Hard drive failures by DerekLyons · 2008-05-31 01:01 · Score: 1
  
  When looking at it on that massive scale, you really get the idea of just how fragile a hard drive really is.
  
  Less than you might think from the summary, reading further down the article you find "The company has a small number of server configurations, some with a lot of hard drives and some with few".
3. Re:Hard drive failures by Firehed · 2008-05-31 04:28 · Score: 1
  
  Also consider the power and heat output of SSDs as compared to spinning disks. SSDs tend to have lower power requirements (which adds up very quickly when you're dealing with tens of thousands of machines) and as such tend to put out less heat (meaning the HVACs won't get reamed as hard, and should therefore also use less power). Assuming the reliability is decent, the additional premium per drive will probably pay for itself rather quickly considering the drop in price of operating the system as a result. Let's not forget that SSDs should kick the shit out of rotating platters for database performance with their limited-by-c seek times, which in itself is probably enough reason for Google to make the switch.
  
  And they'll be able to tint their homepages a light green too.
  
  Of course you can be damn sure that Google's crunched the numbers and taken appropriate measures. If not, I'm always open to new career opportunities (contact info at my homepage) :)
  
  --
  How are sites slashdotted when nobody reads TFAs?
4. Re:Hard drive failures by Thing+1 · 2008-05-31 04:30 · Score: 1
  
  Then, if SSDs are more reliable and the same price, thats also less expense.
  
  Actually, I'd say at twice the price, SSDs would be less expensive over their lifetime. (I'm not sure where the break-even point is, but Seagate warrants for 5 years, and most flash media has a 50-year average write cycle, so 10x probably isn't far off? I'll stick with 2x for my argument though.)
  
  --
  I feel fantastic, and I'm still alive.
5. Re:Hard drive failures by jimicus · 2008-05-31 05:54 · Score: 1
  
  Indeed and if you have 200 thousand servers running, they must be employing at least a couple of dozen people to run around hot datacenters all day and replace hard drives. Neither the hard drives nor the people will be cheap. By all accounts, they don't bother with individual machine repairs. A dead rack might get repaired or replaced, but an individual node will simply be marked as dead and left there. The rack itself will get maintenance as and when it no longer has enough functioning servers to merit keeping it going.
Overheating and rewiring? by throatmonster · 2008-05-31 00:36 · Score: 4, Interesting

The hardware failures I can understand, but needing to rewire the data center after it's been wired once, and the fact that half of them overheat? Those sound like problems that should be addressed in the engineering and installation phases of the datacenter.

--
All pass beyond reach of medicine. None pass beyond the reach of love.
1. Re:Overheating and rewiring? by William+Robinson · 2008-05-31 00:48 · Score: 4, Funny
  
  The hardware failures I can understand, but needing to rewire the data center after it's been wired once, and the fact that half of them overheat? Those sound like problems that should be addressed in the engineering and installation phases of the datacenter.
  Each machine has smoke detector installed right on top of it. The Maintenance director is standing at the gate of data center with pistol in his both hands. As soon as alarm is heard, a batch of maintenance engineers rush towards the faulty machine with keyboard, harddisc, mouse, motherboard and other components. The faulty components of machine are replaced on the rhythm of drumbeats they have been rehearsed through 1000's of times. The crew has to rewire the machine, reboot, and be back at the gate with burnt machine in less than 5 minutes or they are shot dead.
  The trouble is, because of this time limit, the maintenance engineers simply pull machine out of rack without disconnecting any wires. And that's why rewiring is needed.
  
  --
  hilarious
2. Re:Overheating and rewiring? by mdenham · 2008-05-31 00:52 · Score: 2, Interesting
  
  Yeah, the overheating part could be solved by investing in more racks, and then putting half as many units on each rack.
  
  This also allows for future throughput improvements from a single unit, and probably would cost less than the two days' downtime every overheat (racks are relatively cheap, time isn't).
3. Re:Overheating and rewiring? by Vectronic · 2008-05-31 00:57 · Score: 1
  
  The over-heating deffinetly suprised me, especially %50... but they werent very explicit with what exactly "rewiring" entailed, perhaps its simply a preventative measure, instead of waiting for it to fail, just replace it all yearly, which i would imagine wouldnt be too costly when you are buying miles of wires wholesale, and it can all be done machine by machine, and only takes "2 days", or maybe it just means disconnecting and re-wiring, like stepping (disconnect, move from Connection A to B) the connections up to insert more servers.
  
  But over-heating seems to be a serious problem, and might be time to move the facility to a more accomodating place.
4. Re:Overheating and rewiring? by Anonymous Coward · 2008-05-31 01:23 · Score: 1, Interesting
  
  So when you increase the total number of racks you will increase the total number of floor space required (thus increasing rent of the property). This will then increase the volume of air within the room which would then need to have more fans to move more cool air (which will increase electrical costs). Hiring a few guys to run around and fix problems will definitely be cheaper then spreading out the machinery to allow better air flow.
5. Re:Overheating and rewiring? by ioshhdflwuegfh · 2008-05-31 01:47 · Score: 1
  
  Yeah, the overheating part could be solved by investing in more racks, and then putting half as many units on each rack. and to remedy the problem with extra space needed to store all those units taken out from the racks, I suggest that they should use twice as small units.
6. Re:Overheating and rewiring? by Anonymous Coward · 2008-05-31 02:03 · Score: 1, Informative
  
  Thats just plain bad engineering and technical capabilities. In fact the entire tone reeks of bad engineering (aside from hard disk failure). Broadcast engineering (television) trucks are wired at about twice the density of the typical data center and if they had these kinds of failure rates we would not have any live event television at all. The failures and attitude are shocking.
7. Re:Overheating and rewiring? by tenco · 2008-05-31 02:21 · Score: 1
  
  Wish i had modpoints. Thanks for saving my day :)
8. Re:Overheating and rewiring? by kitgerrits · 2008-05-31 02:27 · Score: 1
  
  Racks are cheap, but have you ever seen the bill for floor space?
  
  From what I read, Google uses simple desktop computers.
  These machines have been designed to sit idle 99.9% of the time and they have been designed with that in mind. If you ramp up the load on such a machine, things start to get real noisy real quick. If you keep them at such a high load for a long time, they simply break. (IBM Netvista comes to mind...)
  
  Trouble is, buying machines designed with such a load in mind costs twice as much and the failure rate of the PCs is reasonably below 50%.
  
  --
  "I was in love with a beautiful blonde once, dear. She drove me to drink. It's the one thing I am indebted to her for."
9. Re:Overheating and rewiring? by mdenham · 2008-05-31 02:48 · Score: 2, Informative
  
  Actually, no, you don't need to increase the number of fans, because the number of fans required is a function of the total amount of heat produced, not the air volume.
10. Re:Overheating and rewiring? by Waffle+Iron · 2008-05-31 03:55 · Score: 1
  
  The failures and attitude are shocking.
  Right. But when was the last time you were unable to pull up Google's search page? At the end of the day, that's all that matters.
  BTW, I'd bet good money that a "broadcast engineering truck" costs 25X what google pays per CPU cycle.
11. Re:Overheating and rewiring? by SuperQ · 2008-05-31 04:46 · Score: 2, Informative
  
  The problem comes from requirements changing. "Sorry, we designed this building for X load, now you're using X+10% load so we have to add additional cooling units to keep up"
  
  I had this problem at the University where I worked a while ago. We rolled in a nice new SGI Altix machine. We had enough power, but the cooling system couldn't move enough cubic feet of air into the one part of the room where the box was. As soon as you reach capacity, temps skyrocket.
12. Re:Overheating and rewiring? by Anonymous Coward · 2008-05-31 05:14 · Score: 0
  
  If you put fewer computer in each rack, the cost isn't just more racks like you say. It's a building twice as big, and a cooling unit somewhere around twice as powerful. So basically your suggestion is more cooling per computer. I imagine the economics are very different for cooling than "investing in more racks."
13. Re:Overheating and rewiring? by shmlco · 2008-05-31 06:22 · Score: 1
  
  Ditto. Sounds to me like the issue is using cheap-ass hardware...
  
  --
  Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
14. Re:Overheating and rewiring? by Anonymous Coward · 2008-05-31 13:22 · Score: 0
  
  Dude, they should have bought a Dell.
  
  NOT
15. Re:Overheating and rewiring? by inKubus · 2008-05-31 17:24 · Score: 1
  
  This is funny. But, there are rumors that they might be implementing robotic...overlords...to swap servers automatically.
  
  --
  Cool! Amazing Toys.
16. Re:Overheating and rewiring? by inKubus · 2008-05-31 17:31 · Score: 1
  
  Yeah, the gist of it is, if they were running on real hardware, not "creatively", they probably wouldn't be making money. They are really just following the rules of economics. Power is cheap. Crappy computers are cheap. How can we make this function() = Huge, fast site? And in anticipation of power being more expensive, they are moving their DCs to places with notoriously cheap power (The Dalles and Iowa). Next they will probably anticipate computers being more expensive and start using YOUR computer and power to do some of their work (google toolbar, etc..)
  
  I was originally thinking of a slavery analogy but it didn't quite work the way I wanted it to.
  
  --
  Cool! Amazing Toys.
17. Re:Overheating and rewiring? by ReiDragon · 2008-06-02 03:04 · Score: 1
  
  I for one wel....Wait, I don't to memes, nevermind.
  
  --
  PouchPC 2.13ghz C2D, 8gb ram, 9800 GT, 1.5tb, Vista Business.
18. Re:Overheating and rewiring? by Anonymous Coward · 2008-06-02 07:31 · Score: 0
  
  It's amazing how much more a bunch of slashdot nerds know about very large scale data center design and maintenance than Google does. If only Google could afford to hire you guys instead of the idiots who are currently working there, they might actually have a chance of becoming profitable someday.
19. Re:Overheating and rewiring? by hoggoth · 2008-06-03 00:17 · Score: 1
  
  Thanks! I just rolled on the floor and laughed, then drank milk and forced it out my nose onto my brand new keyboard. In my parent's basement. With a real-doll named covered in hot grits.
  
  --
  - For the complete works of Shakespeare: cat /dev/random (may take some time)
gfs! sounds cool by lemur3 · 2008-05-31 00:44 · Score: 0, Redundant

I want this google file system thing for myself, Imagine, if we could even figure out how to use it, how much fun it might be to use!

or maybe not
It's the same everywhere, regardless of scale by Enleth · 2008-05-31 00:51 · Score: 3, Interesting

I've been managing a dorm network consisting of two "servers" (routing, PPPoE, some services like network printing etc.), a single industrial rack-mounted swithch and dozens of consumer switches spread all over the building.

And they failed. And then they failed again. And again. Sometimes completely, but usually just a single port, or just "a bit" - it looked as if the switch was working, but every - or every n-th, or every bigger than x - packet got mangled, misdirected or whatever. Or sometimes packets appeared just out of the blue (probably some partial leftovers from the cache) and a few of them made enough sense to be received and reported. Sometimes a switch with no network cables attached to it started blinking its lights - sometimes on two ports, sometimes just on a single one.

Well, I could go on for hours, but you get the idea. What happens at Google happens everywhere, they just have some nice numbers.

Regardless, the article is quite entertaining to read for a networking geek ;)

--
This is Slashdot. Common sense is futile. You will be modded down.
1. Re:It's the same everywhere, regardless of scale by Bill,+Shooter+of+Bul · 2008-05-31 02:50 · Score: 2, Informative
  
  No it isn't. If a machine works flawlessly for ten years 90% of the time, and you only have one odds are everything will always work. If you have ten, odds are one will die with in that ten years. Things are different at large scale, and failure prediction is an important part of creating such a big cluster. But yes, even on a small scale you should always plan for failure.
  
  --
  Well.. maybe. Or Maybe not. But Definitely not sort of.
2. Re:It's the same everywhere, regardless of scale by Bender0x7D1 · 2008-05-31 03:39 · Score: 3, Funny
  
  Sounds like you have dust in your cables. I would recommend you clean the inside of your cables with compressed air so the bits don't get stuck on the lint and other stuff in there. The bits travel very fast, so even small dust particles can be a problem.
  
  --
  Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
3. Re:It's the same everywhere, regardless of scale by kipman725 · 2008-05-31 04:02 · Score: 1
  
  I have never seen a switch fail what are you doing to them? mine are just consumer 5-16port devices but they are in constant use quite often at maximum capacity for several hours at a time while large files are transfered over the network. I think I have had one crash needing a reeboot once and had to reset another after a momentry power loss another time.
4. Re:It's the same everywhere, regardless of scale by ocbwilg · 2008-05-31 04:49 · Score: 3, Informative
  
  I have never seen a switch fail what are you doing to them? mine are just consumer 5-16port devices
  
  And that's why. If you're using "smart hubs" or "dumb switches" (aka, your $99 Linksys switch), then you're probably not going to have issues. All it does is store MAC tables and forwards data to the appropriate ports. You probably also don't have multiple other network switches/hubs/routers hanging off of those devices somewhere downstream, and if you do then it's very likely that you know what and where they are and can plan for them.
  
  On the other hand, trying to manage an enterprise-class switch with advanced features can be a little more complicated, especially when you start allowing anybody to plug any other kind of network devices into the switch. You can easily end up with spanning tree loops, issues with frame sizing, cross-brand autonegotiation failures, and who knows what else. And that's before you even have to start worrying about bugs in various firmware revisions or some enterprising "hax0r d00dz" who passed Comp Sci 101 trying to do things that he shouldn't be doing, and spoofing addresses to try to cover his tracks.
5. Re:It's the same everywhere, regardless of scale by jimicus · 2008-05-31 05:59 · Score: 3, Informative
  
  I have never seen a switch fail what are you doing to them? mine are just consumer 5-16port devices but they are in constant use quite often at maximum capacity for several hours at a time while large files are transfered over the network. I think I have had one crash needing a reeboot once and had to reset another after a momentry power loss another time. Then at least one of the following is true:
  
  1. You've been fantastically lucky.
  2. You've not been in IT terribly long.
  3. Your job doesn't involve network management and so your experience of what switches can do when they have a mind to is limited.
  
  Solid-state simple dumb switches can and do fail, as can managed ones. If you're lucky, they fail in a fairly obvious fashion (eg. they just stop pushing packets on some or all ports).
  
  If you're unlucky, they start spewing corrupt frames everywhere confusing the hell out of everything else on the network and you have to figure out exactly which switch is doing this and get rid of it.
6. Re:It's the same everywhere, regardless of scale by Enleth · 2008-05-31 06:40 · Score: 1
  
  I'm not doing anything to them. Their working conditions are. There's some 15 different *brands* of switches there including a few Chinese "no-names", chained up to 5 in a row to reach far out in the building.
  
  Guess what? Every week some moron manages to make a loop, connect a switch to itself, connect two switches with a telephone cable or do any other unspeakably ass-brained thing that makes CSI investigations look like a piece of cake compared to finding out what's wrong with this network.
  
  And don't even get me started on the fact that every wing of the building is powered from a different phase...
  
  --
  This is Slashdot. Common sense is futile. You will be modded down.
7. Re:It's the same everywhere, regardless of scale by kmac06 · 2008-05-31 17:30 · Score: 1
  
  That's a problem I actually have with optical quantum bits :)
8. Re:It's the same everywhere, regardless of scale by inKubus · 2008-05-31 17:36 · Score: 1
  
  enterprise-class switch with advanced features
  
  Aka, a "Layer 3" switch.
  
  --
  Cool! Amazing Toys.
9. Re:It's the same everywhere, regardless of scale by kipman725 · 2008-06-01 07:58 · Score: 1
  
  good job ethernet uses isolation transformers then.
Software architecture, Not hardware by howardd21 · 2008-05-31 00:54 · Score: 3, Interesting

The fact that they attribute success to the software did not surprise me; the chunk and shard (not mentioned in the article) approach has been known for some time. But the fact that the GFS architecture works with BigTable and MapReduce was interesting, and that it handles many data/content types. What this creates is not only a scalable structure volume size, AND a sustainable business model. As new content types are added, regardless of size or type, they can generally be indexed appropriately. I am looking forward to searching more within types like video and audio, or even medical records like xRays or MRI results. The possibilities are staggering.

--
no comment
Comment removed by account_deleted · 2008-05-31 01:01 · Score: 1

Comment removed based on user account deletion
Hardware is cheap by Ritz_Just_Ritz · 2008-05-31 01:09 · Score: 3, Interesting

It's always going to be cheaper to use anthill labor on this type of problem. Even relatively powerful 1RU and .5RU servers are dirt cheap these days. Hell, I was able to buy a pile of .5RU machines for one of my projects this week. I can't believe how cheap things have gotten:

quad-core xeon @2.66ghz
4gb RAM
2 x 500gig barracudas (RAID1)
dual gigabit ether
CentOS 5.1
US$1100 per unit

They are all stashed behind a Foundry ServerIron to load balance the cluster. So far, it seems to scale VERY well and increasing capacity is as simple as tossing another US$1k server on the pile.

Cheers,
1. Re:Hardware is cheap by jacquesm · 2008-05-31 02:43 · Score: 1
  
  would you mind letting me in on where you bought this stuff ?
  
  --
  MP3 Search Engine
2. Re:Hardware is cheap by Anonymous Coward · 2008-05-31 03:32 · Score: 0
  
  From the guy with the red pickup in the company parking lot.
3. Re:Hardware is cheap by Gazzonyx · 2008-05-31 04:39 · Score: 1
  
  Just for a comparison, where I work we bought a SuperMicro with the same specs. about two years ago for about $3500. This was just as the new Xeons and ICHs were coming out, so that number is a bit higher than it would have been if we waited a couple of months.
  
  --
  If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
4. Re:Hardware is cheap by Anonymous Coward · 2008-05-31 11:28 · Score: 0
  
  http://www.8anet.com/
  
  Look at the .5RU machines. The model number is FA1426-BQ and you can configure them to your own spec.
Data by Anonymous Coward · 2008-05-31 01:36 · Score: 0

That's a LOT of porn!
How do they KNOW what to fix by gelfling · 2008-05-31 02:16 · Score: 1

Seems that the whole server/complex monitoring aspect was left off. With 100K servers per complex, how do they even know which ones are broken? How do they even find them on the floor and in the racks?
1. Re:How do they KNOW what to fix by Gazzonyx · 2008-05-31 04:34 · Score: 2, Informative
  
  Most rackmounts that I've seen have an 'identify' LED that you can have blink (I assume you can automate this with SNMP and management software).
  
  --
  If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
2. Re:How do they KNOW what to fix by Kymermosst · 2008-05-31 05:37 · Score: 1
  
  Well, there are two ways (that my employer uses - I'd guess google does the same):
  
  Most server systems worth their salt have fault indicators that turn on when there is a hardware failure or perhaps even a watchdog timeout. Probably they do periodic walk-throughs to look for fault lights.
  
  The second is proper asset management. A machine identified as broken has a record in an asset database that describes the location of the machine, the location being something like (data center, row, rack, RU up from the bottom).
  
  --
  "Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
3. Re:How do they KNOW what to fix by inKubus · 2008-05-31 17:43 · Score: 1
  
  Looking at the "snapshot", I see some twisted pair running out of the Mobos in their "rack". Perhaps you could wire up all the fault light outputs on a rack of mobos to a simple multiple-and logic circuit which has a big siren light connected to it with a relay. Then you know what Rack to go to, after that just find the light.
  
  --
  Cool! Amazing Toys.
4. Re:How do they KNOW what to fix by Anonymous Coward · 2008-06-01 12:21 · Score: 0
  
  The second is proper asset management. A machine identified as broken has a record in an asset database that describes the location of the machine, the location being something like (data center, row, rack, RU up from the bottom).
  
  I wrote a piece of software for my company which does almost exactly what you describe. You search for the machine by any combination of criteria: RU, Rack ID, IP, server type, application (function), manufacturer, etc. When you click on the search results, a details page displays all the information (which you are permitted to see based on network/login group designations), and the application draws a picture of the rack (using stylesheets) with all the units and where in the rack that particular unit lives. An overall rack row/number matches a floor map of the data center.
  
  It's a little clunky but not too bad for a homegrown application from an amateur programmer; it works and the company benefits from it. Stands to reason the largest data centers probably have very complex and accurate monitoring and reporting.
5. Re:How do they KNOW what to fix by Kymermosst · 2008-06-01 13:57 · Score: 1
  
  Nice. We use a commercial solution which doesn't have the drawings, but has all of the vitals for a system. In addition to basic asset management, it also includes physical connectivity descriptions that are very detailed allowing a sysadmin to see exactly how things are put together.
  
  I've never actually seen our main data center.
  
  --
  "Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
6. Re:How do they KNOW what to fix by ZerdZerd · 2008-06-05 13:01 · Score: 1
  
  They probably run some algorithm to find out in which order they need to fix problems to maximize profit.
  
  --
  I'm not insane! My mother had me tested.
Just goes to prove by MrKaos · 2008-05-31 02:19 · Score: 1

Open Source as a business model scales very well, especially if your technology is so tightly integrated with your business goals. No matter how small you start out your business is never going to be limited by an Open Source platform because by the time your big enough to have those problems no one, except yourself, is going to be able to help you anyway.

--
My ism, it's full of beliefs.
1. Re:Just goes to prove by MisterBlueSky · 2008-05-31 11:55 · Score: 1
  
  Which of the software tools Google uses are Open Source? They start with open source, but everything they add and change to make it scale, is kept inhouse. Google does several things to promote open source, but releasing their own software as open source is not one of them. I fail to see what the case of Google proves about open source.
2. Re:Just goes to prove by MrKaos · 2008-05-31 15:20 · Score: 1
  
  Which of the software tools Google uses are Open Source?
  After reading about google's infrastructure in the past it's not just a matter of tools but OSS Operating System collections, probably an internal google distribution, they might use Apache or even a kernel based httpd, and their software authoring tools are eclipse or ant as they seem to be a big fan of Java.
  They start with open source, but everything they add and change to make it scale, is kept inhouse.
  This is a comment about how Google uses Open Source Software, not a comment about Open Source Software. My point is that Open Source provides building blocks upon which you can build. There is nothing in the GPL that says you can't keep parts of the code you develop in house.
  Google does several things to promote open source, but releasing their own software as open source is not one of them.
  Agreed, see above.
  I fail to see what the case of Google proves about open source.
  That you can build a solid business infrastructure on it, you don't think they paid for several hundreds of thousands of Third party proprietary vendor licenses for their operating systems do you? And if you can make a case for infrastructure savings at the operating systems level then you can make the same case at other levels. I doubt those lessons were lost on googles accountants.
  
  --
  My ism, it's full of beliefs.
3. Re:Just goes to prove by MisterBlueSky · 2008-06-01 13:34 · Score: 1
  
  Yes, I stand corrected. I didn't understand what you were getting at in your 'Just proves' post, but I see what you ment now, and I agree with it.
We are ants. by brxndxn · 2008-05-31 03:28 · Score: 1

I look at their data farm - and its complexity.. and cannot help but wonder how Google has organized itself for their thousands of employees to properly maintain it. No one employee can know every piece of it - or is it simply so simple that every employee knows all of it?

Then.. we realize that our own lifespans and lives are as prone to failure as the servers in their datacenters. Our lifespans are short and everyone has problems.. So Google has mastered the ability to make us interchangeable.

WE ARE ANTS!

--
--- We need more Ron Paul!
1. Re:We are ants. by NevarMore · 2008-05-31 04:02 · Score: 1
  
  What makes Googles datacenters different is that they are ALL the same.
  
  It's not like typicall datacenter where cluster X is for ESX Server, Y is for the financial system, z is Win 2k3, and Q is AIX. Every unit in a Google rack is just another piece of typical hardware running the same OS, the same software, and configured the same way. I suspect there may be some sort of 'controller node' for some number of worker machines, but even then, each controller node is just like another controller node.
  
  Each machine won't be exactly the same, but hell its Google. It's not like the staff doesn't have access to a good "local" cache of information on their kit.
2. Re:We are ants. by darkpixel2k · 2008-05-31 17:34 · Score: 1
  
  I suspect there may be some sort of 'controller node' for some number of worker machines, but even then, each controller node is just like another controller node.
  
  ...and suddenly I'm wondering if slashdot shouldn't 'borgify' the Google logo instead of Bill...
  
  --
  There's no place like ::1 (I've completed my transition to IPv6)
nice and all by Anonymous Coward · 2008-05-31 03:47 · Score: 0

and yet many companies manage to have 100% uptime with a very small number of reliable machines. I guess this simply proves that there are many ways to skin a cat...
More ofuscated packages by undol · 2008-05-31 04:05 · Score: 1

Exists also another two software packages, first called Open Job Queue for handle a batch job and dispatch this into some server into some cluster. And another called chubby, it's like a center exclusion mutex distribution, and it's used for another packages like GFS, BigTable and MapReduce.
You can send a lot of information about chubby in google scholar, and i had hear about Open Job Queue in some conference in my university
Can you imagine... by chrysalis · 2008-05-31 04:07 · Score: 0, Redundant

A Beowulf cluster of Google's data centers ?

--
{{.sig}}
Jeff Dean is the smartest guy I've ever met by swb · 2008-05-31 04:33 · Score: 2, Interesting

He was my friend in high school and roommate in college for a year. Smartest guy I've ever met in my life, easily smarter than any other PhDs I've known, including people I know with Harvard post-med school doctorates.
1. Re:Jeff Dean is the smartest guy I've ever met by Anonymous Coward · 2008-05-31 12:06 · Score: 0
  
  Well, you know, not everyone can get into MIT.
2. Re:Jeff Dean is the smartest guy I've ever met by thczv · 2008-06-02 01:56 · Score: 2, Funny
  
  Jeff, is that you?
3. Re:Jeff Dean is the smartest guy I've ever met by Anonymous Coward · 2008-06-02 06:24 · Score: 0
  
  The rate at which Jeff Dean produces code jumped by a factor of 40 in late 2000 when he upgraded his keyboard to USB2.0
Epic cooling fail ? by billcopc · 2008-05-31 04:44 · Score: 1

I'm not trying to be a jerk (I just play one on the internet), but this worries me:

And there's about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover.

I'm not much of a cluster guy, but if the risk over overheating is so great, and the damage so vile, maybe they should beef up the A/C ? Just a thought.

--
-Billco, Fnarg.com
1. Re:Epic cooling fail ? by Wizard+Drongo · 2008-05-31 05:15 · Score: 1
  
  A/C is very expensive in terms of energy; I seem to recall that for every joule of heat energy you remove from an area you'll spend 10 doing it. Something like that, I always sucked at physics...
  
  As soon as you bump up the AC, you then need to re-design the cooling structures, re-design the airflow construct, the power feeds will need up, then added cooling for the a/c and power ducting itself....
  
  Better to just lose a cluster every 6 months, and know that it'll happen that often and plan for it, than to try and diminish it; pretty much the whole point here is to get to an acceptable fail rate and plan around it than try to reduce the fail rate itself.
  
  --
  The truth shall always be free: Boris Floricic is Tron.
2. Re:Epic cooling fail ? by Yetihehe · 2008-05-31 06:03 · Score: 1
  
  What? I have small peltier module, which has about 50% efficiency (it consumes 260W, and moves 130W of heat). I believe AC systems have higher efficiency...
  
  --
  Extreme Programming - Redundant Array of Inexpensive Developers
3. Re:Epic cooling fail ? by Anonymous Coward · 2008-05-31 07:21 · Score: 0
  
  I think the back-of-the-envelope number is 50% : every 100W a Datacenter computer generates, you'll spend 50W in A/C to get it out of the DC.
4. Re:Epic cooling fail ? by Yetihehe · 2008-05-31 07:53 · Score: 1
  
  So this is really a 200%, 1W of power moves 2W of heat.
  
  --
  Extreme Programming - Redundant Array of Inexpensive Developers
5. Re:Epic cooling fail ? by Anonymous Coward · 2008-05-31 08:38 · Score: 1, Informative
  
  No, the % measures the overhead of AC compared to your heat-producing costs. There is no theoretical minimum to the AC needed, so you can't report an "efficiency" number like you're trying to do.
6. Re:Epic cooling fail ? by SuperQ · 2008-05-31 11:02 · Score: 1
  
  That depends on a lot of factors. Moving heat efficiently depends on differences and densities.
  
  Good cooling designs also use evaporation to do a bulk of the work. On a large 20 story building at the University they had 4 cooling towers that mostly just pushed air through waterfalls that went over the heat exchanger coils.
7. Re:Epic cooling fail ? by Wizard+Drongo · 2008-05-31 13:54 · Score: 1
  
  True; what I was really getting at was that in a datacentre, you can;t just "ramp up the aircon" easily. it would often require large-scale building renovations, if it's even possible. Most consumers houses with air-con, the air-con is done as an afterthought, or is easily modifiable. In a datacentre, the air-con is as fundamental to the design of the building as the power conduits, foundations and water-lines. Once it's at 100%, ramping it up further is harder than just planning for the failiures you know will happen.
  
  --
  The truth shall always be free: Boris Floricic is Tron.
a case for mainframes by EllynGeek · 2008-05-31 05:28 · Score: 1

This whole article is a persuasive argument for mainframes. Continually servicing cheap hardware that fails is labor-expensive, and replacing all those failed components costs real money. They're only cheap the first time you buy them. I don't see where they're saving money, though the guy does sound like he's having a ball directing armies of lackeys all through these vast, twisty datacenters.

--
we will end no whine before its time
1. Re:a case for mainframes by Anonymous Coward · 2008-05-31 17:51 · Score: 0
  
  They are buying the mobos straight off the ship from China, they probably pay Wholesale Wholesale for them. And they probably can trade the broken mobos back to their guy for a discount on the next purchase. Pft, plus Sir Ghey probably utilizes his Russian mafia connections to drive the price down even more.
What? No complaints... by glitch23 · 2008-05-31 09:41 · Score: 1

that Google isn't being green enough? Think how much power they are pulling off the grid to run their thousands of servers especially when those servers are doing something useful. Not to mention the power required to cool these servers. I bet when they turn on new data centers the lights in the nearby cities lose a few lumens.

--
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
1. Re:What? No complaints... by MentlFlos · 2008-05-31 15:16 · Score: 1
  
  How much power do I save because I can find data faster with google? How much gas do I save by printing out directions from google maps? I would guess that the amount they use and the amount users save are surprisingly close. You should be more up-in-arms about the power consumed by "off" devices. I have a cluster at work and I turned one of the racks off. It was consuming 7 amps @ 220v across 44 machines when OFF. On and half loaded weighs in around 28 amps for that rack. boggles the mind..
Functional programming by daemonburrito · 2008-05-31 15:57 · Score: 1

Lots of talk about DC hardware and networking here, but the part about parallelization was the really fascinating part to me. The software is the really cool bit here.

There is a meme, here and some other places, that multi-core/multi-unit processors are some great swindle by chip manufacturers. I've come to the conclusion that this is not the case; rather, we're all just dinosaur programmers stuck in the procedural/oop paradigm.

Python in its functional paradigm and MapReduce are amazing safe and efficient ways to solve Google's class of problems with parallel hardware.

What I'm saying here is that every self-respecting geek should learn a functional language.
Wikipedia - MapReduce Functional Programming
Also, Google's engEdu videos are freaking AWESOME.
1. Re:Functional programming by dfj225 · 2008-05-31 17:04 · Score: 1
  
  Yes, I agree that the software is more interesting than the hardware. After all, it is their software that makes their data center organization possible.
  
  Concepts from functional programming have really helped out Google, but at the same time they introduce limitations (at least when considering the MapReduce/GFS framework).
  
  I think the larger problem with parallel programming for multiple processors/cores really comes with finding a conceptual model for expressing the computation. Functional programming (in the sense of MapReduce) is one way, but can't be used to parallelize every problem.
  
  Google is able to scale their processing because they have rather singular needs. Most of their computations likely fall into the "embarrassingly parallel" category. This even shows in their file system, GFS, which is optimized for the computations done at Google and not necessarily the general case.
  
  I'm hoping that the work that Google and others have done in this area will result in more parallel frameworks. Perhaps my favorite thing about the way that Google operates is that they use knowledge of computer science and software engineering to build their own, world class, solutions instead of buying a lot of off the shelf systems.
  
  And yes, I agree that Google's engEdu videos are great. They should probably have more exposure than they currently do.
  
  Another good resource for those interested in MapReduce is the open source implementation of the same concept:
  Hadoop http://hadoop.apache.org/
  
  --
  SIGFAULT
2. Re:Functional programming by daemonburrito · 2008-05-31 18:28 · Score: 1
  
  Super interesting reply. Interesting observation about Google's M.O. of growing their CS and software engineering.
  
  Thanks for tip about hadoop... I think I may use it as an excuse to buy some cloud time. Also, slightly unrelated (and apologies if you are involved in the project and this is not news): according to wikipedia, the project's largest sponsor is Yahoo and they have hired Doug Cutting who started it... Which works nicely with my pet conspiracy theory that the Microsoft takeover bid had more to do with software engineering than anything else.
  
  And about engEdu deserving more exposure... I could be wrong, but it seems like the old video.google.com domain has been repurposed to be a flash video repository with a higher s/n (noise being dogs on skateboards and signal being berkeley computer science lectures).
3. Re:Functional programming by dfj225 · 2008-05-31 19:09 · Score: 1
  
  Thanks, I'm glad you found it interesting.
  
  The bit about Hadoop being supported mostly by Yahoo is news to me. I hadn't bothered to look into their funding.
  
  Hearing this has me wondering what sort of organization and software structure Yahoo uses internally. They probably manage just as much, if not more, information as Google. However, I have a feeling that their software is more of a hodge-podge than built on top of a few parallel frameworks like with Google.
  
  It does seem that video.google.com has become an higher s/n alternative to youtube, but the only way I know of to reliably find the engEdu videos is to simply search for that term. Unless I'm missing something, I think it'd be a great idea if they had a constant presence on Google video's front page.
  
  I just found this video about evaluating MapReduce on multicore/processor systems. Despite being over a year old, it was at the top of my search results. But it's good timing because this looks very relevant to the discussion here.
  
  http://video.google.com/videoplay?docid=5795534100478091031
  
  --
  SIGFAULT