How Many Google Machines, Really?

← Back to Stories (view on slashdot.org)

How Many Google Machines, Really?

Posted by timothy on Sunday May 2, 2004 @05:02AM from the measured-by-weight-not-by-volume dept.

BoneThugND writes "I found this article on TNL.NET. It takes information from the S-1 Filing to reverse engineer how many machines Google has (hint: a lot more than 10,000). 'According to calculations by the IEE, in a paper about the Google cluster, a rack with 88 dual-CPU machines used to cost about $278,000. If you divide the $250 million figure from the S-1 filing by $278,000, you end up with a bit over 899 racks. Assuming that each rack holds 88 machines, you end up with 79,000 machines.'" An anonymous source claims over 100,000.

28 of 476 comments (clear)

Min score:

Reason:

Sort:

What is that as a percentage ... by Alain+Williams · 2004-05-02 05:04 · Score: 3, Interesting

* of servers in the world
* of servers in the USA
* of servers running Linux
IPO changes things by Have+Blue · 2004-05-02 05:05 · Score: 5, Interesting

There was an article recently about how Google constantly understates various statistics about itself to mislead potential competitors. This article also said that the SEC would not allow them to do this once they became a publically traded company.
Assumptions? by waytoomuchcoffee · 2004-05-02 05:07 · Score: 4, Interesting

According to calculations by the IEE, in a paper about the Google cluster, a rack with 88 dual-CPU machines used to cost about $278,000

Um, don't you think if you were buying 899 racks you might actually, you know, negotiate for a better price?

This isn't the only assumption in your analysis, and the problems with them will be compounded. What's the point of this, really?
Cheap hardware by Anonymous Coward · 2004-05-02 05:07 · Score: 1, Interesting

I was always under the impression that Google used a lot of "cheap" hardware. Meaning, they only used IDE and non-rackmount machines.

So, they probably don't used "racks" but if they were, that means they could only get about 12-15 desktop machines (single proc) per rack. That's a whole lot less than 42 - 1U rackmounts to fill the rack.
Re:$278k ?? by Anonymous Coward · 2004-05-02 05:08 · Score: 1, Interesting

You might be able to get machines slightly cheaper than retail if you, say, buy 79,000 of them.
This is actually useful by 2MuchC0ffeeMan · 2004-05-02 05:11 · Score: 2, Interesting

because with ~80,000 machines, they can easily put a few hard drives in each, and give everyone 1gb of gmail space... I didn't think it was possible.

where do you go to buy 80,000 hard drives?

--
Runnin' On Empty .... I'm Still Alive
88 machines per rack? hardly. by cyclop5 · 2004-05-02 05:12 · Score: 3, Interesting

In your standard 42U cabinet, you're talking a half-U per server. Umm.. not happening. Let's just say I happen to know they use 2U servers, for a total of 21 per cabinet. Custom jobs - just the "floor pan" (i.e. no sides, or top for the case), system board, power supply, and I think a single (or possibly dual) hard drive (I didn't want to be too nosy staring into someone else's colo space). Oh, and network. And rumor has it, they're putting in close to 200 cabinets in just this location alone.
1. Re:88 machines per rack? hardly. by cyclop5 · 2004-05-02 05:30 · Score: 3, Interesting
  
  From the cabinets I saw, it was definitely 2U vertical space. It was one of those things that surprised me a little - I would have assumed they'd use blade servers, or at least 1U boxes just to get the rack density. So when I had the opportunity to "sneak a peek", I tried to notice as much as I could, without poking and prodding. Unfortunately, there wasn't much to notice, other than what I mentioned previously. That, and they were all pre-installed in the cabinets before shipping out to the colo. (There were 30 or 40 cabinets in the shipping/receiving area of the colo).
Google hosting by titaniam · 2004-05-02 05:15 · Score: 4, Interesting

I wonder if google will start up a web-hosting business? I bet you can't beat their uptime guarantees. They could provide sql, cgi, etc, and build in multi-machine redundancy for your data just like they do for theirs. It'll be the google server platform, just one more step to replacing Microsoft as the evil monopoly.
1. Re:Google hosting by Angostura · 2004-05-02 06:54 · Score: 4, Interesting
  
  Actually, I would be more worried if I was Akamai. If Google went after the corporate market and offered some kind of grid-esque caching-and-execution environment, that would be something to look at. However it would need some rather nifty scheduling an admin tools, and would add a lot complexity, so I don't think that's too likely.
2. Re:Google hosting by Ian+Bicking · 2004-05-02 09:04 · Score: 3, Interesting
  
  There's an interesting article comparing Google and Akamai which talks about that as well, since they have technical similarities, but are strategically very different -- Akamai does massive web hosting, while Google does massive web applications.
Re:What a waste by phoxix · 2004-05-02 05:18 · Score: 5, Interesting

If you've ever read a white paper of Google's, you'd realize that they even tell people why they deal with massive clusters over mainframes: lower latency.

Sunny Dubey
15 Megawatts by SuperBanana · 2004-05-02 05:21 · Score: 4, Interesting

...assuming 200W per server, which is probably low, but probably compensates for 79,000 being most likely an overestimate. However, that doesn't even begin to account for the energy used to keep the stuff cool.
Anyone know how many trees per second that would be? Conversion to clubbed-baby-seals-per-sec optional.

--
Please help metamoderate.
Re:Which brings up an interesting question... by gregwbrooks · 2004-05-02 05:25 · Score: 3, Interesting

Not a thing, in terms of the number of their servers, or internal data such as line-item hardware purchases.
This is how it should be, since knowing the size of Google's hardware capacity is a very, very strategic bit of information, and the kind of thing that would allow Yahoo/MSN/whoever to get a feel for how much capital would be necessary to duplicate or improve upon it.

--

"It was a summer's tale: Just a boy, his Linux, and a head full of dreams..."
inside information by sir_cello · 2004-05-02 05:32 · Score: 4, Interesting

Interesting People 2004/05:
I know for a FACT they passed 100,000 last November. One thing the Louis calculation may have missed is Google's obsession with low cost. For example read the company's technical white paper on the Google file system. It was designed so that Google could purchase the cheapest disks possible, expecting them to have a high failure rate. What happens when you factor cost obsession into his equation?
You're not factoring in Google's culture by gregwbrooks · 2004-05-02 05:33 · Score: 3, Interesting
Google is all about two things from an operational standpoint:
- Keep costs down; and
- What happens inside the company, stays inside the company.
Figuring out the number of servers they have is why we're noodling over the second point, but the first point is what probably as us all thrown off. Someone in a position to know said recently that he could state as a an absolute fact they have more than 100,000 servers -- and added that merely mentioning it probably violated multiple NDAs he had.
--

"It was a summer's tale: Just a boy, his Linux, and a head full of dreams..."
Re:$278k ?? by Gilk180 · 2004-05-02 05:43 · Score: 5, Interesting

I really doubt they are spending anywhere near this for the machines themselves. A former student a google employee made one of those recruiting/marketing visits to my university last semester. I got to speek to him at length about Google's operation. According to him (and he had pictures to back this up). All of their boxen are a motherboard, an ide drive and a processor sitting on a shelf in the rack. No cases, no fans, no cd, etc. Plus they buy in bulk and get good prices.
Scary... DDOS? by moosesocks · 2004-05-02 06:03 · Score: 2, Interesting

Isn't it scary that according to these figures, Google's datacenter should theoretically be able to DDOS the entire Internet?

Someone mentioned that they have enough bandwidth/processing power to saturate a T1000 line. Scary...

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:Environmental impact: power to 68,000 homes by A.T.+Hun · 2004-05-02 06:04 · Score: 3, Interesting

Yes, it is true: every time you hit Google, you are polluting the Earth.

Whereas Slashdot uses nothing but solar power.
But his low end number are Wrong... by quasi-normal · 2004-05-02 06:24 · Score: 3, Interesting

He displayed a little numerical dyslexia... it's 359 racks, not 539 for $100 Mil. which makes the stats a little different: 31592 machines 63184 CPU's 63184 GB RAM 2527.36 TB of Disk space and I'm not sure what his logic is behind the Teraflops calculations... looks like he's taking 1Ghz==1TFlop which would give about 126.4 TFlops. Aside from that error, the figures sound pretty realistic to me. But I wanna know how much bandwidth they use.
Re:Nobody has 88 systems in a rack by Grimster · 2004-05-02 06:26 · Score: 4, Interesting

I was in Exodus - Toyama facility in Sunnyvale, CA back in 2001 and was talking to some of the data center techs, they were bitching because Google DOES stack 44 -half depth- servers in a rack, on EACH SIDE (aka 88 servers per rack indeed) and how the heat that produces is absolutely fucking insane and how he can't believe they don't meltdown. He was comlaining how frugal google was not giving the systems more room to breath.

--
--- www.f-theocean.com
Re:lego? by james+b · 2004-05-02 06:33 · Score: 2, Interesting

I think the parent is probably referring to some of the pictures on google's early hardware photos page, courtesy of the wayback machine. If so, the lego never necessarily went into `production', it was just when they were messing around.
Web in memory by Sajma · 2004-05-02 07:02 · Score: 1, Interesting

Given how fast Google is, we expect that they keep all the text of all the web pages that they index in memory. If we estimate 100K machines and 4,285,199,774 web pages, that's 42,852 web pages per machine. Let's guess 1 GB RAM per machine, then that's an allocation of about 25 KB per page (quite a bit larger than the average page size, I suspect). Of course, they've probably replicated the web a few times; let's guess 3 times, so that's about 8 KB per page -- still room to spare, and it's possible that the average memory per machine is greater than 1 GB. Plus, they could compress less popular pages -- the delay of decompression in memory is probably small.

Of course, once you consider that they keep thumbnails of al the images they index, things get tight very quickly. Plus, we can't forget the actual INDEX from words to documents -- that's in memory, too. And Orkut (which is probably pretty small, come to think of it).

GMail is another story altogether. 1 GB per user for 100K users would saturate their cluster. Plus indexes for searching mail. It seems unlikely that we'll have all-memory mail accounts anytime soon.
I think they include infrastructure & air cool by melted · 2004-05-02 08:31 · Score: 2, Interesting

I think they include infrastructure and air cooling into their $250M figure. I these things can actually cost MORE than the racks themselves, especially if these racks consist of commodity hardware, and considering the size of their data center.
Re:$278k ?? by jburroug · 2004-05-02 09:58 · Score: 4, Interesting

Your hospital can't just lose a few CAT scans and think oh well, he'll be in for another scan eventually.

You've never worked in a medical field have you? You'd think that that would be a big deal and in theory data integrity is a very high priority but in reality...

I used to work as the IT Manager for a diagnostic imaging and cancer treatment center (and still do contract work with them because my replacement is kind of a noob) While loosing studies isn't exactly a "no big deal" situation it's still far more common than patients will ever realize. The server that stores and processes all of the digital images from the scanning equipment is a single CPU home rolled P4 using some shitty onboard IDE raid controller (doesn't even do RAID5!) running Windows 2K. The most money I could get for setting up a backup solution was the $200 an external firewire drive cost. Somehow we never managed to loose a study once it reached my network in the 9 months I worked there but I know three or four were deleted from the cameras themselves before being sent properly so whoops it's gone, gotta reschedule (and bill their insurance or Medicare again!) Two weeks ago one of the drives in that 0+1 array failed and despite my pleadings they still haven't ordered a replacement yet...

Now it's tempting to think that this place is just a special case of cheapness and sloppiness but from talking to the diagnostic techs (the people that operate the cameras) that's not so. That clinic is a little worse than average in terms of loosing patient information but by no means the worst some of them at seen/heard of/worked at in their careers. It's worse in general at small facilities but even large hospitals often suffer from the same unprofessionalism.

Your bank and the phone company keep much better track of your calls or your ATM transactions than most hospitals do with your CT or MRI scans...

--
"Listen: We are here on Earth to fart around. Don't let anybody tell you any different!" - Kurt Vonnegut
Re:hardcore by njcoder · 2004-05-02 13:49 · Score: 2, Interesting

"42"
Actually, that's pretty close to the number of copies of Red Hat Google actually paid for in 200.
The price was right; Google doesn't pay any significant amount of money to Red Hat. Google downloads the software for free and gets support in-house and from the Linux community. Google actually paid for only about 50 copies of Red Hat, and those purchases were more of a goodwill gesture. "I feel like I should be nice, so when I go to Fry's I pick up a copy," Brin said.
From here

--
Open Source Java DAO Generator
Redundancy by crucini · 2004-05-02 14:41 · Score: 3, Interesting
The google file system is redundant. Loss of one node does not lose data.

Some of the reasons these techniques aren't used in enterprise computing:
1. They're hard, and business programmers are not that bright. And nobody has encapsulated these technologies in an IT product.
2. The system can only respond quickly to a finite set of transactions that was known at design time. It lacks the flexibility of a standard file system or relational database.
3. By the time a business has a lot of data, it usually has enough money to store the data conventionally. Search engines are a bit different.
Since I've seen it up close a few times, I can say that the standard "enterprise way" (Oracle/Sun/EMC) delivers very poor bang for the buck. If Google wanted to, they could deliver a modified GFS with any desired level of reliability by increasing the redundancy. And even after that bloating, it would still deliver greater bang for the buck than the conventional solutions.
1. Re:Redundancy by sql*kitten · 2004-05-02 18:59 · Score: 2, Interesting
  
  They're hard, and business programmers are not that bright. And nobody has encapsulated these technologies in an IT product.
  
  Hmm, yes. The really bright programmers are living in their parents' basement and working for IBM for free. The dumb ones are getting paid a pile of money to code up forms and reports in fancy code-generation tools, then clocking off at 5 and enjoying themselves.
  
  The system can only respond quickly to a finite set of transactions that was known at design time.
  
  Those dumb business programmers left that paradigm behind in the 80s. The tech to do it (the relational database) was developed in the 70s.
  
  Since I've seen it up close a few times, I can say that the standard "enterprise way" (Oracle/Sun/EMC) delivers very poor bang for the buck.
  
  You've "seen it up", I've "set it up", kid. Once you've been around the block a few times, you'll drop your tech-snobbery and just choose the right tool for the job.