Google's 4000 Node Linux Cluster

Clues you can lose by grinder · 2000-05-30 21:06 · Score: 3

But it is a cluster of 4000 PCs which means if one goes down the whole system keeps working. If you have one big Sun and it goes down you have no redundancy and no backup. Reliability and up time for websites is make or break.

Did you say that with a straight face?

Assuming you depreciate a machine over three years (and that's really stretching things in the Real World), you're replacing a machine every just over every six and a half hours. Plus all the effort gets skewed down the the end of the three years. It would almost be economical to throw the door-key away and start afresh.

When you buy a Sun the damned thing just doesn't fall down unless you have a system mangler who keeps dicking around with it. And if a single Sun could not address the problem, then maybe it's time to buy some real iron, like a maxed out S/390. When you have a terabyte of data to process, you have to start paying a little more attention to things like I/O.

4000 PCs cannot be a viable economic replacement. That amount of hardware would require as highly a specialised environment as that of a mainframe (cooling and electricity), and certainly much more real estate. And they have really shitty I/O. If Google has money and space to piss away, well good for them, but it's hardly a wise business practice that anyone over 30 would recommend.

If you want to play with Linux, by all means invent some statistics that show that your MIPS/$ is better than the competition. Statistics can say anything you want them to. I, however, would like to know how they derived such figures. Ignorant readers of the article might otherwise be mislead into pursuing foolish choices in computing platforms.

Oh, and BTW, your regex is suboptimal, the split is entirely redundant and you shouldn't use double-quoted strings in Perl if you're not interpolating anything.

Re:Clues you can lose by stevelinton · 2000-05-30 22:44 · Score: 3

I think the situation at Googol is quite special. Although they have a TB of data, it is very slow changing (once per months, so about 300KB/sec) and what they have to do it is very (integer) CPU intensive. They remark that it distributes really well, so presumably network latency between the PCs isn't a problem, and locality of access to the data is good. Given that (see SpecCPU2000 for instance) Intel processors on cheap motherboards really is a big win for performance/purchase price.

This leaves the management questions. Presumably most of these PCs are configured exactly identically, apart from the ethernet card numbers, and the work is controlled by some central servers (for which big Suns might well be appropriate). So, if I was setting this up, how would I handle hardware failures:

1. a PC blows up
2. the central server notices some timeout on a
parcel of work or a heart-beat and takes that node out of the active list.
3. the central server (or another one specialized for the job) makes a more intensive effort to sort out the problem. If it can get in, it can probably trigger a reboot, or even a re-install, remotely.
4. If it can't get in at all then human assistance is needed. Add a task "reset node 1234" to the next hourly jobs printout for the operator
5. On the next pass through that part of the warehouse, the operator hits reset. The node tries to reboot, goes through health tests, possibly does an auto reinstall.
6. If no life then add it to the daily list for the operator with the electric handcart to pull and replace, send it in the daily shipment to the supplier.

I don't know for sure that this is how they do it, but it's how I would do it. Failure is a nuisance when it happens every few weeks. If it happens every few hours, then you can make it routine and pain-free. In a cluster of 4000 identical machines, hardware failures are part of life.

You mention other things: power -- a bare PC processor mobo and hard drive draws about 90W. So the whole cluster is about 360KW. This is a lot of power to get in, and heat to get out, but well within the normal range of, for instance, small factories, and the people who supply kit for that should be able to cope easily. PCs will work OK in any heat and humidity that people will, so ordinary office-grade air-conditioning will be fine.

So, in their very unusual circumstances, this probably is the right call for Google. They can routinize hardware failures to the point where they just cause a statistically predictable amount of work that must be budgetted for. The central servers that control all this, store the TB database, etc. are another story. There, the more conventional rules apply, and I would bet that those are normal server hardware -- Sun, IBM or high-end Intel servers.
Re:Clues you can lose by Animats · 2000-05-31 00:53 · Score: 3

Assuming you depreciate a machine over three years (and that's really stretching things in the Real World), you're replacing a machine every just over every six and a half hours. Plus all the effort gets skewed down the the end of the three years. It would almost be economical to throw the door-key away and start afresh.
I heard the CTO of Inktomi talk on this issue. Their basic approach to cluster buying is to buy midrange PCs in units of 100. Each cluster then consists of 100 identical PCs. Clusters are replaced as a unit, never upgraded. A site may have multiple clusters of different hardware. Every few months, they do evaluations to pick the machine with the best price/performance, which is usually a machine in the middle of the pack, not a top-end machine.

Off the shelf server farms by ChrisRijk · 2000-05-30 20:16 · Score: 3

In this story at EETimes, a guy from Sun talks about the pre-confiured "server farm" solutions Sun announced yesterday.

An interesting quote is this:

While it's debatable whether buying a preconfigured compute farm is cheaper than stringing together a few PCs and running Linux, Tallman said the latter scenario "would work well in university and government research centers where there is a lot of free labor, but not in a company that needs to get products out the door and can't spend time developing core competencies in compute farms."

Re:Why x86 Linux? by Sun+Tzu · 2000-05-30 20:16 · Score: 3

The Sun solution would be much more expensive because it wouldn't be only one Sun. It would require many, many, Sun 6500's or 10000's. Since their application distributes quite nicely, the price/performance of Intel boxes running Linux would be very hard to beat.

Try substituting Sun 6500's with 20 CPU's for each set of 20 Intel boxes and see what that does to the pricing. ;) (In practice, the ratio would probably be closer to 12-15 Intel boxes per Sun 6500, I would guess, as a PIII doing it kind of integer work would likely outperform a SPARC II)

--
Geeky modern art T-shirts

Very Smart by xinu · 2000-05-30 20:13 · Score: 3

I'll tell yah I'm not a fan of the PC at all being a Solaris Admin. The hardware in general sucks and is unreliable.

But in this case I think Google is on the right track. MIPS/$ ratio is definately in the favor of the PC. And with sooo many PC's if one goes down it really wouldn't make a huge difference. If it were just a 2 or 4 node cluster then I would lean towards a RISC based architechture for reliability. But in this case the cost is just to staggering to imagine a Sun cluster for this.

Koodoos to Google, my new search engine of choice! Long live Linux!

Re:Good comparison by LMacG · 2000-05-30 20:39 · Score: 3

Google does offer phrase searches, and a few other advanced features. Just click on the Search Tips link from the main page. I'm not sure I'd classify their implementation as "intuitive," but it's no worse than learning, say, REXX. You are correct though, in that full Boolean searching is not available -- as stated on the Tips page, Google does not support the logical or operator at all.

--
Slightly disreputable, albeit gregarious

Re:Very Smart *NOT* by SuiteSisterMary · 2000-05-30 21:13 · Score: 3

What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs

Not with Linux. For patching and what not, one can easily create a single script that will do it all. Or, even better, and assuming it's a closed network, make an NFS share. On each machine, put a cron job that takes anything in that directory (RPMs generally) and applies it. You're probably on identical hardware and software, so that sort of thing works. Hell, write a daemon that monitors a port, and then start broadcasting commands, and they'll all pick up on it. Lots of ways.

--
Vintage computer games and RPG books available. Email me if you're interested.

Good comparison by JamesSharman · 2000-05-30 19:57 · Score: 4

It's nice to see some good Linux publicity happening, Google is fast becoming the most respected search engine around, their clean and uncluttered interface is drawing people away from the more traditional search engines where it seems you have to download more portal c$&p every day. It seems poetic the google is becoming an ambassador for linux by showing up their bloat laden competitors in the search engine market, while linux does the same in the OS market.

Re:Good comparison by Gurlia · 2000-05-30 20:12 · Score: 4

Yeah, all the other popular search engines nowadays seem to be ridden with banner ads, promotions, and all kinds of useless fluff on their pages. Google is nice and simple, doesn't clutter the screen, and in general makes everything easier on the eyes. I think this is part of the attractiveness of Google -- you're not flooded with irrelevant info and pictures, but just the stuff you're looking for.

One thing I have against Google though -- I wish they had an advanced search where you can specify to search for exact phrases, etc., or perhaps even a full boolean search. I don't know how Google works, so I can't tell if these features are left out because of design issues. But, being the "hacker's search engine" and everything, it really should support more advanced searches. If they can find a way to implement this well, it may even become a deciding factor against other search engines. (I hardly know any search engine out there that can handle full boolean search, and certainly Google's speed will be a great advantage.)

---

--
mikre he sophia he tou Mikrosophou.

Probable source of their inspiration... by Carnage4Life · 2000-05-30 21:43 · Score: 4

I can just see it now. A manager at Google walking over to a developer's PC and seeing this sticker and saying,"Why not?"

Now all that's needed is for thinkgeek to claim responsibility for this action. :)

hey by jbarnett · 2000-05-30 20:03 · Score: 4

So this "super computer" will be used for Total World Domination? Oh, can we use it atleast to take over some small thrid world countries? I promise to have it back by six tonight.

The Google crew must have some killer Seti@home stats.

I would like to put one of these in my basement and finally disprove the "7 steps to Kevin Bacon" theory everyone seems to buy into.

--

"`Ford, you're turning into a penguin. Stop it.'" -THHGTTG

google uses RAIP technology by aozilla · 2000-05-30 20:16 · Score: 4

redundant array of inexpensive processors

--
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?

Re:Very Smart *NOT* by cybrthng · 2000-05-30 20:54 · Score: 5

Well, as you are all well aware of, dot.com's are going through money like nothing. Sure it is *great* publicity to have 4,000 servers witn another 2,000 coming online.

But damn, that takes a staff of 200 people to manage the security/connectivity/accounts/space and other duties just for the cluster.

The Power bill has to be outrageous!

The Cabling/switching/routing mess has to be totally unmanageable

What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs

I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper. Sure your upfront bill may be more, but only have to worry about 8-16 power connections (redudancy) is alot easier then 6,000 power cords/strips/racks/floor space/cooling/maintenance.

Sure it is one hell of a beast to be proud of, but one hellova costly beast to work with.

Just my 2 cents

Google is driven by python not by perl by segmond · 2000-05-30 20:10 · Score: 5

just my own 10cents, The google guys use python over perl, hrmmm, i wonder why. :D by the way their paper is a good read. http://www7.scu.edu.au/programme/fullpapers/1921/c om1921.htm

--
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind

Re:Very Smart *NOT* by heimdall · 2000-05-30 21:30 · Score: 5

I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper.

Last I checked (this was about a year or so ago) a fully loaded (64/64) E10K ran around $12M and the base (2psr) system was running around $800,000. Even if that's off by a factor of 3 or 4, you're still talking $3-$4M a piece... at three of them, you're looking at between $12-$48M. On the other hand, the typical white box PC will run between $800-$1500. That amounts to $3.68M-$6.9M for 4600 nodes. This doesn't include the network infrastructure or administration costs, however, as someone who has administered large clusters (largest was an 80 node SP/2), it actually becomes easier to administer that many nodes in a cluster than it would that many servers. Keep in mind that there most certainly are groupings of nodes where they are kept identical except for IP.

Another significant expense is that hardware support costs associated with such systems. If you have 4600 nodes, it's trivial to simply keep (MANY) spare systems floating around. Also, you can disable a node with negligible impact. Even if you're subdomaining an E10K, there are (a small few) single points of failure on the platform (regardless of what Suns documentation says). If you're not subdomaining it, you're simply talking a 32way SMP box (might as well just use a 6500 for that configuration). If you were to lose the backplane for whatever reason, you've lost a singificant portion of your compute resources.

Slashdot Mirror

Google's 4000 Node Linux Cluster

16 of 158 comments (clear)