Google's 4000 Node Linux Cluster
Check out the Red Hat press release running at LWN, or the news article at techweb about Google's 4000 Node Linux Box. Both articles are basically Red Hat commercials, but there's some interesting bits like the fact that they have a terebyte index of 300 million Web pages, and that they might expand their cluster to 6000 nodes in the future.
You're a little late on number 6. 1+1 has been shown to be equal to 2, but the proof took 211 pages in Principia Mathematica . You can also find the proof for 2+2=4 here. 2+2=4 is obviously related to 1+1=2, with a few extra steps.
...phil
...phil
"For a list of the ways which technology has failed to improve our quality of life, press 3."
--
I doubt if they would "patch it back." In case you haven't noticed, most companies haven't given a rat's ass about impressing the OSS crowd until very recently. Something tells me they really are running FreeBSD.
You could still be right though... this only means they are running their frontend web servers on BSD. As to what powers their database is anyone's guess, since you'll never see that server from the outside world if they have half a brain. So, they could be running NT for their backend. This would be the totally wrong thing to do though. Usually companies use NT to serve the HTML because it has better applications for interactivity available than Unix, and a *nix for the real meaty, hardcore database queries, etc. I believe this is what Ebay does but I could be mistaken.
--
I think there is a world market for maybe five personal web logs.
This presented some unique problems, tho. Using 300 nodes meant that, potentially, you could have 300 connections to EACH CLIENT. We needed to make a transparent single point of entry and use a 10.X.X.X ->legal NAT translation. Problem with that, of course, is that NAT often breaks apps like Real Audio or Napster or anything that embeds source/destination within the packet to be router through the routing level of the requestor.
Using NAT as a front-end to a server farm returning straight HTML documents won't cause any problems.
Go away!
I forwarded a copy of the above post to Rob Malda, and he sent me a concise reply describing his view of how it's supposed to work. I think it's worthwhile to share his insights with the whole crew. With his permission, here's what he had to say:
So there you have it. Of course, that does raise the question of why we have the Overrated and Underrated moderation categories, but otherwise, I think I see his point.
--Joe--
Program Intellivision!
At this time, it appears to have been rated back down to a 3. I think what happens is that moderators scan / read through posts, selecting particular posts to be moderated up or down. When they finally get to the end of the page, they click [Moderate]. When several moderators are actively viewing a story, you end up with multiple moderations pending for the same article. So, what should've received a +1 might get +2 or more of multiple moderators agreed that it deserved +1.
The problem is that the moderators don't get to see the other moderations being performed in parallel to their own moderation. Perhaps there's a solution. Slashdot could ask for confirmation in cases of "moderator collision."
For example, consider the following sequence of events:
Currently, Slashdot will apply both moderations immediately. This results in article #42 receiving +2, when it may only deserve +1. It's neither Moderator's fault -- they've moderated past each other. Alternately, I propose that Slashdot, in this case, only apply the unique moderation immediately, and then ask for confirmation on Moderator B's moderation of #42. This is because Moderator B had no way of knowing that Moderator A moderated #42 up while he was still reading the posts. Let's assume all moderations are applied, and continue the example:
At this point, Slashdot will apply the moderation. Under my proposal, this would not change, as Moderator C did already see that #69 was moderated up before he selected it for moderation.
What I'm guessing would be necessary is an additional bit of state which says "This was the score that the post was viewed with at the time the Moderator selected it for moderation." If the article's current score is different than the score it was viewed with, ask for confirmation that the moderation be applied for that specific moderation. A series of radio buttons could be displayed for the affected articles: "Apply Moderation? [_] Yes [X] No".
Thoughts?
--Joe--
Program Intellivision!
"PCs will work OK in any heat and humidity that people will"
I pictured myself framing a house in the Texas summer heat, and repairing a barbed wire fence in a snowstorm.
-fb Everything not expressly forbidden is now mandatory.
" the guy who kicked your fucken ass "
If I had moderator points, I'd deal you down accordingly. Since I don't, I'll mention this:
I think that "fucken" is becoming a word. I'm glad it is, because it rhymes with "Turducken". I also think it would work in a subjunctive mood usage context.
-fb Everything not expressly forbidden is now mandatory.
for just a 2 or 4 node cluster, you buy a high-quality PC from VA or some other reputable shop that supports Linux well. once things start to grow, you use those for database, load balancing monitors and things like that, and you grab el cheapo clones for the gruntwork of running httpds.
I knew had to be some heavy-duty equipment back there -- nothing else but a 4,000 node Beowulf cluster could power the awsome "Mentalplex" search engine. It's unfortunate that the search also requires the combined mental powers of 4,000 users. Which might be why I can't seem to get the Mentalplex to find anything but pr0n and mp3s. :).
.....oooh, swirly..."
"Must... Concentrate!
Allthough I use Google mostly, it is not the fastest engine around, that has got to be Fast (http://www.alltheweb.com)!
It is so damn fast, that it just keeps amazing me.
If you haven't tried it - you should!
Here is an example on a seach for "linux":
"3810249 documents found - 0.0051 seconds search time".
I would like to put one of these in my basement and finally disprove the "7 steps to Kevin Bacon" theory everyone seems to buy into.
It's usually 6 steps to kevin bacon, and it's an NP complete problem. If you do find a way to solve it in polynomial time, please share your algorithm. You'll probably get a Nobel prize.
--Shoeboy
(former microserf)
There are ways to reduce the impact of the clustering, but it will never be better than a parallel computer.
That's complete bunk. Whether a centralized multiprocessor machine or a massively-parallel distributed cluster would be faster depends completely on the task at hand. Specifically: How parallel is the task?
If the task can be broken up into many completely self-contained pieces, then a cluster will generally win. You can buy lots of low-end hardware cheaper then you can buy even very good high-end hardware.
If the task contains contention points or data access is very random, then you're better off with a single multiprocessor machine. An example of a contention point would be the locks in a database. An example of random data access would be logins to Slashdot.
Finally, it is worth pointing out that, after a certain point, most large machines have to move to a NUMA design, at which point you start to resemble a massively parallel cluster anyway.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Take a look at your user history. All your posts eventually get looked at by moderators not smoking crack and get modded down to 0 or -1. At best you are entertaining yourself for a few minutes with a single temporarily, high-modded post at a time.
Others might respect your trolling, but the only thing that matters in the end is high-karma--and you ain't got it.
BTW, don't bother responding with a "what are you talking about, I'm not a troll" response: I don't intend to read it.
--
Have Exchange users? Want to run Linux? Can't afford OpenMail?
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Still an Open Source victory!
Alltheweb.com is running "Apache/1.3.6 (Unix) PHP/3.0.11" on FreeBSD...
When I first saw the "powered by Dell Poweredge" sticker on their page, I briefly worried that it was going to be an NT site. Nope!
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
It's all well and good that there's PR out there about this.
As someone who is building a large portal with Redhat, it'd be nice to have some kind of technical reference as to how they've built it. What are they using to handle the clustering? Are they using the Piranha stuff that comes with Redhat 6.2, or are they using hardware, or maybe something they've written themselves? Are they using sessions, and if so how are they handling them?
Are any parts of the cluster sharing processing power, or they all just individual boxes clustered to appear as one?
I think it's great that they're getting press, I'm just hoping that one of these days there will be something published on how it all went down.
No, I wasn't trying to say that google makes a good poster boy for open-source, but it is a great example of large organizations embracing the fruits of open source labor. Linux has gone through a lot of media exposure in recent months due to it's current 'fashionable' status, what is actually need to maintain linux's spot in the media world is examples of linux doing real world jobs. Large companies like google making public statements that they use a massive linux installation to solve a problem because it's the best tool for job are not going to hurt.
I would be interested to hear more about "The troubles that go on at Google behind the scenes are bound to become public knowledge very very soon.", without further information I'd like to think that linux would not get a 'black eye' over any problems within google, but you seem to know more about this than me.
Only 2 of the modules for the entire system are implemented in python, specifically the web crawlers and the server that feeds the crwlers url's. the rest of the system is implemented in c or c++.
No it's not. A google is the verb form of googly (a cricket term) - an off-breaking ball with an apparent leg-break action on the part of a right-arm bowler to a right-handed batsman, or conversely for a left-arm bowler.
A googol is 1 followed by a hundred zeros, 10^100.
A googolplex is 1 followed by a googol of zeros, 10^googol.
take a triptonica to subthunk
raging.com
(seems to come up with slightly diff hits than Altavista itself, but works plenty good for me!)
Yes - in fact they claim an interesting demographic to potential advertisers:
Google advertisers will benefit from marketing to a web audience with these distinct demographics:
Male (65%), female (35%)
High education (65% have at least a BA/BS)
Professional (73%)
High income (average income is $71,000)
Highly technical (71% report high/very high computer skills)
Online experience of 4+ years (58%)
Accessing the Internet from work (48%)
Using the web for work purposes (31%)
There is much cruelty in the universe, John.
Yeah, we seem to have the tour map.
a Beowul....Oh, wait a minute, never mind ;)
John Saul Montoya (Yeah, Wosten, thatJohnny Montoya, the guy who kicked your fucken ass over the KKW second-stage funding. Don't fuck with Wall Street).
-- the most controversial site on the Web
BTW, have you looked at the http://www.hotsheet.com/ portal? It's a portal, yeah, but it's really "clean" looking and has a ton of useful links. That's why they host my email. (no, I'm not affiliated)
----
Did you say that with a straight face?
Assuming you depreciate a machine over three years (and that's really stretching things in the Real World), you're replacing a machine every just over every six and a half hours. Plus all the effort gets skewed down the the end of the three years. It would almost be economical to throw the door-key away and start afresh.
When you buy a Sun the damned thing just doesn't fall down unless you have a system mangler who keeps dicking around with it. And if a single Sun could not address the problem, then maybe it's time to buy some real iron, like a maxed out S/390. When you have a terabyte of data to process, you have to start paying a little more attention to things like I/O.
4000 PCs cannot be a viable economic replacement. That amount of hardware would require as highly a specialised environment as that of a mainframe (cooling and electricity), and certainly much more real estate. And they have really shitty I/O. If Google has money and space to piss away, well good for them, but it's hardly a wise business practice that anyone over 30 would recommend.
If you want to play with Linux, by all means invent some statistics that show that your MIPS/$ is better than the competition. Statistics can say anything you want them to. I, however, would like to know how they derived such figures. Ignorant readers of the article might otherwise be mislead into pursuing foolish choices in computing platforms.
Oh, and BTW, your regex is suboptimal, the split is entirely redundant and you shouldn't use double-quoted strings in Perl if you're not interpolating anything.
An interesting quote is this:
The Sun solution would be much more expensive because it wouldn't be only one Sun. It would require many, many, Sun 6500's or 10000's. Since their application distributes quite nicely, the price/performance of Intel boxes running Linux would be very hard to beat.
;) (In practice, the ratio would probably be closer to 12-15 Intel boxes per Sun 6500, I would guess, as a PIII doing it kind of integer work would likely outperform a SPARC II)
Try substituting Sun 6500's with 20 CPU's for each set of 20 Intel boxes and see what that does to the pricing.
Geeky modern art T-shirts
But in this case I think Google is on the right track. MIPS/$ ratio is definately in the favor of the PC. And with sooo many PC's if one goes down it really wouldn't make a huge difference. If it were just a 2 or 4 node cluster then I would lean towards a RISC based architechture for reliability. But in this case the cost is just to staggering to imagine a Sun cluster for this.
Koodoos to Google, my new search engine of choice! Long live Linux!
Google does offer phrase searches, and a few other advanced features. Just click on the Search Tips link from the main page. I'm not sure I'd classify their implementation as "intuitive," but it's no worse than learning, say, REXX. You are correct though, in that full Boolean searching is not available -- as stated on the Tips page, Google does not support the logical or operator at all.
Slightly disreputable, albeit gregarious
Vintage computer games and RPG books available. Email me if you're interested.
It's nice to see some good Linux publicity happening, Google is fast becoming the most respected search engine around, their clean and uncluttered interface is drawing people away from the more traditional search engines where it seems you have to download more portal c$&p every day. It seems poetic the google is becoming an ambassador for linux by showing up their bloat laden competitors in the search engine market, while linux does the same in the OS market.
I can just see it now. A manager at Google walking over to a developer's PC and seeing this sticker and saying,"Why not?"
:)
Now all that's needed is for thinkgeek to claim responsibility for this action.
So this "super computer" will be used for Total World Domination? Oh, can we use it atleast to take over some small thrid world countries? I promise to have it back by six tonight.
The Google crew must have some killer Seti@home stats.
I would like to put one of these in my basement and finally disprove the "7 steps to Kevin Bacon" theory everyone seems to buy into.
"`Ford, you're turning into a penguin. Stop it.'" -THHGTTG
redundant array of inexpensive processors
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
But damn, that takes a staff of 200 people to manage the security/connectivity/accounts/space and other duties just for the cluster.
The Power bill has to be outrageous!
The Cabling/switching/routing mess has to be totally unmanageable
What happens when you reach a buck in the hardware or have to patch the system or replace a kernel because of a hack that came about? It is costly and hellish to work on 4-6,000 pcs
I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper. Sure your upfront bill may be more, but only have to worry about 8-16 power connections (redudancy) is alot easier then 6,000 power cords/strips/racks/floor space/cooling/maintenance.
Sure it is one hell of a beast to be proud of, but one hellova costly beast to work with.
Just my 2 cents
just my own 10cents, The google guys use python over perl, hrmmm, i wonder why. :D by the way their paper is a good read. http://www7.scu.edu.au/programme/fullpapers/1921/c om1921.htm
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
I would have thought it to be wiser to setup Sun E10000's or something like that.. having 4 32 proc e 100000's in a cluster is a hell of alot easier to manage and cheaper.
Last I checked (this was about a year or so ago) a fully loaded (64/64) E10K ran around $12M and the base (2psr) system was running around $800,000. Even if that's off by a factor of 3 or 4, you're still talking $3-$4M a piece... at three of them, you're looking at between $12-$48M. On the other hand, the typical white box PC will run between $800-$1500. That amounts to $3.68M-$6.9M for 4600 nodes. This doesn't include the network infrastructure or administration costs, however, as someone who has administered large clusters (largest was an 80 node SP/2), it actually becomes easier to administer that many nodes in a cluster than it would that many servers. Keep in mind that there most certainly are groupings of nodes where they are kept identical except for IP.
Another significant expense is that hardware support costs associated with such systems. If you have 4600 nodes, it's trivial to simply keep (MANY) spare systems floating around. Also, you can disable a node with negligible impact. Even if you're subdomaining an E10K, there are (a small few) single points of failure on the platform (regardless of what Suns documentation says). If you're not subdomaining it, you're simply talking a 32way SMP box (might as well just use a 6500 for that configuration). If you were to lose the backplane for whatever reason, you've lost a singificant portion of your compute resources.