Slashdot Mirror


Google Doubles Server Farm

Mitch Wagner writes "Here's our followup story on Google's colossal server farm. When we first wrote about Google last spring, they had 4,000 Linux servers, now they run 8,000. Last year we focused on the Linux angle, this year we thought it was more interesting to go into the hardware, giving a little detail about some of the things Google has to do to build and run a server farm that big." Impressive. I always think our 8 boxes are cool, until I see this kinda thing.

258 comments

  1. remote administration? by Anonymous Coward · · Score: 1

    updating an rpm or some piece of code to 8000 servers must be a nightmare, I am sure they have automated processes, anyone know what? like how are they measuring performance, disk usage etc... doesn't sound very efficient to me.

  2. Re:google modifications available by Anonymous Coward · · Score: 1
    If you make improvements and want to give them away to the community, that's great. You can't sell those improvements thanks to the GPL


    Umm, I don't know which version of the GPL you're referring to, but every copy that I've read says no such thing.

    They could make changes and sell them all they wanted, for as much as they wanted. They just couldn't prohibit the buyer from re-selling (or giving away) it.

    And if they wrote custom software (not part of a GPL'ed project) they can distribute it under any license they wish, including a commercial one.
  3. Re:Now, repeat in unison... by Anonymous Coward · · Score: 1

    Yeah, I bet you wish you had a Beowulf cluster go Google server farms SHOVED&nbspUP&nbspYOUR&nbspASS!


    btw, the canonical form would be "Imagine a beowulf cluster of %s", please have some respect for tradition

  4. IDE bandwith by Anonymous Coward · · Score: 1

    They have two types of rank mounts... The older ones with 1 IDE controller and two HD's. The newer ones with 4 IDE controllers and 4 HD's. Did they go to the 1 IDE-controller to 1 HD setup because IDE controllers can't multitask read/writes well like SCSI? Why not one SCSI controller and 4 HD's?

  5. Re:storage question by wampus · · Score: 1

    thats your bios. flash yerself a new one

  6. Re:Amazing by scoof · · Score: 1

    Often that _is_ what you want unless you use a search engine on specific subjects (such as scientific articles)

    --
    -- Andreas
  7. Re:awesome by GrenDel+Fuego · · Score: 1

    8,000 machines you mean.
    They have single processor and dual processor machines, which means wherewhere between 8,001 and 15,999 processors.

  8. Re:Seen it by swb · · Score: 1

    Are you telling me that they couldn't have a called in an electrician and pulled new circuits to drive the surrounding cages?

  9. Re:Electric bill by swb · · Score: 1

    St. Cloud, Minnesota is a better choice. 41F mean annual temperature (26F Oct-Apr), and it's just a few miles from Xcel (formerly NSP's) Monticello nuclear power plant.

  10. Re:And this is good? by PotatoNO · · Score: 1

    They could boot off the network and just use the local drives as database storage. I've always thought that something like the NetApp should be made for CPU/Mbs. Something like a 12RU case filled with motherboards and CPUs all booting off a common embedded server.

    Regardless you're going to need a huge screwdriver staff to maintain all that hardware. But the actual maintenance would be relatively brainless.

    Do those 8000 machines do the spidering, search requests and CGI? Does that mean that popular keywords get sent to the most loaded down servers? Is it possible to map it out? Just a thought.

    linkfilter

  11. Re:can you say pr0n? by Noehre · · Score: 1

    DivX is for people that don't feel like using Mpeg4v3 which is better than Divx in every respect. *cough*

  12. Re:heck of a space heater by gmeb · · Score: 1

    Actually the place is air-conditioned, so it's pretty cold there.

    --
    The angry man always thinks he can do more than he can. -- Albertano of Brescia
  13. Re:8,000 systems... by gmeb · · Score: 1

    Co-location sites have *huge* batteries which take over when an outage occurs. If the outage lasts for a long time, they have diesel generators kick in to supply the entire data-center with electricity.

    The dimension of the batteries, and the heat dissipation problem, is why data-centers will only provide a certain amount of power per square foot.

    If your racks are high-density, like Google's, you'll have to buy extra floorspace just to get enough electricity, only to leave it empty.

    The cheap data-centers of course provide less power per square foot, which means there's a trade-off between buying useless floorspace and moving to a more expensive, but also more efficient, data-center.

    --
    The angry man always thinks he can do more than he can. -- Albertano of Brescia
  14. between 8,000 and 16,000 hard disks by cpeterso · · Score: 1

    The article said they have two hard disks per server, some times four. So that puts the order to Maxtor somewhere between 8,000 and 16,000 hard disks.

    Many of Google's storage devices are outfitted with 80GB hard drives from Maxtor. They have a single controller per hard drive and two hard drives per PC. In some cases, the company uses PCs that are twice as big, with four controllers, four hard drives, two processors and twice the RAM of the smaller units.

  15. Google's Lindsay Felton by cpeterso · · Score: 1

    "That's not to say that the index takes up a petabyte. We have several hundred copies of the index," Google's Lindsay Felton said.

  16. Why do you think Google needs 8000 servers? by cpeterso · · Score: 1


    I wonder what kind of information Google has about the deficiencies of the Linux TCP/IP stack? with 8,000 servers they could have some input as to how the lack of mult-threading has affects performance on a major site.

    Why do you think Google needs 8000 Linux servers? Linux can only run one or maybe two socket connections per server.

    1. Re:Why do you think Google needs 8000 servers? by Master+Bait · · Score: 1
      No way, Microsoft would be woefully inneficient in that environment, not even counting the license fees.

      The only inefficiency in having 8,000 separate servers is the cost of electricity. Smart that they're leaving California.


      blessings,

      --
      "Only in their dreams can men truly be free 'twas always thus, and always thus will be."
      --Tom Schulman
    2. Re:Why do you think Google needs 8000 servers? by jayhawk88 · · Score: 1

      They probably get a lot of their hardware on the cheap too. Maxtor would probably cut you a pretty nice deal if you tell them you want 4000 80Gig drives.

    3. Re:Why do you think Google needs 8000 servers? by Sinjun · · Score: 1

      I guess this brings up my real question. If you have to have 8,000 servers to run a web site with Linux, is the $ you save in OS related expenses worth the amount of money you have to spend on hardware? I wonder how many, say, Sun servers you would have to have to equal 8,000 Linux servers. Not as much? Dare I say Microsoft may even be less expensive?

    4. Re:Why do you think Google needs 8000 servers? by crealf · · Score: 1
      NT will handle this more efficiently than Linux, due to a superior asynchronous I/O model. If you doubt this, look up some of the research reports on Unix i/o limitations. Linux has a long way to go to reach the same performance levels.

      It's well known NT is better in theory. Unfortunatly in practice this comes at a cost. So point out real research reports with real figures and real comparisons.

    5. Re:Why do you think Google needs 8000 servers? by demaria · · Score: 2

      And of course, the other variants of Unix (Solaris, BSD, SCO) and so forth, which has a strong reputation for being kickass stable, reliable, and good performance.

    6. Re:Why do you think Google needs 8000 servers? by Jarnis · · Score: 2

      So they're in California? So that explains there has been a power crisis in the state. 8000 Servers eat up plenty of power. I wonder what kind of electricity bill they have :)

    7. Re:Why do you think Google needs 8000 servers? by ch-chuck · · Score: 4

      Microsoft would be woefully inneficient in that environment

      ... 8000 Msft boxen is probably getting to the point where you'd need 3 shifts of McSE's full time just to reboot the damn things - kinda like the days they made computers with so many vacuum tubes that their failure rate caught up with them, and it would barely run before another tube needed replacing.

      --
      try { do() || do_not(); } catch (JediException err) { yoda(err); }
  17. Pictures by hendrickx · · Score: 1

    This sounds cool. Are there pictures of the racks? I'd love to see them. I found http://www.google.com/plex/index.html which has some "see, wer'e fun people" type pics, and what could concievably be a portion of a rack in the lower left pic, but nothing impressive.

  18. Re:Damn, thats a lot of space by pthisis · · Score: 1
    Even if they used the DL320 from Compaq (A 1U, 1 proc IDE server) or similar, they would still fill just a bit over 190 racks.

    RLX sells 3u machines that have 24 independent blades in them (Crusoe, 1/2 to 1 gig RAM, 2 laptop drives). That's 8 machines per 1U.

    But google uses Rackable setups, 2 machines/1U. Their cage is right across the aisle from ours in VA, and man is it pretty.

    Sumner

    --
    rage, rage against the dying of the light
  19. Re:Doesn't this seem wrong to anyone? by asparagus · · Score: 1

    You don't build a house starting with a large block of concrete - you use bricks.

    Without being too impolite, you do build a house starting with a large slab of concrete. You put bricks on top of that.

  20. Old news by harmonica · · Score: 1

    My fault. I should read /. in chronological order... ;-(

  21. Here's why by WyldOne · · Score: 1

    8000 windows PC's spending 11 minutes every 12 minutes to fight for 'MASTER BROWSER'

    --

    make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
  22. Re:And this is good? by marxmarv · · Score: 1
    It's good for keeping your sysadmins out of their cars at 2am :-)

    Seriously, we'd need to see usage stats, uptimes, and load averages to see how they're really doing, and such statistics are guarded jealously (usually by management more than staff).

    I'd love to see what sort of innovations they've put together for system installation. Kickstart just ain't Jumpstart.

    -jhp

    --
    /. -- the Free Republic of technology.
  23. Re: Multithreaded TCP/IP stack by marxmarv · · Score: 1
    Let me tell you what you forgot.

    Some Ethernet boards handle checksumming and IP for you, and some will take control of the bus to move data into system RAM for you.

    Processes don't wait for packets on TCP sockets, they wait for data to be available. The TCP stack writes the new data (and only the new data) to the socket buffer, which indirectly signals the process that there's new data available.

    "serialize" isn't a Microsoft term. It's been in use for several years and is quite easy to coin independently.

    -jhp

    --
    /. -- the Free Republic of technology.
  24. Re:Ironic timing... by marxmarv · · Score: 1
    Rackable's 2U systems were nicely built and priced fairly, and at least one of them even came with etherboot pre-installed. Unfortunately, the salesperson I dealt with was less than stellar, in that he 1) wouldn't give me the real story on why 1U's weren't available at the time (CA810 motherboards dried up) 2) wouldn't sell me a SCSI configuration because it "might" not work, and dissembled (a Google search done after the deal showed that it would). I would have rather bought from the local beige-box builders, talked with technical people with clue, and saved about 30%. All that said, if I'm buying to appease VC's, I wouldn't hesitate to work with them again if they will give me exactly what I order.

    (Disclaimer: I only dealt with them once, when a big company was buying lots of stuff for our startup, and I'm just a satisfied customer of Central Computer with an unpaid endorsement.)

    -jhp

    --
    /. -- the Free Republic of technology.
  25. Maxtor is no suckem by mondamay · · Score: 1
    I've allways thought of Maxtor drives as the bottom of the barrel. The kind that you get free when you buy other stuff, or for third world countries for keeping records (they're allways losing them, right?).

    I'm sure that Google wouldn't just buy them because they're cheap. Doesn't make sense. They've got to pay somebody to put a new one in (ok, an intern maybe), and buy a replacement, and whatever the downtime costs them. With all that you'd think they'd buy IBMs (not a big fan of WDs).

    Anyway, Maxtor should pleased with the free publicity.

    --
    --Last Exit To Babylon
  26. OK, how much would this cost for a microsoft OS? by weave · · Score: 1

    My god, 8,000 servers. How much is 8000 windows NT or 2000 server licenses under the Microsoft Select program for large business? Anyone know? And I assume each server would have to spend the extra $2,000 for the unlimited internet access pack to avoid paying per-use Client Access Licenses... Then there's the cost for SQL Server if they used that in any capacity....

  27. Re:Pictures! by A+non-mouse+Cow+Herd · · Score: 1

    It ain't much, but you can see one on the lower right, here
    http://www.google.com/plex/index.html
    of course, that could be any old rack full of telco stuff.

  28. Re:Kudos to Google by yesthatguy · · Score: 1

    Google doesn't want to be the next Yahoo. They just want to be the best damn search engine around. Yahoo has built itself up as a portal, incorporating search with a glut of other functionality, a lot of which is really good (hence their popularity). However, in Google's market - search engine - they're the best, as seen by Yahoo's (and other people's) licensing of Google's database, infrastructure, and searching and categorizing methods.

    --
    Yes! That guy!
  29. This is anti-Computer Science by Surtur · · Score: 1

    I believe these numbers of servers must have been
    approximated from a computer science put of view.

    In reality maybe only 1/400 of these servers are being utilized - doing work and about 100 servers should do.

    There should instead concentrate on reducing I/O contention by utilizing Fiber disk arrays and use a multiprocessor machine - perhaps SGI/Cray.

    This is an administrator's nightmare - I bet there's no backup being done.

    What a waste!

  30. a petabyte?!!?! by JEDi_ERiAN · · Score: 1

    wtf!

    petabyte == 1million gigabytes

    can you just imaging how much _______ (insert your choice: mp3s, pr0n, divX;), etc) you could store! damn. *drool*

    E.


    -

    --

    -
    This Post has been brought to you by the letter "E".
    1. Re:a petabyte?!!?! by Chirs · · Score: 1

      Ah, but they don't archive the binary posts, which probably cuts the storage requirement by 90% or more.

    2. Re:a petabyte?!!?! by Macrobat · · Score: 1
      Okay, shoot me, malign my name, or moderate me down, but I couldn't resist...

      Petabyte== 1/8th of a gyro.

      --
      "Hardly used" will not fetch you a better price for your brain.
    3. Re:a petabyte?!!?! by Tech187 · · Score: 1

      Or the Linux 6.0.34 source tarball.

    4. Re:a petabyte?!!?! by levendis · · Score: 2

      What's most amazing about that is that the storage is spread across 8000 computers, instead of concentrated in a few monsterous racks. As someone working in the storage industry, I find that approach quite suprising.... I would have thought it been cheaper and far easier to manage, say, 1000 servers and a dozen massive disk arrays than to have 8000 points of failure to worry about.

      ----

      --
      ---- I made the Kessel Run in under 11 parsecs.
    5. Re:a petabyte?!!?! by segmond · · Score: 3

      4 copies of Microsoft Windows 2100.

      --
      ------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
    6. Re:a petabyte?!!?! by Tackhead · · Score: 3
      > petabyte == 1million gigabytes
      > can you just imaging how much _______ (insert your choice: mp3s, pr0n, divX;), etc) you could store! damn. *drool*

      A full USENET feed (including binaries) is about 250GB per day (yes, about an OC-3 saturated), and growing at 50-60% per year.

      One petabyte works out to only four more years of future USENET, give or take 50%.

      Scary, ain't it?

    7. Re:a petabyte?!!?! by chris_mahan · · Score: 3

      The point of failure thing is a good point. If 10% of their servers fail (800) they still have 7200 that work fine, and they can probably handle things just fine.

      If 50% of their servers fail, then they would be slow, but still work fine.

      If 90 percent of their servers failed, they would still have 800 up. It would be very slow, but might still handle the load.

      If you had 1000 servers with disk array and your system failed, then ouch!

      In the other hand, they probably have half a dozen burned CDs of their implementation of Linux (depending on the HW configuration), so if a server fails, they take it offline, put another on there, load the OS already preconfigured from the CD (with all conf and stuff done already) and load it online.

      One tech can probably put 10 servers online a day.

      So 30 techs can probably put up 300 servers a day.

      Assuming each Linux box operates without admin intervention for 90 days, there would be 88 boxes that need to be fixed each day (about 1%), and so 9 techs could handle it.

      They probably have more than that.

      And since the technology is not hard to understand because it's a dual pentium PC, they don't have to call the IBM mainframe guy over. Also, they probably have a few dozen servers already configured, ready to be popped into the rack.

      --

      "Piter, too, is dead."

  31. Re:Seen it by jalewis · · Score: 1

    Apparently you have never had to deal with the superior customer service at Exodus. I can only speak of my experience with the Sterling, VA location, so take this with a grain of salt.

    We just pulled 12 racks of equipment from that sorry excuse for a datacenter. We had several boxes fail because of high heat and numerous network problems because of Exodus techs that didn't know what they were doing. Getting more power, a phone line or network drop seems like an impossible task.

    I won't even go into the hassle of getting into and out of the building. It isn't a security thing, it seems all the guards are on slow motion.

    Our new location has much more cage space and my thumb is in the system so I don't have to deal with a security guard for everything.

    As much as I hate spending time in the data center, I do like being able to get in and out quickly.

    jas

  32. Re:Kudos to Google by holzp · · Score: 1

    which is really funny when you consider how Google is really trying to become the next yahoo. mark this post. that's their aim.

  33. two google stories on one day? by holzp · · Score: 1

    What did VALinux and Google strike a deal?

  34. Re:ROI on Linux by GreyyGuy · · Score: 1

    True, but weren't the performance tests done on some high end machines? The article says they are using a lot of smaller machines. And I seem to remember that the tests were on individual boxes, not a farm like this. I was also wondering about if anyone had done something of that size with M$ products.

    Anyone know how big the M$N server farm is?

  35. No Kudos to Google by ToasterTester · · Score: 1

    Bragging about having to go from 4000 to 8000 cheap servers doesn't impress me. In fact I see it as bad design and a SA and facilities nightmare. Not using multiprocessor boxes shows another weakness in Linux, SMP and threading. Sun boxes are expensive, but way fewer would be needed and that would save money. FreeBSD on Intel boxes would of been a better choice for its better TCP/IP stack.

    No Kudos, shame on Google.

    1. Re:No Kudos to Google by ethereal · · Score: 2
      Sun boxes are expensive, but way fewer would be needed and that would save money.

      Did you actually read the article? Because the guy in charge of this stuff said that they were saving money by doing it this way. Considering the amount of money Google would be out if he were just lying through his teeth as part of the Linux Zealot Conspiracy (c), I really doubt that he's making that up. But if you'd like to point out all of the Google-sized sites that you're running, maybe we could talk.

      He also mentioned that using a freely-modifiable commodity OS on commodity hardware kept them free of any vendor pressure, which I imagine would be somewhat of a problem with Solaris, et al. No forced upgrades for Google!


      P.S. There is no Linux Zealot Conspiracy, of course, but you wouldn't know it by reading /. :P

      Caution: contents may be quarrelsome and meticulous!

      --

      Your right to not believe: Americans United for Separation of Church and

    2. Re:No Kudos to Google by El+Volio · · Score: 2

      Try

      for server in $serverlist do
      scp patchNNN.tar.gz $server
      ssh $server (gunzip patchNNN.tar.gz; tar xf patchNNN.tar; install-patchNNN.sh)
      done

      It's not that hard to automate such a thing. Those 8000 servers are NOT managed individually -- that gets to be a real big pain, real fast.

      --

      "You can never have too many elephants on your team."

    3. Re:No Kudos to Google by ToasterTester · · Score: 2

      I am speaking from experience and stand by what I say. The cost of the boxes is small compared to cost of supporting and maintaining 8000 boxes. the SA's must have lots of fun when they have to patch 8000 boxes. Their approach to storage I disagree with also, being mainly read data versus write RAID 5 would improve throughput by using more spindles. Centralized storage would be have better performance, be easier to maintain and update data, add fault tolerance, and easier to scale. We haven't even brought up networking, air conditioning or power aspect. I could go but don't feel a need.

      Bottom line there is a point where lots of cheap systems are no longer cost effective versus hardware designed for large scale systems. FWIW: If I was to build a system like this on Linux I would use Alpha-based systems not Intel. If to be Intel I would switch OS to FreeBSD or BSDi.

  36. One hell of a colocation by I-man · · Score: 1

    And I thought the lifeminders cage at PSInet was cool. I'd love to see the physical layout of all these cabinets. Does anyone know if Google is hosted in a dedicated facility? I've assumed till now that they just had a high-end managed colo, but 8,000 spread across only 4 facilities? Damn.

    1. Re:One hell of a colocation by SiliconJesus · · Score: 2

      Here's the interesting part of the traceroute I ran from my workstation here at, well work :)

      9 284.ATM7-0.XR2.DCA1.ALTER.NET (152.63.33.41) 5.685 ms 13.112 ms 4.145 ms
      10 194.ATM7-0.GW3.DCA1.ALTER.NET (146.188.161.77) 5.545 ms 7.685 ms 4.475 ms
      11 abovenet-dca1.ALTER.NET (157.130.37.254) 5.327 ms 6.011 ms 5.987 ms
      12 core5-core1-oc48.iad1.above.net (208.185.0.146) 6.132 ms 5.715 ms 6.948 ms
      13 core2-iad1-oc48.iad4.above.net (208.185.0.134) 5.818 ms 5.785 ms 6.011 ms
      14 main1colo1-core2-oc12.iad4.above.net (208.185.0.66) 7.527 ms 5.400 ms 4.853 ms
      15 64.124.113.173.available.google.com (64.124.113.173) 6.160 ms 5.705 ms 8.736 ms


      It appears to be co-lo'd at above.net. This was ran on the www server.
      Secret windows code

      --
      Clinton made me a Republican. Bush made me a Libertarian. Trump is making me question reality.
  37. Re:A Real Reason They Can Get Away With That by rossjudson · · Score: 1

    You got that right. It's frickin' hard to make an app (or a cooperating network of apps) that can survive the laundry list of possible failures. I've seen deployment environments that are so far away from optimal that it's just scary. You gotta have an X-files attitude about distributed software: Trust no-one. Trust no process, no socket, no network. If it could go wrong, it will. There's another great leap in computation waiting out there for us -- peer to peer, distributed, fault-tolerant, scalable, indexed, transactional information storage. Don't say database, because it needs to be able to store any kind of data, at any time. This thing is to information what the internet is to packets. Route around the damage, converge on optimality. It's a very hard problem. Whoever solves it first, best, is gonna be extra rich.

  38. Re:it's still not as 31337... by The_Messenger · · Score: 1
    You just made it pretty obvious that you're not a programmer, so stop being lame and pretending.

    --

    --

    --
    I like to watch.

  39. Google is hiring! by wolfpaws · · Score: 1

    Note the pantload of Google geek jobs just posted on craigslist.com last Thursday.

  40. Final point by Galvatron · · Score: 1
    I'll let this go after this, but one last comment :)

    I also never suggested the Linux is inferior to NT.

    But, you're claiming that Google has to spend $1-2 million more on coding for Linux than on coding for NT. That was my point. Not that they wouldn't have to spend $1-2 million on Linux, but that they wouldn't have to spend $1-2 million MORE than they would on NT.

    The only "intuitive" interface is the nipple. After that, it's all learned.

    --
    "The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
  41. Okay... by Galvatron · · Score: 1

    So about $1 million for 8000 copies then? The point still stands...

    The only "intuitive" interface is the nipple. After that, it's all learned.

    --
    "The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
    1. Re:Okay... by duffbeer703 · · Score: 1

      I'm sure that Google spent far in excess of $1 million to customize Linux to the level that they have.

      Qualified people who modify the innards of operating systems for a living don't come cheap. Not too many people are willing to donate their time and energy gratis to companies.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
  42. Re:Electric bill by slamb · · Score: 1

    California ranks 48th in the Union in per capita energy consumption. Compared to the other states, we do an excellent job of conservation, reducing demand, and developing new energy saving technologies.

    ...which is clearly because Californians are all supermen. It couldn't have anything to do with California's perpetually ideal climate...never cold (very low heating costs) and not incredibly hot either (moderate air conditioning costs).

  43. Is the cache functionality legal? by joeykiller · · Score: 1

    Does anybody know whether anyone has questioned whether it's legal or not to present a cached version of someone else's content? I like this functionality, but my gut feeling is that the function surely must be breaking some copyright law.

  44. Simulatious (OT, I know, is that so wrong?) by stixman · · Score: 1

    Nice word. Much cooler than simultanious. I think I'll start using it. :0)

    --
    -
  45. OT: by stixman · · Score: 1

    We talk to much about beowulf clusters here, but I have never seen and cannot find where the term actually comes from. Can anyone offer some insight? If from it's from the tale about Grendel, Hrothgar, etc., how does it relate? Thanks in advance, Mike.

    --
    -
    1. Re:OT: by freeweed · · Score: 2
      A 3 second Google search revealed the following URL as the first hit: http://www.beowulf.org/

      You obviously haven't looked very hard for information :)

      --
      Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  46. Show me by uberchicken · · Score: 1

    I wanna see the 8000 servers...

  47. Uhh, yeah, but the cost? by crashnbur · · Score: 1

    I asked if you knew how much it cost, not how to figure out how much it cost. All that information that you just gave me assumes that I know what all of the constituent parts cost... And... it's all moot anyway. :-)

  48. Re:8,000 systems... by The+Breeze · · Score: 1

    Exodus, where the machines are hosted, has its own backup systems for the building.

  49. Re:Electric bill by Eloquence · · Score: 1
    "That's not to say that the index takes up a petabyte. We have several hundred copies of the index," Felton said.

    The article doesn't say who Felton is. Who is it?

    --

  50. Re:Electric bill by Quixote · · Score: 1

    Buffalo NY would have to be the ideal location for this. Cold as hell, and right next to the Niagra Hydro plant for cheap power.
    You're right about the cold, brother. Except, it is not cold as hell, but cold as a witch's behind.. ;-)
    BTW: the cost of electricity here is higher than a lot of other places. Don't ask me why; I think the juice just flows out of here.

  51. Re:Why? by dpm · · Score: 1

    Even if Google paid USD 1,000 per machine (and I'll bet they paid much less), that comes to only USD 8 million for the whole setup -- you cannot buy much big iron for that, at least not enough to run one of the busiest sites on the Web.

    Furthermore, Google had the enormous advantage of being able to scale up one machine at a time rather than dumping a whole lot of money at the start, and support and replacement parts are dirt cheap. I've seen the same approach work elsewhere.

  52. Re:A Real Reason They Can Get Away With That by green+pizza · · Score: 1

    First of all, there is no way that Google is sustaining a full load of 16 GB/sec from disk or fully streaming of their 16 TB of ram. The overhead is probably with the branching and routing of the reguests and really can't be overcome without a major software overhaul... adding more hardware is cheaper than developer time, plus adds redundancy and storage.

    As far as no single memory system being that large, you are correct. The largest *single* system I can think of is the SGI Origin 3000 which maxes out at "just" 512 MIPS R14K CPUs with 1TB RAM. Storage wouldn't be from SCSI, but rather Fibrechanel... 4 XIO fc_al cards per CPU brick (or 1 fc_al card per cpu) would be about as dense as you'd want to go... that's be about 51.2 GB/sec.

  53. Re:Doesn't this seem wrong to anyone? by green+pizza · · Score: 1

    Amen. Good point.

  54. I wonder if they considered SGI Origin 3000 by green+pizza · · Score: 1

    Seeing how SGI is strapped for cash, yet makes some awesoeme hardware and a rock solid OS, I bet they would have cut Google a good deal on a set of four maxed out O3Ks (512 CPU and 1TB RAM each).

  55. Veronica by green+pizza · · Score: 1

    So, do tell us ... what is the best search engine in the world?

    Veronica

  56. flamebait by green+pizza · · Score: 1

    why is this marked flamebait?

  57. Had to think about it s'more by green+pizza · · Score: 1

    This is what you can tell people when they tell you that linux is a toy. The best search engine in the world is *not* a toy

    I think you pretty much said it all. Running 8000 personal computers to run a seach engine site, heh. You are correct, the best search engine in the world isn't a toy. It isn't Google either.

    1. Re:Had to think about it s'more by garbuck · · Score: 1
      You are correct, the best search engine in the world isn't a toy. It isn't Google either.

      So, do tell us ... what is the best search engine in the world?

  58. Re:Amazing by green+pizza · · Score: 1

    Their full and final product is not a toy, that's for sure. Using *EIGHT THOUSAND* personal computers to run it sure is... it's quite goofy.

  59. Re:Why? by green+pizza · · Score: 1

    Why bother to put together 8,000 Linux boxes, when one could obtain high-powered 64-bit computers to accomplish the same task?

    Redundancy is often a common answer, they could have *FOUR THOUSAND* failures and still keep on chugging. Still, I agree, I wouldn't want to be in charge of keeping 8000 personal computers happy. I would probably load balance the thing between, oh, four different 128-CPU (four racks each) SGI Origin 3800s. Maybe just 64-CPU models or a large Sun. Keeping 4 machines going is a hellofa lot easier than 8000, plus you get just about the same amount of bandwidth. The backplane-less Origin 3000 series uses *gobs* of 3.6 GB/sec "NUMAlink" interconnects. Not exactly your dad's gigE or myranet.

  60. Argghh by green+pizza · · Score: 1

    I was thinking they were opensource. Knowing this I don't feel like even giving them the time of day. Anyone know of a good opensource google-like engine?

    1. Re:Argghh by Macrobat · · Score: 1
      See the thread above, "google modifications available," for the response to this. If you aren't selling the product you make using GNU software (and google provides a service, not a product), you don't have to open-source it. Don't make such a knee-jerk reaction. They're still doing something cool, and showing the world that Linux is viable for business.

      --
      "Hardly used" will not fetch you a better price for your brain.
    2. Re:Argghh by ichimunki · · Score: 3

      What good would open source search engine code do? Unless you wrote it in such a way that it ran on some sort of distributed basis, only your direct competitors would have the hardware to run it. I mean, Google is in the business of providing search results. If they give away the software that does this, anyone with a server farm can build the same engine. Now if they were a not-for-profit company (you know, a charity) or a volunteer effort like DMOZ, then I could see it, but I expect the stakeholders at Google prefer black ink on their bottom line.

      Free software makes all kinds of sense when users demand it, especially when it comes to operating systems, programming languages, and "productivity" applications. But it makes zero sense for a company who has not only written the software, but has the only machine running that software, to give away the software.

      --
      I do not have a signature
  61. Re:can you say pr0n? by front · · Score: 1

    "What do you think is stored at Google? "

    I reckon it is that huge cache of pages in html.

    cheers

    front

  62. that's a lot of power by Alcoholist · · Score: 1

    A 2 hard drive server maybe consumes around 75watts when operating (obviously it has no monitor). Over 8000 machines that 600,000 watts of power. Their electrical bill must be obscene.

    --
    Bibo Ergo Sum.
  63. Re:Based on what? by duffbeer703 · · Score: 1

    Actually, I have some experience managing large numbers of computers. I am part of a team that manages over 85,000 Windows 95/NT4 and 2000 clients, as well as another 5,000 NT, Linux and Unix servers belonging to a variety of medical facilites as well as state & local gov't.

    I never said that Google should use NT, only that the cost comparisons put forth by many Slashdotters are not as one sided as you may think.

    I also never suggested the Linux is inferior to NT. But if you find it unlikely that $1-2 million is time & materials were invested in customization and bugfixing, I don't think that you have a realistic grasp of what things cost.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  64. Re:Electric bill by duffbeer703 · · Score: 1

    Until recently, New York had the highest energy costs in the nation... although California takes that prize these days

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  65. Re:Doesn't this seem wrong to anyone? by ckedge · · Score: 1
    CKW said "semi-normal". Your $5 Trabant belching smoke isn't normal. And if the total cost of ownership and usage of $5 Trabant's was so shit hot, wouldn't there be more people using them?

    The use of a lot of non-committal generalities in the argument plus the "un-accounted" reference clearly indicates that it's a very coarse limited-usage theory, which you've nicely demonstrated.

  66. Re:im not really clear on.. by ckedge · · Score: 1

    How much of that "100-500 times the size" is in pages that can not be reached as they are internal or in dynamic sites? Or are pages for which there are no hyperlinks elsewhere on the net leading to them? I'd like to know Google's coverage when those things are factored out.

  67. Re:Do they give back? by t482 · · Score: 1

    I listened to an speach on technetcast. Apparently they had to rewrite the IDE interface to deal with IDE contention issues.
    Does anyone know if they contributed that to the kernel? They mentioned they might.

  68. Re:can you say pr0n? by GungaDan · · Score: 1

    Nothing that fun. They rent half their space out to the boys from Quantico, in exchange for a primo deal on real estate, security services, and inside information on which missionaries' planes will be shot down next, so they can break the news.

    --
    Eloi are stupid, throw morlocks at them!
  69. Re:google modifications available by bayduv1n · · Score: 1

    What I would find more interesting would be their distributed index algorithms. Maybe they have some ideas on how to make Gnutella more scalable?

  70. Re:Electric bill by jchristopher · · Score: 1
    California ranks 48th in the Union in per capita energy consumption. Compared to the other states, we do an excellent job of conservation, reducing demand, and developing new energy saving technologies.

    It is unfortunate that the current "energy crunch" has given others the impression that Californians are "energy hogs" that are somehow using triple the electricity of Oregonians, when that is not the case.

    If the rest of the country was as efficient as California, energy prices would not be what they are.

  71. 404 Not Found by rtnz · · Score: 1

    You would think google would setup a pretty 404 page: http://www.google.com/asdadsasdasd

  72. Re:Kudos to Google by satanami69 · · Score: 1
    I know they didn't make any money from their affiliate program(the one where you put a search box on your home page for others to use)

    A friend I work with was able to setup a macro that searched a few hundred times an hour. It wasn't a hack or anything, just simulated keystrokes through a macro program. I figured he'd get caught at some point, which he did. To his amazement, he still was mailed a check for over $6000.00US. No joke, he really got the check

    So I signed up, and ran the same macro, and cleared $700.00US. Unfortunately, they shut down the affiliate program before I could make any more money. It sure was nice getting a few checks ahead for my car payment though. I do think I'll switch to Red Hat now. I've been using Corel linux, since it's just me, with no network.

    --
    I really hate Dan Patrick.
  73. slashdot runs off 8 boxes? by Sabol · · Score: 1

    Wow, I would have expected more than that, but I don't know the specs of those 8 so I can't come to any conclusions.

    Anyone know about the hardware behind slashdot? (extremely curious)

    1. Re:slashdot runs off 8 boxes? by bobthemonkey13 · · Score: 1
      It's in the faq.

      ---

    2. Re:slashdot runs off 8 boxes? by krugdm · · Score: 1

      Read all about it here which is the Tech section of the Slashdot FAQ.

    3. Re:slashdot runs off 8 boxes? by Brento · · Score: 2

      Ask and ye shall receive. Click on FAQ on the left side of your screen, and you will discover the hardware behind the dot.

      --
      What's your damage, Heather?
  74. Re:google modifications available by agentZ · · Score: 1

    Aha... see now, this is the true beauty of open source software. Yes, the basic product (gnu*) is available for anybody to use. If you make improvements and want to give them away to the community, that's great. You can't sell those improvements thanks to the GPL, meaning you can't make money off of something that should be free (like speech, not beer). But if you want to use that product, especially if you have improved it over the default version, that's your own business. Long live GNU...

  75. Google Slashdotted by nnnneedles · · Score: 1

    I can't reach Google.com right now?

    Slashdotted? You must be kidding me! All those servers must be downloading PORN!!

    --
    Will code a sig generator for food
  76. Staggering and Power Drains by ackthpt · · Score: 1
    Reading the article, it says they're consolidating round D.C., that's where the *power* is, in more ways then one, right?

    It would probably require some rolling blackouts in D.C. for "W" to consider some serious energy policy other than trashing the environment, increasing CO2, and lining the pockets of his financial backers.

    Final thought, with all that excess heat (hot air competition for the House?) I wonder if they still freeze their butts off because they don't route the heated air anywhere productive. Seems they could sell heat like that to a greenhouse. :)

    --

    --

    A feeling of having made the same mistake before: Deja Foobar
  77. Re:Crud.... by esonik · · Score: 1

    You are right. The estimate for number of atoms in the Universe is around 10^80 (provided the Universe has a finite size), which is a factor of 10^20 less than Googol - pretty much IMHO. This value is calculated using the volume of the visible Universe and the critical density of the Universe, both deduced from the Hubble constant (which can be determined experimentally) using the theory of General Relativity.

  78. Re:google modifications available by esonik · · Score: 1

    They could make changes and sell them all they wanted, for as much as they wanted. They just couldn't prohibit the buyer from re-selling (or giving away) it.

    Yes, but then they have to give out the source code too! If they don't sell (or distribute) it, they can keep the source code secret! (c.f. GPL version 2, Terms and Conditions, item 0: "Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program).")

    However, if you sell something that is available as free source code, nobody will actually buy it (because you have to announce that there exists free source code). That's why agentZ says cou cannot sell it; you can try, but you will fail.

  79. Re:Wait, I have the Answer by John+Harrison · · Score: 1

    Actually the IBM lab at Santa Teresa (now called the Silicon Valley lab) uses the heat from the one-acre computer room to heat the rest of the lab, which is eight four story towers.

    I believe that the system actually produces less heat than it used to (the price of progress I guess) and they have had to supplement the heat it puts out with an actual heating system in the winter.

  80. Divx vs. Mpeg by vheissu · · Score: 1

    Better how? In its identicality? I suppose Microsoft's crippling of its default mpeg4v3 codec to not include .AVI file support could be considered an advantage, if you live in Redmond.

    --
    /* This post not warrantied for mission critical applications. */
  81. Mainframe? by Usquebaugh · · Score: 1

    That would be a cool sell for IBM.

  82. Re:Why? by sumengen · · Score: 1

    Yes, that is altavista's approach. Fit all the web inside one alpha box. Well as you can see the web is growing exponentially, but altavista doesn't. That is why altavista's index size is still 100-200 million web pages for the last several years.

  83. Re:Interesting points by sumengen · · Score: 1

    Good point.
    I think both websites and CPU power doubles in a close time frame.

  84. Re:Kudos to Google by Geeky+Frignit · · Score: 1

    Don't forget the fact that Yahoo probably pays a pretty penny to use them as their web search.

    --
    Tired of sitting at that karma cap? Start a flame war today! See just how low you can go!
  85. Re:it's still not as 31337... by F_Scentura · · Score: 1

    now that you mention it. !=. whatever the fuck. i'm not a programmer, never claimed to be. did a little cgi to go with graphic design classes. big fucking deal. it still doesn't affect the fact that i'm sick of beowulf jokes, does it?

  86. i know. i know. by F_Scentura · · Score: 1

    'its irony, i was playing off the long-time slashdot "i wish i had a beowulf cluster of these" tradition'

    i'm just snippy, i guess... some people tend to take that joke to a dreadful extent. kinda makes me snap sometimes.

  87. Re:I'd bet they've already done the math by kireK · · Score: 1

    hehehe Anyone ever hear of a 8000 node NT cluster? I think not!

  88. Re:Where does Google get their money? by chris_mahan · · Score: 1

    Is it just me or has the whole dot-com bust made it sound like "VC Money" stands for "VietCong money"?

    yeah, it's just me.

    --

    "Piter, too, is dead."

  89. Re:Wow by tdye · · Score: 1

    heh. They're very fast, and they return fresher hits than Altavista, my old favorite.

  90. Re:Answer: reliability. by Ayende+Rahien · · Score: 1

    Not a problem, dude.
    You *don't* have zero reduncy, right?
    The rest of the servers takes the load, while you fix it.
    Not to mention that this kind of stuff doesn't go down that often.
    Worst come to worst, they can get one of those non-stop, they has 100% uptime gurantee.

    --

    --
    Two witches watched two watches.
    Which witch watched which watch?
  91. Why so many? by Ayende+Rahien · · Score: 1

    Is there a good reason for using so many servers, instead of several (very) high end boxes?
    Sure, it's cool that they use linux for such a hard task, but just think about the administration nightmares.
    It just doesn't make sense to me.
    Hell, just think about the *space* 8000 computers takes.
    Replace those with a couple of E10K, and you at least get a major saving on the rent.
    Not to mention you lose the risk of tripping ovel the miles of cables they have there.

    --

    --
    Two witches watched two watches.
    Which witch watched which watch?
  92. Re:Electric bill by fors · · Score: 1

    If Californians had had their heads stuck in the sand and refused to keep their infrastructure up to demand they wouldn't have caused this problem for themselves.

    --
    "If there is nothing you are willing to die for, then you are not really alive." Myself
  93. Re:Electric bill by fors · · Score: 1

    hadn't had, sorry.

    --
    "If there is nothing you are willing to die for, then you are not really alive." Myself
  94. Re:Amazing by geggibus · · Score: 1

    It is a toy... and THE toy i want for christmas... ;)

    We are all children....

    /Geggibus ""

  95. Google rules! by sn0wdude · · Score: 1

    Let's not forget their very cool feature of saving websites. When I search on Google for something, they have this 'proxy' link beneath a hit.

    Click it and you get the page as indexed by Google. Usually (especially home-)pages changes very rapidly, so sometimes the links are already 404, but luckily just clicking the 'proxy/cache' link beneath still let's you view the page !

    --
    --sn0w
  96. Re:google modifications available by Macrobat · · Score: 1
    My understanding of the GPL (and of your question) is that you do not need to provide anything (source, binary, or documentation) for a product that you develop and use in-house. It only requires the open-sourcing of the code if you sell or distribute the final package.

    --
    "Hardly used" will not fetch you a better price for your brain.
  97. Humble pie by Macrobat · · Score: 1
    However, apparently, they do license their search engine software, so the question is a valid one.

    So much for me not opening my mouth without having all the facts.

    --
    "Hardly used" will not fetch you a better price for your brain.
  98. Backups? by number+one+duck · · Score: 1

    Why does this strike me as a backup nightmare? Where would you store that much information, and how would you get it back to its proper place? (Unless you had another googlebyte of storage hiding in the tech closet or something...)

    1. Re:Backups? by Tech187 · · Score: 1

      You know, you've got a point. Maybe next month when we shut down the Internet for it's annual backup we should address it.

  99. But..how do they finance? by OpenSourced · · Score: 1
    I love Google, but haven't seen a single ad. From where do they get the money? I'm just curious. (I know is offtopic)

    --

    --
    Rome taught me patience and assiduous application to detail. Virtues which temper the boldness of great, general views.
    1. Re:But..how do they finance? by OpenSourced · · Score: 1
      Aha! Thank you very much. Overall, a nicer-than-most financing scheme, IMHO.

      --

      --
      Rome taught me patience and assiduous application to detail. Virtues which temper the boldness of great, general views.
    2. Re:But..how do they finance? by kinnunen · · Score: 3
      http://www.google.com/corporate/index.html (under business mode).

      Also, do a search for "porn". Ads.

      --

  100. What do they use for system management? by mveloso · · Score: 1

    I don't think RedHat has any system management stuff built in, except for an SNMP agent. What does google use to make sure all these boxes are up, running, working, and healthy?

  101. BZZZZT.... Wrong by Waffle+Iron · · Score: 1
    There are countless millions of people using Google every day, each with their own energy sucking computer.

    Each one of these people saves lots of time using Google's lightning fast and accurate searches vs. other dog-slow search engines. They get their info fast then log off. Result: net energy savings.

  102. Re:Doesn't this seem wrong to anyone? by dhamsaic · · Score: 1

    touche :) however, it *is* a "slab" that's laid down... i'm not sure if "block" and "slab" can be used interchangeably... but yes, i hadn't thought of that before posting...

    --
    Every once in a while I like to masturbate a new word into my vocabulary, even if I don't know what it means.
  103. Re:Why not Windows 2000? by gupta · · Score: 1

    Is google nuts? right on my fellow MCSE !

  104. Re:Why not Windows 2000? by gupta · · Score: 1

    and, someone mentioned that google paid about 50 copies of RH linux which costs 50 x 50 = $2,500. but i need to give you credit because you raised a good point of not paying US tax anymore, 8-)

  105. Re:And this is good? by gupta · · Score: 1

    speaking of hotmail, a black hole. have not used it since 1997...

  106. Hmmmmmmm by glenebob · · Score: 1

    Ha, first an article about cloning, now one about server farms. Maybe they cloned their entire stock of farm servers? So now they have 4000 master servers, and 4000 slave servers. Trouble is, nobody can remember which are which.
    --
    Damn it Jim, that's my sphincter, not a jelly donut!!!

  107. Re:And this is good? by crealf · · Score: 1
    Why would 8,000 identical boxes be difficult to administer? The guys that develop the monitoring software and the install and upgrade processes are probably pretty smart cookies. But the actual maintence of the machines could probably be handled by monkeys.

    Well they don't even need 8000 identical boxes. All they need is hard-drive access (just IDE) and network card access.

    I think about it: the instructions for handling a hardware failure in one of these machines is probably: [...]

    Or: replace instantly the machine with another random one. Put the system recovery CD in the drive. Let it reboot and try all the possible kernels/configurations until it boots successfully with IDE/network.

    At the same time take the initial defective machine: split all the parts and insert them in place of corresponding components in others unused working machines. Throw away the part which doesn't work.

  108. Re:Crud.... by Tech187 · · Score: 1

    The 'heat' jokes used to all be about the Pentium (back in the Pentium 60 days those jokes were one of the ways for Apple customers to feel good in spite of the reaming they got every time they bought another pricey Apple box). It's nice to see AMD take the heat (pun intended) off Intel.

  109. Re:Ironic timing... by Tech187 · · Score: 1

    Thanks, VA Linux, for spreading the word on your sponsored/owned site about your competitors hardware.

    (umm, VA Stockholders- look over there! That dog looks funny! Funny dog! )

  110. awesome by univgeek · · Score: 1

    i always knew google was fast... but 8000 processors??? how many miles of cable do they use???

    --
    All bow to his Noodliness!! His Noodle Appendage has touched me!
  111. Re:im not really clear on.. by univgeek · · Score: 1

    Check out the fact that they need to see how each page links to each other. Their PageRank system is based on this. As the number of pages in their index grows the number of these cross-links grows > O(n)... I think it would be O(n^2) but Im just guessing....

    --
    All bow to his Noodliness!! His Noodle Appendage has touched me!
  112. Re:Where can I get the google source? by univgeek · · Score: 1

    I think their search methods are proprietory... check out google tech for more info....

    --
    All bow to his Noodliness!! His Noodle Appendage has touched me!
  113. Re:Loadbalancing large websites by almaw · · Score: 1

    Zeus Load Balancer is an excellent product. It scales hugely well, has excellent backup and works transparently across practically any TCP/IP protocol (be it www, smtp, nntp, or whatever). I've heard nothing but good things. It can also handle distributed SSL processing. Very cool and easy to admin product.

  114. Q: web servers or number crunchers? by pastryp · · Score: 1

    Are the all 8000 machines web servers or do some also do the number crunching for google's pagerank algo?

    If you want to know more about pagerank/web indexing goto Monika Henzinger's (research director for google) webpage. http://www.henzinger.com/~monika
    Interesting stuff.

    8000 webservers boggles my mind. Also, they can't store an entire index on "just" 160 gigs. So do they do some xtra work if a query spans multiple servers' subset of the index or if it gets sent to the wrong server? It'd also be interesting how much ram they have in these suckers since less ram = more cache misses = more expensive disk accesses.

    Google definitely needs tons of computing power to generate the numbers for pagerank (see volsung's post) though I don't know how much a distributed system helps. plus last time I heard, there weren't any algo's that can incrementally update the pagerank info, so it had to be totally recomputed every week or so.

    btw does anyone know what altavista uses anymore? It used to be something like 10 alphas with gobs and gobs of ram and disk space.

  115. Re:Ironic timing... by jman_rackable · · Score: 1

    If anyone would like some real testimony about Rackable Systems, our products, our technical knowhow or otherwise, please contact us. We'll make our reference list available so that you can hear from actual Rackable customers. Not trying to sell anything, but rather to keep the facts straight! Thanks!

  116. Audio Interview with Google Chief Ops Dude by Anonymous Coward · · Score: 2
  117. Re:ROI on Linux by Anonymous Coward · · Score: 2

    Go look at who has the SPECweb top slots on 1, 2, 4 and 8 CPU boxes.

    What the fuck is a "multithreaded TCP/IP stack"? The IP stack runs in both process context and interrupt context, there are no threads there, and it'd be stupid to use them. Perhaps you mean "fine grained locking," but just don't know what you're talking about.

  118. Re:Electric bill by Mike+Hicks · · Score: 2

    I'm not sure how processor-intensive the Google software is, though.. Certainly, an S/390 has a lot of internal bandwidth, but I don't think it has the processing power of many PCs. If the searches are mostly just disk-intensive, it could work.. Of course, note that Google is using mostly IDE drives. Getting the same amount of SCSI storage would be, what? 3x the cost? Yeesh.
    --

  119. Re:im not really clear on.. by volsung · · Score: 2
    Actually, the ranking system is equivalent to finding the principle eigenvector of a matrix with a billion rows and columns. Fortunately, there is a nice, iterative algorithm to do this. Each iteration performs a multiplication between a vector and a matrix, so it is at least n^2, and probably something like O(n^2 log n).

    For the curious: PageRank does not depend on your query; it is a global property of the link structure of the web. So Google does a normal keyword search and combines a keyword similarity value with the PageRank value, and sorts on this magic value.

  120. heck of a space heater by Wansu · · Score: 2

    I wonder how much it costs to get rid of all the heat billowing up from the farm. I imagine that place is popular in January.

    --
    Wansu, th' chinese sailor
  121. How many servers are indexing? by AxelBoldt · · Score: 2
    I'm unclear about how many of their 8000 boxes are indexing at any one time and how many are answering queries. Anyone knows?

    --

  122. Re:A Real Reason They Can Get Away With That by rho · · Score: 2
    This is why such sites are usually powered by a moderate (typical site) to huge (Amazon, eBay) database with an enormous redundancy built in.

    Well, I'd think that eBay would split things up, as should Amazon, if they don't already.

    Sure, if the Computer section of eBay goes south the computer bidders are pissed, but it doesn't affect the Beanie Baby contingent.

    I think that the real reason that eBay/Amazon/Things 'N' Stuff aren't doing massive clustering (if, indeed they aren't) is that it takes quite a bit of planning and design to get something like that set up, and Amazon and eBay couldn't take the time. You have to be fast if you want to "build a brand"! Plus, to a greater or lesser extent, Google runs a single algorithm. Amazon runs a thousand of 'em, sometimes 4 or 5 a page.
    "Beware by whom you are called sane."

    --
    Potato chips are a by-yourself food.
  123. Re:Seen it by Dasein · · Score: 2

    Exodus also generally has a limit on how much power they'll pull into you cage because of heat-density concerns.

    I think there's a great Ask Slashdot lurking in here about how they built and manage this stuff.

    --
    You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
  124. Re:Damn, thats a lot of space by Jeffrey+Baker · · Score: 2

    I'm sure the density would be a lot better with the DL360, a 2xCPU SCSI machine in the same box as the 320. The Rackable stuff is 33% higher density than standard 1U machines, and the cabling is easier to manage.

  125. good citizens? by will · · Score: 2

    just scanning these posts, i can see that:

    * google uses redhat

    * they customise it extensively

    * they have arrived at workable solutions to problems of massive parallelism in several fields, eg load-balancing, tcp/ip optimisation, efficient segmentation of a huge database and the associated routing of queries, and presumably heat dissipation too.

    * in short, they have rolled their own into a system that even the /. beowulf fan club must admire

    * they make enough money to run 8000 pizza boxes and buy state of the art furniture by selling this combination of technologies to corporations who want to improve the efficiency of their knowledge workers.

    * they have contributed a total of, say, $3000 to redhat over the counter at Fry's.

    Now I'm not sure that counts as good oss citizenship.

    Overall i'm inclined to think that they're in credit just because google is so fscking good that it has replaced my bookmark file. I'd say that their public service, esp given the /linux branch and their flagship role, is enough to outweigh the fact that rather than returning _any_ of their code to the community they sell it privately to the worst kind of suit. I haven't even seen an educational or non-profit version (but i'd love to be corrected).

    It's hard to call, especially as i am a user of rather than contributor to linux and therefore benefit without being made use of, so i'm surprised not to see it being debated here. Just _using_ linux really doesn't deserve accolades any more. As they say in the article, it's an economic and practical decision, not an ideological one.

  126. Damn, thats a lot of space by Drakino · · Score: 2

    Even if they used the DL320 from Compaq (A 1U, 1 proc IDE server) or similar, they would still fill just a bit over 190 racks.

    And I thought some of the SAN setups here looked impressive.

  127. Re:Seen it by binarybits · · Score: 2

    Um, 6*80=240, which is only 3% of 8000. That would seem inconsistent with the claim that Exodus was one of three coloc locations for 8000 servers.

  128. colocation ??!! by Brigadier · · Score: 2



    I've always understood that you place half yoru servers on the west coast and half on the east. should there be a net split i.e. contruction worker who didnt' call before he digged. you wont suffer he conciquecies. with all their servers in DC, how will they prepare for this

  129. Re: Multithreaded TCP/IP stack by kinkie · · Score: 2

    As I said, I had forgotten something (for the sake of semplicity mostly). My point was: TCP is not simple, and parallelizing it is not pointless, nor everybody does it. For instance AFAIK FreeBSD has one of the most efficient TCP/IP stacks around, but it is not completely deserialized, and thus doesn't scale as well as it could on MP systems.

    About serializing: sure. Bot you can also tell that to the Java guys (in Java-ese, "serializing" means "transforming an object's internal status into a bytestream that can be transferred over the network to some peer where, given the object's class code and the serialized data, an identical instance of the object can be created").

    --
    /kinkie
  130. Re:Electric bill by the+eric+conspiracy · · Score: 2

    New York had the highest energy costs in the nation.

    New York State is not at all homogenous. NY City and Long Island have horrendously high rates, while central and western NY are quite low.

    There is always a political tug of war regarding distribution of cheap hydro power from the St. Lawrence to the rest of the state, but you could always count on upstate being relatively cheap.

    When I moved from upstate NY to NJ my power rates tripled.

  131. Re:A Real Reason They Can Get Away With That by JohnZed · · Score: 2

    That's not necessarily the case. Even though they're using (resaonably cheap) IDE drives, they can still RAID 1-mirror them to prevent loss of data from hard drive failures. They would, however, have to suffer a half hour of downtime to replace the blown disk, but, despite what e-commerce consultants tell you ("if JimsGardenHoseEmporium doesn't get 5-nines availability, it'll lose all its customers!"), most applications could afford to have a 30-minute period of inaccessibility for 1% of their data at a time. The hard thing is desiging a resilient app that can operate well if a portion of its storage just disappears and then reappears sometime later.
    --JRZ

  132. Re:Electric bill by Brento · · Score: 2

    I wonder which gives them the highest electric bill, the servers themselves or the airconditioner required to do it?

    You know, you raise a funny point. When relocating our company, we looked at the cost of bandwidth and electricity, knowing that it was a cost of business. But when you've got 8,000 servers, you've got to think that electricity becomes a huge issue in picking your location. You almost want to move further up North, just to cut your air conditioning bills.

    --
    What's your damage, Heather?
  133. Newsflash: Full Usenet archive available by harmonica · · Score: 2

    As of today, Google makes its complete set of Usenet of messages available (since 1995, over a terabyte of data).

  134. Linux kernel patch by harmonica · · Score: 2

    In that MP3 stream linked somewhere in this thread, it is mentioned how Alan Cox helped solve a problem with the kernel. So Google gives feedback and they profit from the open development model used for Linux (in this case from a patch that Alan provided within 4 hrs ;-)).

  135. Re:A Real Reason They Can Get Away With That by Restil · · Score: 2

    no need to load anything from a "master DB", they stated in the article that there are several hundred copies of the index. That means, that if any one server goes out, there are still several hundred servers serving the same data. The point is, if an ecommerce site WAS set up like this, it would still be perfectly functional. However, that would be quite an impressive setup for an ecommerce site.

    -Restil

    --
    Play with my webcams and lights here
  136. Re:And this is good? by segmond · · Score: 2

    if you hadn't put your disclaimer, i do mark you as a troll. those 8000 boxes are automatically administered, via monitoring software. i don't know what they use, but there are programs to do that. Also, google doesn't go in and maintain those boxes every day, perhaps once a month or once in two months, they pull out all boxes that are down/giving trouble and replace it barely boxes, all they have to do is tell the box what index range to pulldown and store, i bet everything is very automated. Anyway, for what google is doing, you have to check where they are coming from, they need I/O! Are you not impressed when you search google and get a reply in 0.01 second? I am! Please don't compare with hotmail, google has never been down! hotmail on the other hand, ahem, ahem...

    --
    ------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
  137. Re:Kudos to Google by kettch · · Score: 2

    AND they managed the upgrade without interrupting services. That is one of the benefits of using many indivudual smallish servers instead of a few large ones that way you dont get stuff like this that was on yahoo today: Whoops! We cannot process that request. We are presently performing system upgrades. During this time, some areas of the site may be unreachable. Yeah i know that yahoo uses google for their searches, but they don't use it for other services on their site.
    ----------------------

    --
    Opportunities multiply as they are seized. --Sun-Tzu
  138. it's still not as 31337... by MustardMan · · Score: 2

    As my 3 Node IBM PS/2 Beowulf cluster

    1. Re:it's still not as 31337... by MustardMan · · Score: 2

      I actually didn't even design the shirts, it was a design from the "open source" tshirts site geekshirts.sourceforge.net. I created the cafepress site mainly because myself and my friends wanted one, and decided to leave it up afterwards. If you have a complaint about the origins of the quote, i'd suggest contacting the designer who submitted to geekshirts.sourceforge.net

    2. Re:it's still not as 31337... by MustardMan · · Score: 2

      its irony, i was playing off the long-time slashdot "i wish i had a beowulf cluster of these" tradition

      although I agree it's not terribly witty, but i found it slightly amusing.

    3. Re:it's still not as 31337... by DonkPunch · · Score: 3

      Interesting slogan on those shirts.

      http://www.elj.com/elj-quotes/elj-quotes-1999.html

      --

      Save the whales. Feed the hungry. Free the mallocs.
  139. Re:im not really clear on.. by MustardMan · · Score: 2

    Or it could be the fact that they are serving up bazillions of pages a day, each involving searches thru a petabyte database. Google's code is insanely good, they just happen to be one of the most heavily accessed sites on the internet, performing a very computationally intense operation (database searces).

  140. Re:Have you ever noticed.. by MustardMan · · Score: 2

    For the cost of re-outfitting those machines with SCSI, you could probably add another 8000 servers

  141. Re:Where can I get the google source? by mahmut_kursun · · Score: 2

    There is a project under GPL which is to be
    found under http://www.aspseek.org

    It is a deep crawler that works well, I did
    compile the actual stable Version under SUSE 7.0
    and get it running together with MySQL.

    ASPseek is not google but I would say that
    it imitates google a little bit. You can
    give it a try. I guess you do not need
    4 PCs. Crawling/searching on my Celeron333
    Server with 160 MB RAM and IDE HD did
    not stressed the machine. I dont know
    what happens if you got lots of pages.

    ASPseek people say that their baby got
    4 million pages indexed.

  142. Re:Crud.... by jtdubs · · Score: 2

    As it's theorized that there aren't even 10^100 atoms in the universe, or electrons for that matter (obviously), we're going to have to REALLY shrink our die sizes down to get there...

    The New Pentium XXI running on a -.0001mu core.

    Justin Dubs

  143. Re:Kudos to Google by dimator · · Score: 2

    All true, but are they really making money? I rarely see an ad there (not banner ad, mind you, but they're own form of search-related targetted ads). So are they still going off of vc, or do the few ads I see cover the bills?

    I really like to hear that companies that do so much for so little are doing well, such as google, or trolltech. I just worry for their actual business and the talented developers they employ...

    I guess they're doing OK if they added 4000 machines...


    --

    --
    python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
  144. Locking into a OS by duplicate-nickname · · Score: 2
    "Also, by choosing Linux, Google avoids locking itself into a single vendor for hardware or operating system, Quandt said. "

    Umm.....if they're running 8000 copies of RedHat that they've customized to boost performance and improve security, I would say they're locked into an operating system fairly well.

    ÕÕ

    --

    ÕÕ

    1. Re:Locking into a OS by ethereal · · Score: 5

      Totally not the case - they've made their OS what they want, and they can change it if they want to. Don't confuse the cost of rolling out changes to 8000 machines with the cost of forcing a proprietary OS vendor to make the changes you need - you can roll out 8000 machines on a rolling basis in a week, assuming a conservative 1 hour automatic install 80 at a time (1% unavailability). You may never be able to get Sun or Microsoft to make the changes you need in an OS, if it isn't in their best interest to do so. Google's only "locked in" to RH in the sense that they can only achieve sufficient flexibility with an open source OS, and it sounds like they just went with RH because it's easier to hire admins. I bet they could run on any other flavor of Linux pretty easily, and *BSD without too much pain if they had to.

      Moderators, the above was only insightful if you don't care to think very hard...

      Caution: contents may be quarrelsome and meticulous!

      --

      Your right to not believe: Americans United for Separation of Church and

  145. Re:Electric bill by donutello · · Score: 2

    California may do an excellent job of conservation of electricity, but where they are failing is in the production. California is a net importer of electricity.

    Anyhow, my rant was not about how much electricity is consumed but about who has to pay for it. The current crisis is because California doesn't produce as much electricity as it uses. It's ridiculous that consumers in the Pacific Northwest should pay much more for electricity than Californians do!

    The federal energy commission actually forced Northwest energy producers to ship electricity to California at prices much lower than current market value leaving the local utilities forced to buy power on the open market at much higher prices. And the whole time our rates sky-rocket while Californians pay the same amount they always used to.

    My electricity bill is, in effect, the cost of my electricity plus the cost of some Californians electricity. I at least hope they send me a picture of the Californian I'm supporting so I can feel proud to bring light into the life of some rich soul.

    PS: I used to live in California before I moved here - I don't hate Californians - I just resent having to pay for them screwing up while they don't.

    PPS: I posted the previous comment at 1. Three people moderated it up and one moderated it back down as offtopic but due to the wonders of the Slashdot karma system, my karma went down one point as a result.

    --
    Mmmm.. Donuts
  146. Re:Electric bill by donutello · · Score: 2

    No, Californians don't pay high energy costs. Their laws shield the utilities from passing on the costs to the consumers. So guess who gets to pay for California's dumb laws? We in the Pacific NorthWest do! We've had our power rates go up more than 100% (that's DOUBLE) in the past few months and just the other day they had an article about Californians whining that a new bill would allow their power rates to grow by up to (from memory) 45% over the next 5 years!

    Washington actually produces power. And this year we're being forced to pump more water under drought conditions (which means we'll hurt our Salmon runs and have water shortage in the summer) and being forced to send that power to California at a low price while we have to buy our power on the open market.
    </rant>

    --
    Mmmm.. Donuts
  147. 8000 search servers? Northern Light has 8! by eldurbarn · · Score: 2
    Dammitall, folks... Sure Linux is cool, but Northern Light (http://www.northernlight.com) does the meat of all its searches with just eight (8) alpha boxes running VMS!

    Its search technology is different (its classification system is more like that of libraries) and its target audience is mostly the high-end business folks, but the actual box-to-box comparison makes me oogle: one alpha = 1000 linux boxen.

    --
    -Eldurbarn
  148. Based on what? by Galvatron · · Score: 2
    Unless you've got some information you're not talking about, you just pulled that number our of your ass. Admit it, you have no idea how much work they did on Linux, how many additional positions were needed to do that work, or how much work they would have had to do on Win NT 4.0 (Google's been around longer than Win2k, and maybe we ought to add in the upgrade costs to go from NT to 2000? Another, say, half mil you think? And by "work" of course, I mean writing custom programs, because obviously they wouldn't have had the ability to directly hack the OS, so a simpler change probably would be more difficult to do).

    Anyway, my point is, there's an awful lot stacked against NT right from the get-go. Not to say it wouldn't have been cheaper, I have very little experience running 8000 box server farms. I imagine you have equally little. So, Windows may have been cheaper, but you have to assume that they spent more than $1-2 million customizing Linux to even bring it up to the level of NT. Possible, but I'm inclined to think not.

    The only "intuitive" interface is the nipple. After that, it's all learned.

    --
    "The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
  149. Does anyone know how much all that cost? by crashnbur · · Score: 2

    Just curious. I won't be able to afford anything like 8000 Linux boxes for another twenty-five years...

  150. Re:Doesn't this seem wrong to anyone? by tylerh · · Score: 2

    Not to me

    When you can get your basic component matched to your job, using a whole heck of a lot of identical peices oftens SAVES money. The trick is that those pieces must (1) be cheap to mass produce and (2) do their job without any "supervison," like the bricks in my house. Google appears to have done exaclty that with their automated/remote management cusotmizations.

    Just think, my house uses over 8,000 identical bricks. The phone system relies on hundreds of thousands of fiber optic strands. And don't even think about CPUs: all those identical transitors

    --
    "one treats others with courtesy not because they are gentlemen or gentlewomen, but because you are" --G. Henrichs
  151. Re:Where does Google get their money? by Yoje · · Score: 2
    The Google site features minimal advertising. So they are most likely funded with VC money. This means that they must have a plan for making money at some point. What is it and when will it kick in?

    No, they're making SOME money. Several companies use Google's massive web index for their own sites, paying Google a license fee. That alone gives them a good chunk of revenue. And of course, those "little ads" help out some, too. :-)

  152. Gotta love Google by woody_jay · · Score: 2

    Ever since about a week after Google first opened it's doors (so-to-speak) I have switched from Yahoo and never looked back. Then later down the road, Yahoo too switched. They have done an amazing thing for search engines. The web page is simple, and searches are fast. Now they take care of Deja's news groups too. End result, everything Google takes over does great. Maybe they should take over M$? Anyway, I am sure that with their consistant growth that they are going to eventually turn it up to 16,000. You have to love seeing stuff like this.

    --
    Of course, that's just my opinion, I could be wrong.
  153. Where can I get the google source? by green+pizza · · Score: 2

    I assume Google's search engine software is opensource? Where can I download a copy? I would like to use it on about 4 Linux PCs to index and search my company's web-based intranet.

  154. Re:Doesn't this seem wrong to anyone? by duffbeer703 · · Score: 2

    Does anyone mention the outrageous electric bill that they pay?

    Abandon your home and electrical stuff and go do the world a favor and go live in a clapboard shack in the woods somewhere. Take your green left-wing horseshit somewhere and enjoy living without the 'evils' of modern society.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  155. Re:Why not Windows 2000? by duffbeer703 · · Score: 2

    not really, licensing vast quantites of microsoft os's especially in high-profile enviroments leads to very signifigant discounts.

    the large government agency that I work for paid approx $3M for about 25000 windows 2k server and workstation licenses

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  156. Re:Electric bill by agentZ · · Score: 2

    But the idea is that you wouldn't need power cooling, just fans to blow outside air through the racks.

  157. I don't think we can slashdot Google by Water+Paradox · · Score: 2

    Not fair. Google can archive slashdot, but not even slashdot can slashdot Google. Google not slashdotted.

    --
    information is immaterial
  158. Re:The power drain is staggering! by samill · · Score: 2

    The company is moving out of datacenters in the San Francisco Bay and Washington D.C. areas, and consolidating in a new facility in the D.C. area. That means Google is moving from five to four datacenters--this, after adding three datacenters in the past year or so.
    Looks like they've moved to DC. I guess the "crunch" is coming from somewhere else.

  159. Re:Crud.... by V50 · · Score: 2

    If Google ever achives a Googol Servers though, we better hope they don't use Athlons or the world might become toasted...


    --Volrath50

  160. Answer: reliability. by TDScott · · Score: 2

    If one server out of 8000 goes down, no problem. If one server out of 20 goes down on a site that size...

  161. Doesn't this seem wrong to anyone? by repetty · · Score: 2

    Doesn't utilizing 8,000 computers to accomplish something suggest to anyone (besides me) that they're doing something wrong?

    Okay, I know indexing in a hardware intensive task, as well as responding to user requests, however...

    If I bragged that I had installed 8,000 D-cell flashlight batteries to meet the electrical needs of my house, wouldn't someone suggest that I need to reconsider using flashlight batteries?

    Has Google bragged about how much electricity they are consuming to run 8,000 electrical heaters? Have they boasted about how much pollution their power consumption generates?

    I think we should consider this a bit more critically.

    --Richard

    1. Re:Doesn't this seem wrong to anyone? by CKW · · Score: 2
      Doesn't utilizing 8,000 computers to accomplish something suggest to anyone (besides me) that they're doing something wrong?

      Either that or they're accomplishing something unbelievably useful with dazzling speed. There are a lot of useful things take a lot of effort, and Henry Ford's mass-production-lines rule for a reason. (BTW: Did he ever get a patent on that?)

      If I bragged that I had installed 8,000 D-cell flashlight batteries .. wouldn't someone suggest that I need to reconsider using flashlight batteries?

      But what if it was cheaper than all the alternatives. And I mean *way* cheaper? Don't doubt for a minute that everyone and his dog wouldn't do it.

      I'm not happy with how we as a society have managed our pollution and our environmental record. Environmental damage is relatively un-accounted for in how we do things. However for non-toxic semi-normal things, there must be some relationship between the cost of doing something and how much pollution is being created in the process. Therefore if you figure out how to do it cheaper, you are likely generating less polution.

    2. Re:Doesn't this seem wrong to anyone? by gupta · · Score: 2

      i think you drew wrong analogy. it is not about 8,000 D-cell flashlight batteries to meet the electrical needs of my house, rather, it is about 8,000 lamps lighting up part of downtown Austin. we are talking about load balancing here.

    3. Re:Doesn't this seem wrong to anyone? by dhamsaic · · Score: 5
      Has Google bragged about how much electricity they are consuming to run 8,000 electrical heaters? Have they boasted about how much pollution their power consumption generates? - they haven't bragged about *anything* - an article was simply written by an outside source which gave some details of their setup. They also note that they have hundreds of copies of the index, so that the redundancy is there - if one server goes down, another hops back up. Google *is* a business, and they need to be reliable. They're out to a) provide a useful service and b) make money. It's not useful if you can't get to it.

      They're using 8,000 computers to accomplish a pretty amazing feat, and they're doing this instead of buying a pretty huge farm of larger and faster computers anyway. Sometimes more smaller parts are better - you don't have one big machine that fails, separate parts are replaceable (say 10 or 20 machines instead of a few larger servers).

      You don't build a house starting with a large block of concrete - you use bricks. Google is doing the same thing. Cut them some slack.

      --
      Every once in a while I like to masturbate a new word into my vocabulary, even if I don't know what it means.
  162. Have you ever noticed.. by amirboy2 · · Score: 2

    Have you ever noticed that accessing google's cache is slower than accessing normal google servers?

    --

    I like meat helmets.
  163. Re:Kudos to Google by CKW · · Score: 2

    and on that note, let me say that if they ever do whore themselves out, I hope they make a SHITLOAD of money..

    ..because they DESERVE IT!!

    If they figure out how to do that without slowing their site down or filling our screen with slow big ads, cool. But if not, I won't hold it against them, simply because they've given me such a spectacular level of service up until now.

    Seriously, the best three things on the net, that made me agog and say to myself "this is the way things should be", are:

    1. Google

    2. Napster

    3. Counter Strike (substitute your favorite online multiplayer game here)

  164. Re:Wait, I have the Answer by dhamsaic · · Score: 2

    Although it's a cute idea, you can't get more power out than you put in. We'd need some sort of fusion or something to sort of amplify the power output, but even so, it's highly unlikely something like this could ever work. Oh well...

    --
    Every once in a while I like to masturbate a new word into my vocabulary, even if I don't know what it means.
  165. Re:"Google downloads Red Hat for free" by _cnn_ · · Score: 2

    Nope. They get more (reputation) that they would have gotten from 8000*$50.

  166. Re:Interesting points by math+nazi · · Score: 2

    - Number of websites are increasing exponentially. So your number of computers or required CPU cycles are increasing exponentially. On the other hand prices per CPU Mhz also decreases exponentially (Moore's law ???). That is the key solution for the scalabbility. At least the problem is not exponential.

    That depends on the actual size of the exponents:
    Say total website size is increasing with time t like exp(a*t) and CPU computing power like exp(b*t) then your CPU number has to develop like exp(a*t)/exp(b*t) = exp((a-b)*t) to keep up.
    So if a>b (i.e. websites grow faster than computer power) you still have an exponential problem, just with a smaller exponent (a-b). Only for a<=b, you win.

    --
    André Kostolany: 2+2 = 5 (minus 1) for t>>0
  167. Pictures! by Anonymous Coward · · Score: 3

    I want to see pictures.

  168. Re:Kudos to Google by Anonymous Coward · · Score: 3

    and without whoring themselves

    I have to say it's so nice not having a giant animated "Punch the monkey for $20" at the top of the screen. With Google, you actually have to look for the ads to see if there are any. It would be nice if a few other major sites learned something from this. What would that lesson be? Giant flashing ads only annoy people and do not bring in new customers.

  169. Re:Why? by Bill+Currie · · Score: 3
    IMO, it's not the CPU power they're after (though it doesn't hurt), it's the io bandwidth. Think of it as a giant RAID array. Assuming their systems can pull 20MB/s off the hdds, that's 160000MB/s (or 156.25GB/s) total bandwidth (ignoring overheads).

    Bill - aka taniwha
    --

    --

    Bill - aka taniwha
    --
    Leave others their otherness. -- Aratak

  170. google modifications available by Brigadier · · Score: 3



    I'm curious whether or not the optimizations made by google are readily available to the public. i.e GNU,

  171. "Google downloads Red Hat for free" by cpeterso · · Score: 3


    "Google downloads Red Hat for free, taking advantage of the company's open source distribution. And Linux's open source nature allowed Google to make extensive modifications to the OS to meet its own needs, for remote management, security and to boost performance."

    I'm sure Red Hat is upset that they are missing out on the sale of 8000+ Linux licenses!! :-) Maybe they should block downloads from the *.google.com domain.

    1. Re:"Google downloads Red Hat for free" by shyster · · Score: 5
      I'm sure Red Hat is upset that they are missing out on the sale of 8000+ Linux licenses!! :-) Maybe they should block downloads from the *.google.com domain

      I imagine they only download it once, then distribute via LAN. Besides, from last year's coverage, "Google actually paid for only about 50 copies of Red Hat, and those purchases were more of a goodwill gesture. "I feel like I should be nice, so when I go to Fry's I pick up a copy," Brin said."

  172. Re:Electric bill by the+eric+conspiracy · · Score: 3

    Buffalo NY would have to be the ideal location for this. Cold as hell, and right next to the Niagra Hydro plant for cheap power.

  173. Re:Kudos to Google by Chewie · · Score: 3

    Well, Google has recently added paid links near the top of searches (but, thankfully, they've taken pains to identify them as such). Also, they make a metric buttwad of money licensing out their search engine to other sites (Yahoo!(TM) anyone?).

    --
    49 20 68 61 76 65 20 74 6F 6F 20 6D 75 63 68 20 66 72 65 65 20 74 69 6D 65 2E
  174. Re:What about hardware maintenance by crimoid · · Score: 3

    With all those machines you could just pull the dead ones out of service and leave them there until you wanted to do periodic maintenance (at which time you simply yank out the dead ones, replace them, flip on the power switch and walk away). Assuming you've got some clever auto-assimilation software you may not even need to configure the box manually.

  175. Really doubled or part of a cost cutting move? by jonathanclark · · Score: 3

    As part of the infrastructure expansion, Google is consolidating. The company is moving out of datacenters in the San Francisco Bay and Washington D.C. areas, and consolidating in a new facility in the D.C. area. That means Google is moving from five to four datacenters--this, after adding three datacenters in the past year or so.

    I wonder if they really need that many servers or they doubled their size in order to have a seemless transistion during the move? I.e. Get the new site up and running and handling load and then take down the old site? Maybe they will sell off the old computers instead of move them. This could just be a PR spin to say "we doubled our size." Just devil's advocates conjecture, but they are probably moving to DC from SF to save money on space - so this is more of a cost cutting thing than anything else.

    Don't get me wrong, I love Google and use it everyday, but I don't see any reason they would suddenly double their capacity.

  176. MG (Managing Gigabytes) by harmonica · · Score: 3
  177. Re:im not really clear on.. by MustardMan · · Score: 3

    A bit of a correction to my own point, it's not a petabyte database, that petabyte of storage contains several hundred copies of the database. It's still a friggin LOT of data.

  178. Re:Why not Windows 2000? by eric17 · · Score: 3

    Well, $120 per license is a pretty good deal. Maybe the government should get the same deal for us citizens. For 150 million copies, the discount should be down to say, $100 a copy. That's only $15 billion, just a drop in the bucket for rich old uncle sam, and just a bit more than half of M$'s yearly revenues, so it won't hurt them either, but OMG--think of the savings!

  179. Compression by dopolon · · Score: 3

    They actually use some compression algorithm (gzip I think) to compress the pages of the cache, because it would be silly to keeep a complete uncompressed mirror of the cache, since it's a feature that's probably used by only 20% of users

    --
    "The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers." Bill Gates,
  180. Re:im not really clear on.. by turbodog42 · · Score: 3

    Well, when was the last time you searched on Google? It has a stunning amount of servers indexed. I can search for just about anything, and Google always finds more accurate hits, faster, than any other search engine. (Don't turn this into a search engine flame war, either.) They have to constantly refresh their indexes, and they have to turn around fast answers.

    Yeah 1.3 billion pages indexed is stunning. But even more stunning is the fact the total number of "pages" (an overly broad terms I concede) on the Internet is at least 100, if not 500 times that size. Basically Google is behind on indexing by 2 to 3 orders of magnitude.

    It's true that they constantly refresh their index. But it takes them about 2 months to do it. That ain't fast no matter how you look at it. As evidence, take a look at the date on the cached CNN.com home page

  181. Electric bill by HerrGlock · · Score: 3

    I wonder which gives them the highest electric bill, the servers themselves or the airconditioner required to do it?

    I'd just give up and get a handful of S/390s and do the same thing.

    DanH
    Cav Pilot's Reference Page

    --
    Cav Pilot's Reference Page
    UNIX - Not just for Vestal Virgins anymore
  182. And this is good? by update() · · Score: 3
    Disclaimer: I don't know anything about enterprise-scale IT. If I'm saying something ridiculous, let me know!

    That said, I'm surprised by the positive slant on this story. 8000 boxes that have to be separately administered? This is cost-effective (and environmentally sound) compared to a small number of heavy-hitter Solaris, AIX or Tru64 systems? I have to say I was a lot more impressed by hearing what cdrom.com does with a single FreeBSD system than by how many Linux boxes Google has had to cobble together.

    I've got to wonder - if this were a story about 8000 W2K servers powering Hotmail, would it get the same spin?

    Unsettling MOTD at my ISP.

    1. Re:And this is good? by bellings · · Score: 5
      8000 boxes that have to be separately administered?

      Why would 8,000 identical boxes be difficult to administer? The guys that develop the monitoring software and the install and upgrade processes are probably pretty smart cookies. But the actual maintence of the machines could probably be handled by monkeys.

      Think about it: the instructions for handling a hardware failure in one of these machines is probably:
      1. Identify bad part
      2. Replace bad part with any of the two dozen exactly identical parts we keep in the spare parts closet.
      3. Put system recovery CD in drive.
      4. reboot.
      5. remove system recovery CD when it automatically ejects and the end of the recovery process.
      6. If this doesn't work, call our system engineer, at 555-1212
      The spare parts closet probably just has boxes with labels like: "This box contains 80GB Maxtor hard drives -- exact match for every hard drive in rack 5, 7, and 8." Another box might be labeled: "AMI A571 motherboards -- exact match for all motherboards in rack 1, 2, 3, 4, and 7."

      Another box in the closet is probably labeled "Empty, pre-labeled Fed-Ex shipping boxes that are exactly the right size for our rack mounted hardware. Use to ship any badly broken machines back to our system engineer. Call first!"
      --
      Slashdot is jumping the shark. I'm just driving the boat.
  183. Why? by rabtech · · Score: 3

    Why bother to put together 8,000 Linux boxes, when one could obtain high-powered 64-bit computers to accomplish the same task?

    You can always go with Tru64, W2K Datacenter, AIX, et al.

    It would be interesting to figure out how much high-powered hardware would be required to replace those 8,000 boxen and the software to run it, and see if it comes out less or more than running the 8k separate Linux boxes.
    -------
    -- russ

    "You want people to think logically? ACK! Turn in your UID, you traitor!"

    --
    Natural != (nontoxic || beneficial)
    1. Re:Why? by Chewie · · Score: 5

      Several points here: W2K DC doesn't run 64-bit, at least not until Itanium is released. Second, for something like this, there are two reasons to do a large server farm: scalability and throughput. They said that they do not have one monolithic storage system, but instead partition the database up into small segments in the servers themselves. This means that they can handle many more I/Os per second than one (or several) big iron boxes could do. Also, those big 64-bit boxes are damn expensive (both hardware and software). For the price of one of those, you can get cheap servers and cluster them together. The big iron boxes are great for large databases that can't be split up among several servers/storage systems, but if you can split the database up (as they have done), a farm of small servers will always provide better scalability and throughput than one big box. And aren't those two things the secret behind the web game?

      --
      49 20 68 61 76 65 20 74 6F 6F 20 6D 75 63 68 20 66 72 65 65 20 74 69 6D 65 2E
  184. Google architecture by SpaceLifeForm · · Score: 3

    If you want to really know how it works.

    http://www-db.stanford.edu/~backrub/google.html
    Note: the document was written in 1998.
    two snipets:
    6.3 Scalable Architecture

    Aside from the quality of search, Google is designed to scale. It must be efficient in both space and time, and constant factors are very important when dealing with the entire Web. In implementing Google, we have seen bottlenecks in CPU, memory access, memory capacity, disk seeks, disk throughput, disk capacity, and network IO. Google has evolved to overcome a number of these bottlenecks during various operations. Google's major data structures make efficient use of available storage space. Furthermore, the crawling, indexing, and sorting operations are efficient enough to be able to build an index of a substantial portion of the web -- 24 million pages, in less than one week. We expect to be able to build an index of 100 million pages in less than a month.

    9.1 Scalability of Google

    We have designed Google to be scalable in the near term to a goal of 100 million web pages. We have just received disk and machines to handle roughly that amount. All of the time consuming parts of the system are parallelize and roughly linear time. These include things like the crawlers, indexers, and sorters. We also think that most of the data structures will deal gracefully with the expansion. However, at 100 million web pages we will be very close up against all sorts of operating system limits in the common operating systems (currently we run on both Solaris and Linux). These include things like addressable memory, number of open file descriptors, network sockets and bandwidth, and many others. We believe expanding to a lot more than 100 million pages would greatly increase the complexity of our system.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
  185. Re:Further info on box specs? by shyster · · Score: 3
    Anybody out there have more nitty gritty details on the specs of the latest boxes added? I am interested in CPU speeds, gigabit ethernet, RAM. 8000 of these things! The mind boggles...

    Evidently, they shun multiprocessor boxes, use big & fast IDE drives (2 per PC, one on each IDE channel), and from last year's article, use 100 Mbps links on the racks, with gigabit links between the racks. Last year's articles also quotes "256 megabytes of memory and 80 gigabytes of storage", though I imagine it's closer to 512MB (at least) and 180 GB per server now. Also says that they pack them in 1U on each side of a rack.

    But, here's the kicker, "Many of the systems are based on Intel Celeron processors, the same chips in cheap consumer PCs."!

  186. Why not Windows 2000? by dougel · · Score: 3

    I mean why not... Really: Windows 2000 Server OEM 642.60 Times 8000 PC's Is only $5,140,800 Now for the peace of mind that comes with a crash proof windows box, why would linx even be an alternative. The worst part about this post is there are MCSE's who are reading and saying "right on my brainwashed friend!" =-=-=- Doug

  187. Re:Seen it by Anonymous Coward · · Score: 4

    Funny story. Google got into the Virginia facility when Globalcenter owned the datacenter. Before google, the sales people would only sell "floor space". Google's one and half cage, jammed full of 1U linux boxes pulled so much power that it rendered 6 surrounding cages unsellable. After that, sales people began selling "Amp capped floor space" rather than just square ft.

  188. I'd bet they've already done the math by pivo · · Score: 4
    Considering that they're not necessarily Linux advocates, I'd imagine the did that calculation *before* buying all those machines.

    In any case, they'd have done it at some point along the line before the 8000th server arrived, and if they found they were making a mistake I can't see why they wouldn't have switched by now. Especially since if they thought NT would somehow be so much better they could have just removed Linux and installed NT and not have had to buy more hardware.

    Sounds like Linux is working out pretty well for them.

  189. Re: Multithreaded TCP/IP stack by kinkie · · Score: 4


    Let's recap how a single packet is to be handled (and probably I forgot something):
    you get the ethernet interrupt, you have to DMA the frame off the board, check to what protocols it belongs (if it's not IP, drop), checksum, check if you have to do any reassembly, check what protocol it is (it might not be TCP after all), check that the packet makes sense given the connection's history (i.e. sequence numbers and various other bits here and there), identify the process waiting for the packet, copy to userspace, signal process.
    A multithreaded TCP/IP stack means that more than one packet can be in the pipeline at the same time. It makes no difference on an UP system really, but on Nproc it can multiply your throughput by N (at least theoretically), just as a multithreaded app could increase throughput on a multiproc system.
    Of course, to be feasible, as many parts of the stack as possible must be reentrant, or you'll have to do locking and thus (in MS-ese) "serialize".

    --
    /kinkie
  190. Seen it by travisd · · Score: 4

    I've seen their cage out at Exodus in Virginia. Pretty cool.. They have like 6 racks of servers there - each rack is 80 servers I believe. They use systems from Rackable. Generally in a hosting facility you pay per rackspace and bandwidth -- more servers/rack means less cost/month in space.

  191. can you say pr0n? by Ender+Ryan · · Score: 4

    I thought I was really cool with my 100 gigs of storage at home filled with DivX ; ) movies and MP3s. 1 million gigabytes, that's insane.

    Ok, new poll

    What do you think is stored at Google?
    1. Huge search engine index
    2. Pr0n
    3. MP3s
    4. DivX ; ) Movies
    5. DivX ; ) Pr0n
    6. Marketing data collected with satellites and video cameras attached to flies... just like MLB
    7. Cowboyneal's transporter pattern buffer

    note: I own _MOST_ of the mp3's and divx movies I have...

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  192. ROI on Linux by GreyyGuy · · Score: 4

    Just think how much it would cost to license 8000 servers with win2k and whatever database they would use. Would Google even be able to do this on M$?

  193. The power drain is staggering! by clink · · Score: 4

    I hope these people aren't located in California. Otherwise I think we've located the source of the electricity crunch.

  194. What about hardware maintenance by Once&FutureRocketman · · Score: 4
    The scalability of many small servers is great, but I would think they would run into a wall eventually due to the effort required to maintain all those machines. I mean, even if the failure rate is very low on a per machine-per time basis, if you have enough machines, you're going to wind up replacing multiple hard drive, cards, mobos etc every day. Their system is redundant enough that this doesn't affect performance, but there is a cost associated with the manpower required to do all that maintenance.

    I just gotta wonder at what point they would get better overall efficiency by replacing all those little boxes with a couple of big iron mainframes.

    --

    "Research is what I am doing when I don't know what I am doing." -- Wernher von Braun

  195. Interesting points by sumengen · · Score: 4

    I have listened to a Google senior engineer for about 10 months ago. They are really good at load balancing and should become a good example for other companies. Interesting points I remember:

    - Number of websites are increasing exponentially. So your number of computers or required CPU cycles are increasing exponentially. On the other hand prices per CPU Mhz also decreases exponentially (Moore's law ???). That is the key solution for the scalabbility. At least the problem is not exponential.
    - As mentioned in this article, they have been running Celeron 500+256MB RAM+ 2x 40GB harddisks back then. When a computer fails it is easier to replace them because of the cheap hardware.
    - Buy systems as much parts integrated to the main board as possible (NIC card, etc.) It is supposedly more reliable.
    - They are not running linux because it is cheaper. I have seen headlines about this including Slashdot, but it is not true. They are not denying that they saved a lot of money because of that, but hen they started Google that wasn't the issue. He mentioned that they could have had got a good deal from Sun for Solaris. The reason was that the openness of the source code and other reasons mentioned in the article. By the way he mentioned that TCP stack issues were also considered when the decision have been made. it looks like they are confident that they can fix problems at home if any exist.
    Google wants to design all software they run at Google. They don't want to use third party software because it introduces instability and it is difficult to fix bugs in that case.
    - They are not running Apache. using linux doesn't mean running apache. They designed their web server, which is simplest possible and therefore fastest. They don't need a complicated web server. All the computation is done in the background on 8000 linux servers. Web server needs only to send the query to the query server and display the results.
    - Googles job was easier than people might think. Their database is not dynamic. It only gets updated once a month. Updating means replacing the old files with the new ones, which is an offline process. Comparing this with an ecommerce site displaying real time statistics, you can see that google has an advantage and makes things easier for them.
    - Lets say Spidering and crawling is done on one datacenter. You need to copy these terabytes of data over to other datacenters and then replicate it to multiple server farms in each datacenter. You have to do this fast and without any errors. You don't want to use OS file system functions.
    - They rent bandwidth of multi gigabits for offline hours when there is not much traffic. of course for a very very cheap price. They use this bandwidth to copy data files from west coast to east coast. We are talking about many terabytes.

  196. Crud.... by V50 · · Score: 4

    They are still NOWHERE near a Googol Servers like their name suggests... Humph...


    --Volrath50

  197. Re:Loadbalancing large websites by baptiste · · Score: 4
    however I havn't seen that many testimonies/reviews from sites that use it.

    http://slashdot.org/article.pl?sid=01/04/26/033921 9

    Anandtech.com is using it.

    --

  198. Amazing by Anonymous Coward · · Score: 5

    This is what you can tell people when they tell you that linux is a toy. The best search engine in the world is *not* a toy.

  199. Re:Loadbalancing large websites by Precision · · Score: 5

    We have been using LVS on SourceForge, Linux.com and Themes.org and I nothing but good things to say about it. I have yet to have any real problems. We have 2 firewalls with automagic failover using heartbeat. We also use keepalived to automagically remove webservers from the queue if they go down.. all in all it's been a great piece of software.

    --
    - U
  200. Loadbalancing large websites by blinx_ · · Score: 5

    In the recent months I've been trying to read everything I can find about loadbalancing large web sites, and google sure does make an interresting example.
    My company is in the progress of moving from one big server to several smaller onces, to allow for greater scalability, there is just a limit to how much cpu + memory you can put in a single box. Our future site will proberly use linux virtual server, which seems quite nice, however I havn't seen that many testimonies/reviews from sites that use it. The company I work for creates online image manipulating services, and part of the process is rendering large high quality images - and the hard part seems to be shared storage of these images (scsi over tcp/ip seems very interresting), load balancing with static pages seems easy enough. Anyway google's way of using many small machines is an inspiration.

    --
    Resistance is not futile - www.gnu.org
  201. Do they give back? by leperjuice · · Score: 5
    Google's applications are unique, requiring far more extensive load-balancing, computing, and input-output bandwidth than other enterprise applications.

    The question that should be asked here is if they are sharing the results of their word. I bet that they're probably lifting some of their techniques hot and fresh off of research papers and they may be the first to actually use them in a enterprise environment.

    Note that I personally believe that closed source is not necessarily a bad thing. But if Google has made radical changes to these enterprise-grade tools, it would be nice to see them trickle down into the mainstream distros. While we as home users would probably never need them, it would certainly put to rest some of the pro-Microsoft arguments against Linux as a server-grade OS.

    Of course, for all I know, they could be actively working with Cox et al to incorporate their findings into the kernel and related tools.

    Either way, a very impressive job done with a operating system that "is simply a fad that has been generated by the media and is destined to fall by the wayside in time."

    Note that I use Windows and Linux so I'm no bigot... (some of my best friends as Microsoft Programmers!)

    --

    -- "I am disrespectful to dirt. Can you not see that I am serious!"

  202. Re:A Real Reason They Can Get Away With That by ottffssent · · Score: 5

    "And no, Linux on IBM/390 WILL NOT help them because it is just an emulation, and disk arrays of this one huge computer will get swamped by the billions of read requests (the same way they will get swamped on Starfire or the same S390 under OS390)"

    Exactly. Even at ~1M/s per IDE drive (lots of random reads), that's 1M/s * 8000 machines * 2 drives/machine (yeah, some have 4, but the article doesn't say how many) = 16GB/sec. It would take a hell of a SCSI setup to equal that bandwidth, let alone the massive numbers of IOs.

    Further, even if the boxen only have 2G memory each, that's 16TB of memory, which you could put in one big server, but no single memory system is going to provide the throughput that 8000 SDRAM channels will.

  203. Re:im not really clear on.. by Brento · · Score: 5

    what in gods name do you need 8000 linux servers for? quake? I cant figure out what google could possibly use all that power for... if they really *need* all that power, they're obviously doing something wrong with their code.

    Well, when was the last time you searched on Google? It has a stunning amount of servers indexed. I can search for just about anything, and Google always finds more accurate hits, faster, than any other search engine. (Don't turn this into a search engine flame war, either.) They have to constantly refresh their indexes, and they have to turn around fast answers.

    Yahoo even uses them for their search engine. I can't imagine being able to service Yahoo's search needs with anything less than a full-fledged data center split across two cities.

    --
    What's your damage, Heather?
  204. Kudos to Google by revscat · · Score: 5

    This is only tangentially related to the story at hand, but I would just like to compliment Google on a job done extremely well. They have successfully built the fastest search engine out there, using open methodologies and without whoring themselves out like any number of other search engines. They continue to add interesting (and [gasp!] useful) features such searching PDF documents and their translation engine. They have really helped the Open Directory Project along, as well.

    There are successful .coms out there, but I think their business practices are so foreign to the "regular" business community that they aren't quite sure how to handle it.

    BTW: Anyone else see a philosophical relationship between Google and ArsDigita?

    1. Re:Kudos to Google by Tackhead · · Score: 5
      > All true, but are they really making money? I rarely see an ad there (not banner ad, mind you, but they're own form of search-related targetted ads). So are they still going off of vc, or do the few ads I see cover the bills?

      Actually, I think they're being smart about it.

      If the typical query returns one USENET post - maybe 2-3 kilobytes of text - why would you want to (as Deja did) spend money sending 20-30 kilobytes of HTML for the associated frames and banners and other ad support?

      The user's gonna see one ad. Google's bandwidth and I/O costs are gonna explode if the HTML wrapped around each ad takes up 10 times as much space as each query's results.

      By going with text-based ads and a non-frames approach, they not only make the site more user-friendly (thereby adding value), they cut their own costs by a sizable fraction.

      With lower bandwidth costs and I/O requirements, Google can make money with less ads, not more. That's where (IMHO) Deja went wrong - the more they needed the ad-revenue, the more they escalated the cost of serving the ads, in a vicious circle that consumed them.

      It's also where (IMHO) Google is doing it right.

  205. Petabyte? Try pedobyte! :) by Phrogz · · Score: 5
    Google indexes 1.3 Web billion pages on over a petabyte of storage--that's more than a million gigabytes. "That's not to say that the index takes up a petabyte..."

    And what takes up all that size? You know it--pr0n. The storage size says it all...it's not a petabyte they've got there, but a pedobyte. Sick google bastards. :)

  206. Seen it firsthand... by supabeast! · · Score: 5

    I have seen some of Google's stuff in the Northen Virginia. Those guys really know how to do high density racks. They have double-sided racks of 1U servers, with what I believe is 47 servers per side. The cabling alone is gorgeous. The bright red and shiny steel racks full of hundreds of flashing LEDS looks like something out of a rave.

  207. Wait, I have the Answer by StoryMan · · Score: 5

    What they should do is utilize the heat escaping from that chimney of theirs to power steam turbines.

    Then use the turbines to drive generators.

    Then send the power from those generators to the western united states.

    Now -- follow me here -- this would be a self-sustaining system, no?

    Users use google to search the web and read their embarrassing usenet posts from 1995. Power is generated. That power is funneled back to the user so that his or her computer stays on, the lights stay on, and they don't have to worry about getting stuck in an elevator during a rolling blackout.

    Users are happy, nuclear opponents don't have to worry about radioactive leaks into the environment from improperly sealed cooling tanks and leaking water, and google remains up and active, chugging away ad infinitum.

    Simple.

    Tomorrow, I'll work on my plan for cold fusion. Maybe a couple of Guiness glasses filled with tapwater, a couple of batteries, and a beowulf cluster ...

    1. Re:Wait, I have the Answer by BMazurek · · Score: 5
      Now -- follow me here -- this would be a self-sustaining system, no?

      "Lisa! In this house we obey the laws of thermodynamics" -- Homer Simpson

  208. A Real Reason They Can Get Away With That by Poligraf · · Score: 5

    It is that their information and the cost of failure are not critical. If one of the Google's servers (or hard drives) dies they can just find out what pages were stored there (from the master DB) and reload them into the storage on a new PC (and I'm sure they have some PCs with identical data).

    Now imagine an e-commerce site built like that. Loss of any part of user list or merchandise catalog is a major failure. This is why such sites are usually powered by a moderate (typical site) to huge (Amazon, eBay) database with an enormous redundancy built in.

    And no, Linux on IBM/390 WILL NOT help them because it is just an emulation, and disk arrays of this one huge computer will get swamped by the billions of read requests (the same way they will get swamped on Starfire or the same S390 under OS390). The entire idea of the setup is that you have a lot of independent disk channels.

    Another interesting insight is that they have done some improvement to administering all of these machines remotely. Otherwise they will blow all their money on paying sysadmins ;-)

    --
    Tigers respect lions, elephants and hippos. Maggots respect no one. (C) S. Dovlatov
  209. missing email by Matthew+Luckie · · Score: 5
    "It doesn't look like Google got the e-mail that the dotcom boom is over"

    three possible explanations:

    1. they have a spam filter in place
    2. they have a microsoft exchange server somewhere
    3. they were too busy going through everyone else's embarassing usenet postings than to read their own email
    my guess is the third one

  210. Multi-Threading Madness by Sinjun · · Score: 5

    I wonder what kind of information Google has about the deficiencies of the Linux TCP/IP stack? Certainly with 8,000 servers they could have some input as to how the lack of mult-threading has affects performance on a major site. I know that the most recent kernels and Apache versions were suposed to have dealt with this issue, but has anyone seen such a large scale experiment?

    1. Re:Multi-Threading Madness by epiphani · · Score: 5

      not nessecarily commenting on the multi-threading issue, kernels 2.4.x have substantially better socket handling... there were articles floating around on slashdot and linux.com a while back about a DALnet server breaking 38,000 simulatious active open sockets at one time. Linux has done wonders with their 2.4.x tcp/ip stack.. until recently, nobody even considered linux's stack worthy of an attempt at an IRC server of any reasonable size.

      --
      .
  211. google's new language features by stype · · Score: 5

    Go to google, click on preferences and change your language to "bork, bork, bork." From now on the site is completely in Swedish Chef (no joke).
    -Stype

    --
    -Stype
    Bus error -- driver executed.
  212. Interesting detail the article didn't go into: by vslashg · · Score: 5
    "That's not to say that the index takes up a petabyte. We have several hundred copies of the index," Felton said. "Most of the servers are serving up some fraction of the index." The index is partitioned into individual segments, and queries are routed to the appropriate server based on which segment is likely to hold the answer.
    An interesting metric that they don't go into in this article:
    • 4,718 of the servers index pr0n
    • 2,148 of the servers index warez
    • 1,634 of the servers index MP3 sites
    • 1,139 of the servers index various "ate my balls", "all your base", and other joke-of-the-month sites
    • 278 of the servers index content
  213. Where does Google get their money? by SirChive · · Score: 5

    Google is wonderful. But I'm left wondering where they get their financing and what their long term goals are.

    The Google site features minimal advertising. So they are most likely funded with VC money. This means that they must have a plan for making money at some point. What is it and when will it kick in?

  214. Ads are secondary... by daveym · · Score: 5

    "The Google site features minimal advertising. So they are most likely funded with VC money. This means that they must have a plan for making money at some point. What is it and when will it kick in?" Ummm...If you go to google and read about their company, you will learn that most of their income comes from licensing their awesome search engine for internal use by other companies. NOT from advertising. With everyone just now learning that advertising on the web sucks balls, this looks like a pretty shrewd move on the part of Google....

    --
    "Chill, Orrin!"---Trent Lott
  215. Ironic timing... by nrozema · · Score: 5

    I just spent all of yesterday afternoon installing a 63-node rack from Rackable. The build quality of these units is excellent... amazingly dense and efficient. According to the installers, in addition to google, their systems are also used extensively by yahoo and hotmail.

  216. They're efficient too by Magumbo · · Score: 5
    "they direct heat to a central chimney which is blown up to a high-powered fan"

    And these high powered fans then blow the blisteringly hot air along a complex series of ducts which lead to facilities which:

    a) generate electricity for the wall-o-lava-lamps
    b) are used to fill state-of-the-art, floating, hot-air furniture
    c) keep folks warm-n-toasty in the sauna
    d) make you hot and thirsty

    --