Slashdot Mirror


How does Google do it?

Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."

26 of 261 comments (clear)

  1. As a consultant by elinenbe · · Score: 5, Informative

    having been a consultant at their data center a year or so back I can attest that they had well over 50,000 machines. I am not sure about the 80GB drive per machine because from what I understood was they bought whatever drive at the time was the cheapest MB/$ and would replace any dead ones with the larger ones. Also, at any given time machines just die and many of them are not replaced or repaird for months. Their cluster accounts for all this...

    --
    -eric
    1. Re:As a consultant by _Sharp'r_ · · Score: 5, Informative

      But also realize that the data center you were at isn't their only one. I know of at least 7 physical locations and there are probably more out there.

      But yeah, their racks of 4 servers/1U is pretty impressive when you see them lined up in row after row of racks. Their data centers have to bring in extra cooling because they are so densely packed.

      --
      The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
  2. Huh? by lawrencekhoo · · Score: 1, Informative

    There are no answers in the article at all. Just the usual questions about how Google's publicized statistics don't add up.

  3. Re:Soon to be everything by richard_za · · Score: 5, Informative

    Google already has spell check, and so does Gmail have a look at the screenshots on my blog. I believe they're looking at releasing it to the public in six months time, have a look at this article.

  4. Re:Soon to be everything by Anonymous Coward · · Score: 3, Informative

    The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?

    Google has had a builtin spellchecker forever and their translate tool is right here http://www.google.com/language_tools
  5. Re:Interesting by ShaunC · · Score: 4, Informative

    Google is definitely cracking down on duplicate content. In fact, they've recently patented the concept.

    Insert software patent debate (where Google is the default hero due to its geek factor) here...

    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
  6. How Google do that? by elpecek · · Score: 4, Informative

    For those who haven't read - there is an article written by Brin and Page - maybe a little outdated, but still interesting: The Anatomy of a Large-Scale Hypertextual Web Search Engine

    1. Re:How Google do that? by jvsanford · · Score: 3, Informative

      There is also a paper that describes their storage infrastructure (Google File System) here

  7. Re:Soon to be everything by evilmonkey_666 · · Score: 2, Informative

    Umm is this a joke, they do have a spellchecker built into the search engine. I use it on a daily basis.

    And their online translator is here.

    --


    - PS. This is what part of the alphabet would look like if Q and R where eliminated.
  8. first casualty ?? by Sad+Loser · · Score: 4, Informative


    Recycling without attribution is the first casualty of bad journalism.

    I thought I had read this article before, and then I realised, I had read it before...
    (although I now realise that you are not supposed to read the linked articles before posting comments - sorry)

    --
    Humorous signatures are over-rated.
    1. Re:first casualty ?? by platypussrex · · Score: 4, Informative

      Not sure why you say that. If you read all the way through Naughton's article, he says that the calculations come from Garfinkel, he mentions Technology Review, and then later directly quotes Garfinkel. Sounds like attribution to me.

  9. Re:Google is faltering by Waffle+Iron · · Score: 3, Informative
    Yeah, those hundreds of PhDs they have working there will *never* figure that out. I hear they started with a 16 bit signed integer for their primary key and only after months of hard work upgraded it to 32 bit. Time to close down shop, it's impossible to fix.

    Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

  10. Re:Openness is the first casualty of going public? by nacturation · · Score: 4, Informative

    With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).

    I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:

    "Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...

    That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.

    Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.

    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  11. Tinfoil Hats by mfh · · Score: 4, Informative

    > 1) Why are their terms of service / Pirvacy Policy so vague?

    This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.

    > 2) Why does their cookie stay until the year 2038?

    Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.

    > 3) Why does their Google search bar report information and auto-update without permission?

    I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.

    --
    The dangers of knowledge trigger emotional distress in human beings.
  12. Public paper on Google File System by MarkWatson · · Score: 4, Informative
    Here is a PDF file of the paper.


    If that link gets slashdotted, here is another link of a PDF PowerPoint presenation.


    Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence - so much so that I am using it for my new project.

    -Mark

    1. Re:Public paper on Google File System by svr0002 · · Score: 4, Informative
      and another good one - http://www.computer.org/micro/mi2003/m2022.pdf

      Interesting that a major problem for Google is managing power and cooling !

  13. Re:Why Verbatim Clones??WAS:Interesting by reanjr · · Score: 2, Informative

    I don't know why he has numerous identical sites, but one reason is when a small company purchases several other companies that are in the exact same market. Since the companies are compatible, you merge all their operations into one. But you still want to keep brand identification with your customers so you keep two copies of the site, each branded differently.

  14. You may also find this interesting... by lunar_legacy · · Score: 5, Informative

    Another wonderful speculation about Google infrastructure which You can find it here.

  15. Re:Google is faltering by orthogonal · · Score: 5, Informative
    Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

    Ah, youthful mod!

    You've been (humorously) trolled. I suggest posting in this thread to remove your "+1 Informative", or getting a friend to mod it "Funny".

    What the parent is describing is not what Google will do, but what DOS did: the above scheme is how MS-DOS managed memory, except that the "selector" and "offset" were both 16-bit numbers under DOS. (Although "segment" was the more usual term for "selector".) The segment number was shifted left four places -- or put more simply but less graphically, multiplied by 16 -- and then added to the offset number, to give the whole or "flat" address:
    segment (in hex): 0001
    offset ( in hex): 0002
    segment is multipled by 16 (shifted left 4 bits or one hex digit of multipled by 16)
    segment: 0001x
    offset: 0002
    ---------------
    total: 00012
    This allowed DOS to use 16-bit numbers to address 2^20 = 1 MB of memory, but since DOS reserved the upper 384 KB for the (remapped) BIOS and peripheral cards, programs were able to address at most 640 KB of memory; the parent's mention of "64 billion pages" is probably an allusion (increased several orders of magnitude) to this DOS limit.

    Of course, this was a kludge, pure and simple, required because DOS machines were 16-bit. Among other things, it allowed the same memory locations (all but the very top and bottom memory addresses) to be addressable by several different addresses, and discovering pointer aliasing it required calculations that, by their very nature couldn't be done wholly in the machines (16-bit) registers.

    Consider: segment 4, offset 0 is 4 * 16 + 0 = 64,
    and segment 3, offset 16 is 3 * 16 + 16 = 64,
    and segment 2, offset 32 is 2 * 16 + 32 = 64
    and segment 1, offset 48 is 1 * 16 + 48 = 64
    and segment 0, offset 64 is 0 * 16 + 64 = 64:

    so all five segment:offset pairs are apparently different but actually point to the same memory location.
  16. One word. by Viceice · · Score: 3, Informative

    Robot.txt

    The Google bot respects it, so if you're up to no good, it's easy to get Google to not index your page.

    Anyway, I'd like to see a version of google that didn't respect robot.txt. You'd used to be able to dig up alot of infermation on peopel on google before they started to use robot.txt on alot of sites.

    --
    Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
  17. Yes. by Ayanami+Rei · · Score: 2, Informative

    very simple example of 15 servers in 3U. Many vendors are also offering a "dual dual" system in 1U... that is a two dual CPU motherboards that fit in one case.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  18. Re:Google is faltering by imroy · · Score: 2, Informative
    ...the above scheme is how MS-DOS managed memory.

    <sarcasm>Wow, I didn't know DOS managed memory at such a low level!</sarcasm>

    s/DOS/the 8086/g;

    You're really referring to the horrible segmented memory layout used by the Intel 8086 processor and its later derivitives. I did all this shit years ago in university. Almost every lesson my fellow students and I (and the lecturer as well) would end up cursing Intel for their whacky processor design. Interestingly Intel introduced a similar scheme in (IIRC) its Xeon processors to produce (IIRC) 36-bit addresses and access more than 4 gigabytes of physical memory on a 32-bit processor.

  19. Re:Google is faltering by NonSequor · · Score: 2, Informative

    The 36-bit addressing extension began with the Pentium Pro.

    --
    My only political goal is to see to it that no political party achieves its goals.
  20. Re:Why Verbatim Clones??WAS:Interesting by Anonymous Coward · · Score: 1, Informative

    or corporation #128264 has a complete web-viewable copy of the javadocs for version 1.2. lots of times i've done google searches for something code-related looking for examples/bugs/whatever and come up with a ton of hits on the same API documentation on different websites.

  21. Re:Google started to make me mad by XO · · Score: 2, Informative

    Chill out, brother.

    Try clicking in the address entry bar on Safari, and typing in "www.lycos.com", or whatever other search engine you would like to use.

    Just because the menu bar's search function pulls up google, doesn't mean you have to use it. Or did using a Mac for this long rot your brain to the point where you can only do things either the Mac way or the Extremely Difficult way?

    --
    "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
  22. Re:The Google Might Be Falling by _Sprocket_ · · Score: 3, Informative


    The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?


    1) Google has an effective advertisement system

    2) My last two employers bought Google boxes for their intranet