Slashdot Mirror


How does Google do it?

Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."

81 of 261 comments (clear)

  1. Openness is the first casualty of going public?! by Paul+Townend · · Score: 4, Insightful

    If truth is the first casualty of war, openness is the first casualty of going public

    OK - I can (perhaps) see this as being the case prior to an IPO, but that statement can't be true after it has happened...

    I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!

  2. Google is faltering by Anonymous Coward · · Score: 3, Interesting

    Google has been at 4.285 billion pages for more than three months straight. The count hasn't increased in a long time... The index is maxed.

    Google has recently removed tens of thousands of "duplicate content" sites from its index - where "duplicate content" is as simple as being an affiliate site (e.g. Amazon) and having the same textual item descriptions as many other sites.

    Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.

    Google is wavering.

    Gmail is a distraction, a venture into some other space to keep people from noticing that their search product is degrading.

    May she last as long as possible...

    1. Re:Google is faltering by jabbadabbadoo · · Score: 5, Insightful

      "Google has been at 4.285 billion pages for more than three months straight. The count hasn't increased in a long time... The index is maxed."
      Hmm... are they using a 32-bit integer to keep the page count?
      2^32 = 4.294 billion, pretty close to 4.285 billion pages.
      Newbies...

    2. Re:Google is faltering by Waffle+Iron · · Score: 3, Informative
      Yeah, those hundreds of PhDs they have working there will *never* figure that out. I hear they started with a 16 bit signed integer for their primary key and only after months of hard work upgraded it to 32 bit. Time to close down shop, it's impossible to fix.

      Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

    3. Re:Google is faltering by Decameron81 · · Score: 4, Funny

      I bet you wouldn't know you need more than an unsigned 32 bit integer before you hit it.

      On a side note I would really like to know which one is page number 1.

      Diego Rey

      --
      diegoT
    4. Re:Google is faltering by zcat_NZ · · Score: 2, Funny

      "64 billion should be enough for anybody" .. ?

      --
      455fe10422ca29c4933f95052b792ab2
    5. Re:Google is faltering by orthogonal · · Score: 5, Informative
      Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

      Ah, youthful mod!

      You've been (humorously) trolled. I suggest posting in this thread to remove your "+1 Informative", or getting a friend to mod it "Funny".

      What the parent is describing is not what Google will do, but what DOS did: the above scheme is how MS-DOS managed memory, except that the "selector" and "offset" were both 16-bit numbers under DOS. (Although "segment" was the more usual term for "selector".) The segment number was shifted left four places -- or put more simply but less graphically, multiplied by 16 -- and then added to the offset number, to give the whole or "flat" address:
      segment (in hex): 0001
      offset ( in hex): 0002
      segment is multipled by 16 (shifted left 4 bits or one hex digit of multipled by 16)
      segment: 0001x
      offset: 0002
      ---------------
      total: 00012
      This allowed DOS to use 16-bit numbers to address 2^20 = 1 MB of memory, but since DOS reserved the upper 384 KB for the (remapped) BIOS and peripheral cards, programs were able to address at most 640 KB of memory; the parent's mention of "64 billion pages" is probably an allusion (increased several orders of magnitude) to this DOS limit.

      Of course, this was a kludge, pure and simple, required because DOS machines were 16-bit. Among other things, it allowed the same memory locations (all but the very top and bottom memory addresses) to be addressable by several different addresses, and discovering pointer aliasing it required calculations that, by their very nature couldn't be done wholly in the machines (16-bit) registers.

      Consider: segment 4, offset 0 is 4 * 16 + 0 = 64,
      and segment 3, offset 16 is 3 * 16 + 16 = 64,
      and segment 2, offset 32 is 2 * 16 + 32 = 64
      and segment 1, offset 48 is 1 * 16 + 48 = 64
      and segment 0, offset 64 is 0 * 16 + 64 = 64:

      so all five segment:offset pairs are apparently different but actually point to the same memory location.
    6. Re:Google is faltering by eet23 · · Score: 2, Insightful

      I'd rather know which one is page 0.

    7. Re:Google is faltering by imroy · · Score: 2, Informative
      ...the above scheme is how MS-DOS managed memory.

      <sarcasm>Wow, I didn't know DOS managed memory at such a low level!</sarcasm>

      s/DOS/the 8086/g;

      You're really referring to the horrible segmented memory layout used by the Intel 8086 processor and its later derivitives. I did all this shit years ago in university. Almost every lesson my fellow students and I (and the lecturer as well) would end up cursing Intel for their whacky processor design. Interestingly Intel introduced a similar scheme in (IIRC) its Xeon processors to produce (IIRC) 36-bit addresses and access more than 4 gigabytes of physical memory on a 32-bit processor.

    8. Re:Google is faltering by NonSequor · · Score: 2, Informative

      The 36-bit addressing extension began with the Pentium Pro.

      --
      My only political goal is to see to it that no political party achieves its goals.
  3. How does Google do it? by Talez · · Score: 5, Funny

    PigeonRank! Duhhhhhh

  4. Here by mfh · · Score: 4, Insightful

    > If truth is the first casualty of war, openness is the first casualty of going public.

    Maybe this is the reason after all, but I think it's more about Google being simple, smart and clean. They play fair (no browser interstitials, no sneaky crap, no registration necessary...etc); I would equate Google's victory thusfar to a kind of no-nonsense attitude to business, always, no-exception.

    --
    The dangers of knowledge trigger emotional distress in human beings.
    1. Re:Here by evilviper · · Score: 5, Insightful
      They play fair (no browser interstitials, no sneaky crap, no registration necessary...etc)

      And the fact that there are so many articles, from people that just can't understand why google is successful, just goes to show you how screwed we all are...

      Practically everyone in business is determined to be as evil as possible torwards their customers (and employees) and assume that anybody doing anything else must be doing something wrong, no matter what all other indicators may say.

      For a great example, read The Wal-Mart Myth.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  5. They have built an amazing system using Linux... by Anonymous Coward · · Score: 2, Interesting

    Sure would be nice to see some of that amazing tech coming back into the community...

  6. Re:Google Problems by Motherfucking+Shit · · Score: 2, Funny
    If Google had chosen to go with a superior platform, they probably would have been able to go pubic already.
    Well, I suppose "Micro soft" isn't the superior platform for anyone's pubic ventures...
    --
    "BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
  7. Article didn't say much by krs-one · · Score: 4, Interesting

    I read the article and it didn't say much at all about how Google operated. Instead, it just said we don't know how they operate because they keep it secret. But maybe that was the point to begin with.

    -Vic

  8. Soon to be everything by WhitePanther5000 · · Score: 4, Interesting

    The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?

    1. Re:Soon to be everything by richard_za · · Score: 5, Informative

      Google already has spell check, and so does Gmail have a look at the screenshots on my blog. I believe they're looking at releasing it to the public in six months time, have a look at this article.

    2. Re:Soon to be everything by Anonymous Coward · · Score: 3, Informative

      The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?

      Google has had a builtin spellchecker forever and their translate tool is right here http://www.google.com/language_tools
    3. Re:Soon to be everything by evilmonkey_666 · · Score: 2, Informative

      Umm is this a joke, they do have a spellchecker built into the search engine. I use it on a daily basis.

      And their online translator is here.

      --


      - PS. This is what part of the alphabet would look like if Q and R where eliminated.
  9. As a consultant by elinenbe · · Score: 5, Informative

    having been a consultant at their data center a year or so back I can attest that they had well over 50,000 machines. I am not sure about the 80GB drive per machine because from what I understood was they bought whatever drive at the time was the cheapest MB/$ and would replace any dead ones with the larger ones. Also, at any given time machines just die and many of them are not replaced or repaird for months. Their cluster accounts for all this...

    --
    -eric
    1. Re:As a consultant by _Sharp'r_ · · Score: 5, Informative

      But also realize that the data center you were at isn't their only one. I know of at least 7 physical locations and there are probably more out there.

      But yeah, their racks of 4 servers/1U is pretty impressive when you see them lined up in row after row of racks. Their data centers have to bring in extra cooling because they are so densely packed.

      --
      The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
  10. Interesting by Motherfucking+Shit · · Score: 3, Interesting

    I lost a couple of sites from Google this month, presumably due to duplicate content; they were nearly verbatim clones of some of my other sites. The original sites are still there, the "clones" vanished from Google. As in, even if I search for those domains directly, I get nothing, where I used to get a cached copy of the sites. They've quite literally vanished from Google's database.

    Can you back up your assertions that Google's index is full? It's a rather interesting theory, and perhaps an explanation for all the tweaking they've done lately.

    --
    "BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
    1. Re:Interesting by ShaunC · · Score: 4, Informative

      Google is definitely cracking down on duplicate content. In fact, they've recently patented the concept.

      Insert software patent debate (where Google is the default hero due to its geek factor) here...

      --
      Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
    2. Re:Interesting by galaxy300 · · Score: 2, Insightful

      It's possible that their index is full. A more likely theory is that they don't really see the benefit of having content duplicated throughout the database.

      How many times have you run a search and seen a link at the bottom that says something like "Google removed information from this search that is redundant to information already displayed on the page" (Can't remember exactly what it says right now). Usually, there's nothing valuable in the hidden links - why index them at all?

    3. Re:Interesting by Psychic+Burrito · · Score: 4, Funny

      Google is cracking down on dupes? Oh no, Slashdot is doomed! :-)

  11. Two Thingies by BoldAC · · Score: 5, Interesting

    One -- Slashdot seems to be into content-directed ads now... as google was my ad for this story.

    Two -- If you want your pages indexed faster and more frequently, sign-up and place a google adsense ad on your page. Many webmasters believe that google is having to index so many adsense pages... that is difficult for google to add many more non-ad driven pages.

    Just sign up for adsense and run it a couple of weeks while you build your site. After google has spidered your site well, then just drop adsense.

    Good luck. I would love to hear any of your google-related tricks.

    AC

  12. Re:Openness is the first casualty of going public? by Anonymous Coward · · Score: 5, Insightful

    They will not have to disclose the number of machines, the OS, the anything related to the machines. Wall Street isn't buying their technology, they are buying their cash flow.

    If you do not believe me, buy a share of GE. Pick up the phone, call Investor Relations and ask them how many Unix computers they have and what OS and patch level they run.

  13. Re:Openness is the first casualty of going public? by nacturation · · Score: 5, Interesting

    I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!

    Why would a shareholder care about server specifications? Investing is all about money. Read any quarterly report from a public company. Income statement, balance sheet, and cash flow are the primary interests on the numbers side as well as a general roadmap of where the company's heading. Warren Buffett doesn't care if each server has two 80 GB drives, or whether they have four 250 GB drives per server. The only thing that matters is that there are competent people to handle these kinds of "dirty details" that an investor doesn't give a rats ass about.

    Take a look at the kinds of information you could expect from Google's quarterly reports.

    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  14. the reason they keep their mouth shut by gevmage · · Score: 5, Funny
    It's quite possible the reason that they keep their mouth shut about their capabilities is to avoid the NSA (or someone like them) to come calling. After all, they basically have a distributed database of the entire net, which they index efficiently on a continuous basis. Who wants to bet that their system is better at gathering intelligence than any government agency in the world?

    On the other hand, here's the conspiracy theory version: what if Google IS the NSA? The IPO is a smokescreen to try to avert attention. The reason they can't show their true capability is that when the company goes public, only 20% of their hardware will actually go into the public company "Google", the rest of the hardware will still be hidden and a part of the NSA's system. :-)

    [For the humor impaired, I'm just joking, but it does make you wonder...]

    --
    Craig Steffen
    http://www.craigsteffen.net
  15. I've though about this a bit by gtoomey · · Score: 3, Interesting
    The software/hardware architecture seems impressive.

    Putting on my computer scientist hat I would guess:
    - instead of backup, hold data in multiple places at once
    - use a "cascaded rsync" to trickle software changes to thousands of nodes
    - then load software via NFS at node bootup
    - use nodes just to store data; keep software in RAM for speed

    Just a few thoughts.

    1. Re:I've though about this a bit by kasperd · · Score: 2, Insightful

      instead of backup, hold data in multiple places at once
      Even better, instead of backup just crawl the pages again in the event of a lost disk. Of course some data needs to be in multiple places for performance reasons, but not all data are accessed frequently. How often do you think they will need the page with the lowest rank? (OK, I know there will probably be a lot with exactly the same rank, but you get the idea).

      load software via NFS at node bootup
      There are better protocols for this than NFS. But when you build a cluster this size, you surely want boxes, that can netboot of of the box. Actually that means you will need to use DHCP and TFTP. Security of the DHCP and TFTP servers is going to be very critical.

      use nodes just to store data; keep software in RAM for speed
      I wouldn't worry about the speed. Linux is going to do fine. But since they probably netboot and download kernel and a ramdisk from a server, it is of course going to be kept in ram. Now I wonder, does it all run of an initial ramdisk?

      --

      Do you care about the security of your wireless mouse?
  16. google instant messenger, or... by zogger · · Score: 4, Interesting

    GIMMEE would be nice. Well, nice for awhile and if they didn't get weird with it. Don't know if that could happen though, nature of man and all that philosophical stuff. Goes along with the current VoIP articles. They would dominate the net then if they implemented that. I know I would pay cash to them have a universal works great, any OS VoIP and no-spam, no commercial email service.

    So far we know they have just a cubic load of servers, the most on the planet most likely with one private company. The government probably has more, but it's a mish mash of them, not near as sleek or coordinated, AFAIK. What COULD be next with them, practical cheap 50 dollar thin clinets that you could do a TON on, using distributed computing, from games to communication to running any business? With tech savvy like they got and their already established heavy hardware base and heavy committment to R&D, they could just 'splode with an extra 25 billion in cash all of a sudden from an IPO. OR, the money could get to them and they become just another weird company that forgets it's roots as "brains come first" and switch to "marketing crap comes first" like certain other unnamed megacorps do now.

    Interesting times

  17. How Google do that? by elpecek · · Score: 4, Informative

    For those who haven't read - there is an article written by Brin and Page - maybe a little outdated, but still interesting: The Anatomy of a Large-Scale Hypertextual Web Search Engine

    1. Re:How Google do that? by jvsanford · · Score: 3, Informative

      There is also a paper that describes their storage infrastructure (Google File System) here

  18. Supplmental Result by Richard5mith · · Score: 3, Interesting

    There is plenty of evidence to suggest that Google has run out of docid's, hitting the 32-bit integer limit.

    The best evidence is doing a search which returns results which say "Supplemental Result" next to them. That'll be coming from a second document store I'd guess.

    1. Re:Supplmental Result by Webz · · Score: 5, Interesting

      That doesn't make any sense. A well-designed system is a transparent one, so Google would have no reason to let you know that they're running out of IDs.

      By the way, for supplemental result... By doing a quick keyword search on Google using my domain name, I'm led to believe that pages marked "Supplemental Result" are pages that look like search results. That is, they aren't filled with any real content, other than search results from other engines. Results that could "supplement" your "result" from Google.

    2. Re:Supplmental Result by Dave2+Wickham · · Score: 2, Interesting
      Supplemental results do come from a second store, yes:
      Hey, pages get added to the supplemental index using automatic algorithms. You can imagine a lot of useful criteria, including that we saw a url during the main crawl but didn't have a have a chance to crawl it when we first saw it.

      Think of this as icing on the cake. If there's an obscure search, we're willing to do extra work with this new experimental feature to turn up more results. The net outcome is more search results for people doing power searches.

      The above is from GoogleGuy in this thread on WebmasterWorld.

      (I think you may need to copy/paste the link, I'm not sure)
  19. Re:Openness is the first casualty of going public? by Blastercorps · · Score: 3, Insightful

    I disagree. An investor deserves to know at least general information about the goings on of a business. If I were a stock broker I would want to know that say: FruitCompanyA uses insecticide whereas FruitCompanyB doesn't. I personally would choose FruitCompanyA as a a rise in the insect population would ruin FruitCompanyB.

    With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).

    Honest reporting of operations lets an investor make an intelligent decision about their money and helps avoid boiler-room companies.

  20. Re:Openness is the first casualty of going public? by BigGerman · · Score: 4, Insightful

    unfortunately the technology spending IS part of the cash flow. "We went dumpster-diving and picked up a dozen new machines for the indexing farm" and "we entered agreement with Dell to secure a reliable source of cheap Intel servers" would both show up on the shareholder statements but the impact would not be the same.
    Going public WILL expose the siginificant portion of Google technology, more sp when it has to do with hardware.

  21. Re:Openness is the first casualty of going public? by Smidge204 · · Score: 4, Insightful

    The problem with that analogy is that what software they run has absolutely nothing to do with what they do to make money.

    With Google, their entire "business" - their means of generating cash flow - relies on sheer quantity of computing muscle and high performance software for their search databases. With GE, their business is making lightbulbs, dishwashers, hair dryers, electric motors and any more of thousands of different products used in residential, commercial and industrial settings. How many Unix computers they have in all their offices around the world is a causality of doing business, not their means of doing business.

    I'm sure if you asked the GE Investor Relations department something relevant about how their business operates, you might get somewhere.
    =Smidge=

  22. first casualty ?? by Sad+Loser · · Score: 4, Informative


    Recycling without attribution is the first casualty of bad journalism.

    I thought I had read this article before, and then I realised, I had read it before...
    (although I now realise that you are not supposed to read the linked articles before posting comments - sorry)

    --
    Humorous signatures are over-rated.
    1. Re:first casualty ?? by platypussrex · · Score: 4, Informative

      Not sure why you say that. If you read all the way through Naughton's article, he says that the calculations come from Garfinkel, he mentions Technology Review, and then later directly quotes Garfinkel. Sounds like attribution to me.

  23. Re:Additional questions by nacturation · · Score: 5, Funny

    Google search for the letter "a" resulted in 3,530,000,000 hits [search took 0.12 seconds].

    Neat. I wonder what doing a Google search would return for other letters:

    "c" -- 299,792,458 hits
    "e" -- 2.71828183 hits
    "h" -- 6.626068 × 10^-34 hits
    "i" -- sqrt(-1) hits
    "k" -- 1.3806503 × 10^-23 hits

    Looks like Google is definitely busted. They should fix these bugs.

    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  24. Linux needs more patching? Does it? by MicklePickle · · Score: 2, Interesting

    much more frequent in Linux than in proprietary systems from Microsoft or Sun

    Huh? Does it!? Since when? I like these throw-away lines the media people dish out. What is their basis for this statement? Even when they see Linux obviously succeeding, they dish out a statement like this.

    I certainly don't have to patch my Linux boxes as frequently as my Windows boxes. Actually... no... wait, they're right! I only need to patch Windows once. Ctrl-Alt-Del -> Boot Debian CD.

    --
    -- main(s){printf(s="main(s){printf(s=%c%s%c,34,s,34) ;}",34,s,34);} $p='$p=%c%s%
    1. Re:Linux needs more patching? Does it? by Baumi · · Score: 2, Insightful

      Not sure if it needs more patching, but at least OSS-pastches come out in a timely manner after the discovery, whereas MS patches sometimes take ages to materialize. Thus, more patches don't necessarily mean more security holes - just better housekeeping.

      Baumi

  25. Re:The "searching xxx web pages" count by Anonymous Coward · · Score: 2, Interesting

    Searching for 'the' gives about 5,740,000,000 pages while they index 'only' 4,285,199,774 web pages... Anyone knows why?

  26. Re:Openness is the first casualty of going public? by nacturation · · Score: 4, Informative

    With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).

    I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:

    "Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...

    That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.

    Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.

    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  27. Re:Openness is the first casualty of going public? by Vlad_the_Inhaler · · Score: 2, Insightful

    Do you know how many servers IBM have? Akamai? Microsoft?

    Be reasonable.

    Financial information is important, their business plan is important, it is probably important to know that they are running Linux so that SCO-type problems can be factored in. The sort of fine technical details the Observer goes into are totally irrelevant, just an incidental business expense. We know that it all works and that Google are on top of what they do. That is what matters.

    --
    Mielipiteet omiani - Opinions personal, facts suspect.
  28. Tinfoil Hats by mfh · · Score: 4, Informative

    > 1) Why are their terms of service / Pirvacy Policy so vague?

    This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.

    > 2) Why does their cookie stay until the year 2038?

    Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.

    > 3) Why does their Google search bar report information and auto-update without permission?

    I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.

    --
    The dangers of knowledge trigger emotional distress in human beings.
  29. Re:How does Google do it? by theRG · · Score: 2, Funny

    My favorite Google features:

    http://labs.google.com/
    http://www.google.com/i ntl/xx-klingon/
    http://www.google.com/intl/xx-elm er/

  30. Public paper on Google File System by MarkWatson · · Score: 4, Informative
    Here is a PDF file of the paper.


    If that link gets slashdotted, here is another link of a PDF PowerPoint presenation.


    Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence - so much so that I am using it for my new project.

    -Mark

    1. Re:Public paper on Google File System by svr0002 · · Score: 4, Informative
      and another good one - http://www.computer.org/micro/mi2003/m2022.pdf

      Interesting that a major problem for Google is managing power and cooling !

  31. The Google Might Be Falling by aluminumcube · · Score: 2, Interesting

    I think this is the wrong question investors need to be asking about Google before they IPO. Sure, it makes for some great geek gab; the fetishistic wonderment of just how many servers Google is running, how many hits they get and how exactly they manage to, well, manage that many servers. In the end though, answering those questions doesn't tell us anything about what Google is actually selling.

    The more and more I look at it, the more and more I fear Google is just nothing more then a very well calculated shill game; the Enron of technology IPOs...

    Pretty much everyone who uses the internet loves Google and we do so for a combination of three compelling reasons; First off, Google offers up what is basically the best search engine on the internet. It isn't perfect, it doesn't work all the time but it is the best thing out there right now. Second, they offer this high-quality search service without all the excess bullshit that got tacked onto all of the other search engines on the market in the .com heyday. While Yahoo was busy playing in Hollywood and becoming a "Portal" and Alta Vista was going down the tubes, Google's simple, whimsical, easy to use front page didn't get gaudy by trying to make us sign up for accounts or any of the other marketing department crap. Finally, Google has a high Willy Wonka factor, sort of like Apple. We don't hear much from the company in the way of press releases or other information, but every so often, they open the doors and it turns out the PhD Umpa Lumpas there developed something totally cool. Local search, Froogle, gMail and Okurit are examples of this...

    The thing that gives me the heeby geebies about Google is how they make all of this look so effortless. Okurit just sort of popped out of the open one day. gMail appeared on April 1 with such an "effortless" air about it all that Google didn't even bother to take the press release seriously. We keep hearing these cryptic references from the company about some overwhelmingly massive amount of computing power they have and how their kabul of PhDs has it humming along with levels of efficiency that are a world beyond most everything else out there.

    All of this has made for a very pumped up environment for an IPO, but we still have yet to get an answer to the question "What is Google's business model?" I "google" words all day. I have an Okurit account that I use. I even use Google as a quick and dirty calculator. When it opens up, I will have a couple of gMail accounts. The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?

    Sure, we can say that Google has integrated advertising within the search results, but the advertising model has always proven to be of dubious effectiveness at best. Google has an enterprise search division, but the cost of their Google Appliance is a pittance compared to the sort of money big time enterprise software companies like Oracle and SAP are making, how can they survive on that revenue stream and pay the bandwidth bills for all of the free services they offer to the public?

    We always tend to answer these questions with an "I don't know, but Google must be doing something right." Google works very hard to continue to fuel the fire that they are doing something paradigm shifting with all of those PhDs they have on the payroll, and how many servers they have, and how they can just sort of effortlessly announce 1gb free email accounts. We keep drawing up the impression that these guys must have something HUGE up their sleeves, and they have us salivating for the IPO so we too can be part of it.

    Very soon, Google executives are going to pile onto a Gulfstream V and do a roadshow for big time investment houses and institutional investors and they are going to be trying to convince these guys to buy Google IPO. They are going to be asked exactly what sort of business model Google is going to be pushing and one of two things is going to happen:

    - Google will c

    1. Re:The Google Might Be Falling by laura20 · · Score: 5, Insightful

      The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?

      Um, you do realize that Google already makes a profit, don't you? I daresay the IPO will puff the value of the company up beyond the rational amount, but that's not 'Enron' -- if you are going to use buzzwords, use the right ones. Enron was a case of internal actors in the company using financial games to siphon off profits and inflate the value of the company on the books. You accusing Google of financial fraud? If you are going to use a buzzword, use 'Yahoo' or something -- a solid company that got its stock price puffed up excessively due to investor mania.

      How the hell did this get moderated up, except as 'Funny'?

    2. Re:The Google Might Be Falling by _Sprocket_ · · Score: 3, Informative


      The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?


      1) Google has an effective advertisement system

      2) My last two employers bought Google boxes for their intranet
    3. Re:The Google Might Be Falling by Lord_Dweomer · · Score: 3, Insightful
      "Sure, we can say that Google has integrated advertising within the search results, but the advertising model has always proven to be of dubious effectiveness at best."

      Correction, the ad model has proven to be of dubious effectiveness with companies that have no credibility.

      Google is perhaps the most trusted company on the net today, and with the traffic they get, I'm not surprised at all that they can support all their financial needs with ad revenue, especially with some of the big bucks that large companies dump into advertising with Google. I challenge you to show evidence showing that their advertising business model cannot support their costs, because so far you've done nothing but toss up tin-foil hat ideas without any proof to back it up, and as someone else so kindly pointed out to you, Google is ALREADY in the black.

      --
      Buy Steampunk Clothing Online!
  32. How do they do it? Two words by JoeBaldwin · · Score: 2, Funny

    Underpants gnomes.

  33. Re:Why Verbatim Clones??WAS:Interesting by reanjr · · Score: 2, Informative

    I don't know why he has numerous identical sites, but one reason is when a small company purchases several other companies that are in the exact same market. Since the companies are compatible, you merge all their operations into one. But you still want to keep brand identification with your customers so you keep two copies of the site, each branded differently.

  34. You may also find this interesting... by lunar_legacy · · Score: 5, Informative

    Another wonderful speculation about Google infrastructure which You can find it here.

  35. Leprechauns! by penginkun · · Score: 2, Funny

    I mean, how else could they do it?

  36. Re:They have built an amazing system using Linux.. by B1ackDragon · · Score: 3, Insightful

    As far as I can tell there is no better way for that hardware to have come "back into the community."

    The service is free, and they're really good at what they do. I would say I'd be lost without google on the internet, but really this compliment goes for lots of search engines - I'm really very grateful this sort of service still exists for free (well, with ads.)

    Unless you want to talk about cures for diseases through protien folding simulations, I can't think of a better way for this hardware to be used, such that it begets a greater net benefit.

    --
    The snow doesn't give a soft white damn whom it touches. -- ee cummings
  37. Re:Openness is the first casualty of going public? by Anonymous Coward · · Score: 2, Interesting

    Akamai?

    "When I visited the company in January, the screen said that Akamai was serving 591,763 hits per second, with 14,372 CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes of total storage. On April 14 [2004], the number had jumped to a peak rate of 900,000 hits per second and 43.71 billion requests delivered in a 24-hour period."

    From this article.

  38. "serves up the answers to our questions"??? by tsadi · · Score: 2, Insightful
    The Observer, serves up the answers to our questions.

    the article never answered any of our questions - heck, i even looked for a "Page 2" link after reading the entire thing, sadly, the article ended w/o even attempting to answer its own questions.

  39. Re:Openness is the first casualty of going public? by espo812 · · Score: 2, Interesting
    hy would a shareholder care about server specifications? Investing is all about money.
    I, for one, would. Now, unfortunately I don't have enough money to start investing on Wall Street, but hopefully that will change soon. So, why would I want to know technical details for a company? Obviously, because I'm a geek. But someone has to track this kind of stuff to produce a stock report. You can't have a company saying "We bought an IBM X Server and it now ballances our accounts and brokers international deals for us - so our $10,000 server produces $10Million in revenue." I'd like to know I was making a good investment, instead of one based on snake oil.

    No, they have to have people who understand technical details to be able to produce legitimate forecasts of output. I'm sure there are people who analyize how many workers and robots Ford has to estimate how many cars they can produce, right? So the equvilent is how many coders and systems Google has, no?

    Well if they don't, big brokerage houses can reply and I will consider the most lucrative offer.
    --

    espo
  40. One word. by Viceice · · Score: 3, Informative

    Robot.txt

    The Google bot respects it, so if you're up to no good, it's easy to get Google to not index your page.

    Anyway, I'd like to see a version of google that didn't respect robot.txt. You'd used to be able to dig up alot of infermation on peopel on google before they started to use robot.txt on alot of sites.

    --
    Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
  41. Re:Google can't do it: phrase searches by RenaissanceGeek · · Score: 5, Insightful
    I performed the Google search for the phrase

    "To be or not to be"

    and I honestly can't see what you are going on about: of the first ten results, eight highlighted the phrase in the page synopsis, one used the phrase as a domain name, and one included the parital phrase "...Or Not To Be."

    Note the elipsis on that last one: it alludes to a larger portion of text preceding the printed portion. And the domain-name was found even though the spaces were omitted.

    Those aren't irregular results: those are highly intelligent results.

    Just because they aren't deterministic enough for you to plug them into a piece of code of your own construction (without compensating Google) doesn't mean that they don't fulfill the purpose of the web search.

    --
    What is the difference between a small revolutionary change and a large evolutionary change?
  42. just read almost everything on google-watch.org by tsadi · · Score: 2, Funny

    i say google-watch.org is as credible a site as this one: www.realultimatepower.net - go ahead, click the link - its a hilarious site

  43. Yes. by Ayanami+Rei · · Score: 2, Informative

    very simple example of 15 servers in 3U. Many vendors are also offering a "dual dual" system in 1U... that is a two dual CPU motherboards that fit in one case.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  44. Re:Openness is the first casualty of going public? by chipset · · Score: 3, Insightful

    The original analogy is a little off. However, if you look at eBay, do they disclose how many systems they are running? How about Amazon? Do I care?

    The real fact of the matter is, they have custom software that they run. The number of systems, speed, memory and OSs are simply a byproduct of what they really offer: a service.

    Google is no different. They offer a service. As long as they are profitable, as an investor, I could care less if the systems were running on Dell's, White Boxes, Mac, or Commodore-64s. They have found a way to make the business run on the systems they have.

  45. Google full? Or just tweeking the algorithm? by Saeed+al-Sahaf · · Score: 2, Interesting
    Google has recently removed tens of thousands of "duplicate content" sites from its index - where "duplicate content" is as simple as being an affiliate site (e.g. Amazon) and having the same textual item descriptions as many other sites.

    Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.

    It's possible that the index is full, but I would imagine that they would have seen this coming long ago, as it "filled up", and taken measures. What's more likely behind the elimination of duplicate pages is that more and more people have been complaining about the search results relevancy and how site owners have been taking advantage of certain known flaws in the Google algorithm. So, they are taking steps to fix the algorithm, and kill off all the fake sites.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  46. they don't have to path and update very often by tmalsburg · · Score: 3, Insightful
    For example, how do you implement security patches and operating-system upgrades (much more frequent in Linux than in proprietary systems from Microsoft or Sun)

    Come on, the nodes in their clusters are not desktop computers with office software on it.

    The system running these machines are rather very stipped down: They only need very few applications and a very simple kernel (not many device drivers, maybe no graphic card driver, ...).

    Furthermore there are no local users on the the machines -> many security flaws wont affect the integrity. And remote holes in the kernel occur not very often.

    And above all these cluster nodes are certaily shielded by some sort of firewall. Therefore they don't have to care for network security themselves.

    All in all: I believe that you need to update such machines rather infrequent. At least not for security reasons.

    Titus

  47. IPO signals more World Poker Tour participants by mabu · · Score: 2, Interesting

    I can understand how in some cases an IPO can help generate revenue necessary to operate and break into new markets, but does this apply to Google? I really don't think so. They have market share; they have resources. Any infusion of funds to the company is more likely to give them the ability to further diversify and enter different markets, which history has shown is more often than not, a bad business idea.

    So one has to assume the IPO is the first phase of the principals "cashing out". The press will probably signal this as a sign of the next dot com boom, and a bunch of nerds within the company will suddenly become millionaires, and subsequently quit their job and open up a Bed & Breakfast in some obscure town or join the World Poker Tour. There goes the talent.

  48. Re:Openness is the first casualty of going public? by krosk · · Score: 2, Interesting
    Not necessarily. You can easily fudge this information. You don't have to relate any specific information, in fact Google could quite easily just say "$10,000 for capital investments" and everything would confirm perfectly to GAAP (Generally Accepted Accounting Principles). Capital investments could be anything from computer servers, a new piece of land, a new building, or pens/pencils for all we would know. Google, through it's financial statements doesn't have to say exactly what they spent their cash on, just what catagory it fits in (Operating, Investing, and Financing).

  49. Re:Openness is the first casualty of going public? by AhBeeDoi · · Score: 3, Funny
    With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).
    Not to mention source code for custom applications, maintenance schedules, software upgrade schedules, standard permissions settings, root passwords, type and model of CPU cooling fans used, average uptimes and other relevant information which all prudent investors need.
  50. Why 4.285 billion? by NotQuiteReal · · Score: 4, Interesting
    Just because the front page says "©2004 Google - Searching 4,285,199,774 web pages " doesn't mean you have to believe them. Maybe it is understated. For example, I just did a search on "the" and got:

    Results 1 - 10 of about 5,750,000,000 for the [definition]. (0.11 seconds)

    Doesn't that imply more than 4.285 billion?

    --
    This issue is a bit more complicated than you think.
  51. Doing half as well as Google by alien_tracking_devic · · Score: 4, Funny
    from the artice:

    "Google manages to achieve this with sophisticated techniques for rippling changes through the cluster, yet achieves 100 per cent uptime. This is serious stuff, and there are a lot of IT managers out there who would give their eye-teeth to be able to do it half as well."

    Sigh...as an IT manager I can only dream of 50% uptime. Damn you, Google!

  52. This wil be the begining of the end for Google by rpsoucy · · Score: 2, Interesting

    Wallstreet should be seen for what it is: a plague upon american businesses and innovation.

    You get your initial investment, which seems great, but then you sell your soul. You will be forced to "cut the fat" and "yeild higher short-term profits" and all resarch projects that make tech companies great will vanish.

    This has happened with almost every great American tech company. How often do we see the type of reasearch that came out of Bell Labs today? We don't, instead we see former reasearchers that were once considered the "cream de la cream" of computer scientists out looking for work (most taking up teaching positions at universities).

    Along with the presure of Wallstreet, Microsoft will be releasing their direct competitor to Google soon and they will be pushing hard for industry domination.

    Wallstreet is the reason that our tech jobs are going to India, Wallstreet is the reason that America is slowly becoming less and less of the technological superpower that it used to be.

    IMHO, Google should stay out of Wallstreet and keep doing what it has been doing.

    Then again, there are plenty of examples of companies that had alot of hype for an IPO and are still strong and innovating today, VA Linux Systems for example, oh, I mean VA Software, and their one product that is slowly being made obsoleete by Free and Open Source alternitives.

  53. Re:Google started to make me mad by XO · · Score: 2, Informative

    Chill out, brother.

    Try clicking in the address entry bar on Safari, and typing in "www.lycos.com", or whatever other search engine you would like to use.

    Just because the menu bar's search function pulls up google, doesn't mean you have to use it. Or did using a Mac for this long rot your brain to the point where you can only do things either the Mac way or the Extremely Difficult way?

    --
    "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
  54. annoying ads by 602 · · Score: 2, Funny

    It's a good article, but the page as a whole is annoying, due to several animated ads. I won't put up with that shit. I copied the text to my word processor for reading.

  55. Re:Openness is the first casualty of going public? by Lord_Dweomer · · Score: 2, Interesting
    'With Google, their entire "business" - their means of generating cash flow - relies on sheer quantity of computing muscle and high performance software for their search databases."

    Actually, their means of generating cash flow relies on how beneficial advertisers feel it is to advertise on Google.

    --
    Buy Steampunk Clothing Online!