Slashdot Mirror


Why Do Google Hit Numbers Vary?

Supa-Fly writes "I have a question about some conflicting results with the search engine google. I did a search for "pictures of mountains" and got exactly 1 million results. My friend did the same search (from the same office)and got 1,010,000 results. A second friend did the same search as the last 2 and got 1,020,000. These have not changed and every person gets the same results each time. My question is what is up with the discrepancies on google's search results?" Since this question is hard to answer from the outside, Craig Silverstein of Google kindly supplies his best answer to this question, below.

Craig writes: "Thanks for the great question. We get this from time to time and hopefully I can clear up some of the confusion. The number of estimated pages listed to the top right of a Google search results page is indeed, an estimate. It's a good estimate but still, an estimate.

There are many reasons why one might see a difference in the estimated number of pages returned for the same query. It's most likely the queries made by your co-workers were sent to different Google datacenters in what appears to have been a round-robin fashion. The index at any given Google datacenter can change slightly over the course of a day (each index is refreshed completely every three to four weeks). Depending on which datacenter finishes a query, the estimated number of results may vary.

Without having direct access to your environment it is hard for me to tell for sure, however, I believe this is the case."

8 of 362 comments (clear)

  1. uh... by caino59 · · Score: 1, Insightful

    who cares....as long as it works...chances are you don't go past the first 2 or 3 pages.....

  2. Amazing! by PeterClark · · Score: 5, Insightful

    An "Ask Slashdot" that actually went to the source for the answer first, without the usually bad/wrong/pointless pontificating that normally goes along with it. How long can such a good thing last, I wonder.
    :Peter

  3. How do you "estimate" database count results ? by Anonymous Coward · · Score: 1, Insightful


    Surely the figure should be the exact number of results and not "estimated" as either those entries exist in the database or they do not, isnt it trivial just to display the database results count as an exact figure, how can you "estimate" a database count ?

  4. Re:Distributed database? by Anonymous Coward · · Score: 1, Insightful

    Yeah but one would tend to think that the indices for the results are completely replicated everywhere - in which case you would think that the number of results should be consistent.

  5. Re:Some different results by Anonymous Coward · · Score: 1, Insightful

    If wordProximityIndex is calculated before calling removeCommonWords, then the subsequent hits=lookup(keywords[],wordProximityIndex)
    will return > hits for more ors.

  6. Re:I have to wonder... by RedWizzard · · Score: 4, Insightful
    I get a list of 7 pages, and then after getting to page 5, there are only 6 pages.
    I believe that what's happening there is that as you move through the pages of results Google realises that some of the later results are similar to some of the earlier results and omits them. You can get them back but clicking on the link at the end of the last page.
  7. Re:number oddities by Forgotten · · Score: 5, Insightful

    I nearly always use double quotes to search for phrases. It works extremely well with google. You can also combine multiple phrases, and unquoted terms as well.

    In fact, I'm surprised no one else mentioned that searching for "pictures of mountains" (quotes included) yields 1320 hits, which are likely to be much more useful than the other 998,690 or so. Though in this case I really would have searched for "pictures of mountains" OR "mountain pictures" (or done two searches).

    If you're not going to use the quotes, there's precious little point including the word "of" in the query.

    There are other useful tricks for the google search field listed on the help page, but double quotes is by far the most useful overall.

    (another handy trick if you're using Mac IE is to hack the app's resource fork so the '?' address bar shortcut goes to google instead of MSN - a trick expanded on in iCab's built in URL expansion)

  8. Re:Why didn't the Google guy mention this? by [magus] · · Score: 2, Insightful

    It may not be that he overlooked the possibility. If google does any kind of load balancing (even through round-robin dns) you can often set IP Affinity so that once a client makes a connection, they will almost always get the same connection. IP affinity is often used in web farm environments where you maintain a small amount of reconstructable state on each server and its less expensive to keep having the same client visit the same server while other clients could be directed to (and gain an affinity for) other servers.