Why Do Google Hit Numbers Vary?
Craig writes: "Thanks for the great question. We get this from time to time and hopefully I can clear up some of the confusion. The number of estimated pages listed to the top right of a Google search results page is indeed, an estimate. It's a good estimate but still, an estimate.
There are many reasons why one might see a difference in the estimated number of pages returned for the same query. It's most likely the queries made by your co-workers were sent to different Google datacenters in what appears to have been a round-robin fashion. The index at any given Google datacenter can change slightly over the course of a day (each index is refreshed completely every three to four weeks). Depending on which datacenter finishes a query, the estimated number of results may vary.
Without having direct access to your environment it is hard for me to tell for sure, however, I believe this is the case."
who cares....as long as it works...chances are you don't go past the first 2 or 3 pages.....
Several weeks back I happened to mention a very nice new restaurant in Toronto on one of my pages, and within days shot to the #2 position on Google when searching for several variants of this restaurants name. I knew this by the fact that suddenly I was seeing closing on a hundred hits per day of people looking for this restaurant. Note that this restaurant has such a unique name that there are only around 5 pages of links in all anyways. Anyways suddenly the hits entirely stopped, and a search on Google found my page was purged from the database: Despite it being a unique name with few hits, it no longer even registered. A week later suddenly it was back in the #2 spot again.
No idea why this happened, but it is entertaining to see it vary.
I got 1,030,000 hits.
;oP
I'm better than you!
nah nah poo poo
An "Ask Slashdot" that actually went to the source for the answer first, without the usually bad/wrong/pointless pontificating that normally goes along with it. How long can such a good thing last, I wonder.
:Peter
What's really odd is searching for a few words with OR, and noticing that adding words actually lowers the numbers of results obtained.
Results 1 - 10 of about 1,010,000. Search took 0.04 seconds
The numbers seem to be consistent, I guess. Kinda of cool to have a little insight to how Google works, IMHO.
this and this
## W.Finlay McWalter ## http://www.mcwalter.org ##
Always Picking Favorites :( . But what a lucky friend getting to look at a million and twenty thousand compared to his friends measly million.
But anyways, I sure hope this doesnt seriously degrade anyones view of google....that would be kinda sad
It's too bad Google doesn't have one of those things where you can watch everyone's search scrolling down the screen live. I bet there would be a lot of "pictures of mountains" searches right about now.
I think some engine had that (metacrawler)? back in the day, was fun to watch, and I believe they didnt censor it.
If the database is distributed, results might be coming from different servers. After a certain point (so many millis?), the results are returned. This could result in the difference.
About a month ago, someone posted this story over on K5 regarding the google dance. Good to see it's run by a marketing site, I couldn't think of anyone who might have more of an interest in rankings then those bastards. :P
No wonder I couldn't find the website I was looking for! It was in those missing 10,000 websites. If I had only gotten those and checked through them as thoroughly as I checked the other 1,010,000 then I would have certainly found it.
/. page and the poll from last week appears. You'd think the Uber Midgets and Stealth Ninjas could get it right ;-)
Humor aside, this is pretty interesting. Alot like when you vote in a poll, go back to the main
Posting as directed.
like snowflakes falling
google queries melt upon
different servers
like the wild flowers
each view of the database
unique, yet alike
and...
its that time of month
google dances, results wiggle
w00t first haiku post
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Surely the figure should be the exact number of results and not "estimated" as either those entries exist in the database or they do not, isnt it trivial just to display the database results count as an exact figure, how can you "estimate" a database count ?
Oh well...Google is still beating my photo album...I searched for pictures of mountains, and only found 3. And two of those are debatable. I'd classify one as a photo of an airplane, with a mountain on the background, and the other wasn't a mountain at all...it was a hill.
Warning: Opinions known to be heavily biased.
...and got 40,000 more search results (10,010,000 to 10,050,000). "Of" isn't included in the original search anyway, so I wonder why removing it yields a different estimate.
It's simple, really... mountains are the new thing in pornography. People are snapping and posting so many pictures of naughty, erotically shaped rock formations that the number of mountain pics available worldwide on the net is rising by about 10,000 every 10 minutes.
Soon, the number of phallic granite pics worldwide will even exceed the number of Jenna Jameson facials. Quite the phenomenon, really.
Finally, proof that all Ask Slashdot questions could be more quickly answered by simply checking with Google :)
314-15-9265
--naked
Very popular slashdot journal for adul
Results have been inconsistent ever since they let those damn pigeons unionize. He's obviously covering for the union.
Hey, maybe it would be possible to modify this technology so that whenever anyone from the RIAA or MPAA did a search for MP3 or MPG, all they'd get is Whitehouse.com
"pictures of mountains" 986,000
"pictures of of mountains" 1,010,000
"pictures of of of mountains" 1,020,000
Two of these pages had a different top-ranked link.
Funny thing, all three times Google told me "of is a very common word and was not included in my search", but it made a difference!
Regardless of these results, Google is the best search engine. Period.
If this is the same reason that when I search, I get a list of 7 pages, and then after getting to page 5, there are only 6 pages. I would think that they would have a cookie set saying which server they are gathering their data for each search though...
:(
It is kind of aggrevating to be expecting 7 pages, and get only 6, I always think that the mystical disappearing page contains my wanted result though.
Remember cookies are kept in all browsers.
When a search engine finds those relating to their advertisers or 'favored sites' those are 'extended' into a higher level.
Some results may be discarded if certain advertisers don't like you to see those competing sites.
Uhhh, no. A domain can only see their OWN cookies.
bevis: Huh-huh. They said *mountains*. Huh-huh.
*smack*
butt_head: They are slashdot. They make such references to screw up the google database, thus completly validating their newstories. Inevitable reposts will bump the number even higher!
bevis: like a conspiracy. huh-huh
butt_head: conspiracies are cool!
If a man's character is to be abused there's nobody like a relative to do the business. -Thackeray, William
Uhhh, no. A domain can only see their OWN cookies.
And you believe this why?
Ever tried to turn off Google Images' "You-really-don't-want-to-see-this" filter?
I mean.. You were searching for "pictures" of "mountains"... Big breasts, that is? ;-) Nah.
It's "&safe=off", and people outside the US might want to change the language to English before trying to use it (hint).
Funny thing here in Germany is: The filter is ALWAYS ON, and in the German preferences, there's no option to turn it off. After you change your language to English (URL), though, there suddenly appears an option for disabling the filter... Try talking about censorship (there are not even clear rules about what exactly they are filtering, and there's no explanation why you can't turn it off over here; even worse: They don't even tell you that there IS a filter and that it's always active).
I asked Google about this, but never got a response.
42. Easy. What is 32 + 8 + 2?
google fight!
It's the answer to every problem.
Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
Regardless of cookies, Google doesn't rank or modify searches to benefit advertisers, though it allows advertisers to target specific search criteria. Google isn't tweaking it's results for this case, and it wouldn't explain it anyway
Here's some radio commentary on the subjet matter. I heard it the other day on Public Radio International. An interesting read and somewhat related...
Life is the leading cause of death in America.
Because he knows something about web browsers and the way the WWW works, which, apparently, you do not?
...I found that out from google ages ago...
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
You buy a TV ad during the super bowl (or whatever) and wait for the orders to just roll in.
Because it's true.
It's hard to be religious when certain people are never incinerated by bolts of lightning.
Another quirk I just noticed. My personal webpage was just recently indexed by Google. When searching for terms in it and getting it near the top of the search results, a cached link is seen next to it. This link works.
However, if I enter cache:URL_HERE, it says it cannot find it. This feature works for other webpages, so I know it's not my syntax.
You probably just don't have the correct Asian fonts installed for your web browser.
cpeterso
Perhaps we should Ask Jeeves.
Hmmmmmm?
I suspect that Google employs the nucular radiation-enhanced, super pigeons as actual editors (as opposed to the slashdot phenomenon). Regular columbiformes are probably relegated to mundane crawling/pecking duty.
My site, which is admittedly somewhat unique, has been listed in the top 5 for over 18 months now if the appropriate keywords are used.
Of course, it also helps that those keywords are snarklbort, giffleblag and byzgetford.
I read an article on K5 regarding this, about a week or two ago... Very interesting stuff, and although the article was regarding the timed (monthly, quarterly, etc) synching between these different google datacenters, it was an interesting read and relevant to the point none-the-less.
Go take a peek! Worth the read!
Of course, a little thought on the subject would probably lead to the conclusion that the searches must be being sent to different lookup engines since the same result going to the same DB will always return the same amount.
Unless you hit the same search engine and its index has been updated between searches.
... anyone care to tell me what they see
on this one?
This space for rent.
Well, google is the greatest search engine in my opinion, I think they are doing a great job! Anyway, I noticed some weird thing in the results (especially in google groups), when I switch to a different page (through the indexed numbers below e.g. 1 2 3 4 Next --->) .. I noticed that the number of these pages change sometimes which is a bit weird.
"What you 'seek' is what you get!"
Sounds like they simply offlined one datasource, and put up another. Can you say...backup? Then, the backup was taken down and the (presumed) original went back online.
so you may connect to any one of several servers. The servers each have different databases to pool results from and different caches to display.
A google search for my site returns our old site that has had dead dns records for nearly a month above my new site. Sometimes my new site pulls into the lead, sometimes it isn't there, and at least one cache has the announcement that the old name was lost and a domain was purchased.
You can't judge a book by the way it wears its hair.
... for "pictures of mountains", and Results 1 - 10 of about 1,320. Search took 0.04 seconds. That is a little different than 1e6
in your robots.txt make sure that googlebot has access to the /fridge & /tv.
Score: (-1, Anti-Google). But seriously, how can it be overrated if they aren't even modded up?
My soon to be exwife is trying to screw me out of back pay and my share of the company that I helped build she change the contact page of the company to remove my name and change the name of the page but thanks to google cache I could retrieve it and will be showing it to the labor board in the morning woooohoooo
http://Lenny.com
Because I have set it to do so in both IE and Mozilla? Yes you can set both IE6 and Mozilla to accept only first party cookies and reject third party cookies, also both browsers restrict access to cookies to the originating domain only by default (not sure if this can even be changed).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Actually, IN SOVIET RUSSIA, they use elgooG.
Results can also vary due to the Google Dance.
Google has 7 data centers each with a copy of it's index and these are "usually" mapped to www.google.com. But google also has versions located at www2.google.com and www3.google.com.
During the monthly update there can be different version of the index on each of the 3 versions. A website www.google-dance-tool.1hut.com provides results for a search done on all 3 of googles index.
To check to see if the google dance is happening the most common technique is to check the "back links" for mayor sites like Yahoo by typing "link:www.yahoo.com" into the search box. this will list all the sites with links to "www.yahoo.com".
The Google Dance Tool site mentioned checks google every 5 minutes to see if the dance is on. Once it is started it sends out an automated email to subscribers (like me) so I can visit the site and see what the search positions for the next month on google will be using their google dance tool search.
as you can get through those first 1 010 000 results, then i'd start to worry about why it sometimes comes up with 20 000 more.
That might have a chance of having less matches then an AND ... but there is a better chance that I don't know what I'm talking about :]
The hypothesis should be easily testable.
Presumably if you did an nslookup on www.google.com and each person in your experiment used the ip instead of the name then they'd get the same estimated results since you would be using the same host at the same datadenter?
Or is there load balancing other than DNS round robin the load balancing?
Google uses Dmoz as a pagerank source. Just submit your site through Google's forms, I've had stuff listed within 1-2 days.
OliverWillis.Com
An Operative with an Agenda
Everyone knows it's because of he pigeons. Everytime you have an analog element, such as a pigeon, in the equation your end result will vary.
They claim 76,300,000 pages with 'computer' try actually getting past 1000. It just stops.
google employs a sreach spider called the freshbot. the freshbot spiders constantly, looking for new content, and periodically injects those results into the search engine listings at the data center. these results drop and return sometimes.
chances are your site was freshbotted, dropped, and re-catalogged. it could also be a result of the 'dance' at the end of each month when google is updating its search results. rankings fluctuate alot at that time.
There's nothing Intelligent about Intelligent Design.
Speaking about the Zeitgeist, I imagine they'll have a graph showing the spike in searches for "pictures of mountains" and how it relates to when this article was posted... :-)
All editorial writers ever do is come down from the hill after the battle is over and shoot the wounded.
And I have the 80,000 hit for the word 'sex' yet I still get 100-200 hits per day from Google and Yahoo.
I think people tend to do more complex searches.
"We have determined that the result varies because google does not like some of you for personal reasons."
Table-ized A.I.
With the exception of my name, I run a couple of sites that are no where near the top results for the categories people arrive with in their referers.
And as I was explaining to a friend last week, the photo of her at my wedding attracts desperate surfers looking for a similarly named Malaysian porn star.
Yes...
c /pd/cxsr/400/in dex.shtml
There are hardware load-balancers that allow the machine to keep the same front end ip address, and dynamically pass the tcp connection to the back end server farm.
cisco used to have one called the localdirector
http://www.cisco.com/warp/public/c
There are others...
--
Time is on my side
Slashdot is right vs. Slashdot is wrong:
slashdot sucks vs. slashdot rules:
slashdot correct vs. slashdot incorrect:
Cmdrtaco vs. cowboyneal
News for nerds vs. Stuff that Matters
I got 1,320. And then I reloaded the page four times. Every time, 1,320. Hmm...
umm, d00d, can you not add? u win by 30,000, not 20, and besides, the guy under you has 1,030,000..
tcpa SUX!!!!
I mean, the differences of 10,000 are still more results than I would be interested in looking at yet alone 1,000,000. When do we stop counting?
I encrypt all my files with Double XOR Encryption!
So Google has more then one datacenter to cache Slashdot and Slashot often caches itself when it dupes storys. So the cached story on Google is a dupe of perveious story but Google's second datacenter purges ths story because it is a dupe and then is reposted again on Slashdot when is purged by the first datacenter and duped again and purged by second datacenter and cached on the third and duped by slashdot....*bang*
See judge he needed killing. He was stuck. It was a mercy killing...
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
...all you needed to do was ask: plenty of "pictures of mountains" on my site. And, no, this is not a troll.
Non-Linux Penguins ?
Spell Check:
Type in candidate spellings of a word, and assume the spelling with the most search results is the right one:
'amatuer' -> 3.9e6 hits, 'amateur' -> 35e6 hits. Amateur it is.
'modelling' -> 2.6e6 hits, 'modeling' -> 5.7e6 hits. Close call, perhaps both are acceptable?
Ego Boost:
Everyone knows about this one: see what comes up under your own name (put it in quotes if necessary)- Hopefully if you run a small website or comment with your real name frequently in a google searchable place that'll come up first. But you'll have to work hard to beat out all those genealogy sites that just list thousands of names, graveyard roll-calls and whatnot. Oh, and there's some court case from five years ago where you're name is featured prominently. My namesake is shared with one of the first shaken babies to die and become a major local (wherever it happened) newstory- not much of a boost after all.
Stalking:
I'd imagine this pretty similar to the previous, but with names of other people you know or used to know: your old college sweetheart died in 1892! Wait...
Trademark pre-research .com,.org,or .net site with the same name resolves, just in case.
You need a product name- something fresh and original, and easily googleable? Start with a few ideas, and use a thesaurus (and don't forget cool foreign language words/roots) to refine the name until google hits are down to a zero. Run words together or otherwise potential customers will end up at sites that just randomly use those words at different points of the text- assume the customer is too dumb or lazy to use quotes.
'NodeZero' is my new badass something-or-other- wait there's 1K hits, how about 'NodeNull'? Only 8 now, that's good, but better yet try 'NodeNothing'- zero results.
After the google test see if the
I'm sure there's many more...
Oh, page hits?! Err, nevermind.
Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote.
my search took .05 seconds. same number, though
If I have nothing to hide, don't search me
Offensive as it may be, Slashdot does not delete posts (with extremely rare exceptions.. it's maybe happened once or twice)
slashdot!=valid HTML
"I have a question about some conflicting results with the search engine google. I did a search for "pictures of mountains" and got exactly 1 million results. My friend did the same search (from the same office)and got 1,010,000 results. A second friend did the same search as the last 2 and got 1,020,000"
I work at Google. This is a new feature that we added about four months ago, it's called the Friend Result Growth: For searches that return a huge result, the engine approximates the returned number as 1,000,000 + Friend*10,000.
-Fatty
It's simple, really... mountains are the new thing in pornography. People are snapping and posting so many pictures of naughty, erotically shaped rock formations that the number of mountain pics available worldwide on the net is rising by about 10,000 every 10 minutes.
Speaking of which, the Grand Tetons in Wyoming have an interesting French translation for their names.
Which is really annoying. All the other pages are just pages discussing my site. Autopr0n.com used to be the #1 result for a search on "autopr0n" and I got tons of hits from people doing just that.
autopr0n is like, down and stuff.
Seeing as everyone else is playing with google over this...
Query: ask slashdot ADD insightful comments -"insensitive clod" -"karma whore" -"Katz"
Hmm, 2310, not bad.
-AlPhAbEt
I only have one site.
:)
It's first hit for the nicknames "zcat" and "hunnyb" however, along with a few two-word searches.
I'm rather proud of that
455fe10422ca29c4933f95052b792ab2
Once a month ? This is slashdot you lucky bastard !
beauty is only a light switch away
Sometimes nodes go down. So, if (insert favorite search engine here) hits 1024 nodes for your search, there's a reasonable chance that one of them will time out, or be down / rebooting, or whatever. The search results would differ, and a few of the lower-ranked pages wouldn't show up, but the most important pages (the first ~1000?) will be on multiple nodes so they'll always show up. Really, if the company's choices are be down for one day a year or give only 99.9% accurate results 24/7/365, what are you going to choose, especially when you can count the people who will encounter problems with the remaining .1% on one hand? And a simple refresh will solve the problem?
And nodes WILL be down. (Insert major search engine here) cycles server reboots so that the nodes fail in a controlled fashion, instead of crashing and potentially causing problems. When there are thousands of nodes, statistically one's always down at any given time in a rolling reboot situation like this.
I don't doubt that the Google guy's explaination is probably true. But there can also be more than one reason... and I DO know that (major search engine) is doing active research on the topic.
A witty [sig] proves nothing. --Voltaire
check your referrers log
are they coming to you for just that word?
'There is a Light that never goes out.'
#1 Search query for Feb 2003 is...
:)
"pictures of mountains"
Your search - snarklbort giffleblag byzgetford - did not match any documents.
No pages were found containing "snarklbort".
Suggestions:
- Make sure all words are spelled correctly.
- Try different keywords.
- Try more general keywords.
- Try fewer keywords.
Try the experiment again but after disabling cookies.
Yes google is the best, whether it dances or not !
Chris ,
Php Programmers.
It may not be that he overlooked the possibility. If google does any kind of load balancing (even through round-robin dns) you can often set IP Affinity so that once a client makes a connection, they will almost always get the same connection. IP affinity is often used in web farm environments where you maintain a small amount of reconstructable state on each server and its less expensive to keep having the same client visit the same server while other clients could be directed to (and gain an affinity for) other servers.
I'm the #1 Google hit for the tube is civilization. Woohoo! I get more hits from that phrase than any other, easily. :)
http://www7.scu.edu.au/programme/fullpapers/1921/c om1921.htm
"If anything can go wrong, it will." - Murphy
Hey I got 105,000 hits!
:-) /b
Why is that?
[Please type your sig here.]
Completly diferent things!
ps: i got 1,050,000 :) woot
You are so right. Scroll-time wasting double spacing ranks way up on my scale of irritating postings too.
or there are many many horny people that have not been satisfied by the first 249 links :)
Sneak teach kids Algebra using a game
I'll forward you some emails from people who can help you with your search engine placement problems...
a) did he very (by examination of the "referrer" tag in his logs) that people really went there from a search for "sex" (and not "sex" combined with other words)?
b) people might be tired of "google optimized" webpages and manually insert a "&start=250" parameter in the address line, to skip the commercial sites and browse to the less commercial ones.
Btw, I use the b) technique in Google-Groups to find old postings. The sort-by-date option only sorts from newest to oldest, and by modifying the page number you can directly go to the last page - effectively reversing the sort order.
Maybe it's google's way of saying:
"Use images.google.com instead, man."
-
See? The evil guys might fight dirty, but the good guys always win. I knew movies mirrored life exactly, which is why I don't bother to go outside, I just stay locked in my basement and watch movies on my computer while reading /.
This space for rent, inquire within.
It could also be affected by language settings and other preferences, coul it not?
mbbac
Offtopic.... ? I think not. http://www.google.com/technology/pigeonrank.html Thanks for reading it though.
2b2b2b415448300d
Because that's the way its set up. The only way 3rd party cookies work at all is if an ad or even a one pixel image from another domain is on the page you're looking at.
I use Mozilla and I trust that it implements the spec correctly. I'm a web developer as well, the problem I most often have with IE is not accepting valid cookies. So maybe now IE reads what type of server is sending information to it and only accepts cookies from Windows servers?
The Anti-Blog
What's the great attraction of watching cars going at it? :-D
--- I wish I could hear the soundtrack to my life. That way I'd know when to duck.
well, usually it is, but this just happened to me last night:
." *click*
Ring Ring! (about seven times)
MEAN_LADY: "Hello?!"
ME: "Yeah, uh, how late are you open?"
MEAN_LADY: "Who are you calling?!"
ME: "Is this 7xx-4xx-5xxx?"
MEAN_LADY: "--YES!"
ME: "But this isn't Fantastic Sam's?"
MEAN_LADY: "--NO, oh gosh no they changed their number about two years ago."
ME: "Oh, I'm sorry."
MEAN_LADY: "Bieeee . .
Thanks, Google. #1 link was 2 YEARS out-of-date.
hi, I like pancakes -.-- -.-- --..
Right now, several Google engineers are thinking "what the hell are all these searches for 'pictures of mountains'?"
Another Google search shortcut exists in Opera. In the browser bar (press F8 to get there in a hurry, also try ctrl-n), enter:
:)
g "pictures of mountains" OR "mountain pictures"
to initiate a _G_oogle search for your stuff. Extremely handy. There's also about a dozen other such shortcuts (check Preferences/Search) You can even set the number of returned search items without setting a cookie
No one has ever fired for blaming Microsoft.
Thanks. I now have the same kind of nasty migrane headache that I had when I first learned about recursion.
Likewise, Einstein, I ate some Chinese food for lunch yesterday, and guess what, I wasn't in China at the time.
Possible questions for /. polls:
1) How many people will repeat this line? (My wife told me the morning DJ on the radio channel she listens to already made this joke.)
2) How long before this joke is entirely lame? More quickly than the news Kevin Mitnick had his web site hacked?
Geoff
I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
And as I was explaining to a friend last week, the photo of her at my wedding attracts desperate surfers looking for a similarly named Malaysian porn star.
Back a few years ago, I posted a cute painting my sister drew on my site: a picture of a little girl hugging a big St. Bernard dog.
girlwithdog.jpg.
You wouldn't believe the direct traffic I got to that one picture. People are really screwed up.
If your dick was the size of mine, you'd have jealous people calling you a troll too.
I hate liberals. If you are a liberal, do not reply.
That's because Google doesn't do boolean searches. It will ignore the or (too common a word) and ends up treating it like an and search.
I sincerely hope that whever moderated the parent "informative" will get what they deserve.
First, Google supports limited boolean operators, consisting of exclusion (using '-'), forced-inclusion of stopwords (using '+'), phrases (words in double quotes or separated by other punctuation marks) and ORing (using 'OR' or '|').
It also supports "wildcard words", which lets you approximate a "NEAR" search.
Finally, more customization is available using documented and undocumented (julian date and phonebook - residential and business) operators.
However, when their their 10-word limit is combined with the absence of stemming, real wildcards and real boolean expressions, you may want to check the competition.