geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."
More pages v.s more relevant pages
by
xiando
·
· Score: 5, Insightful
Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.
These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.
Re:More pages v.s more relevant pages
by
Kithraya
·
· Score: 5, Insightful
I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.
Re:More pages v.s more relevant pages
by
PsychoSlashDot
·
· Score: 5, Insightful
What I've read on the Google help pages seems to indicate that they don't index punctuation or capitalization. When you search for something, your string is looked for within an existing index, and appropriate reference materials are shown. Including punctuation wouldn't result in any hits within their index, meaning no results.
Now, obviously, it is theoretically possible to do just about anything. But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical.
My point is that as I understand it, Google has coded a number of shortcut tricks which allow reasonable search times, and full-text string-exact searching would prevent them from using those shortcuts, resulting in search times they don't seem to think is reasonable.
-- "Oh no... he found the.sig setting."
Makes you wonder...
by
manmanic
·
· Score: 5, Insightful
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
Re:Makes you wonder...
by
jlar
·
· Score: 5, Interesting
"Does this mean that I've been missing a huge amount of important information until now?"
Maybe the steep increase is due to all the new file formats they are indexing now. That might be useful for some people (although I sometimes find it kind of annoying that a search returns MS-Word documents).
Re:Google thieves my bandwidth
by
Anonymous Coward
·
· Score: 5, Informative
Google respects the robots.txt file. Use it.
Doubled? Wait a minute...
by
't+is+DjiM
·
· Score: 5, Funny
From 4 to 8 billion pages... I guess they just indexed the google cache...
-- --Use ant to make.war
Re:Google thieves my bandwidth
by
jvj24601
·
· Score: 5, Informative
Well, if you know that Google is indexing your site and "stealing" your bandwidth, then you must have looked at the server logs, right? You'd see the name of the search bot is googlebot. Search for it, and you'll find that the first relevant link explains how to prevent googlebot from accessing your site.
The logs would probably also show failed attempts to find the file/robots.txt. Similar info is gained from searching on that term as well.
So, to sum up...
by
kahei
·
· Score: 5, Insightful
I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces:)
You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.
Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.
The solution is to drown yourself in a bucket. I have a bucket.
Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.
These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.
9/11: Never forget it was a false-flag operation
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
Google respects the robots.txt file. Use it.
From 4 to 8 billion pages... I guess they just indexed the google cache...
--Use ant to make
Well, if you know that Google is indexing your site and "stealing" your bandwidth, then you must have looked at the server logs, right? You'd see the name of the search bot is googlebot. Search for it, and you'll find that the first relevant link explains how to prevent googlebot from accessing your site.
/robots.txt. Similar info is gained from searching on that term as well.
The logs would probably also show failed attempts to find the file
I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces
You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.
Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.
The solution is to drown yourself in a bucket. I have a bucket.
Whence? Hence. Whither? Thither.