geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."
Re:This is news ?
by
Anonymous Coward
·
· Score: 1, Funny
They doubled the index by counting all the stuff on your hard drive indexed and sent to them by Google Desktop Search.
Re:Quality - not quantity
by
Ingolfke
·
· Score: 3, Funny
I agree search engines are so 1990. I rely exlusively on word of mouth to find websites. If Firefox would add a button to the toolbar that said 'Cool Sites', maybe with an icon of a pair of glasses, and have the button link to a webpage with links to the latest cool sites on the net, that would certainly be the end of Google and their 8 billion pages. Pah!
Re:Google makes minor change to website - news at
by
dotmike
·
· Score: 2, Funny
At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
If I kept eating so much spam...
by
dos_dude
·
· Score: 2, Funny
Now it's going to be even harder to get my name in the top spot. Why was I cursed with the surname Smith!
-- I used to have a better sig but it broke.
Doubled? Wait a minute...
by
't+is+DjiM
·
· Score: 5, Funny
From 4 to 8 billion pages... I guess they just indexed the google cache...
-- --Use ant to make.war
Re:More pages v.s more relevant pages
by
juglugs
·
· Score: 1, Funny
What's a library?
-- This sig is in Spanish when you're not looking....
Just tried the beta of the new MSN Search
by
Mostly+a+lurker
·
· Score: 3, Funny
I received this response:
This site is temporarily unavailable, please check back soon.
Didn't get the results you expected? Help us improve.
It is not clear to me how I can help them improve. Suggest they switch their servers to Linux?
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
I don't want to start an old discussion again... but hereI don't want to start an old discussion again... but here is where rdf,... can play a role. At my school they are starting to deposit articles,... in a repository that has metadata based on the dublin core. Hope this will help searching for that kind of info info: papers,... ?
Anyway, I believe google also has a personalised search:
http://labs.google.com/personalized
Maybe this can help.
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
That's why engines showing clustered results may well end up beating Google at its own game.
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
It is an interesting problem, extact string matching. If you think at how it would be done it is relatively simple for a short piece of text. just call strstr on a chunk of text. The problem, is google does not likely index large bodies of text. Instead, google indexes bags of terms. Each term is likely a stemmed word, that no longer resembles the orignal word. In this way, google compresses the document, saving space, while making it faster to look up key words in a document. The only way I think google could provide exact string matching, is to search their google cache. The problem or limitation with the google cache, is if you didn't notice, google does not cache every page, hence the word cache. While disk space is cheap it is also slow to access, so, even while it is visible google could store all 8 billion pages on disk it is only likely you would want to wait that long to search for your extact match. There are some tricks that could be used to speed narrow in on which documents to do exact string checking in. First they use the string you passed in and do the normal tokenization of the string breaking it down into parts. Then they come up with a result set. Now they can start doing exact string matching within that returned result set. The issue with that is it is undeterministic as to how long that process will take as each document is of arbitrary size. The best they could do would be to do an exact string match in the summary text and return the documents in that set first followed by the other documents, which is very close to what they actually do.
Yes because we at /. love Google..
Google is a constant source of information and a geeks friend - if the index has doubled so has our supply of information. Information rules!
Have they updated image search yet?
8 billion pages and not a single link to my blog.
/.
Can't figure of I should just shoot my self or maybe just open a subscription to
TC - My Photos..
No, wait, they are our internet search overlords since, like, 1999?
Mhm to anonymous coward or not to anonymous coward?
Will moderators smack my karma below zero?
They already have.
Training monkeys for world domination since 1439
In case of slashdotting use this mirror.
In Soviet America the banks rob you!
They doubled the index by counting all the stuff on your hard drive indexed and sent to them by Google Desktop Search.
I agree search engines are so 1990. I rely exlusively on word of mouth to find websites. If Firefox would add a button to the toolbar that said 'Cool Sites', maybe with an icon of a pair of glasses, and have the button link to a webpage with links to the latest cool sites on the net, that would certainly be the end of Google and their 8 billion pages. Pah!
over eight billion pages crawled
You don't just go from 4 billion to 8 billion overnight.
They are probably just crawling the same 4 billion twice.
Blearf. Blearf, I say.
At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
... my weight would probably double, too.
Of which 80% is V1AGR@ advertising,
and 19% is pr0n.
There's debate if the remaining 1% contains pirated music and movie or plans for DIY nukes.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Now it's going to be even harder to get my name in the top spot. Why was I cursed with the surname Smith!
I used to have a better sig but it broke.
From 4 to 8 billion pages... I guess they just indexed the google cache...
--Use ant to make
What's a library?
This sig is in Spanish when you're not looking....
It is not clear to me how I can help them improve. Suggest they switch their servers to Linux?
I don't want to start an old discussion again ... but hereI don't want to start an old discussion again ... but here is where rdf, ... can play a role. At my school they are starting to deposit articles, ... in a repository that has metadata based on the dublin core. Hope this will help searching for that kind of info info: papers, ... ?
Anyway, I believe google also has a personalised search:
http://labs.google.com/personalized
Maybe this can help.
That's why engines showing clustered results may well end up beating Google at its own game.
It is an interesting problem, extact string matching. If you think at how it would be done it is relatively simple for a short piece of text. just call strstr on a chunk of text. The problem, is google does not likely index large bodies of text. Instead, google indexes bags of terms. Each term is likely a stemmed word, that no longer resembles the orignal word. In this way, google compresses the document, saving space, while making it faster to look up key words in a document. The only way I think google could provide exact string matching, is to search their google cache. The problem or limitation with the google cache, is if you didn't notice, google does not cache every page, hence the word cache. While disk space is cheap it is also slow to access, so, even while it is visible google could store all 8 billion pages on disk it is only likely you would want to wait that long to search for your extact match. There are some tricks that could be used to speed narrow in on which documents to do exact string checking in. First they use the string you passed in and do the normal tokenization of the string breaking it down into parts. Then they come up with a result set. Now they can start doing exact string matching within that returned result set. The issue with that is it is undeterministic as to how long that process will take as each document is of arbitrary size. The best they could do would be to do an exact string match in the summary text and return the documents in that set first followed by the other documents, which is very close to what they actually do.