Google Index Doubles

← Back to Stories (view on slashdot.org)

Posted by samzenpus on Wednesday November 10, 2004 @11:00PM from the even-more dept.

geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."

23 of 324 comments (clear)

Re:This is news ? by Manip · 2004-11-10 23:02 · Score: 1, Funny

Yes because we at /. love Google..

Google is a constant source of information and a geeks friend - if the index has doubled so has our supply of information. Information rules!
Image Search by TupperTrenine · 2004-11-10 23:04 · Score: 4, Funny

Have they updated image search yet?
I'm all alone by tcdk · 2004-11-10 23:06 · Score: 4, Funny

8 billion pages and not a single link to my blog.

Can't figure of I should just shoot my self or maybe just open a subscription to /.

--
TC - My Photos..
1. Re:I'm all alone by Zork+the+Almighty · 2004-11-10 23:11 · Score: 4, Funny
  
  If you shoot yourself, will your blog readers know ? I mean, it's kindof like the tree in the forest thing.
  
  --
  
  In Soviet America the banks rob you!
And I for one welcome... by mu22le · 2004-11-10 23:10 · Score: 2, Funny

No, wait, they are our internet search overlords since, like, 1999?

Mhm to anonymous coward or not to anonymous coward?
Will moderators smack my karma below zero?
Re:Google Schmoogle by seanyboy · 2004-11-10 23:13 · Score: 2, Funny

They already have.

--
Training monkeys for world domination since 1439
slashdotting by Zork+the+Almighty · 2004-11-10 23:15 · Score: 4, Funny

In case of slashdotting use this mirror.

--

In Soviet America the banks rob you!
1. Re:slashdotting by juglugs · 2004-11-10 23:45 · Score: 3, Funny
  
  No, no, no... Use this Mirror...
  
  --
  This sig is in Spanish when you're not looking....
2. Re:slashdotting by osvejda · 2004-11-11 01:30 · Score: 1, Funny
  
  This should be modded down. The mirror is out of date.
3. Re:slashdotting by xlcus · 2004-11-11 01:39 · Score: 2, Funny
  
  or this mirror ;-)
Re:This is news ? by Anonymous Coward · 2004-11-10 23:16 · Score: 1, Funny

They doubled the index by counting all the stuff on your hard drive indexed and sent to them by Google Desktop Search.
Re:Quality - not quantity by Ingolfke · 2004-11-10 23:18 · Score: 3, Funny

I agree search engines are so 1990. I rely exlusively on word of mouth to find websites. If Firefox would add a button to the toolbar that said 'Cool Sites', maybe with an icon of a pair of glasses, and have the button link to a webpage with links to the latest cool sites on the net, that would certainly be the end of Google and their 8 billion pages. Pah!
Nonsense. by MadFarmAnimalz · 2004-11-10 23:20 · Score: 2, Funny

over eight billion pages crawled

You don't just go from 4 billion to 8 billion overnight.

They are probably just crawling the same 4 billion twice.

--
Blearf. Blearf, I say.
Re:Google makes minor change to website - news at by dotmike · 2004-11-10 23:29 · Score: 2, Funny

At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
If I kept eating so much spam... by dos_dude · 2004-11-10 23:31 · Score: 2, Funny

... my weight would probably double, too.
8 billions.... by DrYak · 2004-11-10 23:39 · Score: 1, Funny

Of which 80% is V1AGR@ advertising,
and 19% is pr0n.
There's debate if the remaining 1% contains pirated music and movie or plans for DIY nukes.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Grrrrr by squoozer · 2004-11-10 23:47 · Score: 4, Funny

Now it's going to be even harder to get my name in the top spot. Why was I cursed with the surname Smith!

--
I used to have a better sig but it broke.
Doubled? Wait a minute... by 't+is+DjiM · 2004-11-10 23:50 · Score: 5, Funny

From 4 to 8 billion pages... I guess they just indexed the google cache...

--
--Use ant to make .war
Re:More pages v.s more relevant pages by juglugs · 2004-11-11 00:20 · Score: 1, Funny

What's a library?

--
This sig is in Spanish when you're not looking....
Just tried the beta of the new MSN Search by Mostly+a+lurker · 2004-11-11 00:29 · Score: 3, Funny

I received this response:
This site is temporarily unavailable, please check back soon.
Didn't get the results you expected? Help us improve.

It is not clear to me how I can help them improve. Suggest they switch their servers to Linux?
Re:More pages v.s more relevant pages by Anonymous Coward · 2004-11-11 00:38 · Score: 1, Funny

I don't want to start an old discussion again ... but hereI don't want to start an old discussion again ... but here is where rdf, ... can play a role. At my school they are starting to deposit articles, ... in a repository that has metadata based on the dublin core. Hope this will help searching for that kind of info info: papers, ... ?

Anyway, I believe google also has a personalised search:

http://labs.google.com/personalized

Maybe this can help.
Re:More pages v.s more relevant pages by Anonymous Coward · 2004-11-11 00:45 · Score: 1, Funny

That's why engines showing clustered results may well end up beating Google at its own game.
Re:More pages v.s more relevant pages by Anonymous Coward · 2004-11-11 01:10 · Score: 1, Funny

It is an interesting problem, extact string matching. If you think at how it would be done it is relatively simple for a short piece of text. just call strstr on a chunk of text. The problem, is google does not likely index large bodies of text. Instead, google indexes bags of terms. Each term is likely a stemmed word, that no longer resembles the orignal word. In this way, google compresses the document, saving space, while making it faster to look up key words in a document. The only way I think google could provide exact string matching, is to search their google cache. The problem or limitation with the google cache, is if you didn't notice, google does not cache every page, hence the word cache. While disk space is cheap it is also slow to access, so, even while it is visible google could store all 8 billion pages on disk it is only likely you would want to wait that long to search for your extact match. There are some tricks that could be used to speed narrow in on which documents to do exact string checking in. First they use the string you passed in and do the normal tokenization of the string breaking it down into parts. Then they come up with a result set. Now they can start doing exact string matching within that returned result set. The issue with that is it is undeterministic as to how long that process will take as each document is of arbitrary size. The best they could do would be to do an exact string match in the summary text and return the documents in that set first followed by the other documents, which is very close to what they actually do.