Slashdot Mirror


Yahoo Passes Google in Total Items Searched

tonyquan writes "Yahoo announced today that its search engine passed Google's for overall capacity, with 20 billion documents and images indexed versus 11.3 billion for Google. Observers had previously pegged Yahoo's index at just 8 billion items. The growth is due to a recent expansion effort. More info can be found on the Yahoo! Search blog and at CNet."

27 of 434 comments (clear)

  1. Yahoo! playing Tortoise to Google's Hare by Ohmster · · Score: 4, Insightful

    It's interesting to see that Yahoo! may have surpassed Google on this metric. Over the past decade, Yahoo! has beaten other "hares" to date, including AOL and Microsoft's MSN. They're doing some innovative stuff, but also have some areas to catch up on. More here: http://mp.blogs.com/mp/2005/08/on_the_merits_o.htm l

    1. Re:Yahoo! playing Tortoise to Google's Hare by cybersaga · · Score: 4, Funny

      but also have some areas to catch up on

      Like how to park?

  2. Great by Rosco+P.+Coltrane · · Score: 5, Insightful

    Now all Yahoo has to do is create a real search engine that can actually spew out relevant results amongst those 20 billion entries...

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
  3. Great... by Lewisham · · Score: 5, Insightful

    ...now it'll be even harder to find anything on Yahoo! Google keeps and holds its users because searches *work*. When I search for something, Google has a very high chance of giving me what I want in 4 pages or so. Yahoo! isn't as good at getting me the information I want. The problem might even be made *worse* with all these pages. Yahoo! has never said, AFAIK, how it ranks pages, but Google does it better. With this wealth of data, the ranking system is going to be under much more scrutiny at picking the right pages.

    1. Re:Great... by bedroll · · Score: 5, Funny
    2. Re:Great... by mph · · Score: 4, Informative

      Adding "review" usually results in storefronts that say "Be the first to review this product!".

  4. Googlebot is not very aggressive on internal links by Anonymous Coward · · Score: 5, Informative
    We recently launched a mobile search engine. The domain was registered, pages created, etc, so I'm observing it go from zero page rank, to having a page rank and getting crawled. Yahoo's bot definitely crawls more frequently, and Googlebot doesn't seem to crawl any links unless they are linked to from external pages. I assume that as the pagerank increases, Googlebot will get more aggressive, but from what I can see in the logs it's clear that Googlebot takes a "wait and see" approach to crawling.

    That's not a bad thing. There are a lot of useless pages out there, and having twice as many pages in the index certainly does not mean twice as many useful pages.

    I am glad to see the search engine wars are on and competitive.

  5. More important by Chairboy · · Score: 5, Insightful
    A newsflash that's more important to me is how, years ago, Google passed Yahoo's abillity to display relevant results.

    Why isn't programmer efficiency measured in KLOCs? Because quality is more important than quantity when used as the only metric.

  6. Re:fantastic by ciroknight · · Score: 4, Insightful

    I always wonder about that. How many of those billions of additions to the engine pages that retroactively generate pages according to what is searched for?

    I *hate* those pages the most, as they usually have every word in mankind listed in six or more languages, and just so happen to grab the one you're looking for just to suck you in to their million popups.

    I guess quality verses quantity will be an afterthought; we're about to see quite the cache expansion if my gut feeling is right.

    --
    "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
  7. Quantity versus quality by IamGarageGuy+2 · · Score: 5, Insightful

    I don't believe that volume of pages is really a relevant metric to be used in the case of search results. With an infinite number of pages the real metric comes down to relevance.

    --
    Stay tuned for new sig...
  8. 20 billion documents, I wonder... by baylanger · · Score: 5, Funny

    Are those 20 billion documents, the actual SPAMs I received at my yahoo mail account since 1994?

  9. 95% of which is crap by darkCanuck · · Score: 5, Interesting
    • useless blogs and geocities "websites": 12 billion
    • clipart, midi and hideous backgrounds for above websites: 6 billion
    • links to outdated or expired user sessions: 1 billion
    • real content: 1 billion, if lucky
    The only thing I ever use Yahoo for is if and when my internet connection seems slow or dead I ping yahoo.com. It's just been a habit since the 90's.
  10. Re:Googlebot is not very aggressive on internal li by Eric+Giguere · · Score: 4, Informative

    The Yahoo! crawler (Slurp) is definitely more aggressive than the Googlebot. It comes knocking on my door several times a day, especially the blog pages. Google is more conservative and keeps things in a sandbox, too.

  11. It's true, and easy to check... by NotQuiteReal · · Score: 4, Interesting
    I did a search for "a" on both Google and Yahoo.

    Results:

    Google: "1-10 of about 3,120,000,000 .06 sec"
    Yahoo: "1-10 of about 11,300,000,000 .08 sec"

    Top yahoo hit - some punk band. Top Google hit, apple .com.

    Gee, who do you think will make more money with those results... ;-)

    --
    This issue is a bit more complicated than you think.
  12. Re:fantastic by fembots · · Score: 4, Informative

    While 9 billion additional pages are pretty useless to an individual, it can however mean each topic will have an additional 30 pages, or a search on Ferrari images gives another 25 pictures.

  13. Big Increase - Simple Explanation by ndansmith · · Score: 5, Funny

    The increase can be explained by Yahoo adding Slashdot dupes to their index.

  14. My own - albeit anecdotal - experience... by mosel-saar-ruwer · · Score: 5, Interesting

    I've spent the last few days doing some very important searching - we're thinking about launching a new product in a rather arcane field, and I wanted to be absolutely certain who the potential competition might be - hence I decided to search both Google & Yahoo!.

    Guess what? Yahoo! search beats Google search, hands down. Not even close.

    Two thoughts:

    1) While everybody was oohing and ahhing about Google's IPO, Yahoo! very quietly went about purchasing some excellent search engine/caching outfits, like Inktomi and AllTheWeb, and, owing to the great dot-com bust, paid only pennies on the dollar in acquiring some outstanding talent and intellectual property.

    2) I think Google's been reading too many of their own press releases, and has been resting on their laurels for a few years now. And it doesn't help matters that their CEO, Eric Schmidt, is the same fella who damn near drove Novell to bankruptcy.

    1. Re:My own - albeit anecdotal - experience... by coflow · · Score: 5, Insightful

      I do think this is interesting to note, but I have to ask you as a business man, what matters more to you, the quality of the search or the number of people using the search engine. From anecdotal evidence, I can tell you that I maybe know of 3 or 4 people who use yahoo to search, and pretty much everybody else uses google or has firefox search toolbar set to google.

      I can make a better hamburger than McDonald's can, but you're probably better off investing in them than you are in me.

    2. Re:My own - albeit anecdotal - experience... by Sancho · · Score: 4, Informative

      Multiple search engines are probably the way to go, honestly, but here's some counter-anecdotal evidence.

      Search for:
      super mario world hacks

      on each of Yahoo and Google, and check the first hit. Google takes it hands down, with an entire page devoted to SMW hacks, vs. Yahoo's page on SNES hacks.

      I routinely try other search engines, and while another one occasionally trumps Google, the big G tends to come out on top overall.

  15. Re:fantastic by b0r1s · · Score: 4, Informative

    Google's index should be growing faster in the coming months. With more and more webmasters implementing Google's sitemap helpers, a lot of unlinked/dynamic pages should start showing up very, very soon.

    --
    Mooniacs for iOS and Android
  16. If anyone can do it... by brunes69 · · Score: 4, Insightful

    ... how come no one is?

    Where else can I find the likes of Y! Calender / Mail / Address book, all integrated, for free? Point me there and I might jump ship.

    GMail is great for email, but it's address book is a POS, and there is no calendering whatsoever. Meanwhile, over at Y!, I have a calender that not only shows me the weather forecast for the week embedded into it, but it also issues me reminder notices via Y! IM for important dates.

    Not to mention the vast usefulness of other Y! services like Launch! and Y! Photos.

    Google may be leading the way as far as search, maps, and email goes, but for other services, *they* are the ones playing catch-up. For example, see their "Customized" home page, which http://my.yahoo.com/ had beat about 3 years ago.

  17. Re:fantastic by HD+Webdev · · Score: 4, Funny

    How do you figure? Do you find it harder to find restaurants in large cities?

    Only if most of those restaurants in large cities give you a menu that only lists Viagra as something you can order.

    --
    This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
  18. What on earth? by mcc · · Score: 4, Insightful

    So.. Yahoo is mature and Google is not because Google's news service reprints many and varied websites-- but not some of the "blogs" you like-- and Yahoo's news service reprints Reuters? I'm not entirely sure what's going on here but it sounds like you are misinterpreting some kind of personal poor experience with Google's sales department as an actual problem.

    Google and Yahoo news do not even offer remotely the same kind of service, nor are the services equal in importance. Yahoo News is almost closer to the core of Yahoo's service than even the search; Google News is more auxiliary from Google's perspective, and I don't think they're even getting much money off of them.

    Anyway, frankly IMO "blogs" shouldn't be on google news anyway. Period. If I wanted a blog aggregator, I'd go to a blog aggregator. Google News is a news aggregator. The difference may mostly be only in terms of what the aggregated sites choose to identify themselves as, but that's enough of a difference for me.

    As for AdSense, the categories based on which things can get classified as inappropriate for AdSense are extremely broad and if you're expecting close attention paid to border cases, I think you're expecting things of the service that the service never intended. And if the person your complaint here concerns is Michelle Malkin...? Well, from what I've read of her stuff, if you're trying to defend her against accusations of racism then some article about Nelson Mandela would be only the tiniest part of the problem.

    Don't be surprised if in a few more years of broadband development, that Yahoo is able to position itself as an alternative to many cable TV providers.

    Wait, wasn't this exact same prediction being batted around, like, five to seven years ago? And didn't it fail to work out then either? Hm, you are a blogger, aren't you.

  19. I've got Results as to why I prefer Google: by Ralph+Spoilsport · · Score: 5, Interesting
    OK: I did a brain fart search on both engines. The word? Kyzyl. It's the capital of Tuva. Tuva is an obscure little suburb of Mongolia. Yep. When you think your stupid relatives who bought a place in Indiana live in the middle of Nowhere, you're wrong. Tuva Is The Middle Of Nowhere.

    So, In Firefox tab A, I have Google and tab B is Yahoo. Both searched on Kyzyl.

    Results (pleae pay attention because htmling this was a pain...):

    Yahoo's first 5 entries:

    * All Russia Hotels All Russian Hotels - We offer discount hotel reservation services online in Moscow, St. Petersburg, Kiev, Russia, Ukraine, CIS and Baltic. www.allrussiahotels.com

    * Tuva Travel Kyzyl city is the capital of Tuva Republic (Russia) Kyzyl city is positioned right in the center of Asia, which is proudly claimed by a local monument specifically dedicated to this fact. www.sokoltours.com

    WEB RESULTS

    1. Wikipedia: Kyzyl
    Open this result in new window
    Wikipedia Free Encyclopedia's article on 'Kyzyl' en.wikipedia.org/wiki/Kyzyl

    - More from this site - Save - Block

    2. Weather Underground: Kyzyl, Russia Forecast
    Open this result in new window Find the Weather for any City, State or ZIP Code, or Airport Code or Country. Email. Password. Maps. United States. International. Information. Refinance Rates. GoTo Meeting. Kyzyl Singles. Hosting Companies. Online deals! Vitamins. Internet Mall ... Updated: 8:00 AM KRAST on August 02, 2005. Observed at Kyzyl, Russia (History) Elevation: 2064 ft / 629 m ... Coming soon: Flash Stickers. Kyzyl, 63 F / 17 C ...
    www.wunderground.com/global/stations/36096.html
    - 64k - Cached - More from this site - Save - Block

    3. AllRefer.com - Kyzyl (CIS And Baltic Political Geography) - Encyclopedia
    Open this result in new window

    3. AllRefer.com reference and encyclopedia resource provides complete information on Kyzyl, CIS And Baltic Political Geography. Includes related research links. ... By Alphabet : Encyclopedia A-Z - K. Kyzyl, CIS And Baltic Political Geography ... Kyzyl or Kizil[both: kizil'] Pronunciation Key, city (1989 pop ...
    reference.allrefer.com/encyclopedia/K/Kyzyl
    More from this site - Save - Block

    Now, for the first five Google Results on Kyzyl:

    Kyzyl'-administrative center of Republic of Tuva, Russia Kyzyl' Republic of Tuva,
    |Central-Chernozemny| ... Republic Capital:, Kyzyl. Capital Population:, 91000( at 01/01/94) ...
    members.tripod.com/~argun/kyzyl.htm
    - 5k - Cached - Similar pages

    Kyzyl on Encyclopedia.com
    Kyzyl or Kizilboth: kzl, city (1989 pop. 85000), capital of Tuva Republic, S Siberian Russia, on the Yenisei River. It services motor transport and has ...
    www.encyclopedia.com/html/K/Kyzyl.asp
    - 47k - Cached - Similar pages

    Kyzyl Travel Information. Photos, Stories and Diaries about Kyzyl
    Sustainable Tourism for independent travellers (travelers) and backpackers. www.worldsurface.com/browse/location.asp?locationi d=5654
    - 59k - Cached - Similar pages

    Kyzyl, Tuva, Russia current local time
    Kyzyl, Tuva, Russia - before placing a telephone call or making travel plans for a flight or hotel, get the current local time provided by ...
    www.worldtimeserver.com/current_time_in_RU-TY.aspx ?city=Kyzyl
    - 17k - C

    --
    Shoes for Industry. Shoes for the Dead.
  20. Re:fantastic by xs650 · · Score: 5, Insightful

    If over 1/2 the restaurants in big cities were fake restaurants built to look like the restaurant you were looking for, yes it would be.

  21. Re:fantastic by natrius · · Score: 4, Insightful

    You, sir, win the award for worst analogy ever. Restaurants only stay in business if enough people patronize them to make the restaurant worth running. Web pages, on the other hand, are almost, if not totally free to toss up. Some things are crap, some things are gold, but I think the crap to gold ratio goes way up as the number of pages increases. The crap that goes up on the internet stays up, the crappy restaurants don't. Google's PageRank is supposed to filter out things that no one else thinks is worthy of linking to, which can eliminate much of the problems caused by a high crap to gold ratio, but the gradparent's statement that adding many more web pages may harm results is a perfectly plausible assertion.

  22. Let me explain by Moraelin · · Score: 4, Insightful

    The problem is the difference between raw data and useful information.

    When you look through a list of restaurants (or the list of anything in the yellow pages), you're looking at something put together based on _semantics_. Some human put that list together and made sure the _meaning_ is what you'd expect there: you can actually drive to one of those locations and order food.

    Search engines, on the other hand, just look at the words and have no bloody clue of semantics.

    If someone ever put together a list of restaurants, it would just be a list of all people who ever said the word "restaurant". Including everyone who ever said "I hate chinese restaurants" or "I took my gf to a restaurant" or "I went to see a new apartment, but it was above a restaurant" or whatever. Needless to say, driving to most of those locations would be a bloody useless exercise.

    Adding another 20 million people to that kind of indexing would just raise the noise-to-signal ratio, not actually produce anything useful.

    --
    A polar bear is a cartesian bear after a coordinate transform.