Yahoo Passes Google in Total Items Searched
tonyquan writes "Yahoo announced today that its search engine passed Google's for overall capacity, with 20 billion documents and images indexed versus 11.3 billion for Google. Observers had previously pegged Yahoo's index at just 8 billion items. The growth is due to a recent expansion effort. More info can be found on the Yahoo! Search blog and at CNet."
It's interesting to see that Yahoo! may have surpassed Google on this metric. Over the past decade, Yahoo! has beaten other "hares" to date, including AOL and Microsoft's MSN. They're doing some innovative stuff, but also have some areas to catch up on. More here: http://mp.blogs.com/mp/2005/08/on_the_merits_o.htm l
Now all Yahoo has to do is create a real search engine that can actually spew out relevant results amongst those 20 billion entries...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
...now it'll be even harder to find anything on Yahoo! Google keeps and holds its users because searches *work*. When I search for something, Google has a very high chance of giving me what I want in 4 pages or so. Yahoo! isn't as good at getting me the information I want. The problem might even be made *worse* with all these pages. Yahoo! has never said, AFAIK, how it ranks pages, but Google does it better. With this wealth of data, the ranking system is going to be under much more scrutiny at picking the right pages.
Why isn't programmer efficiency measured in KLOCs? Because quality is more important than quantity when used as the only metric.
I always wonder about that. How many of those billions of additions to the engine pages that retroactively generate pages according to what is searched for?
I *hate* those pages the most, as they usually have every word in mankind listed in six or more languages, and just so happen to grab the one you're looking for just to suck you in to their million popups.
I guess quality verses quantity will be an afterthought; we're about to see quite the cache expansion if my gut feeling is right.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
I don't believe that volume of pages is really a relevant metric to be used in the case of search results. With an infinite number of pages the real metric comes down to relevance.
Stay tuned for new sig...
Just go there and see for yourself: a search on the word "a" (letter "a", whatever) yield 11.5bn results. If you admit there may be twice as many pages without "a" in it (say, all non-latin webpages, files, jpgs and such), that's pretty close to their 20bn entries.
:-)
Of course, now if you still doubt, you're welcome to count all 11.5bn results and make sure none of them are dupes
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
If Google wants to survive in the long run, they will need to stop playing favorites based on political ideology. They give, IMO, too much lee way for their adsense and google news people to restrict access. One blogger I know of was rejected as a "racist" because she questioned whether Nelson Mandela really should be called a hero. The irony of it is that my blog is far more politically incorrect than hers and AdSense for some reason accepted me. I wrote a letter to Google about the behavior of their AdSense policies and News development team, but they did the customary Google response which was "we don't care."
The thing that Google needs to wake up and realize is that they have been caught doing genuinely evil things like letting Hamas use AdSense to promote their recruitment and training centers, and Yahoo has survived enough big companies attacking them to make them a longterm threat. The real war is between Google and Yahoo, not Google and MSN, and Yahoo understands clearly how being apolitical is necessary to really become a hub for finding and accessing data online.
Don't be surprised if in a few more years of broadband development, that Yahoo is able to position itself as an alternative to many cable TV providers. Expect them to start providing premium content alone or in conjunction with Apple. If that happens, Google is actually going to be screwed because the market for that sort of media is huge and the amount of money that Yahoo will have will dwarf Google. Sooner rather than later, Google's stock price will crash down to maybe $20-$30 a share unless they really do some death-defyingly radical things every so often over the next several years that the market likes. In fact, I'd wager that if Yahoo can get deep into providing on-demand TV services, that in five years they'll be able to buy Google in cash unless Google really does become the "Microsoft of search services."
Click here or a puppy gets stomped!
Nonsense.
Search: Google's Pagerank concept radically changed the way that search engines determined which results were relevant. While previous services were based on human rankings or on how many times a particular word was listed on the page, Google put out an automated system which was able to deliver more relevant results when confronted with normal sites and, by its very design, much harder to exploit with SEO techniques. Further, Google continually tweaks the parameters of their search -- if you can go to one of Norvig's talks about the sorts of stuff they do, it's amazing.
Maps: That interface -- scrolling, markers, and all -- is done entirely in javascript. No plugins, no flash, no helpers. Nobody thought that that sort of thing was even possible.
GMail: I don't use it, so I can't comment. But I do have around 1 GB of email on my primary account. When you use email for serious work, it can add up.
Google Groups: It's my group reader. I like it because it shows the discussions in thread format from the top and supresses the quoting that can make USENET discussions turn into pages and pages of greater-than symbols.
As to your assertion that Google hasn't ushered in a new age, I disagree. Ten years ago, when someone wanted information they went to a library, an encyclopedia, or maybe a CD-ROM. Now, any time anyone wants to know anything, they go immediately to Google and chances are that the information will come up on the first page.
Lest you've forgotten, it was Napster and Winamp that 0popularized mp3's, not the iPod, and COBOL, not Oracle, that popularized the database. So I'd respond to you, "Stop the misinformation campaign."
I used to read Caltizzle. I was a lot cooler than you.
And you know what? The same thing can be said of Google. I know Google is akin to the iPod on Slashdot, but I've been finding it less and less useful...expired image links, "cached" pages that go nowhere (the browser just sits and spins), completely irrelivent results. I've been finding sites like www.clusty.com to be better at giving me meaningful results these days.
But the real problem, as I see it, is the whole concept of "spidering" around the web to begin with. It started with the original Webcrawler as an ad-hoc solution to a serious deficiency in the design of the World Wide Web. Ted Nelson (of Xanadu fame, which at least somewhat inspired the WWW) has railed and ranted about this from day one. Hell, even Gopher was built with some sort of indexing capability. But not the Web, and that leaves us with this ancient (in relative terms) method of trolling web servers for links. There has GOT to be a better way, but perhaps it is far too late at this point.
... how come no one is?
Where else can I find the likes of Y! Calender / Mail / Address book, all integrated, for free? Point me there and I might jump ship.
GMail is great for email, but it's address book is a POS, and there is no calendering whatsoever. Meanwhile, over at Y!, I have a calender that not only shows me the weather forecast for the week embedded into it, but it also issues me reminder notices via Y! IM for important dates.
Not to mention the vast usefulness of other Y! services like Launch! and Y! Photos.
Google may be leading the way as far as search, maps, and email goes, but for other services, *they* are the ones playing catch-up. For example, see their "Customized" home page, which http://my.yahoo.com/ had beat about 3 years ago.
Even when you use "Yahoo" to search for something,
you're still googling it. Just like xeroxing on a Canon, or putting food in the frigidaire (even if it's a Kelvinator.)
Google has this kind of brand identity, for good or for worse. This is a status that both Napster and Tivo almost acheived, but fizzled just in time to escape the phenomenon.
-fb Everything not expressly forbidden is now mandatory.
I do think this is interesting to note, but I have to ask you as a business man, what matters more to you, the quality of the search or the number of people using the search engine. From anecdotal evidence, I can tell you that I maybe know of 3 or 4 people who use yahoo to search, and pretty much everybody else uses google or has firefox search toolbar set to google.
I can make a better hamburger than McDonald's can, but you're probably better off investing in them than you are in me.
However, the article said nothing about Yahoo becoming better, just Bigger. There's a world of difference between the two.
How do you know the 9BN pages google's not indexing are not worth indexing? How would google know? And if they did know those pages were no good, how would indexing them pose a risk of obscuring the better pages?
So.. Yahoo is mature and Google is not because Google's news service reprints many and varied websites-- but not some of the "blogs" you like-- and Yahoo's news service reprints Reuters? I'm not entirely sure what's going on here but it sounds like you are misinterpreting some kind of personal poor experience with Google's sales department as an actual problem.
Google and Yahoo news do not even offer remotely the same kind of service, nor are the services equal in importance. Yahoo News is almost closer to the core of Yahoo's service than even the search; Google News is more auxiliary from Google's perspective, and I don't think they're even getting much money off of them.
Anyway, frankly IMO "blogs" shouldn't be on google news anyway. Period. If I wanted a blog aggregator, I'd go to a blog aggregator. Google News is a news aggregator. The difference may mostly be only in terms of what the aggregated sites choose to identify themselves as, but that's enough of a difference for me.
As for AdSense, the categories based on which things can get classified as inappropriate for AdSense are extremely broad and if you're expecting close attention paid to border cases, I think you're expecting things of the service that the service never intended. And if the person your complaint here concerns is Michelle Malkin...? Well, from what I've read of her stuff, if you're trying to defend her against accusations of racism then some article about Nelson Mandela would be only the tiniest part of the problem.
Don't be surprised if in a few more years of broadband development, that Yahoo is able to position itself as an alternative to many cable TV providers.
Wait, wasn't this exact same prediction being batted around, like, five to seven years ago? And didn't it fail to work out then either? Hm, you are a blogger, aren't you.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
If over 1/2 the restaurants in big cities were fake restaurants built to look like the restaurant you were looking for, yes it would be.
lolololol pwned He's right, you know. The Anonymous. And the moderation system is useless, it pushes a mental monoculture and dissenting opinions are modded as trolls or flamebait. I have both set to add +5 points instead of subtract to combat this. I have to sift through some crap posts, but if that's what it costs, so be it.
You, sir, win the award for worst analogy ever. Restaurants only stay in business if enough people patronize them to make the restaurant worth running. Web pages, on the other hand, are almost, if not totally free to toss up. Some things are crap, some things are gold, but I think the crap to gold ratio goes way up as the number of pages increases. The crap that goes up on the internet stays up, the crappy restaurants don't. Google's PageRank is supposed to filter out things that no one else thinks is worthy of linking to, which can eliminate much of the problems caused by a high crap to gold ratio, but the gradparent's statement that adding many more web pages may harm results is a perfectly plausible assertion.
The problem is the difference between raw data and useful information.
When you look through a list of restaurants (or the list of anything in the yellow pages), you're looking at something put together based on _semantics_. Some human put that list together and made sure the _meaning_ is what you'd expect there: you can actually drive to one of those locations and order food.
Search engines, on the other hand, just look at the words and have no bloody clue of semantics.
If someone ever put together a list of restaurants, it would just be a list of all people who ever said the word "restaurant". Including everyone who ever said "I hate chinese restaurants" or "I took my gf to a restaurant" or "I went to see a new apartment, but it was above a restaurant" or whatever. Needless to say, driving to most of those locations would be a bloody useless exercise.
Adding another 20 million people to that kind of indexing would just raise the noise-to-signal ratio, not actually produce anything useful.
A polar bear is a cartesian bear after a coordinate transform.
People being willing to buy and people wanting to sell are different incentives. Google used to be great for researching a topic. Now if I want to buy (say) a solar cell, it is very hard to find anything but storefronts.
The value originally afforded by the web was the fact that I could find things out about different technologies, efficiencies, lifespans, etc, prior to making the purchase. This gave me an advantage over traditional information gathering techniques.
If in the "real world" I want to buy a few hundred solar cells, I have to talk to a manufacturer's rep, where I am unlikely to find any unbiased information. The "real world" equivalent of some of these search results are trade magazines in highly specialized areas. Most of the ads are for unknown companies or companies trying to push into a new market.
For Google and Yahoo to keep ad revenue up, they are going to need to make sure people continue to get helpful information segregated from empty shells.
Dude! What the fuck is a popup?
Why is anything anything?
By a simple random test, I think the results are clear: GOOGLE IS THE BETTER SEARCH ENGINE
All you "proved" is that Google is the better search engine when looking for information on obscure villages in Mongolia. Nothing else.
Besides, if you skip the adds, the first two results returned by Yahoo were more reliable sources. Wikipedia, and weather information on the location, as opposed to some site on Tripod for Google.
In the end, it's all subjective anyway.
What?
The number of 8 billion searchable pages on Google's home page wasn't touched for a long time. Usually they do an update when another engine claims to have a bigger index. Also, this number does not include images etc., Yahoo's number does. I agree that Google's sitemap helpers will dig out a lot of stuff from the hidden Web. Most probably Google's index contains way more than 8 billion pages, perhaps even more than 20 billion objects.
http://sebastianx.blogspot.com/
How about a search for mortgage on Google. Hmmm, this looks familiar. The two top results seem to be sponsored links instead of real results. Does "this [infer] that commerce puts people above the law"" on Google?
Never underestimate the power of fiber.