Yahoo Passes Google in Total Items Searched
tonyquan writes "Yahoo announced today that its search engine passed Google's for overall capacity, with 20 billion documents and images indexed versus 11.3 billion for Google. Observers had previously pegged Yahoo's index at just 8 billion items. The growth is due to a recent expansion effort. More info can be found on the Yahoo! Search blog and at CNet."
My google-fu isn't bad, but I sometimes have trouble finding relevant results. I figure adding 9 billion more possible results should complicate things quite nicely.
Provided it is correct... I don't suppose there is any third party organisation that was allowed limited access to the data to confirm it?
who's counting? :-/
~~~Please pass the salt, I hate unsalted MD5s
It's interesting to see that Yahoo! may have surpassed Google on this metric. Over the past decade, Yahoo! has beaten other "hares" to date, including AOL and Microsoft's MSN. They're doing some innovative stuff, but also have some areas to catch up on. More here: http://mp.blogs.com/mp/2005/08/on_the_merits_o.htm l
Now all Yahoo has to do is create a real search engine that can actually spew out relevant results amongst those 20 billion entries...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
...now it'll be even harder to find anything on Yahoo! Google keeps and holds its users because searches *work*. When I search for something, Google has a very high chance of giving me what I want in 4 pages or so. Yahoo! isn't as good at getting me the information I want. The problem might even be made *worse* with all these pages. Yahoo! has never said, AFAIK, how it ranks pages, but Google does it better. With this wealth of data, the ranking system is going to be under much more scrutiny at picking the right pages.
My Search Engine is bigger than yours
Ignorance is not a crime; neither should it be a way of life
Congress control $ = inmates run the asylum
I guess that means 9 billion potential more sites to hopefully buy their way into Yahoo's directory. Because if you're unwilling to pay, you're pretty damn sure not to get in for free.
StupidChildren...the reason jesus is crying
Check the post right below yours.
...world keeps using Google for searching.
There's a lot of crap out there on the internet, and adding it all doesn't help, especially when your search engine sucks.
That's not a bad thing. There are a lot of useless pages out there, and having twice as many pages in the index certainly does not mean twice as many useful pages.
I am glad to see the search engine wars are on and competitive.
Why isn't programmer efficiency measured in KLOCs? Because quality is more important than quantity when used as the only metric.
Reading these comments here all I can say is you guys are so brainwashed by the Google hype machine.
First, Google is NOT an innovator. Why not? Everything they do is a slight improvement on existing services:
- Search: Sure, it's the best search around, but it is simply an improvement over existing search services. And by now Yahoo's search is comparable. Soon there will be many equivalent search engines.
- Maps: Looks pretty, but it's just an incremental improvement over existing services. Trivial for Yahoo or anyone else to catch up.
- GMail: Nothing to see here except very good marketing. Who ever uses 1 GB of email? Nobody.
A lot of Google's services actually suck if you think about it. Froogle? Google Images? Those are a joke. And thanks for breaking Google Groups to make it unusable.
If you think Google is the greatest thing since sliced bread, take a deep breath and realize that it's just a company that is very good at marketing, and making lots of money.
Google is an advertising company, they are not a technology company. They are not true innovators like, say, Apple or Oracle. Just look at the reasons I outlined above to understand why. A true innovator ushers in a new age. Like Apple with the iPod and digital music. Or Oracle with database systems. Google hasn't ushered in a new age of anything.
Stop the hype.
I don't believe that volume of pages is really a relevant metric to be used in the case of search results. With an infinite number of pages the real metric comes down to relevance.
Stay tuned for new sig...
I've found that yahoo! slurp is almost always my most frequent visitor to my websites.
Religion for nerds. Stuff that really matters
It's not the size of the boat...
it's the motion of the ocean.
Don't forget the sneaky introduction of Yahoo toolbar with the installation of the Macromedia stuff .... That couldn't possibly have anything to do with the increase in searches ....
---
When you want to type a double-quote use " instead
Generated by SlashdotRndSig via GreaseMonkey
is it that bad seein a hot chick again? if i see a hot chick walkin down the hall i dont say "repost"
The only reason Google has not completely taken over all Yahoo traffic is the fact that Yahoo mail came out long before GMail. Many who have had Yahoo email accounts for years hate the thought of having to redistribute there email address to friends, family and colleagues that have come so accustomed to there current one. I for one continue although begrudgingly to use yahoo mail, but do all my searches from Google.
I bet 15 billion of those 20 billion are google search result pages.
i was only searching for hacks for windows genuine advantage and noticed that yahoo had better results since it runs on *nix :)
i'm not sure what the google-ster runs on.
Are those 20 billion documents, the actual SPAMs I received at my yahoo mail account since 1994?
- useless blogs and geocities "websites": 12 billion
- clipart, midi and hideous backgrounds for above websites: 6 billion
- links to outdated or expired user sessions: 1 billion
- real content: 1 billion, if lucky
The only thing I ever use Yahoo for is if and when my internet connection seems slow or dead I ping yahoo.com. It's just been a habit since the 90's.It's not the size of your index, it's the results you get with it.
Right?
RIGHT?!?
-Ted
-=-=- Quantum physics - the dreams stuff are made of.
While google is busy playing games with microsoft, creating google maps, giving employees 20% time off to work on their own projects, hiring only people with Masters or PHDs, giving away money for open source projects, giving away free 2GB+ email accounts,
Yahoo! has been busy working on its search technology. Soon, you'll also be introduced to Yahoo! blog search engine.
The Yahoo! crawler (Slurp) is definitely more aggressive than the Googlebot. It comes knocking on my door several times a day, especially the blog pages. Google is more conservative and keeps things in a sandbox, too.
It's ironic that 20 billion just happens to be the VERY SAME number of links on www.yahoo.com... hmm, coincidence?
Jerry
http://www.cyvin.org/
I see robots from yahoo and inktomi search (owned by Yahoo) hitting my web servers several times daily, as opposed to weekly (or even more sparse) hits from google.
Look for quick retaliation from Google; Remember what happened when Yahoo, Hotmail etc. began expanding their email storage space? Google's doubled to 2GB.
I have always found Yahoo searches too "commerce heavy," leaning way to much to "Yahoo stores" and "Yahoo sites." The meaningful info is lost in the jumble. Google results are cleaner and easier to parse. So, more Yahoo results equals more furustrating/confusing results.
I'll stick with Google, thank you.
Ignorance is curable, stupid is forever.
If Google wants to survive in the long run, they will need to stop playing favorites based on political ideology. They give, IMO, too much lee way for their adsense and google news people to restrict access. One blogger I know of was rejected as a "racist" because she questioned whether Nelson Mandela really should be called a hero. The irony of it is that my blog is far more politically incorrect than hers and AdSense for some reason accepted me. I wrote a letter to Google about the behavior of their AdSense policies and News development team, but they did the customary Google response which was "we don't care."
The thing that Google needs to wake up and realize is that they have been caught doing genuinely evil things like letting Hamas use AdSense to promote their recruitment and training centers, and Yahoo has survived enough big companies attacking them to make them a longterm threat. The real war is between Google and Yahoo, not Google and MSN, and Yahoo understands clearly how being apolitical is necessary to really become a hub for finding and accessing data online.
Don't be surprised if in a few more years of broadband development, that Yahoo is able to position itself as an alternative to many cable TV providers. Expect them to start providing premium content alone or in conjunction with Apple. If that happens, Google is actually going to be screwed because the market for that sort of media is huge and the amount of money that Yahoo will have will dwarf Google. Sooner rather than later, Google's stock price will crash down to maybe $20-$30 a share unless they really do some death-defyingly radical things every so often over the next several years that the market likes. In fact, I'd wager that if Yahoo can get deep into providing on-demand TV services, that in five years they'll be able to buy Google in cash unless Google really does become the "Microsoft of search services."
Click here or a puppy gets stomped!
Yahoo has grown a bigger epenis than google.
How about that..
Though of course thats if its true, which it probably isnt - and even then, how does this affect your search?
Oh thats right.. it doesn't. If google doesn't have it, its obscure and unreferenced - sounds to me like filler.
(1) Take search, submit to multiple different engines (2) Rank sites among results (3) Ad Google ads and spit back out (4) Profit!
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
I read this in one of two ways:
a) Yahoo crawler is not as discriminating as google, collecting loads of garbage and mirrored sites
or
b) Google is finally falling behind the Web. In the past every snazy search engine eventually got overwhelmed by web growth and fell beihnd. Has that time arrived for Google?
On a different note I've heard a rumor that Google's total CPU count across all its server sites is fast approaching a million. If this is true, talk about barriers to entry! Anyone out there who can confirm or deny this?
Results:
Google: "1-10 of about 3,120,000,000 .06 sec"
.08 sec"
Yahoo: "1-10 of about 11,300,000,000
Top yahoo hit - some punk band. Top Google hit, apple .com.
Gee, who do you think will make more money with those results... ;-)
This issue is a bit more complicated than you think.
Maybe the editors need to check on something, or we all ought to count 11.3 billion as the "new 42".
This issue is a bit more complicated than you think.
Lots of indexed docs is nice, but it doesn't mean much if the indexed stuff isn't meaningful.
I wonder how many of those indexed images are spacers and things, or how many of those documents are just copies/meaningless information.
But enough being a naysayer. More documents does mean that Yahoo! is more likely to have found the document I'm looking for, and that amount of searched stuff will definitely help for those more obsucre searches.
'Every story, if continued long enough, ends in death.' --Ernest Hemingway
The increase can be explained by Yahoo adding Slashdot dupes to their index.
I've spent the last few days doing some very important searching - we're thinking about launching a new product in a rather arcane field, and I wanted to be absolutely certain who the potential competition might be - hence I decided to search both Google & Yahoo!.
Guess what? Yahoo! search beats Google search, hands down. Not even close.
Two thoughts:
The My Yahoo! portal is (for the time being) superior to Google's simple offering.
And who can resist the pleasure of flaming your political enemies on the Yahoo! news story message boards - the veritable bathroom walls of the Internet.
What?
Results 1 - 10 of about 3,930 for "In Soviet Russia" slashdot.org
from Yahoo!:
Results 1 - 10 of about 11,300 for "In Soviet Russia" slashdot.org - 0.38 sec.
Looks convincing to me, comrades!
I find it interesting that Cnet is the second link about this, after the Google/Cnet snub a bit ago.
Okay an odd search but even stranger results
... but we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected.
;)
searching for
inurl:"axis-cgi/mjpg"
will bring up Axis web cams. For the past few days, any attempts to view the 2nd or any additional results other then the first page of Google results, results in a Google error of:
We're sorry...
Maybe it is just my IP and I am searching for this too much
And all the sex-bot Yahoo ID's. I know for a fact there's 999,999 numerically-different variants of one particular name, because a friend of mine is responsible for all 999,999 of them, and he made quite a bit of cash at it. Had I the time, inclination, and lack of soul, I'd do that, too. I don't think adding 12 billion sites is TOO hard, considering the cappillary branching of some sites. My own website (not linked here for courtesy reasons) just from the way it's made, has over 700 end-result HTML pages, and I made it in a few days. Let's see if Yahoo is more USEFUL than Google. That's kind of a better test than how many sites it has.
I like to place meaningful quotes in my sig, so people will know that I know what meaningful quotes are.
... how come no one is?
Where else can I find the likes of Y! Calender / Mail / Address book, all integrated, for free? Point me there and I might jump ship.
GMail is great for email, but it's address book is a POS, and there is no calendering whatsoever. Meanwhile, over at Y!, I have a calender that not only shows me the weather forecast for the week embedded into it, but it also issues me reminder notices via Y! IM for important dates.
Not to mention the vast usefulness of other Y! services like Launch! and Y! Photos.
Google may be leading the way as far as search, maps, and email goes, but for other services, *they* are the ones playing catch-up. For example, see their "Customized" home page, which http://my.yahoo.com/ had beat about 3 years ago.
Even when you use "Yahoo" to search for something,
you're still googling it. Just like xeroxing on a Canon, or putting food in the frigidaire (even if it's a Kelvinator.)
Google has this kind of brand identity, for good or for worse. This is a status that both Napster and Tivo almost acheived, but fizzled just in time to escape the phenomenon.
-fb Everything not expressly forbidden is now mandatory.
But more importantly, how accessible it is.
Its great and all that they're trying to develop search engines to be so complicated, but really just give us users searching things like them handy dandys ""'s and do NOT includes, and we can do a good job ourselves :P. We don't need a search engine trying to tell us what were searching for. Hail Google!
So does this mean that all of my very valuable searches for "boobies" have been missing half of the results I so desperately desire? I feel like I have wasted my life away now that I know I've only seen half the boobies I should have, or maybe there's another reason I feel I've wasted my life, hopefully I'll figure that out right after I finish looking at these beautiful...
I've noticed that Yahoo's crawl visits my site more frequently...but Google's crawl seems to be intelligent about how often it crawls.
If I update alot, google crawls more. Yahoo doesn't seem to care.
So all these folks talking about yahoo being better may be off the mark. Why crawl all the time when you can only crawl when necessary?
Sign your posts, you deserve credit for that spew. The bit about Apple at the end was absolutely classic.
If I go to www.google.cn, I get redirected to http://www.google.com/intl/zh-CN/
anyhow, a search on "len" gives me "7,980,000len1-10 0.02 " a lot of hits...
www.lensite.com is the number one entry in both .com and .cn hits...
This issue is a bit more complicated than you think.
It's how you use it.
I'd be lmao if I see msn in the news tomorrow with 30 bil data.
Wow, after reading a bit of the discussion so far, I'm amazed that everyone on Slashdot is being so defensive of Google and hostile to Yahoo. Is Yahoo Microsoft? Has Yahoo done anything wrong or horrible?
As some people have been quick to point out, it's the quality of the search, not the quantity of results that matters. This is just another worthless Slashdot story and shouldn't mean anything to anyone.
Its quite obviously time to start using Yahoo, as its clearly the best search engine.
Or is it?
cause that would rock.
So how come the current /. poll has been archived? It their comment database full up or something?
No wait.
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
So what if they're upping the ante and the gigabyte contents of their databases? This anonymous coward, posting from China, finds that only a fraction of the search results are reachable anyway, and probably a lot of them don't even show as search results, thanks to the mutual complicity of Yahoo and Google.
Hopefully this same level of service is not coming to a country near you anytime soon.
This post has shown up a dozen times for this topic and not one poster has provided even useful anecdotal evidence, let alone anything that can be independently verified.
when's the last time you've actually used Yahoo to *search* for something? Yeah, I thought so.
there's no place like ~
SIZE DOES MATTER!
It's because of all those yahoo users who are searching for google :-)
http://mindset.research.yahoo.com/
blog blah not relevant in my opinion.
You can lead a horse to water, but a pencil must be lead. ~ Laurel and Hardy
So Yahoo spiders you daily.. and the people's republic of google spider's you weekly.. why is that (according to some) a notch on google's belt? The more information you have, the better. Or are many of you making the stupid assumption that Google is better and thus whatever they do is inherently better? Google routinely blocks sites that they just don't like from organic results, regardless of their relevance... which means that relevance is not their primarry goal. Take a look at http://www.google.com/?q=isearch and you'll notice that isearch.com, who is constantly searched is not even indexed.
For the most part, unless I'm looking for something technical, Google is not very usful to me at all anymore. There seem to be huge numbers of sites that have these "doorway" pages that show up for just about any key word at all, especially those that have nothing at all to do with the site.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Wow! They beat them by that much in just a single day!! That has to be the fastest indexer ever!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
The size of your database matters plenty. If it's too small no wants to look at it, and even if they tried they probably would find what they were looking for. Both sites use the google engine, so they are pretty much equally skilled. Now lets say a young hot blonde girl in her mid twenties is looking for some action, movies. Do you really think she's going to come if don't have a big meaty database? Of course not. She's blonde but she's not retarded. They say it's the motion of the ocean, but it takes a loooooong time to sail to china in a row boat.
Contrary to other Google or Yahoo loyalists, this is a good move in the search engine market. This is capitalism at it's finest and computer using at its finest. We, the consumers, benefit because of innovations and improvments by the producers. Competition has caused this. Yahoo came up with that home page thing, and at least at this point, it is the best. But Google rapidly offered one, and MSN now does as well (because MS does not compete or innovate, they copy). I'm pleased to see these companies following *real* capitalism and not using scare tactics, as the bastards from Microshaft do. --Drew
Google won because it knew that the real test of a good search engine was its sorting algorithm.
MSN gets it now too.
first results off of google are yahoo, altavista, MS, cnn, amazon, lycos...yada yada, google itself not on first page
So.. Yahoo is mature and Google is not because Google's news service reprints many and varied websites-- but not some of the "blogs" you like-- and Yahoo's news service reprints Reuters? I'm not entirely sure what's going on here but it sounds like you are misinterpreting some kind of personal poor experience with Google's sales department as an actual problem.
Google and Yahoo news do not even offer remotely the same kind of service, nor are the services equal in importance. Yahoo News is almost closer to the core of Yahoo's service than even the search; Google News is more auxiliary from Google's perspective, and I don't think they're even getting much money off of them.
Anyway, frankly IMO "blogs" shouldn't be on google news anyway. Period. If I wanted a blog aggregator, I'd go to a blog aggregator. Google News is a news aggregator. The difference may mostly be only in terms of what the aggregated sites choose to identify themselves as, but that's enough of a difference for me.
As for AdSense, the categories based on which things can get classified as inappropriate for AdSense are extremely broad and if you're expecting close attention paid to border cases, I think you're expecting things of the service that the service never intended. And if the person your complaint here concerns is Michelle Malkin...? Well, from what I've read of her stuff, if you're trying to defend her against accusations of racism then some article about Nelson Mandela would be only the tiniest part of the problem.
Don't be surprised if in a few more years of broadband development, that Yahoo is able to position itself as an alternative to many cable TV providers.
Wait, wasn't this exact same prediction being batted around, like, five to seven years ago? And didn't it fail to work out then either? Hm, you are a blogger, aren't you.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Maybe its a stupid question but, whe google index some spam web, this site is counted?
And what is yahoo policy about that?
Good bye
Rock and Roll
...but yeah, yahoo sucks and google still pwns j00.
Sorry, but it's true. Google has WAY more power and ingenuity than yahoo will ever dream of. That is a fact.
The end.
We have secretly replaced these Slashdot mods' sense of humor with a rusty nail. Let's see if they notice!!
I maintain a blog called haroonbarlas.com
F www.haroonbarlas.com&btnG=Google+Search/
n barlas.com&sm=Yahoo!+Search&fr=FP-tab-web-t&toggle =1&cop=&ei=UTF-8/
More than a month has passed and it has yet to be indexed by google. On the other hand Yahoo has already indexed most of the pages.
Check it out for yourself.
Google
http://www.google.com/search?hl=en&q=site%3A%2F%2
(0 results)
Yahoo
http://search.yahoo.com/search?p=site%3Awww.haroo
(15 results)
I am a bit disappointed with google after this.
I was relearning LaTeX, the typesetting language. I wanted to express some equations from quantum mechanics. The notation defines a "bra" as because together they make a "bra-ket" (Dirac probably had way too much free time on his hands when he came up with that). Anyway I did a google search for "latex bra ket". The search results were useful but I was at first surprised by the ads that showed up.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Keep in mind that Google's crawler, and presumably Yahoo!'s as well, tries to visit the more important pages first. So even if Google is missing half of the pages in Yahoo!, it probably still covers most of the ones you'd want to find.
r awling.ps .
(Indeed, covering fewer pages might be a wise strategy, other things equal, since it leaves more crawling time available to revisit known important pages that may have changed.)
As far as I know, prioritized crawling was first described in this 1998 paper by the Google folks. Note that the paper considers the original PageRank metric, which has since evolved. http://www-db.stanford.edu/pub/papers/efficient-c
I would be curious to see how many searches are run on eBay a day. I know it's not the same as a total web search, but it has to be a high number. When I'm bored I can run 30 searches or more in a day. I don't think I have ever done that on Yahoo.
http://www.kubuntu.org/
So, In Firefox tab A, I have Google and tab B is Yahoo. Both searched on Kyzyl.
Results (pleae pay attention because htmling this was a pain...):
Yahoo's first 5 entries:
* All Russia Hotels All Russian Hotels - We offer discount hotel reservation services online in Moscow, St. Petersburg, Kiev, Russia, Ukraine, CIS and Baltic. www.allrussiahotels.com
* Tuva Travel Kyzyl city is the capital of Tuva Republic (Russia) Kyzyl city is positioned right in the center of Asia, which is proudly claimed by a local monument specifically dedicated to this fact. www.sokoltours.com
WEB RESULTS
1. Wikipedia: Kyzyl
Open this result in new window
Wikipedia Free Encyclopedia's article on 'Kyzyl' en.wikipedia.org/wiki/Kyzyl
- More from this site - Save - Block
2. Weather Underground: Kyzyl, Russia Forecast ... Updated: 8:00 AM KRAST on August 02, 2005. Observed at Kyzyl, Russia (History) Elevation: 2064 ft / 629 m ... Coming soon: Flash Stickers. Kyzyl, 63 F / 17 C ...
Open this result in new window Find the Weather for any City, State or ZIP Code, or Airport Code or Country. Email. Password. Maps. United States. International. Information. Refinance Rates. GoTo Meeting. Kyzyl Singles. Hosting Companies. Online deals! Vitamins. Internet Mall
www.wunderground.com/global/stations/36096.html
- 64k - Cached - More from this site - Save - Block
3. AllRefer.com - Kyzyl (CIS And Baltic Political Geography) - Encyclopedia
Open this result in new window
3. AllRefer.com reference and encyclopedia resource provides complete information on Kyzyl, CIS And Baltic Political Geography. Includes related research links. ... By Alphabet : Encyclopedia A-Z - K. Kyzyl, CIS And Baltic Political Geography ... Kyzyl or Kizil[both: kizil'] Pronunciation Key, city (1989 pop ...
reference.allrefer.com/encyclopedia/K/Kyzyl
More from this site - Save - Block
Now, for the first five Google Results on Kyzyl:
Kyzyl'-administrative center of Republic of Tuva, Russia Kyzyl' Republic of Tuva, ... Republic Capital:, Kyzyl. Capital Population:, 91000( at 01/01/94) ...
|Central-Chernozemny|
members.tripod.com/~argun/kyzyl.htm
- 5k - Cached - Similar pages
Kyzyl on Encyclopedia.com ...
Kyzyl or Kizilboth: kzl, city (1989 pop. 85000), capital of Tuva Republic, S Siberian Russia, on the Yenisei River. It services motor transport and has
www.encyclopedia.com/html/K/Kyzyl.asp
- 47k - Cached - Similar pages
Kyzyl Travel Information. Photos, Stories and Diaries about Kyzyl
Sustainable Tourism for independent travellers (travelers) and backpackers. www.worldsurface.com/browse/location.asp?locationi d=5654
- 59k - Cached - Similar pages
Kyzyl, Tuva, Russia current local time ...
Kyzyl, Tuva, Russia - before placing a telephone call or making travel plans for a flight or hotel, get the current local time provided by
www.worldtimeserver.com/current_time_in_RU-TY.aspx ?city=Kyzyl
- 17k - C
Shoes for Industry. Shoes for the Dead.
How about this metric: users
I don't know a SINGLE person - not one solitary soul, who uses Linux, but I know scores who use Windows every day, for everything from world processing to Internet browsing.
Is ANYONE out there using Linux for anything?
Obviously, the more people use something, the better it is. In fact I bet the ratio of Windows/Linux users is much bigger than the ratio of Google/Yahoo users. It's funny how popularity counts on Slashdot when the Slashbots' favourite is the popular one.
I think the idea is that people are more impressed by google's bot that searches as necessary as opposed to yahoo's bot which just bruteforces the web over and over. This seems to imply google's bot more efficiently uses its resources, possibly implying it actually does provide more information because it indexes smarter, not harder. That is to say, Google's spider doesn't necessarily index less; it indexes proportionally to the benefit gained from indexing more often.
The other idea is that while yahoo's spider may possibly provide more information, it definitely uses more bandwidth. Web spiders are good for end users but some webmasters don't like them as much since for a large site an overzealous spider can basically constitute a small DDOS. Now imagine this small DDOS happening multiple times a day. This may be coloring some people's statements.
Take a look at http://www.google.com/?q=isearch and you'll notice that isearch.com, who is constantly searched is not even indexed.
Looking at isearch.com, I think that I would consider this to be a point in Google's favor.
...this was on my google homepage! http://google.com/ig
Gmail also offers the 'conversation as a thread' feature you mentioned about google groups, and makes it invaluable when dealing with long strings of replies to the same topic. (Also hides previous message quotes by default).
Touched By His Noodley Appendage.
Much like realmeme's site, tho trendwatcher providers real time graphs of search results on the top 4 engines, you can see a jump in just about all of them, and add your own too internet Trendwatcher
I switched from Google to Yahoo after a friend suggested I "put my money where my mouth is" when I was saying I thought Google was vastly overrated (we were talking about the stock price).
So now I use search.yahoo.com whenever I search.
Honestly, you can hardly tell the difference between it and Google. Both work great. Once in a while I think Yahoo's results suck and I try them on Google. Only once is great while are Google's results actually better.
http://lkml.org/lkml/2005/8/20/95
I think I speak for everyone when I say, "Oh, okay. That's nice."
What I want to see Google and other search engines do is add a small link next to every result in their search listings:
"Report this site as spam."
Of course they would only want to offer this to registered users who were signed in (so they can shut down offending accounts and keep company A from trying to destroy company B and in the process overwhelming the system), and the most frequently reported sites are groomed from the cache after review by a search engine employee.
I hate the sites that clearly exist for no other purpose than to manipulate the search engines (of course in 15 minutes of searching I can't find one, but the next time I search for real they'll be everywhere.), cheat (see the tiny links at the bottom), fake blogs that just reguritate free articles from all over the net, and other such scum that offers absolutely nothing to any human being visiting them.
I hate them both as a surfer and a webmaster.
I wouldn't be surprised at all to find out that they made up more than 50% of the entire internet.
Lose Weight and Feel Great with Isagenix
It's the relevancy, stupid!
20 Billion Pages and still nothing good on
Exactly...
8,168,684,336 *web* pages
One that that gmail lacks that yahoo has is an address book. Yahoo also has calendar and notepad functionality. Yahoo mail is really pretty good. i say this as a gmail user. Using javascript to autocomplete the from field is nifty and I like using keyboard shortcuts for mail functions but the lack of a proper address book has forced me to return to importing my gmail acount into a proper email client through pop. And this will be free how long?
harmonious design
A big reason I like Google a lot more - google.com has like 30 words total, and one image to load. Nothing extra that I don't need. Yes, it offers maps, email, and things that Yahoo does, but it doesn't clutter its search page with them. Until Yahoo can change that it won't convert me.
This year I've been getting increasingly annoyed with crappy sites occupying top spots in Google results.
I tried different search engines but couldn't get used to their GUI.
Then recently I installed Greasemonkey and Customize Google (www.customizegoogle.com) script that automatically inserts search links to Yahoo and other major sites on every page of Google results
(Does this sound too confusing? What happens is that when you search for something on Google.com, CustomizeGoogle inserts a line of links to competing search engines positioned around the Sponsored Search Results area of Google result pages.
Then one can click on those links to open new tabs with search results from other search engines, all without re-entering the term(s). It's semi-automatic, that is).
Now I use Yahoo search engine much more often as I don't have to copy-paste and/or re-enter search terms. And I still keep the Google GUI as my main search GUI. And finally, there are some other cool Customize Google features - it can mask/block all Google ads and anonymize your Google cookie.
Now I rarely use MS IE for searching - I also un-installed my Google search bar from MS IE and Firefox. I still use MS IE for reading some particular sites, though.
...the entire appended conversation devolves to:
- this is a good thing because competition is a good thing. Having a single search engine dominate the market is just as bad as having a single browser dominate the market; and
- anything which competes with my Lord and God Google is evil and must be struck down! Death to the infidel Yahoo!
Yep, looks like someone's been poking the Google fanboy hive with a stick again....
Max
My god carries a hammer. Your god died nailed to a tree. Any questions?
I never looked at the visits of the bots (I suppose I need to know their ip addresses?), but I don't think you can say that in general.
For my site, google returns more than 7000 results for a search for the site name limited to the site's domain, Yahoo only 23.
There are some external links, some to the front page, some to other pages, but I have a lot of relations (and links) between the pages.
The number of google results is a bit strange, I thought I had about 2400 pages, maybe with php session-id's and slightly different parameters for the same page?
Bart
I want Yahoo to consider *renting* the index so that people can create exclusive brands by building applications on top of it.
Slashdot = Sarcasm
this post reminded me of a neat little script that creates a visual comparison of Yahoo! vs. Google search results (for the first 100 results).
Sure, another 12 Billion web pages probably doesn't do anything for some guy looking for a digicam review.
:)
But that's not the point.
The real killer app of the Internet at this point is the long tail. (I'm assuming you all know abou the long tail by now, but if you don't, Google it.
Those extra 12 Billion pages help me out when I'm trying to find information about slobertygerbet flamjam fruit and none of the previously indexed 8-11 Billion pages had any info on that subject.
This is huge! Everyone is talking about Billions of pages like they are an unopened ream of paper you found behind a file cabinet. Take a moment to think about how huge a number a single billion is, let alone 12 billion or 20 billion. Then consider the subtleties that are hidden in those millions of millions of pages that are waiting to be discovered by someone who is searching for chamber string quartets who play trip-hop in the style of Sibelius.
Everything aside, 20 Billion web pages is a hell of a lot, and passing Google in this metric is a big deal.
obviously no deficiencies vs. no obvious deficiencies
The reason the number of Google searches has gone down is that the boycott of Google by ham radio operators, shortwave listeners, CB users and other radio hobbyists has started to show its effect.
While I am one of the most vocal of them, many have quietly persuaded their family and friends to use other search engines.
This is an effort to stop Googles support of Broadband over Power Lines, a misserable failure in all of its tests because it interfers with radio communtications around the world.
gsm@mendelson.com Jerusalem Israel
The whole thing that differentiates Google is the quality of their results, not the quantity of pages indexed. Anyone crew of morons can download 20 billion web pages onto a disk array. It takes some real talent to create a tool that will give you relevant results in less than .1 seconds when you search that cache of data.
Yahoo is just blowing smoke. Move along... nothing to see here folks...
*Condense fact from the vapor of nuance*
For popular search terms (queries with millions of hits) index size doesn't matter much. Yahoo, google, ask, msn etc all produce pretty similar results (that tend to favor established sites/pages.) For rare terms or combinations, which contribute to the Long Tail of web search, index size is very important. Both Yahoo and Google report estimated (often inflated) hits for popular terms and exact numbers for rare terms, which still include dups. You need to go to the last result page to find out the exact non-dup number, which sometimes can shrink the de-dup'ed hits by a factor of 10. Let's see how the new yahoo fairs against google with a few queries I picked randomly:
Yahoo used to consistently underperform google on rare terms, it seems they indeed have caught up. But it has NOT really exceeded google in terms of useful size (Yahoo has more dups.) Still, it's a worthy engineering effort. Congrats!
I tried it with the subject of my website (classical mp3s). The Google results are quite good. Yahoo gives a first page that looks like Google would have given two years ago.
For example some websites refer for their content to mp3.com. As mp3.com is no longer the free music hosting site it once was these sites basically just contain dead links. Yet in Yahoo they still are ranked high.
Yahoo has the tendency to keep pages returning a 404 error in their index.
I don't know if they count these stale pages in their total, but I would not be surprised.
Long after having changed a website structure, Yahoo keeps coming back to old URLs that return a 404 every time. A few weeks later it just tries again.
Google is apparently more clever when handling this situation.
The only thing worse than Yahoo is the French "Voila.fr" search engine. It keeps fetching pages that have not been valid for many years, and is not intimidated by 301, 302, 404 or robot exclusion.
Pages excluded via robots.txt are no longer fetched, but when you remove the line from robots.txt after a year, it immediately starts fetching again. So they have not been removed from the index at all.
To all who call themselves geeks and kiss google's ass left and right, get off your chairs and start to work those google share sure wont go that high any longer and you won't retire rich as you think. There is life beyond google you know.
The bots are easy to spot if you log the User-Agent header, they identify themselves with unique UA strings. Also, they access your /robots.txt file before crawling.
All I said was that Yahoo! was more agressive at crawling, not that it returned better results...
Wow! Talk about a misinformation campaign.
And you'll find only 2,860,000,000 entries. Same for b or s or n (not exactly the same of course, but a lot less). So what's the deal with "a"? I think it just tells us something about the dominance of English pages on the web, where the word "a" is used on almost any page. I don't believe a page with the letter "a" in it, but only as part of another word, won't be counted.
I certainly don't believe those other 10 billion pages are in other languages. Images and other files don't add up to that much as well. So your way of counting doesn't work here...
I ping www.bbc.co.uk because I pay a little bit for it to be there.
I cannot agree with you more.t ml
"The act of investigating information online by following links more or less at random. Many users of the Web find themselves spending time simply following one interesting link after another, never knowing quite where they will end up. It can be a useful research technique, and is usually quite entertaining, as well." definition from http://www.webtec.uk.com/Glossary/body_glossary.h
Nine out of ten times, the link I searched for was in the first 3 or 4 result-pages, I never cared about the remaining thousand results. The only results that count are the ones that display what I need. I look for quality in a search engine, quantity comes second, speed third. Yahoo may search twice as many pages, the result-quality never convinced me to switch from Google. My 2 -cents
All those moments will be lost in time, like tears in rain. Time to die.
"Indexing" content is not the same as "Finding" content.
I'll take Google's very specific, targeted, mostly-accurate results over having to sift through pages and pages of "sorted" results from Yahoo any day. I NEVER have to go beyond the first page of SERPS on Google, where I'm into the 2nd, 3rd and 4th page at Yahoo for the same exact search.
Also, as a website host, we see a LOT of abuse from Yahoo's spiders quite often. Employees of Yahoo deny this is happening, but it is.
For example, based on today's logs: there are 69 unique separate Yahoo crawlers (in the ^68.142.x.x range) requesting robots.txt. 287 other unique IPs (in the same ^68.142.x.x range) that are requesting other content. The total number of hits and requests for content from our servers today, from Yahoo's spiders, is 2,351 hits . Its only 7:26am too!
Google's crawlers are much more well-behaved, and don't hammer the server to get to content. I've locked out lots of our urls and resources in robots.txt, and because those 69 crawlers requesting robots.txt don't talk to the other 287 crawlers requesting content (they don't even share the same datasource), the content we don't want indexed, gets indexed anyway. I may end up blocking the whole ^68.142.0.0/24 soon just to stop the abuse.
Just my 0.02c on the matter, but I'll stick with Google thanks.
The problem is the difference between raw data and useful information.
When you look through a list of restaurants (or the list of anything in the yellow pages), you're looking at something put together based on _semantics_. Some human put that list together and made sure the _meaning_ is what you'd expect there: you can actually drive to one of those locations and order food.
Search engines, on the other hand, just look at the words and have no bloody clue of semantics.
If someone ever put together a list of restaurants, it would just be a list of all people who ever said the word "restaurant". Including everyone who ever said "I hate chinese restaurants" or "I took my gf to a restaurant" or "I went to see a new apartment, but it was above a restaurant" or whatever. Needless to say, driving to most of those locations would be a bloody useless exercise.
Adding another 20 million people to that kind of indexing would just raise the noise-to-signal ratio, not actually produce anything useful.
A polar bear is a cartesian bear after a coordinate transform.
Yahoo has reached a milestone, that of gathering the data, now it has to fine tune its search algorithm so that it can differentiate wheat from chaff !
Chris ,
Php Programmers.
Google has NEVER sent me spam.
You see? You see? Your stupid minds! Stupid! Stupid!
Take a read at my last three posts on this subject for some detailed results.
Yahoo's crawlers (the ones asking for robots.txt) do not return that information to the other crawlers that are fetching the content. As a result, content that you ask not be indexed, is indexed, because the crawler doing the crawling, has no idea that the content exists in a robots.txt that another crawler already requested.
Your stats generally agree with mine, though I hadn't realized the robots.txt issue -- I don't actually have anything to protect from crawling on my sites, so my robots.txt is a pretty simple "allow all". I was just pointing out some guidelines for figuring out when your site's being crawled and who is doing it. The truth is, there are crawlers out there who don't identify themselves as such and who misbehave (purposely or not) so you can't absolutely be sure, but Google does seem to respect all the rules, and I respect them in turn for doing so.
"linux" @ yahoo : 425,000,000 (0.11 sec)
"linux" @ google: 163,000,000 (0.15 sec)
-
"midget porn" @ yahoo : 26,600,000 (0.19 sec)
"midget porn" @ google: 1,250,000 (0.07 sec)
-
"0day exploits" @ yahoo : 101,000 (0.17 sec)
"0day exploits" @ google: 42,500 (0.05 sec)
-
"russian brides" @ yahoo : 5,840,000 (0.10 sec)
"russian brides" @ google: 1,180,000 (0.12 sec)
-
"debian sucks" @ yahoo : 556,000 (0.34 sec)
"debian sucks" @ google: 192,000 (0.09 sec)
Yahoo! Slurp! not respecting robots.txt standards and crawling websites it's not supposed to. I have "Disallow *" in my robots.txt file, and Slurp! accounts for 15mb of my total bandwidth this month.
A computer makes it possible to do, in half an hour, tasks which were completely unnecessary to do before.
Besides, it's hardly the middle of nowhere, as it has become famous for its traditional throat singing. One of the people who made it famous was Richard Feynman; I first learned of Tuva as I was searching for stuff on Feynman. It shouldn't be news to any fan of Feynman that he was into obscure music.
If you're looking for less well known parts of the world, you might have a look at the other 'autonomous republics' within Russia, such as Komi or Mari.
Escher was the first MC and Giger invented the HR department.
Search for Charles Manson's son:
Zezozose Zadfrack Glutz
Yahoo returns 8, with 2 of those (almost 20%) for a site called www.health104.health-204.info. The name appears to be INSERTED on the fly into that "Sponsor". The site has NOTHING to do with the search term.
Search Google?
Get 42 hits, with NO phoney web pages listed.
Interesting? How do you explain that one? I'll stick with Google for now.
Yahoo has started throwing out advertisement that bypasses Firefox' popup blocker. Enough reason for me to not ever consider using any of their services.
Harsh? Possibly. But any company that uses technology that specifically circumvents a protection the user has in place has made it clear that it doesn't give a fuck about what the user (and potential customer) wants, so anyone dealing with them can assume that he'll be screwed over as soon as it fits their plans and bottom line.
No, thanks. I'll stay with Google, which so far has actually listend to what people demand.
Assorted stuff I do sometimes: Lemuria.org
Have you ever considered adding a type of search that simply does not list retailers? Many times when I am looking for something strictly informational (like how to do xxx, what is yyy) I find the links swamped in junk sites and retail links. It would rock if there were a mode (something far more sophisticated and accurate than adding -review -buy -price etc) that once turned on, automatically adjusted the results so any retail links went to the bottom of the stack.
Hell is being intelligent in a world full of idiots.
Rofl factor: 10. That definitely made my day.
Big deal. Yahoo still sucks. ;)
You have trouble finding anything, simply use these search structures for Google:
site: www.example.com "search string"
filetype: pdf johnsmithresume
link: www.example.com
cache: www.example.com
intitle: foo used when seraching for a certain word in the title of a Web page, like Alaska vacations, you could use intitle: "alaska vacations"
inurl: foo used when searching for a certain word in the url of a Web page, like alaskavacations (note one word unlie above).
inurl: alaskavacations
Happy trails...
...how big your data is, it's what you do with it that counts!
Have you tried http://search.socialexperiment.co.uk/
http://www.robotii.co.uk/
Yahoo finally finished indexing google.com
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I've even seen Slurp get stuck on some of my pages (www.math.purdue.edu) and continusouly request the same page. There was a month or two when Slurp was generating a serious amount of our traffic.
Google refuses to index pages that aren't linked to by at least a gazillion other sites, submitted or not.
My site, for example, has been up and running for nearly two months, submitted a few times and actually linked to by a few pages that are indexed by Google but it still doesn't appear *at all* in Googles index, not even far in the bottom.
Even if you enter site:www.....com in the search bar directly, it just says it doesn't know it. At least Yahoo has got it in there, never mind high ranked or not.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
To see the difference between the two engines, I thought I'd try a vanity search ("firstname lastname") and on yahoo, the results were as follows:
1,930 when on page 1
1,920 when on page 2
484 on pages 3 to 7
To be fair, google does the same thing, only with far less wildly inflated initial results:
212 when on page 1
211 when on pages 2-3
210 when on pages 4-5
209 when on page 6
The number of 8 billion searchable pages on Google's home page wasn't touched for a long time. Usually they do an update when another engine claims to have a bigger index. Also, this number does not include images etc., Yahoo's number does. I agree that Google's sitemap helpers will dig out a lot of stuff from the hidden Web. Most probably Google's index contains way more than 8 billion pages, perhaps even more than 20 billion objects.
http://sebastianx.blogspot.com/
http://bash.org/?514353
at least until Netcraft confirms it.
It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
This article (the expansion, especially) actually strikes a familiar chord, a day or so ago, I was checking my apache logs, and noted that over the past few weeks (maybe months), there have been a metric crapload of requests from Yahoo bots for pretty much everything public on the server. I've got a bunch of random crap on there, and I can guarantee that very little of it is relevant to anyone but me (and even that's iffy).
--- What
It really doesn't matter that much when you are talking about indexes in excess of 10 billion. Whats more relevant is how smart your search is. For intance, if I'm searching about some red spots on my dog, and I get 600k results of family home pages with pictures of their dog "Spot" then, 20 billion indexes are worthless to me.
Don't get me wrong, I haven't used Yahoo search in several years, and do not know Yahoo's search capabilities today. They used to use Google's search from what I hear, but replaced it with something they rolled themselves. I'm sure if they didn't feel it was better, they wouldn't have changed.
On the other hand, I find Yahoo Mail far better than Gmail (and I use both!) Yahoo Finance is by far the best site for stock trading research (though I'm not fool, I used several others also)
Anyway, more doesn't mean better as I'm sure most of you know. Better is a judgement each of us make for ourselves, but sometimes it's just damn obvious to everyone. (as Google has been in the past)
I think it maybe time for me to put Yahoo and Google through the riggers and find out which is actually better. (for my own usage anyhow and just remember, Yahoo has several extrememly smart people too!)
Comment removed based on user account deletion
Comment removed based on user account deletion
IT doesnt matter.
The fact that google only loads a single image on its front page makes it the best search engine. Pretty flash adds, banners, and tons of clutter/links still puts yahoo, down as an inferior search engine.
Comment removed based on user account deletion
Actually, if you want to figure out what an error message means I've found that searching "site:experts-exchange.com errormessage" works great. So go ahead and sign up for it. If you don't find an answer there already you get free points to post your own question and pick among the responses to give the points to, and you can keep posting followups like "I tried that and it didnt' work, what should I try next?". I've found the site very helpful, and often go there in downtime to answer questions on topics I know.
Oh, you know what it is? Their new graphic layout is crap. Where you see the "sign up to see the solution" - just keep scrolling down the page. The answer's right there. You only need to sign in to do things like get email alerts when someone posts an answer to your question.
Yahoo! Groups search is unbelievably pathetic. Could you talented people at Yahoo! do something about this, please, please, please?!?
I don't know about you, but if I don't find what I'm looking for on the first couple pages of results, I add something else to my search.
In effect, for me and millions like me, search engines only return 20-30 results. The most important thing to me is that what I'm looking for is there, and it's usually the kind of thing where the same info is on dozens of websites. So I really don't care whether I get all of those dozens of sites, I just care that one copy of the info I'm looking for is at the top of the list.
The one thing I think they could both do is make advanced searching easier for novice users. Rather than having a separate page, it might be nice to incorporate that into the page that shows the results. So you see your top ten results and realize they're all shopping sites when you wanted technical. I would type in "-buy -sale -shopping", but I've always thought there should be an alternate mechanism for the less geeky.
What is Yahoo?
- Danny
Just then, a faint voice could be heard comming from Yahoo's headquarters...
"I'm not dead yet..."
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
... and haven't looked back. I got sick of the way that, say, Google searches for particular projects put the project homepage halfway down the page, with barely-relevant mailing list postings sitting higher.
I still do comparisons every so often, and Yahoo still wins most of the time.
Heh, yes it does. That link is going into the hotbar. Exactly what I was asking for. Thanks!
Hell is being intelligent in a world full of idiots.
In other news, Microsoft and Google simaultaneously attempted a hostile takeover of Yahoo.
you would know we are talking about SERVICES, not searching.
http://my.yahoo.com/ is no more about web searching, than http://slashdot.org/ is about buying cars.
I forget whether it was after I switched from altavista (which rapidly lost value when it stopped being an alpha demo project) to google, but searching for LaTeX, wrap-around, and figure yielded some rather shocking results. It took a moment to figure hout how my search had been hijacked . . .
hawk