Slashdot Mirror


Search Beyond Google

An anonymous reader writes: "'Search Beyond Google', the cover story of the March issue of Technology Review, is one of the few current Google stories that discusses whether their technology can stay ahead of the competition in the months to come."

55 of 248 comments (clear)

  1. All good things ... by Ernest+P+Worrell · · Score: 5, Funny

    ... or bad things ... or pretty much anything, come to an end sometime. Except Microsoft of course.

    I think Google has deviated too much from searching, with their Blogger aquisition, and other stuff like that. We'll see how long they stay around.

    1. Re:All good things ... by LostCluster · · Score: 5, Insightful

      It seems like Google is starting to admit that they've hit a wall at improving their search technology, so they're starting to expand into other portal areas to anchor themselves down the same way Yahoo did when their directory-search model hit the wall.

      But Yahoo seems to be investing in several of the surviving web crawlers from the early days. Clearly, they see Google's hold on the title as the #1 search engine as something they might be able to take back.

    2. Re:All good things ... by Ernest+P+Worrell · · Score: 5, Interesting

      They haven't hit a wall. They're just giving up. There is always always room for improvement in searching ... sure you can have natural language queries and stuff like that. But, getting rid of the Search engine "spam", and all those fake self-refering sites. C'mon google, that can't be that hard to get rid of. I mean, assuming you have a PHD and stuff ... right?

    3. Re:All good things ... by interiot · · Score: 4, Interesting
      Google is certainly trying to weed out the junk while keeping the good stuff. Anybody who's dealt with a spam filter knows how hard that can be, especially if you want to keep all the good guys happy.

      There have been two updates in the last couple months, named Update Florida and Update Austin by the SEO community. As typical, various webmasters have been devoting a lot of thought and emotion to them. But as a normal user, all I can see is that Google is definitely trying, and not succeeding yet.

    4. Re:All good things ... by indigeek · · Score: 5, Informative

      Google works approximately by modding up the sites that get linked to the most. All the contributing links have an equal weightage it seems. This allows scamming by forming webrings and similar circular linking schemes
      Another approach I heard being discussed is to give more popular sites a higher weightage. ie If a site has a lot of pages linking to it, the sites linked from this site must also be good. Apparently if done right, you can do a few iterations and get to a better algo.
      Or probably assign a number to (karma if you will ) to each site. Then divide this karma by the number of sites it links to and add this to all the linked sites. Eliminate the cycles in the graphs and iterate.

    5. Re:All good things ... by Anonymous Coward · · Score: 5, Insightful

      Perhaps they're trying to be the central information aggregator? Many of their initiatives, webpage search, news search, usenet search, store searches... have to do with sifting through more information than humans can possibly handle. The Blogger acquisition and the friendster thing could be seen as peripheral endeavours that may yield a profit, but also might yield information on how to sift for relevance. e.g. handle blog relevance by studying interpersonal relationships of sites like friendster. After all, their web search is based on relationships between webpages of a sort. In that vein, Google Answers could be interpreted as an experiment to leverage the power of people in finding relevant results.

      Or I could be reading too much into what is otherwise standard corporate behaviour. :)

    6. Re:All good things ... by NewWaveNet · · Score: 4, Funny
      Google works approximately by modding up the sites that get linked to the most.
      I can't seem to find a link to the meta-moderate page on Google ;)
  2. Search engine spam is the key... by PornMaster · · Score: 5, Insightful

    They key for google providing relevancy is certainly eliminating "search engine spam". Almost everything that comes up on the first page for most things I search for is a referral program selling either something I'm looking for information about, or selling something completely different.

    1. Re:Search engine spam is the key... by Mr.+Stinky · · Score: 5, Interesting

      Agreed. Until last week, I observed Google being bombarded by spammers of the 3rd level domain name. I belive that last week they tweaked their algorithm similary to the November 2003 tweak by throwing out results that contained the exact keywords in the 3rd level domain name. I run a legitimate business: snowboards-for-sale.com, and these jack-ass-holes have been funneling Googlers into their Amazon affiliate site by setting up shell websites like: http://flux-bindings.foo.com/ If you compare the result set between Google and Yahoo for the same query, I'm finding that Yahoo has slightly better technology for weeding out the spam; at least right now.

      --
      Nothing is foolproof because fools are so ingenious.
    2. Re:Search engine spam is the key... by DrEldarion · · Score: 5, Interesting

      The funny thing is that Google does this on its own sometimes, and not because people are manipulating it. I recently noticed that I've been getting a lot of hits from Google searches for "S635MP". I recently posted a deal for a S635MP motherboard w/ CPU for $5. (the deal is dead now, sadly, although there's one for $10)

      Google saw that link, grabbed it, and for a while mae me the #1 search result for "S635MP", even above the manufacturer. I've since been moved to #2 by another site similar to my own, and we're both still above the manufacturer.

      Now, I didn't TRY to do this. All I did was post a simple link in my forums. Google is filling itself up with spam.

    3. Re:Search engine spam is the key... by Anonymous Coward · · Score: 5, Insightful

      Google saw that link and grabbed it...

      Just like they do with all of their search results.

      Really, whether you tried to do it or not, doesn't matter. It's a fact that more people were referring to your site with links like "s635mp" than were referring to the manufacturer.

      Reacting to this is exactly what makes google, google and not Yahoo!. I mean, a search engine whose results can't be manipulated has existed a while. It's called a phone book. Yahoo! results are manipulated simply by keyword volume. Google results are manipulated by keyword volume and a proprietary heuristic based on links and pagerank.

      I'm surprised (in retrospect) that it took so many years for so-called ``google-whacking'' to emerge. I wonder how long they [google] knew it was inevitable, or at least a strong possibility (some really bright guys working there)...

    4. Re:Search engine spam is the key... by mopslik · · Score: 5, Informative

      I thought the whole concept of google was that it ranked pages higher if lots of other pages linked to it.

      And this is exactly one of the problems that is now coming to light. Spammers set up hundreds of tiny sites that do nothing but point to each other, thus inflating their PageRanks. They've saturated Google to the point that searching for information about commercial products usually returns 2/10 legitimate pages.

      At least, that's been my experience.

    5. Re:Search engine spam is the key... by justMichael · · Score: 4, Informative

      try using this

      something interesting -site:example.com

      At this point there's no way to save it as a pref, but you could always drop it in a text file to keep a big list

    6. Re:Search engine spam is the key... by willamowius · · Score: 4, Insightful

      > ..a search engine whose results can't be manipulated has existed a while.
      > It's called a phone book.

      The phone book can't be manipulated, because it doesn't try to rank entries. Try to find the right person called "Smith" in a phone book...

      When you look at the Yellow Pages, they do some sort of ranking and they do get manipulated by those with a lot of money who can take out a bigger ad, but aren't better than any other business.

  3. Google can't rest on its successes by LostCluster · · Score: 5, Insightful

    Google has had the last few years virtually unchallenged as the #1 search engine, because nobody has yet come out with anything that's better than PageRank.

    But, five years is a long time to sit on an innovation without making it better. It gives the competition time to catch up. Furthermore, since PageRank doesn't seem to have seriously changed much, it's actually slipped backwards a bit as more and more people have figured out how to "beat the system" by posting nonsense sites with links to the site they want on top. Google's clearly trying to fight this, but that's an uphill battle.

    Meanwhile, Yahoo now owns three distinct web-crawl based search engines, AltaVista, AllTheWeb, and Inktomi. They also own Overture, which begain life as GoTo.com who was the first to associate real search results with targetted ads. Put all these pieces together. Yahoo also has the original mega-directory site, which Google tries to duplicate by presenting the Open Directory Project on their site. In short, Yahoo's got all the resources to launch a brand with everything that Google has going for it... and when you look at AltaVista and AllTheWeb they feel quite a bit like Google already. Clearly, Yahoo's gearing up to issue a challenge to Google.

    It really seems like Yahoo is making sure they have all the tech in place right now. When they're sure that they're better to Google, I fully expect to see a marketing campaign claiming that and inviting people to do head-to-head searches.

    Google, as it stands now, is going to look pale in such showdowns. They've got to seriously modify PageRank so that the link spammers get downranked before Yahoo issues that challenge, or else Yahoo could reclaim the search market under it's "Google-killer" product line, and then direct people back to the original Yahoo site for their other portal needs.

    1. Re:Google can't rest on its successes by DrEldarion · · Score: 4, Insightful

      The problem with Yahoo is that it tries to do far too much. When I want go search for something, I just want a little box asking me what I want to search for - not a huge page with a million links on it and a few flash ads.

    2. Re:Google can't rest on its successes by LostCluster · · Score: 4, Insightful

      The problem with Yahoo is that it tries to do far too much. When I want go search for something, I just want a little box asking me what I want to search for - not a huge page with a million links on it and a few flash ads.

      And I doubt Yahoo.com is going to change at all. However, look at the other two search portals they operate. It's quite likely that the offering Yahoo puts forward to fight Google won't be called Yahoo, but be flown under the AltaVista or AllTheWeb brand name.

      So, if you just want to search, they'll have a nice clean entry point to their network for you. If you want the full busy-screen portal, there will be another entry point for that. Nothing limits Yahoo to having only one major brand...

    3. Re:Google can't rest on its successes by Anonymous Coward · · Score: 4, Informative

      http://search.yahoo.com/

  4. Hopefully.. by HenryFjord · · Score: 5, Insightful

    Hopefully google will not go public anytime soon like they were talking about earlier. I fear that this would stifle their innovation and bring it closer to some of the other failed portals.. ie more ads in an attempt to satisfy investors.

    I think it is a good idea for other search engines to step up to the plate and challenge google. It stops them from beoming complacent and spurs innovation from a desire to be #1.

    1. Re:Hopefully.. by millahtime · · Score: 4, Interesting

      The only worry would be if Google goes public and then shortly there after someone develops a new way to search that's better and Google looses a bunch of market share. Then the stock would go down quick. This is now a high stakes game.

  5. I once thought Altavista ruled the universe by shoppa · · Score: 5, Insightful
    At one point I thought Altavista was the end-all and be-all of search engines. Since then it's become an also-ran (last time I tried it, it really wasn't working at all) and Google has taken its place.

    I see no reason why the cycle cannot repeat. In fact, the cycle may be much like the semiconductor memory business, which has seen boom-bust cycles every few years since the early 70's. Sometimes a name will ride out for many cycles, but usually the company (and as necessary the technology) behind the name changes radically.

  6. Vivisimo is not a search engine by morelife · · Score: 4, Interesting

    rather a document organizer. It gets some of its results from Google anyway and just reorganizes it. Search results have the flavor of


    See books about "more stupid f---ing shit" at Amazon.


    targeted organization as in targeted selling. All they want is your demographic datum.

    IOW google will crush them.

  7. Even if they don't... by AKAImBatman · · Score: 5, Insightful

    ...maintain their technological lead, goodwill toward them will give them some breathing room. I continued to use Altavista for quite a long time after Google came out. It was what I was familiar with, I liked it, and it worked. Why switch? Eventually, I realized that Google had keen "read your mind" powers and finally switched. :-)

  8. bigco by oogoody · · Score: 4, Interesting

    What will stop google is not their technology,
    but the ossification that takes over every
    large company as it grows. Changes won't be
    made because it is too big a change. Changes
    won't be made because it's not cost justified.
    Marketing concerns will override technology.
    People we get fat and happy. And unlike microsoft
    i can switch to a different search engine
    in a second. Yahoo is looking pretty good...

  9. This has been the "story" for the past two years by Anonymous Coward · · Score: 5, Insightful

    Every couple months it's "Can Google stay ahead of new competitor x?" And so far, everytime, the answer has been yes. People shift from search engines quickly when they no longer work, and people are still heading to Google.

  10. It's search people by Future+Linux-Guru · · Score: 4, Insightful

    I type something in and it spits an answer back at me.

    As long as that answer is in the first page, usually the first three items listed, people simply will not care about the backend technology.

    MS and others will brag about the vastness of the numbers of matching items they can find; most people only worry about finding one or two sites.

    This is going to be a big non-event...mark my word.

    1. Re:It's search people by LostCluster · · Score: 5, Interesting

      But look how that game has changed. Google's the one now bragging that they can search "6 billion items", while the others have worked at tweaking their sort routines to be more resistant to link spam... and there's the event.

      Google's starting to be the one wishing this was a non-event.

  11. Google's speciality & ubiquity by aacool · · Score: 5, Insightful
    Google, IMHO, has excelled in what truly counts in the consumer world - branding. As everyone, including slashdotters, knows, googling is now a verb, and not just in math textbooks.

    Enough branding studies have shown that it's very very hard to knock someone off their post once they seize a certain mindshare - e.g. Coke, Windows(grin), and now Google.

    So, irrespective of the technical competence, or otherwise of Google, it is going to be around and the leader, for a long time to come. P.S. My favorite missing google feature: search for bittorrent files

    1. Re:Google's speciality & ubiquity by JusTyler · · Score: 4, Insightful

      Enough branding studies have shown that it's very very hard to knock someone off their post once they seize a certain mindshare - e.g. Coke, Windows(grin), and now Google.

      This isn't entirely true. Take the 'New Coke' disaster of the late 80's. Pepsi actually overtook the flagship Coke at this time, until Coke Classic was released in 86.

      Google is not much different to Coke. As soon as the water starts to taste funny (and on many searches it does now) we jump to the other main brands. Unlike Coke, however, Google cannot afford to keep its flavor constant every year.. but it must at least make it taste fresh instead of spammy.

  12. Scout the talent, reap the benefits. by bad+enema · · Score: 5, Insightful

    Google has been successful due to original thinking. It needs to ride its wave of reputation now rather than later in order to snatch up some of the finest minds to stay on top of this industry that is all about originality and fresh ideas. They seem to be on the right track by providing the work environment that they do.

    But no more stuff like that Friendster wannabe site.

  13. A suggestion -- to stay competitive by ackthpt · · Score: 5, Interesting
    Keep it simple (as it is) and limit arbitrary changes.

    I'm utterly fed up with eBay with the bloodymindedness of their "enhancement" and roll-out policy. Holding a near strangle-hold on the online-auction market, they are blind to the aggrevations they inflict upon users.

    Radical changes to a familiar interface shouldn't take place without dire need, unfortunately some people think it's fine to dust users. Google is all I want in a search engine and it works very well. The only reason I'd seek another search engine is if they (Google) drive me away.

    BTW, did you know there's a calculator? I found it when I did a search for 'stones to pounds'

    --

    A feeling of having made the same mistake before: Deja Foobar
  14. Article is already out of date. by michael+path · · Score: 4, Insightful

    I've always been a google fan, but this article is essentially dated on its release, given the fact that the Yahoo! switch has already occured.

    I do hope Google can continue its innovation, and reduce much of the annoyance of bad results through blogs.

    I'm suprised more attention wasn't given to the Google IPO, and what affect that might have on the "relatively small" 1000 person company.

    -m.

  15. Still waiting by roman_mir · · Score: 4, Interesting

    for a p2p distributed transparent encrypted indexing system with voted super-nodes.

  16. I for one.... by Lxy · · Score: 5, Interesting

    welcome our new search engine overlords. No, really, I'm serious.

    Google is awesome, and is by far the best search engine out there. Google became the best by being the best. I use it because it works, and it works well.

    In order to be dethroned, a search engine needs to work BETTER than Google. I welcome any search engine that can beat Google, as it has to be DAMN good to take that title. Microsoft search flat out sucks. If I look for articles on linux, I get articles about linux alternatives (mostly M$ content). If I google for linux, I get real linux stuff. This is just an example, but it's true across the board. I have yet to see a search engine superior to Google, and I welcome any tool that can prove itself better.

    --

    There is no reasonable defense against an idiot with an agenda
    :wq
  17. Every good web developer knows... by Anonymous Coward · · Score: 5, Funny
    that content is more important than technology (or bells & whistles).

    Forget about Yahoo and Microsoft. If I was google I would keep an eye on booble. No way they can compete.

  18. Re:This has been the "story" for the past two year by Anonymous Coward · · Score: 5, Insightful

    Exactly. When Google falls behind, you'll know it because you'll be using something else. This kind of "Entity XXXXXXX may suffer setback YYYYYYY any day now" story isn't reporting at all, it's speculation and ghost stories.

  19. Mousetraps by blogboy · · Score: 4, Interesting

    That's what technology is, isn't it? The constant search for something better than what's available? And the approach of many companies (insert any NASDAQ 100 company here) is wait-and-see. See how the poineer does it, do the same, but throw some more bells and whistles in, or just market it better.

    Google has a brilliant algorithm, thanks their 60 PhD's. But there's plenty of other PhD's out there, some of whom I'm sure are just finishing up their newest, succeeding algorithm. It's a constant game of king of the hill.

  20. google needs "stemming" by elwinc · · Score: 4, Informative

    I'm a heavy google user, but I still miss altavista's ability to search for stems. For example, an altavista search for "slid* rul*" will get 'slide rules,' 'sliding rulers,' and plenty of other variations. Google does support whole word wildcards (try "miserable * failure") but stems are even more useful.

    --
    --- Often in error; never in doubt!
  21. Google has an advantage..... by sunami · · Score: 5, Insightful

    .....in that everyone uses it, and everyone HAS used it for the past five years, or longer. People trust it, and that is something that just doesn't vanish. Plus, they HAVE done new things, such as google news.

  22. Teoma by nucal · · Score: 4, Interesting

    After just a quick bit of playing around with Teoma (mentioned in the article), it seems to be better than Google. I was surprised ...

    1. Re:Teoma by LostCluster · · Score: 4, Interesting

      Some might claim that Teoma actually has the best find-what-you-want technology right now, but is suffering from a lack of crawling resources and promotion since Ask Jeeves, Inc. hasn't been bought up by any of the major resources. They seem like a project only being held back by lack of funding...

  23. In 3 months? by oGMo · · Score: 5, Informative

    People seem to think Google is simply a place to find HTML pages. You type in your words, and poof, you get some relavent sites. Could this be replaced in 3 months? Google has a huge index, a very good search algorithm, and works for most people, but (in theory) someone might come up with a working alternative in that period. However:

    • Images is great for searching for pictures. The results are uncannily good.
    • Groups lets you search Google's huge Usenet archive (remember when they purchased this from Deja?).
    • News is my primary source for world news.
    • Froogle is great when searching for where to buy almost anything.
    • Answers lets you pay for research when the rest don't cut it.
    • Catalogs lets you search mail order catalogs for when Froogle doesn't cut it.

    And more. Babelfish translation? Caching like a billion pages? Simple design, with text ads that are actually relavent? In 3 months.

    Yeah, right.

    --

    Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

  24. Regexps, please! Anyone! by Eudial · · Score: 5, Insightful

    Someone should invent a search engine with regular expression support. *sigh* A world with regexp-enabled search engines... That would be a wonderful world to live in.

    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  25. technology schmology by ohzero · · Score: 5, Insightful

    Google's market position when they IPO has nothing to do with their technology. It has to do with their brand. "Googling" for something is the effective equivalent of going to get a Kleenex. Noone asked for a tissue. The market is going to be buying faith in the Google brand, and it's loyal userbase.

    --
    -- http://www.criticalassets.com
  26. How I'd fix Google... by Flamingcheeze · · Score: 5, Insightful
    Doesn't it seem simple enough to have a feedback system of sorts? Say I search for "Dell LCD Monitor Reviews" and get nothing but vendor sites, I could check a box next to all the Googlespam that says "not helpful."

    It would be the rebuttal to Google bombing... searchers could fight back by giving the crap a thumbs-down. Of course, then you would have the bombers voting down all the ligit sites. Dammit.

    --
    The Philosophy of Liberty | lewrockwell.com
  27. The issue's moot... by sammyo · · Score: 5, Interesting

    I kinda like this one, but not enough to not slashdot them. A cool pun, a funky gui, what more could you want in a nextgen search engine.

  28. Structure of Information vs Search for Knowledge by stuffduff · · Score: 4, Interesting

    Todays search engines work a lot like information sieves, or panning for gold. The idea seems to be to take a bunch of stuff and wash away the un-needed, leaving behind (we hope) what we were looking for. However the very nature of the web provides the opportunity for looking at the relationships between ideas, the synthasis of knowledge as opposed to just collections of information. While the 'tricks' from the microsoft research projects look promising; only a true 'learning machine' will be able to go beyond the information and delevop a 'meta-interpretation/representation' of the raw data in order to support a 'meta-understanding' that is traversible and navigable in that we can not only connect with what we don't know, but that we can explore the unknown in terms of its relationship with what we do know.

    --
    "Can there be a Klein bottle that is an efficient and effective beer pitcher?"
  29. Google will still stay on top for a while. by bad+enema · · Score: 5, Insightful

    The majority of users who use search engines are just end users anyways and appreciate the simplicity of Google's page design. I go to Yahoo, Altavista and Lycos and there's half a million links all over the place. I go to Google, and there's a nice clean page with the text box smack right in the middle.

    Visual appeal still counts.

  30. Three keys to the search game by LostCluster · · Score: 5, Informative

    There are three very distinct elements involved in creating a powerhouse search engine:

    - A large crawl: A search engine with nothing in its database isn't going to work very well. A search engine needs as big of a crawl as possible in order to have any results at all. This takes huge resources in terms of bandwidth and computing power. Some of the early search engines met their demise when they couldn't afford to keep their crawlers growing as fast as new web content comes out.

    - The Sorter: Once the long list of results that match the keywords are pulled out of the crawl, a sort needs to be applied in order to locate the best results and present them first. Google got vaulted to the top because PageRank was better than anybody else has ever put out. However, PageRank isn't perfect, so there is still room for somebody to make something better than PageRank.

    -Promotion: A web site just sits there unused if it isn't promoted. Google never spent much on advertising and it just relied on word of mouth since it was so strong in the other two areas. And now that everyone turns to them first without even checking other engines, that has given them the strong advantage of a strong brand image. However, we've seen plenty of cases where inferior technology has been beaten out by better marketing. If somebody's tech passes Google, without marketing it nobody will know about it. Therefore, look for the challengers to be launching major ad campaigns inviting people to at least try them before they assume Google is better.

    Can anybody put it all together? We're about to find out...

  31. technology ? by Tsiangkun · · Score: 5, Insightful

    I don't really care who has the most advanced search capabilities. I use google because all the paid links appear off to the side in a different color.

    Thats all I really want . . . to get my search result seperate from the commercially paid for product placements.

    --Tsiangkun

  32. I thought... by BlackShirt · · Score: 5, Interesting

    ... it could be great idea to publish unanswered questions as weblog.

    Even google cannot answer everything. Web is limited even if you don't believe it. You post your question. Answers will come through trackback, comments, email. Googling the web after you posted the question. Or not.

    All you need is some tag to mark post as answer or question. Hot list like metafilter to aggregate.

    Is it a good idea or does it belong to recycle bin?

    Mailing lists used to be about that. Discussing specific problems. Finding answers. Nowadays they are quite dead. Except some. Newbies, spam, whatever is the reasons. Problem is that those who possess knowledge don't have enough stimulus to share it. I don't solve that problem. The answer might be micropayments or gifts via amazon.

    But make a good deed today. Answer one or two questions. In a year it might make quite a lot. In some day you might need answer to something yourself.

    http://answers.google.com/answers/main
    http://i haventfound.blogspot.com/

  33. Google has added stemming by interiot · · Score: 5, Informative

    Google recently added stemming as a search of {quit smoke} will reveal. You can read about it in their help section. Stemming can be disabled on specific words. Otherwise the update came around November 15, 2003, but is probably still in flux, so there isn't too much good info about it yet.

  34. Building the wrong mousetrap by saddino · · Score: 4, Interesting

    A lot of articles (including this one) are focused with how Google (and their would-be competitors) can improve search via algorithms like PageRank; and again and again the proposed/imaged solutions are based on server-side computation. IMHO, the real solution to improving search is client-side -- and I don't mean search toolbars -- but rather using the computional power of the client to provide a better experience than what is available inside your browser. Searching in a browser is cool, but why not build a powerful Google search client app?

    As a simple example: if your a Mac user, Beholder is really a much more useful image search frontend than using images.google.com alone (yes, I've mentioned this before, but hey, a developer has to eat).

  35. Back in Undergrad by ducomputergeek · · Score: 4, Interesting

    Google really helped with research papers couple years ago, but now I find there's too much spam. So much so, that now I'm into Grad studies, I am going to Lexis-Nexis to find out information about topics. Also I have found that the Internet is certianly not what it used to be either in terms of quality of content. There used to be a lot more academic sites appear when I searching for information on a topic. Now, especially being in an political science related field, International Affairs, doing a web search on some topics leads to dozens of ranting bloggers instead of more academic type work.

    --
    "The problem with socialism is eventually you run out of other people's money" - Thatcher.
  36. Yes, Google has some problems by Everyman · · Score: 4, Insightful

    Yes, Google has a spam problem. It has been getting worse over the last year. In April, 2003 Google stopped crawling the web once per month, and then recalculating PageRank based on that monthly crawl. Since then, there has been a question of whether PageRank can even be calculated accurately by Google.

    I speculated about a 4-byte docID overflow problem in an essay last June at Google Watch. In recent months Google started a "Supplemental Index" for some curious, unexplained reason. Their total number of pages indexed was recently updated to 4,285,199,774 -- just below the maximum for a 32-bit integer. It looks as suspicious now as it did last June.

    Last November, Google began using an on-the-fly filter to further refine the search results for ecommerce sites. Some spam was deleted, a lot of other spam took its place, and a lot of mom and pop ecommerce sites were dropped inadvertently. Many people were unhappy.

    Further evidence that Google's old ranking system is broken is the fact that three famous Googlebombs, "french military victories," "weapons of mass destruction" and "miserable failure" are all still working. The first one is eleven months old. It used to be that such Googlebombs were suppressed at the next monthly crawl, when PageRank was recalculated. Now it seems that suppressing them is beyond Google's ability. How else can you explain why Google puts up with these widely-publicized embarrassments?

    Google's results remain unsurpassed for noncommercial sites from EDU, ORG, and GOV domains, however. Their crawling of the noncommercial sector is the most complete of any engine. The reason Google does so well here is probably because spam isn't much of a problem in this area.

    So far Yahoo doesn't appear to be making much of an effort at covering the noncommercial web. It should be added that Google has more of a spam problem simply because spammers have been focused on Google for so long. Once Yahoo gets the same attention from spammers, then we'll be able to make a fair comparison of Yahoo with Google.