Slashdot Mirror


Yahoo! Vs. Google: Algorithm Standoff

An anonymous reader writes "There's a new report out from the guys who brought us the Google keyword density analysis. As they put it, "the goal of this analysis is to compare the keyword density elements of Yahoo's new algorithm with Google's algorithm." They compared 2000 low traffic, non-competitive keywords in the hopes of seeing the algorithms more clearly, without any possible search engine tweakings related to high-traffic keywords. Their findings are interesting. Should you go and rebuild your site based on these findings? Maybe not. It's worth a look though."

43 of 270 comments (clear)

  1. Search Engine Optimization Professional by Anonymous Coward · · Score: 5, Interesting

    Gee, aren't these the guys responsible for continually diluting the quality of search engine results? I'm getting really tired of sites that present one thing to search engines and something totally different to me.

    1. Re:Search Engine Optimization Professional by Anonymous Coward · · Score: 5, Informative

      As always, there are is a grayscale of good and bad search engine optimization. A good webauthor designs a site for the users, but keeps the workings of search engines in mind, too.

      Search engines need help with frames (if anyone can still find a good reason to use them). If you use Flash based navigation, you better make sure that you have a prominent document which links to all pages as well or search engines won't index them. It's also a good idea to use descriptive titles and put what's important at the top of the page. In other words, most good search engine optimization is exactly what you would do to make a site screen-reader or text-browser friendly.

      Then there's link-bombing, show-something-different-to-Google, white-on-white text, redirections, etc.
      It's quickly becoming so that you can't tell someone to optimize a site for inclusion in search indexes or they'll fall into the hands of this kind of scum. It's a little like the word "Hackers". Can't use that anymore without having to explain that you're not illegally breaking into other people's computers.

    2. Re:Search Engine Optimization Professional by Bushcat · · Score: 5, Funny
      If you use Flash based navigation

      That's another set of people that need a whack with a clue stick.

    3. Re:Search Engine Optimization Professional by Anonymous Coward · · Score: 5, Informative

      I'm getting really tired of sites that present one thing to search engines and something totally different to me.

      Then complain about it. That practice is known as cloaking, and you can get sites blacklisted for it.

    4. Re:Search Engine Optimization Professional by Araneas · · Score: 5, Interesting
      It's an escalating battle. Someone hijacks a keyword that is highly relevant to your site so you have to figure how to overcome that and give users something that isn't porn or a crappy search portal.

      I think it's fair to say there are white hat SEOs as well as black hat hijack^H^H^H^H^H^H SEOs.

    5. Re:Search Engine Optimization Professional by dargaud · · Score: 5, Interesting
      That's what I wanted to submit to the Google programming contest, but it wasn't admittable:
      • Make a 2nd robot that retrieves a few full web pages (with graphics) per site claiming to be IE6 (or a normal Mozila), thus lying about it being from google.
      • Display the page in IE6 (or Mozilla), save the entire display as a bitmap image.
      • Run the bitmap image through an OCR program to extract the real text seen by the user
      • Compare this text with what the ordinary google robot sees.
      • If the text is completely different, lover the ranking
      This gets rid of all the blue on blue keywords, display:none keywords and others. I think it will come to that.
      --
      Non-Linux Penguins ?
    6. Re:Search Engine Optimization Professional by Anonymous Coward · · Score: 5, Insightful

      Or even better, just use an intelligent html parser that can work out if text would be hidden and ignore it if it is.

    7. Re:Search Engine Optimization Professional by samhalliday · · Score: 4, Interesting
      thats ridiculous... OCR is not needed in this scenario, it is easy enough to write a program to find out what colour the background and foreground of text is, its probably just takes too much time to factor this in to the equation. your method would take _at least_ 10 seconds to even check a simple page (assuming all the code worked, which it wouldn't, cuz its OCR).

      and, this way you are giving a lower ranking to pages which use text in images. it is not good practice to have all the text embedded in images, but it is often necessary for sytle purposes; an example being the logo of a site (ok, alt= should handle this). hell, i even do it! its cleaner than hoping the person on the other side can render the same fonts as me (which would be impossible cuz i filtered then thorugh GIMP to add some effects).

      a lot of sites auto detect robots based on what you are saying, and either block them or launch a seek-and-destroy attack against you. to get around this, the file /robots.txt (which every large site should have) WILL be read by the google/yahoo prowler no matter what, and abided by. it plays the prominent role in what the search engines read... not the server reading the browser tag.

      thats without even going into the algorithms of matching the read OCR text up against the text from the source.

    8. Re:Search Engine Optimization Professional by a24061 · · Score: 5, Insightful
      Besides, in which way does Flash exclude other operating systems?


      It excludes blind users with screen readers and people who don't or can't install superfluous plug-ins. Flash is great for entertainment but it should never be required for getting information.

    9. Re:Search Engine Optimization Professional by thbb · · Score: 5, Informative

      Here is where to file a complaint at google. Fast and easy to do, don't hesitate...

    10. Re:Search Engine Optimization Professional by DrSkwid · · Score: 5, Interesting

      Besides, in which way does Flash exclude other operating systems?

      Let's see

      Mozilla on FreeBSD (that's me) :

      We are unable to locate a single Web player that best matches your platform and operating system

      Mothra on plan9 (also me)

      We are unable to locate a single Web player that best matches your platform and operating system

      The acceptable list is :

      Windows 98/ME/2000/XP - Internet Explorer/AOL/Netscape/Mozilla/Opera/CompuServe - Flash 7

      Mac OSX / OS9 - Internet Explorer/Safari/Netscape/Mozilla/Opera - Flash 7

      Other Operating Systems
      Linux x86 Flash Player 6 for Mozilla 1.1 - (Not officially supported by Macromedia.)
      Pocket PC Flash Player 6 for Pocket PC 2003 (color devices supported only)
      OS/2 Flash Player 4 for Netscape
      Sun Solaris (Sparc/Intel) Flash Player 6 for Netscape
      HP-UX Flash Player 6 for Netscape
      SGI IRIX Flash Player 4 for Netscape

      On my 500,000 page impression web site, using Flash would have excluded the otherwise successful visitors running the following OS

      CPM
      Windows 3.xx
      WebTV
      OSF Unix
      Aix
      NetBSD

      I will admit that the actual numbers are low but being excluded/ignored is how us non Windows users are treated day in day out. Seems you can't fight the pigopolists.

      --
      There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  2. If Yahoo wants my vote... by PoprocksCk · · Score: 4, Interesting

    ...they'll have to get rid of all that junk on their home page. Much of the reason for my using Google is that its home page is simple, it loads quickly, and it is just so easy to _search_, which is what a search engine should be. Yahoo failed when it became a "portal" and tried to do too much by itself. If they could somehow reduce the size of Yahoo's page down to that of Google (that would mean getting rid of those ads, guys) then maybe I'd consider trying it.

    1. Re:If Yahoo wants my vote... by penultimatepost · · Score: 5, Informative

      That's why They have: http://Search.yahoo.com

    2. Re:If Yahoo wants my vote... by PoprocksCk · · Score: 5, Interesting

      Heh...

      Well that's all well and good, but how many people would know to type that in?

      Has anyone looked at altavista lately? They've certainly taken the Google route, and their home page looks a lot like Google now, as does search.yahoo.com. However, in search.yahoo.com _and_ altavista, I noticed that "sponsored results" show up before the real ones, but they appear in the list just the same. That could confuse newbies, and I prefer the approach Google has taken to advertising (shoving the ads to a separate entity on the right, and keeping them text-based).

    3. Re:If Yahoo wants my vote... by ak3ldama · · Score: 5, Informative

      I know many people who use Yahoo! as a home page and they like the many services that are offered by Yahoo! besides just the search facilities. If all they wanted was search I doubt they would use yahoo.com for their homepage.

      --
      "but money is the God of Algiers & Mahomet their prophet." - Rich. O'Bryen June 8th 1786
    4. Re:If Yahoo wants my vote... by Pedrito · · Score: 5, Funny

      "Yahoo failed when it became a "portal"..."

      It failed? If a market cap of 28 BILLION dollars is failure, what do I have to do wrong to get there?

    5. Re:If Yahoo wants my vote... by Sique · · Score: 5, Informative

      Yahoo never was a search engine in the pure sense of word. Yahoo started out as a browsable catalogue of the Web, where every entry was put into categories by hand. The automated search came later and was bought as service from external providers up until now.

      --
      .sig: Sique *sigh*
  3. And a User Friendly game to go along! by Indras · · Score: 5, Interesting

    Just grab a friend and a deck of cards, and you can play Yahoo vs. Google at home.

    --
    The speed of time is one second per second.
    1. Re:And a User Friendly game to go along! by Anonymous Coward · · Score: 5, Interesting

      Chris Langreiter has a cute toy to compare Yahoo vs. Google results.
      Touch the dots !

      It's written in REBOL

  4. I think by Bishop,+Martin · · Score: 5, Insightful

    Google is way too embedded in everyones everyday life, it will just naturally be more widely used. When was the last time you heard someone say "Yahoo it"?

    --
    Setec Astronomy
  5. Re:Yahoo? by sam1am · · Score: 4, Informative

    Yahoo! Switches Search Engines (Wednesday February 18, @09:51AM) has the info on when this happened.

  6. Re:Yahoo? by MoriarGryphon · · Score: 5, Interesting

    RTFM, Yahoo is switching to their own engine.

    Personally, I find the differences in how the two engines handle bold text to be most interesting. If only for that, I'd stick to Google.

    Most pages that have 17 occurences of your search text in bold are only going to be Porn sites ((unrelated to your search)) or Spam sites ((unrelated to your search)).

  7. Pattern Recognition by Space+cowboy · · Score: 5, Interesting

    This is essentially a problem in pattern recognition, and it's a damn hard problem to solve because of the disparity between the high-volume and low-volume words.

    Information is essentially the inverse of entropy. Entropy can be calculated, and you can use Bayes probability theory to get a hold on the information content of a given word within a set of words.

    What is difficult to do, and what search engines are trying to do, is measure the mutual information inherent between the set of pages that the word appears in, and the word itself, then apply that to all the words in the searched-for phrase; this is commonly called 'context'. This is plainly impossible to do for every given phrase, for every word combination, for every page indexed. The best you can do is use a statistical approach (and Bayes is your friend again) to come up with "good" matches.

    The problem with the statistical approach is the class unbiasing, since once you have wildly different statistical populations, your choice of context gets harder and harder - the "easy" standard models don't cope very well. You don't have the computational resources to do a good analysis, so you're essentially stuck between a rock and a hard place.

    This is why the google idea of strengthening the importance of a word depending on linked pages was such a good one - it "did" the hard work by relying on the entire planet to do it for them, by creating links. Of course, what one man can do, another can undo, and Google has got progressively worse over time. It's still by-far the best though, and my search engine of choice. When you look at the queries from search-sites, I get 100x as many from Google as Yahoo (next nearest)....

    People think searching is easy, and it is. What's really really hard is searching *well*.

    Simon

    --
    Physicists get Hadrons!
    1. Re:Pattern Recognition by Eivind · · Score: 5, Interesting
      And what is even harder, as you sorta hint at, is searching well in a world where thousands of people do their damnedest best to game the system.

      Google doesn't only have to make sense of a great big mess.

      It has to make sense of a great big mess where a significant part of the pages are made *spesifically* to confuse Google, and where a part of those same pages gets tuned regularily in dedicated attempts at confusing whichever algorithm google use more.

      Most of the cases where Google returns poor results these days, it's obvious to a human observer that the bad results on top are *purposely* made to confuse Google. I've even seen pages that return one set of content if your user-agent is "Googlebot", and another, totally different content (dialer, etc) if your user-agent is anything else.

  8. Keyword density?! by Short+Circuit · · Score: 5, Interesting

    When I search for something, I don't want to get a page that's a marketing front for what I'm trying to find, I want an informational, probably technical, page on the item I'm searching for.

    Such pages don't usually mindlessly repeat the keyword I'm searching for over and over again.

  9. My little test.. by CoolCat · · Score: 4, Interesting

    Just typed in the company I work for name (8 employees). First hit on google, yahoo.. I gave up after 9 pages..

  10. It's All Magic... by photonX · · Score: 5, Insightful

    I'm one of those greybeards who was writing college reports in the pre-BBS days, never mind the World Wide Web. Remembering back to when I used to spend a half-day of research in the library to mine info that now magically appears on my computer screen in ten seconds, well...it's hard to throw stones. I'm just happy the damned things work at all.

    --
    Anti-gravity? That was *my* little secret! But I never patented it! Boy, was *that* dumb!
    1. Re:It's All Magic... by Araneas · · Score: 4, Insightful
      I shave but the moustache is getting a little white. ;)

      What I miss is looking in the card catalogue under the general subject and being able to pull out all sorts of related material I hadn't thought of. Same for browseing the stacks. Grab the general Dewey number and go surf the titles.

      Wetware fuzzy logic at its best.

  11. So that's what happened! by peterdaly · · Score: 5, Interesting

    I've been on vacation and away from internet and most mass media for a week. Got back on Monday and have noticed a drop in traffic to my web sites while I was gone. Didn't have a clue why. Well, now I know.

    I'll be watching this very closely. Inktomi (sp?) sucked, which is what this is based on. I think it's too early to tell right now if the results are any good. Along the same lines, it will probably take about 6 months for marketers to learn to effectivly spam the results, which is something Google has historically been very good at keeping at bay.

    This will be interesting to watch over the next few months.

    -Pete

  12. Warning: You are being watched! by walter. · · Score: 5, Interesting
    Looks like someone is counting the slashdot community. One of the links in this post points to
    http://www.searchguild.com/redir/o.php?out=http:// www.gorank.com/research/01072004_Google_Density_Re port.php
    So someone at searchguild.com is counting every slashdot visitor who clicks on that link! The unredirected link points here.
  13. Missing the google point? by ItsIllak · · Score: 4, Interesting

    Isn't this missing the point of how google works? OK, so it measures the success, but it won't tell you anything (or much) about the actual search algorythm as google is actually basing the score not only on the page you link to but also pages that link to IT.

    Hence, it's an interesting read, and maybe you could draw your own preferences from what the weighting turns out to be in the listed cases, but it's not a very fair representation of how google works. *NB* I've no clue how Yahoo/Inktomi works, so I couldn't comment.

  14. Re:A layman's view by Quaryon · · Score: 5, Interesting

    Is anyone else getting so annoyed by pages which grab your keyword and then direct you to Amazon, no matter what the topic? Seems that every time I do a search on Google and find a site which looks interesting they're either just ripping Amazon's content or redirecting me there.

    Guys, if I wanted to go to Amazon I would just type "www.amazon.co.uk" into my browser.. If I'm searching on Google it's because I've either already looked at Amazon and didn't find what I want, or because Amazon is really not relevant..

    I've started adding "-amazon -kelkoo -dooyoo -pricewatch" and others to my Google searches recently which helps cut down the chaff a little, but doesn't seem to cut out all the Amazon ripoffs.

    Q.

  15. The Problem with Search Algorithm Monocultures by G4from128k · · Score: 5, Insightful

    While I know that various search engines use various core ideas in search, I would think that a better way to search would use multiple approaches. Some combination of link-based analysis, keyword analysis, expert analysis, cluster-analysis, etc. rather than a single "this-is-how-we-do-it-here" algorithm.

    The first big challenge in search is in disambiguating what the searcher really wants without requiring a long string of inputs. A multiple-algoithmic approach would let a search engine serve up hits gathered in multiple ways (e.g., hit number 1 was top ranked using mehtod 1, hit #2 was top ranked using methd 2, etc.). The search company could then see which algorithm provides the best hits for a given search (i.e., by watching which hits the searcher clicks on).

    The second big challenge is all the nasty spammers and SEOs (Search Engine Optimizers) who will try to use knowledge of any search algorithm to game the system and artificially raise their page rank for commerical purposes. This is probably one reason why Google cannot maintain dominance - any dominant search enegine attracts the concerted efforts of SEOs, thus ruining its search quality, thus ruining its dominance.

    Yet a multi-algorithmic search engine could create a moving target that frustrates SEOs. By rotating the algorithms and even using negative weights on some algorithm results, a multi-algorithmic search company could cause high-ranked pages to plummet in rank over time. One week, a heavily keyworded site (e.g., one listing every possible keyword in metadata) might be at the top of the list, the next week it is at the bottom of the list. This raises the cost to sites trying to game the system. (The search company might even reward or penalize sites that change structure to often to either find the freshest sites or penalize the efforts of SEO).

    There never can be one right way to do search.

    --
    Two wrongs don't make a right, but three lefts do.
  16. Re:Google Super Computer? by /ASCII · · Score: 5, Informative

    Your statement is not completely correct. There is nothing "fake" about a cluster based supercomputer. In fact, all sufficiently large supercomputers are cluster based. Many of them use special purpose, low latency NICS and switches, and proprietary communication protocols, but the underlying principle of a Beowulf cluster is the same as that of the Earth simulator.

    --
    Try out fish, the friendly interactive shell.
  17. SEO - SEM by peterdaly · · Score: 4, Informative

    As someone who does search engine optimization of his own sites, I believe there is an important distinction between ethical and non-ethical (spam) activities.

    Search Engine Optimization - doing all things possible to tell a search engine what your page is about while being balanced for humans to read as well. Ethical. Sometime considered spam when really the search engine returns poor results; usually due to the page you are looking for not being easy to understand for spiders.

    Search Engine Manipulation - trying to doing things to get search engines to return your page in results when the page may not otherwise be something the engine considers relevent or high quality. Showing something different for the search engine falls under this category, is commonly refered to as cloaking, and is against many search engines "rules" for designing pages. Not ethical, aka spam.

    -Pete

    1. Re:SEO - SEM by silentbozo · · Score: 4, Insightful

      The problem is that telling the public what your site is about is equivalent to telling search engines what your page is about. Aside from meta-tags (which should really be all you need in order to communicate "additional" info to search engines), any change to your website to "optimize" for a specific type of search engine, and not for the general public, has the effect upgrading your page ranking AT THE EXPENSE OF NON-OPTIMIZED SITES.

      Here we go into the slippery slope that leads to situations like the tradgedy of the commons (where people tend to use up a resource because it isn't theirs), the hiring of lawyers (statistically, if one side hires a lawyer, they get better results, but if both sides hire lawyers they get the same settlement, only smaller because of lawyers fees), etc. It's the prisoner's dilemma - defect (ie, optimize) to improve my position, at the risk of everybody else defecting and earning worse returns than non defecting in the first place (ie, everybody stops using google because the rankings are screwed up and are no longer trustworthy.)

      Put simply, the moment any site tries to game the system, even just a little bit, they ruin the usefulness of Google. As it stands, I'm getting better results with Metacrawler now than with Google - something I wouldn't have said just a year ago. Don't even get me started on websites with javascript-redirect gateway pages, or the ones that scrape search-engine/newsgroup/eBay pages for text in order to boost hit counts, and then link back to similar pages in order to get higher link relevancy, OR the ones that take over abandoned domains in order to exploit the ranking generated by pre-existing links that point to the domain name...

  18. Cocks. by WhodoVoodoo · · Score: 5, Interesting

    Actually, I find an intersting way to rate search engines is to search for the word "cocks"

    yeah, I know what your thinking.

    You typically get a couple things from this search:

    Porn (duh)
    Chicken related things
    and the band "The Revolting Cocks"

    By looking at which ones come up first, you can infer some interesting and useful things about how an engine works. What those things are I will let you decide.
    Mostly because it's funnier.

    But seriously, folks, try it out.

  19. Comapre the Algorithms manually by GoogleGuy · · Score: 4, Informative

    The challenge for Google and Yahoo is to filter out the SEO spam (Doorways, cloaking, ...)

    Check out the algorithms yourself by comparing google and yahoo search results side by side.

  20. Re:A layman's view by pledibus · · Score: 5, Interesting

    I think google's ranking system needs a major overhaul; various sleazy companies have become *much* too effective at fooling it. For example, below are the first three hits that I got by typing "prozac suicide" into google (I've deleted the URLs to protect the guilty :-). Most of the top 20 hits are similar to these.

    prozac suicide
    Prozac prozac suicide. prozac nation nude Viagra prozac hair loss Paxil
    prozac dogs Yasmin ssri prozac Propecia prozac ocd. ... prozac suicide. ...

    Prozac Suicide - Shopping and Discounts - PROZAC SUICIDE
    Prozac Suicide Prozac Suicide. Are you looking for Prozac Suicide? We've searched
    the internet for the best Prozac Suicide and we hope you enjoy what you find! ...

    Prozac Suicide
    Real Pharm - Lowest Prices & Fantastic Service - Prozac Suicide, ... Prozac
    Suicide Prozac Suicide. Prozac(R) is a selective serotonin ...

  21. The search engines just need moderation by pj2541 · · Score: 4, Insightful

    But the only choices should be "Interesting" and "Troll." If each vote added or subtracted a very small amount from the page rank, and steps were taken to prevent stuffing the ballot box, I think this would actually improve the search results for the users.

  22. Clarification! by Ayanami+Rei · · Score: 4, Informative

    The article submitter is SPECIFICALLY trying to profile slashdot readership. Clearly the Anonymous Coward is either the article's author, or someone with a vested interest in our opinions on this topic, but someone who can't look at gorank's referral logs.

    This is VERY sneaky (akin to putting an Amazon referral link in a book review).

    Do NOT click on the link. If the submitter had actually bothered to use a logged in slashdot account, I would be more trusting.

    Copy Link location, open new browser window, paste.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  23. Ads good at filtering out crap by 0x0d0a · · Score: 4, Informative

    I've had excellent luck using Google's ads for one thing -- when I'm looking for a retailer to buy something. Not infrequently when trying to buy something, I come up with plenty of garbage and irrelevant results, but the paid advertisements are there because the people are trying to sell me what I want (and they are interested in not wasting impressions on people that *aren't* interested in their product, so they have a positive incentive to focus their ads).

  24. Yahoo uses more than keyword density by elflet · · Score: 5, Informative
    "Keyword density" is a favorite SEO trick for trying to get a page to rank more highly, along with engine-specific tricks (e.g. getting people to link to your page with they keywords you want in the link to drive a Google ranking higher. I just ran a handful of experiemnts with long-established (8+ years), high-ranking pages and found a few interesting things in Yahoo:
    • Incoming link popularity appears to play a far smaller role than on Google. Pages that are "top of page 1" material in Google due to their oncoming links don't even show up on top of Yahoo.
    • Yahoo is using the meta Description tag, at least in the display (but it also looks like they're using it for ranking.)
    • They're giving extreme weight to items that show up in the Yahoo directory (which has been pay-for-inclusion for the most part the past several years.) In fact, one of my pages which has changed titles shows up in yahoo search under a 6 year old title (the one used to list it in the directory, natch.)
    • Yahoo is also giving heavy weight to keywords that show up in URLs.
    • Keyword cramming seems to move sites up on Yahoo (very annoying, especially for those of us who would rather get placed via honest content.)
    To be honest, Yahoo's new engine reminds me of circa-1996 engines. Go run the same search on Yahoo and Google and see what comes back with better relevance (Google still looks better to me.)