Slashdot Mirror


Yahoo! Vs. Google: Algorithm Standoff

An anonymous reader writes "There's a new report out from the guys who brought us the Google keyword density analysis. As they put it, "the goal of this analysis is to compare the keyword density elements of Yahoo's new algorithm with Google's algorithm." They compared 2000 low traffic, non-competitive keywords in the hopes of seeing the algorithms more clearly, without any possible search engine tweakings related to high-traffic keywords. Their findings are interesting. Should you go and rebuild your site based on these findings? Maybe not. It's worth a look though."

44 of 270 comments (clear)

  1. Search Engine Optimization Professional by Anonymous Coward · · Score: 5, Interesting

    Gee, aren't these the guys responsible for continually diluting the quality of search engine results? I'm getting really tired of sites that present one thing to search engines and something totally different to me.

    1. Re:Search Engine Optimization Professional by Araneas · · Score: 5, Interesting
      It's an escalating battle. Someone hijacks a keyword that is highly relevant to your site so you have to figure how to overcome that and give users something that isn't porn or a crappy search portal.

      I think it's fair to say there are white hat SEOs as well as black hat hijack^H^H^H^H^H^H SEOs.

    2. Re:Search Engine Optimization Professional by dargaud · · Score: 5, Interesting
      That's what I wanted to submit to the Google programming contest, but it wasn't admittable:
      • Make a 2nd robot that retrieves a few full web pages (with graphics) per site claiming to be IE6 (or a normal Mozila), thus lying about it being from google.
      • Display the page in IE6 (or Mozilla), save the entire display as a bitmap image.
      • Run the bitmap image through an OCR program to extract the real text seen by the user
      • Compare this text with what the ordinary google robot sees.
      • If the text is completely different, lover the ranking
      This gets rid of all the blue on blue keywords, display:none keywords and others. I think it will come to that.
      --
      Non-Linux Penguins ?
    3. Re:Search Engine Optimization Professional by SiggyRadiation · · Score: 3, Interesting

      Run the bitmap image through an OCR program to extract the real text seen by the user

      Wouldn't it be smarter to just render both versions and compare bitmaps? No need to OCR then...

      --
      This unique sig is intended to make this user more recognisable.
    4. Re:Search Engine Optimization Professional by samhalliday · · Score: 4, Interesting
      thats ridiculous... OCR is not needed in this scenario, it is easy enough to write a program to find out what colour the background and foreground of text is, its probably just takes too much time to factor this in to the equation. your method would take _at least_ 10 seconds to even check a simple page (assuming all the code worked, which it wouldn't, cuz its OCR).

      and, this way you are giving a lower ranking to pages which use text in images. it is not good practice to have all the text embedded in images, but it is often necessary for sytle purposes; an example being the logo of a site (ok, alt= should handle this). hell, i even do it! its cleaner than hoping the person on the other side can render the same fonts as me (which would be impossible cuz i filtered then thorugh GIMP to add some effects).

      a lot of sites auto detect robots based on what you are saying, and either block them or launch a seek-and-destroy attack against you. to get around this, the file /robots.txt (which every large site should have) WILL be read by the google/yahoo prowler no matter what, and abided by. it plays the prominent role in what the search engines read... not the server reading the browser tag.

      thats without even going into the algorithms of matching the read OCR text up against the text from the source.

    5. Re:Search Engine Optimization Professional by Woogiemonger · · Score: 3, Interesting

      A lot of times, text would be masked by making it a color that blends into the background graphic. A plain background color is intelligible by an HTML parser, but you would need to do at least some form of color histogram/pattern recognition on the background graphic to determine whether or not it is likely to mask keywords. Honestly, I think it's a nice idea, and it's not like every page has to be scanned. It's a way of filtering out a few relatively obvious bad apples, or at least some rather irritatingly hard to read web sites.

    6. Re:Search Engine Optimization Professional by 0x0d0a · · Score: 2, Interesting

      I wouldn't oppose Flash navigation on a media-rich site

      Umm. By definition, if a site is loaded with Flash, it's media-rich. :-)

      I just can't really see Flash being a benefit. Folks thought that it was useful back when it was novel -- ("Look, the web page makes sounds when I click!"). We've gone through this same "novel" phones so many times on the web that it's depressing. When music came out, everyone had to put music on their personal pages, and at first it was kind of cute. Then it got really annoying. Even before that, there was GIF animation. I remember the first time that I saw GIF89a animation. I was enthralled. Here's a copy of this newish Netscape Navigator program and *stuff moves on the screen*. Surprise, a year later, with way too many sites using animated GIFs, I never wanted to see them again (and fortunately, my browser lets me disable their animation).

      Flash is the same thing. It only interests anyone because it's novel. There just aren't any good justifications for using it.

      Actually, no. I believe I've used effectively once before. There was a new MP3 player of some sort out, and you could use an embedded Flash file to try out the interface and see what you liked. That was actually a useful thing.

      Aside from that, Flash on webpages is useless.

      Flash still has some merit for standalone video, as there are no other good vector animation formats (SVG is just plain not designed for animation).

    7. Re:Search Engine Optimization Professional by ichimunki · · Score: 3, Interesting

      Yes, but it seems to me that like another escalating battle there will be a simple agent-based, learning algorithm solution.

      Bayesian filters learn to recognize spam and are personalized to the user. They are at least as effective as rules-based mail filters, but very effectively halt the rules race (where the filter writer writes a rule to filter by, and the spammer figures out a way around the rule, rinse, repeat).

      We need something like that for web pages and web searching. It's not just about keywords with sites like Google. It's also about the other parts of their page rank scheme. But imagine that your spider software (unlike Google) was grading its results. For any bad result it could go back and score against every page involved in getting to that result. Same for good results. Next time you search it gives more emphasis to pages in good result trees. Etc.

      I mean, that's not an actual technical idea there (I can think of lots of problems with that sort of spider/agent idea that would keep it from being practical). But that's the kind of thinking we need to be doing. Is there a way to solve this problem of finding information that won't involve a central repository of keyword scores and rankings?

      --
      I do not have a signature
    8. Re:Search Engine Optimization Professional by Hentai · · Score: 2, Interesting

      Now THERE'S an interesting idea - a Google subscription service. I know I'd pay Google $20/mo to dedicate a few megs to customized Bayesian filters that learn MY particular search needs, and remember them for next time. It'd depend on their privacy policy, though.

      --
      -Hentai [in vita non pacem est]
    9. Re:Search Engine Optimization Professional by JimDabell · · Score: 2, Interesting

      Or even better, just use an intelligent html parser that can work out if text would be hidden and ignore it if it is.

      There are legitimate reasons for hiding text. For example, putting help text into a page, and only showing it when the user clicks a help button (far more friendly than popups).

    10. Re:Search Engine Optimization Professional by DrSkwid · · Score: 5, Interesting

      Besides, in which way does Flash exclude other operating systems?

      Let's see

      Mozilla on FreeBSD (that's me) :

      We are unable to locate a single Web player that best matches your platform and operating system

      Mothra on plan9 (also me)

      We are unable to locate a single Web player that best matches your platform and operating system

      The acceptable list is :

      Windows 98/ME/2000/XP - Internet Explorer/AOL/Netscape/Mozilla/Opera/CompuServe - Flash 7

      Mac OSX / OS9 - Internet Explorer/Safari/Netscape/Mozilla/Opera - Flash 7

      Other Operating Systems
      Linux x86 Flash Player 6 for Mozilla 1.1 - (Not officially supported by Macromedia.)
      Pocket PC Flash Player 6 for Pocket PC 2003 (color devices supported only)
      OS/2 Flash Player 4 for Netscape
      Sun Solaris (Sparc/Intel) Flash Player 6 for Netscape
      HP-UX Flash Player 6 for Netscape
      SGI IRIX Flash Player 4 for Netscape

      On my 500,000 page impression web site, using Flash would have excluded the otherwise successful visitors running the following OS

      CPM
      Windows 3.xx
      WebTV
      OSF Unix
      Aix
      NetBSD

      I will admit that the actual numbers are low but being excluded/ignored is how us non Windows users are treated day in day out. Seems you can't fight the pigopolists.

      --
      There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  2. If Yahoo wants my vote... by PoprocksCk · · Score: 4, Interesting

    ...they'll have to get rid of all that junk on their home page. Much of the reason for my using Google is that its home page is simple, it loads quickly, and it is just so easy to _search_, which is what a search engine should be. Yahoo failed when it became a "portal" and tried to do too much by itself. If they could somehow reduce the size of Yahoo's page down to that of Google (that would mean getting rid of those ads, guys) then maybe I'd consider trying it.

    1. Re:If Yahoo wants my vote... by PoprocksCk · · Score: 5, Interesting

      Heh...

      Well that's all well and good, but how many people would know to type that in?

      Has anyone looked at altavista lately? They've certainly taken the Google route, and their home page looks a lot like Google now, as does search.yahoo.com. However, in search.yahoo.com _and_ altavista, I noticed that "sponsored results" show up before the real ones, but they appear in the list just the same. That could confuse newbies, and I prefer the approach Google has taken to advertising (shoving the ads to a separate entity on the right, and keeping them text-based).

    2. Re:If Yahoo wants my vote... by demonbug · · Score: 2, Interesting

      I've been using Yahoo for years as my homepage, it was quick and easy to set up nice news summaries and stock market summaries on things I was interested in. Their search feature always sucked, though, so I have used Google for that purpose. Unless Yahoo Search comes up with much better results than Google, I see no reason to change this. It isn't all that hard to type in "google.com" when I want to search for something.

  3. And a User Friendly game to go along! by Indras · · Score: 5, Interesting

    Just grab a friend and a deck of cards, and you can play Yahoo vs. Google at home.

    --
    The speed of time is one second per second.
    1. Re:And a User Friendly game to go along! by Anonymous Coward · · Score: 5, Interesting

      Chris Langreiter has a cute toy to compare Yahoo vs. Google results.
      Touch the dots !

      It's written in REBOL

  4. Google Super Computer? by YanceyAI · · Score: 3, Interesting

    Wasn't there a Slashdot article claiming that the Google servers may be the fastest super computer in the world, but they are so busy they couldn't run the benchmark? I can't find it now. If that's the case, how does Yahoo compete? By dividing the traffic? Can anyone link me?

    --
    Can I bum a sig?
  5. Re:Yahoo? by MoriarGryphon · · Score: 5, Interesting

    RTFM, Yahoo is switching to their own engine.

    Personally, I find the differences in how the two engines handle bold text to be most interesting. If only for that, I'd stick to Google.

    Most pages that have 17 occurences of your search text in bold are only going to be Porn sites ((unrelated to your search)) or Spam sites ((unrelated to your search)).

  6. Pattern Recognition by Space+cowboy · · Score: 5, Interesting

    This is essentially a problem in pattern recognition, and it's a damn hard problem to solve because of the disparity between the high-volume and low-volume words.

    Information is essentially the inverse of entropy. Entropy can be calculated, and you can use Bayes probability theory to get a hold on the information content of a given word within a set of words.

    What is difficult to do, and what search engines are trying to do, is measure the mutual information inherent between the set of pages that the word appears in, and the word itself, then apply that to all the words in the searched-for phrase; this is commonly called 'context'. This is plainly impossible to do for every given phrase, for every word combination, for every page indexed. The best you can do is use a statistical approach (and Bayes is your friend again) to come up with "good" matches.

    The problem with the statistical approach is the class unbiasing, since once you have wildly different statistical populations, your choice of context gets harder and harder - the "easy" standard models don't cope very well. You don't have the computational resources to do a good analysis, so you're essentially stuck between a rock and a hard place.

    This is why the google idea of strengthening the importance of a word depending on linked pages was such a good one - it "did" the hard work by relying on the entire planet to do it for them, by creating links. Of course, what one man can do, another can undo, and Google has got progressively worse over time. It's still by-far the best though, and my search engine of choice. When you look at the queries from search-sites, I get 100x as many from Google as Yahoo (next nearest)....

    People think searching is easy, and it is. What's really really hard is searching *well*.

    Simon

    --
    Physicists get Hadrons!
    1. Re:Pattern Recognition by Eivind · · Score: 5, Interesting
      And what is even harder, as you sorta hint at, is searching well in a world where thousands of people do their damnedest best to game the system.

      Google doesn't only have to make sense of a great big mess.

      It has to make sense of a great big mess where a significant part of the pages are made *spesifically* to confuse Google, and where a part of those same pages gets tuned regularily in dedicated attempts at confusing whichever algorithm google use more.

      Most of the cases where Google returns poor results these days, it's obvious to a human observer that the bad results on top are *purposely* made to confuse Google. I've even seen pages that return one set of content if your user-agent is "Googlebot", and another, totally different content (dialer, etc) if your user-agent is anything else.

  7. Keyword density?! by Short+Circuit · · Score: 5, Interesting

    When I search for something, I don't want to get a page that's a marketing front for what I'm trying to find, I want an informational, probably technical, page on the item I'm searching for.

    Such pages don't usually mindlessly repeat the keyword I'm searching for over and over again.

  8. My little test.. by CoolCat · · Score: 4, Interesting

    Just typed in the company I work for name (8 employees). First hit on google, yahoo.. I gave up after 9 pages..

    1. Re:My little test.. by levar · · Score: 2, Interesting

      weird. I just tried my company (5 employees). First hit on yahoo and not in the top 15 pages on google.

  9. So that's what happened! by peterdaly · · Score: 5, Interesting

    I've been on vacation and away from internet and most mass media for a week. Got back on Monday and have noticed a drop in traffic to my web sites while I was gone. Didn't have a clue why. Well, now I know.

    I'll be watching this very closely. Inktomi (sp?) sucked, which is what this is based on. I think it's too early to tell right now if the results are any good. Along the same lines, it will probably take about 6 months for marketers to learn to effectivly spam the results, which is something Google has historically been very good at keeping at bay.

    This will be interesting to watch over the next few months.

    -Pete

  10. Warning: You are being watched! by walter. · · Score: 5, Interesting
    Looks like someone is counting the slashdot community. One of the links in this post points to
    http://www.searchguild.com/redir/o.php?out=http:// www.gorank.com/research/01072004_Google_Density_Re port.php
    So someone at searchguild.com is counting every slashdot visitor who clicks on that link! The unredirected link points here.
  11. W3 compliance? by valentyn · · Score: 3, Interesting

    Slightly off topic: yesterday someone said that Google ranks W3-compliant pages higher than non-W3 compliant pages. I'm still confused. Could this be true?

    --
    my other sig is a 500 page novel
    1. Re:W3 compliance? by Aphrika · · Score: 3, Interesting

      In theory, it makes sense for Google to prioritise pages that adhere to W3C standards.

      Over-generalising here, it means you get a lot of professional sites rather than little Timmy's Frontpage creation, however, being a large corporation doesn't guarantee you a decently constructed site, and is no guarantee of it being W3C compliant.

      But then, Google probably sees this as a possible 80:20 rule - with the majority of W3C compliant sites probably offering something useful to index ,and index well, so they get priority over a page of junk that may or may not contain useful information.

    2. Re:W3 compliance? by BReflection · · Score: 2, Interesting

      This is interesting considering Google is not even W3C compliant. I guess when your on top you don't follow rules, you make them.

      --
      python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
  12. Missing the google point? by ItsIllak · · Score: 4, Interesting

    Isn't this missing the point of how google works? OK, so it measures the success, but it won't tell you anything (or much) about the actual search algorythm as google is actually basing the score not only on the page you link to but also pages that link to IT.

    Hence, it's an interesting read, and maybe you could draw your own preferences from what the weighting turns out to be in the listed cases, but it's not a very fair representation of how google works. *NB* I've no clue how Yahoo/Inktomi works, so I couldn't comment.

  13. Re:A layman's view by Quaryon · · Score: 5, Interesting

    Is anyone else getting so annoyed by pages which grab your keyword and then direct you to Amazon, no matter what the topic? Seems that every time I do a search on Google and find a site which looks interesting they're either just ripping Amazon's content or redirecting me there.

    Guys, if I wanted to go to Amazon I would just type "www.amazon.co.uk" into my browser.. If I'm searching on Google it's because I've either already looked at Amazon and didn't find what I want, or because Amazon is really not relevant..

    I've started adding "-amazon -kelkoo -dooyoo -pricewatch" and others to my Google searches recently which helps cut down the chaff a little, but doesn't seem to cut out all the Amazon ripoffs.

    Q.

  14. Re:A layman's view by Anonymous Coward · · Score: 2, Interesting

    >>
    I have to admit that I used to think google was incredible just after it came out, but nowadays I'm used to wading through 10-15 pages of results before finding something relevant to what I need.

    Yep. I agree. I search for something as simple as "Philips DVD driver" for a Philips DVDRom drive and I get at least five adds selling Philips CD/DVDRom drives before I find a "SINGLE" reference to Philips themselves. Is this what Google has become? Maybe I should have put an 's' on driver.

    Codifex Maximus

  15. Cocks. by WhodoVoodoo · · Score: 5, Interesting

    Actually, I find an intersting way to rate search engines is to search for the word "cocks"

    yeah, I know what your thinking.

    You typically get a couple things from this search:

    Porn (duh)
    Chicken related things
    and the band "The Revolting Cocks"

    By looking at which ones come up first, you can infer some interesting and useful things about how an engine works. What those things are I will let you decide.
    Mostly because it's funnier.

    But seriously, folks, try it out.

  16. Re:A layman's view by pledibus · · Score: 5, Interesting

    I think google's ranking system needs a major overhaul; various sleazy companies have become *much* too effective at fooling it. For example, below are the first three hits that I got by typing "prozac suicide" into google (I've deleted the URLs to protect the guilty :-). Most of the top 20 hits are similar to these.

    prozac suicide
    Prozac prozac suicide. prozac nation nude Viagra prozac hair loss Paxil
    prozac dogs Yasmin ssri prozac Propecia prozac ocd. ... prozac suicide. ...

    Prozac Suicide - Shopping and Discounts - PROZAC SUICIDE
    Prozac Suicide Prozac Suicide. Are you looking for Prozac Suicide? We've searched
    the internet for the best Prozac Suicide and we hope you enjoy what you find! ...

    Prozac Suicide
    Real Pharm - Lowest Prices & Fantastic Service - Prozac Suicide, ... Prozac
    Suicide Prozac Suicide. Prozac(R) is a selective serotonin ...

  17. Re:They are different by meta-monkey · · Score: 2, Interesting

    It's almost scary to google yourself, isn't it? I just did it and found a newspaper article I was quoted in from six years ago, a letter to the editor I wrote to my college newspaper and listings for various research projects I was a part of a long, long time ago. Thankfully, there's nothing incriminating there.

    Also, it was interesting to see that I seem to be the only person on the Internet with my name. A search for my name in quotes, first and last, with either the long form or short form of my first name, turned up links ONLY to me. Thankfully, I've never done anything truly embarassing that wound up in the papers, so I guess I'm safe. How much would that suck to do something assinine 10 years ago, get a blurb in the online version of your town's newspaper, and then have it turn up every time somebody searched for your name for the rest of your life? Ouch.

    --
    We don't have a state-run media we have a media-run state.
  18. Re:A layman's view by a24061 · · Score: 2, Interesting
    I wont use Yahoo for Search. I think they are hella shady with their privacy policies

    What about Google's perpetual data retention and refusal to say what they may or may not do with the info?

  19. Re:SEO - SEM by RevDobbs · · Score: 3, Interesting
    Aside from meta-tags (which should really be all you need in order to communicate "additional" info to search engines), any change to your website to "optimize" for a specific type of search engine, and not for the general public, has the effect upgrading your page ranking AT THE EXPENSE OF NON-OPTIMIZED SITES.

    But like the "SEO v. SEM" argument above, search engine optimization done right will also give better results to the end user.

    Think about it: if I'm looking for the specs on Widget A and the best damn website on Widget A makes me sit through a 135 second flash animation before I can get to any usefull content, I'm going to miss all that valuable information because I'm not wasting my time or bandwidth loading that crap.

    Now, what if the second best Widget A site is ran by people with a clue: title tags contain the important keywords ("bulk pricing", "failure modes", "Mil/Commercial/Industrial specification compliance"), easy-to-use navigation that tells me by the link text this is what I want? Well, than this is the most useful site, and should be ranked higher than the others.

    Search engines are just distiliers of information; super-quick page scanners. If you make your page human-scanable and easy to use, your relevence will rise higher than other pages. By effectivly telling people what your pages are about, you'll be effectivly telling the search engines what your pages are about.

  20. Re:If you're "wading through 10-15 pages of result by TheLink · · Score: 2, Interesting

    Actually google has got worse.

    Now many of my web searches tend to turn up tons of mailing lists archives. If I want to search those I'd use google groups (I get about the same results for my search terms in google groups).

    I'm actually not that surprised - when I first heard they were using Page Rank some years back, I wondered how long that would keep working. It's easy to manipulate, plus it's kind of circular.

    --
  21. Re:The Problem with Search Algorithm Monocultures by 0x0d0a · · Score: 2, Interesting

    "Evil" is meaningless.

    It's clearly in search engine spammers' benefit to do so (much like email spammers).

    It also clearly disadvantages users, since PageRank is a pretty good metric (outside of people trying to game the system) of usefulness.

    You clearly have some interest in discussing SEO. The parent has some interest in discussing thwarting SEO. I'd that that the second subject has at least as much merit (as in, it benefits a large group of people a good deal), and is certainly equally interesting.

    Now, it's true that simply eliminating SEO-using sites may not be worthwhile -- it's possible that some SEO-using sites have merit, and over-penalization is possible.

    Increasing the difficulty of SEO analysis is interesting. A couple of other interesting possibilities:

    * It might be interesting to try to specifically identify users trying to "game the system" and start feeding them slightly shuffled results. As long as the shuffling isn't too heavy, it even false positives with this shouldn't be too painful.

    * It might be interesting to try to identify sites attempting to utilize SEO and penalize them. Frankly, the kind of sites that use SEO are generally the sort of thing that I *don't* want to find.

    * Not quite as nice, but it might be interesting to try to identify clouds of SEO sites. For example, Google seeds an inverse trust network by posting to an SEO site (and posing as an SEO) a particularly complex approach to SEO. A site implementents it, and it is immediately a "known using SEO" site. Google tries to identify sites that are "related to" it a la PageRank and looks for sites that adopt similar measures, considering them to be SEO-ized sites with a somewhat smaller probability.

  22. My advice: work hard on content by MarkWatson · · Score: 2, Interesting
    I am fortunate to be the number 1 hit for the keywords "Java consultant" on Google and Yahoo.

    I have never played any games what so ever to get there. What I do however is try very hard to place interesting and useful content on my site (mostly 'free web books').

    I don't think that it matters so much what you do in life so long as you love doing it. I have been programming computers since the early 1960s, and I still love it!

    -Mark

  23. $25/hour? by wodelltech · · Score: 2, Interesting

    I am complete befuddled as to how/why you charge so low with so much experience and a top Google rank. What's up? Is money just not an issue?

    --
    Your monitor is staring at you.
  24. Missing Domain Name Data Points by PetoskeyGuy · · Score: 2, Interesting

    Domain Names.

    Search Engines definately give rank to domains which contain your keyword in them. Tons of sites out there seem to have figured this out to make searches useless. There are tons of "keyword.useless-site.com" dictionary pages out there.

    I would really like to see the search engines be able to figure out that certain pages make no sense. They read like something from the old SNL subliminal man skits. Or site that bounce you somewhere else as soon as you arrive.

  25. Look Out For Yahoo! Lawyers... by herrvinny · · Score: 2, Interesting

    According to Whois information (CAPTCHA required), yahooslurp.com is owned by a flower store site. How long until Yahoo figures this out and hammers the store into the ground?

  26. Re:Clarification! by Matthias+Wiesmann · · Score: 2, Interesting
    Or click it a zillion times and clear cookies each time.
    Why click? If you want to put silly things in their logs, simply follow this stupid link, with cookies disabled it inlines the inline page in itself a few times before inlining the google cache of slashdot.
  27. Google Versus Yahoo -- And Results by etLux · · Score: 2, Interesting

    As an operation with several dozen websites with fairly substantial traffic, we tend to look at all this from the other direction. Google consistently delivers a whopping THIRTY TIMES more traffic than Yahoo, network-wide. Guess whose "algorithm" we like better...