Yahoo! Vs. Google: Algorithm Standoff
An anonymous reader writes "There's a new report out from the guys who brought us the Google keyword density analysis. As they put it, "the goal of this analysis is to compare the keyword density elements of Yahoo's new algorithm with Google's algorithm." They compared 2000 low traffic, non-competitive keywords in the hopes of seeing the algorithms more clearly, without any possible search engine tweakings related to high-traffic keywords. Their findings are interesting. Should you go and rebuild your site based on these findings? Maybe not. It's worth a look though."
Gee, aren't these the guys responsible for continually diluting the quality of search engine results? I'm getting really tired of sites that present one thing to search engines and something totally different to me.
Just grab a friend and a deck of cards, and you can play Yahoo vs. Google at home.
The speed of time is one second per second.
Google is way too embedded in everyones everyday life, it will just naturally be more widely used. When was the last time you heard someone say "Yahoo it"?
Setec Astronomy
That's why They have: http://Search.yahoo.com
RTFM, Yahoo is switching to their own engine.
Personally, I find the differences in how the two engines handle bold text to be most interesting. If only for that, I'd stick to Google.
Most pages that have 17 occurences of your search text in bold are only going to be Porn sites ((unrelated to your search)) or Spam sites ((unrelated to your search)).
This is essentially a problem in pattern recognition, and it's a damn hard problem to solve because of the disparity between the high-volume and low-volume words.
Information is essentially the inverse of entropy. Entropy can be calculated, and you can use Bayes probability theory to get a hold on the information content of a given word within a set of words.
What is difficult to do, and what search engines are trying to do, is measure the mutual information inherent between the set of pages that the word appears in, and the word itself, then apply that to all the words in the searched-for phrase; this is commonly called 'context'. This is plainly impossible to do for every given phrase, for every word combination, for every page indexed. The best you can do is use a statistical approach (and Bayes is your friend again) to come up with "good" matches.
The problem with the statistical approach is the class unbiasing, since once you have wildly different statistical populations, your choice of context gets harder and harder - the "easy" standard models don't cope very well. You don't have the computational resources to do a good analysis, so you're essentially stuck between a rock and a hard place.
This is why the google idea of strengthening the importance of a word depending on linked pages was such a good one - it "did" the hard work by relying on the entire planet to do it for them, by creating links. Of course, what one man can do, another can undo, and Google has got progressively worse over time. It's still by-far the best though, and my search engine of choice. When you look at the queries from search-sites, I get 100x as many from Google as Yahoo (next nearest)....
People think searching is easy, and it is. What's really really hard is searching *well*.
Simon
Physicists get Hadrons!
When I search for something, I don't want to get a page that's a marketing front for what I'm trying to find, I want an informational, probably technical, page on the item I'm searching for.
Such pages don't usually mindlessly repeat the keyword I'm searching for over and over again.
tasks(723) drafts(105) languages(484) examples(29106)
Heh...
Well that's all well and good, but how many people would know to type that in?
Has anyone looked at altavista lately? They've certainly taken the Google route, and their home page looks a lot like Google now, as does search.yahoo.com. However, in search.yahoo.com _and_ altavista, I noticed that "sponsored results" show up before the real ones, but they appear in the list just the same. That could confuse newbies, and I prefer the approach Google has taken to advertising (shoving the ads to a separate entity on the right, and keeping them text-based).
I know many people who use Yahoo! as a home page and they like the many services that are offered by Yahoo! besides just the search facilities. If all they wanted was search I doubt they would use yahoo.com for their homepage.
"but money is the God of Algiers & Mahomet their prophet." - Rich. O'Bryen June 8th 1786
I'm one of those greybeards who was writing college reports in the pre-BBS days, never mind the World Wide Web. Remembering back to when I used to spend a half-day of research in the library to mine info that now magically appears on my computer screen in ten seconds, well...it's hard to throw stones. I'm just happy the damned things work at all.
Anti-gravity? That was *my* little secret! But I never patented it! Boy, was *that* dumb!
I've been on vacation and away from internet and most mass media for a week. Got back on Monday and have noticed a drop in traffic to my web sites while I was gone. Didn't have a clue why. Well, now I know.
I'll be watching this very closely. Inktomi (sp?) sucked, which is what this is based on. I think it's too early to tell right now if the results are any good. Along the same lines, it will probably take about 6 months for marketers to learn to effectivly spam the results, which is something Google has historically been very good at keeping at bay.
This will be interesting to watch over the next few months.
-Pete
Soccer Goal Plans
"Yahoo failed when it became a "portal"..."
It failed? If a market cap of 28 BILLION dollars is failure, what do I have to do wrong to get there?
Is anyone else getting so annoyed by pages which grab your keyword and then direct you to Amazon, no matter what the topic? Seems that every time I do a search on Google and find a site which looks interesting they're either just ripping Amazon's content or redirecting me there.
Guys, if I wanted to go to Amazon I would just type "www.amazon.co.uk" into my browser.. If I'm searching on Google it's because I've either already looked at Amazon and didn't find what I want, or because Amazon is really not relevant..
I've started adding "-amazon -kelkoo -dooyoo -pricewatch" and others to my Google searches recently which helps cut down the chaff a little, but doesn't seem to cut out all the Amazon ripoffs.
Q.
Yahoo never was a search engine in the pure sense of word. Yahoo started out as a browsable catalogue of the Web, where every entry was put into categories by hand. The automated search came later and was bought as service from external providers up until now.
While I know that various search engines use various core ideas in search, I would think that a better way to search would use multiple approaches. Some combination of link-based analysis, keyword analysis, expert analysis, cluster-analysis, etc. rather than a single "this-is-how-we-do-it-here" algorithm.
The first big challenge in search is in disambiguating what the searcher really wants without requiring a long string of inputs. A multiple-algoithmic approach would let a search engine serve up hits gathered in multiple ways (e.g., hit number 1 was top ranked using mehtod 1, hit #2 was top ranked using methd 2, etc.). The search company could then see which algorithm provides the best hits for a given search (i.e., by watching which hits the searcher clicks on).
The second big challenge is all the nasty spammers and SEOs (Search Engine Optimizers) who will try to use knowledge of any search algorithm to game the system and artificially raise their page rank for commerical purposes. This is probably one reason why Google cannot maintain dominance - any dominant search enegine attracts the concerted efforts of SEOs, thus ruining its search quality, thus ruining its dominance.
Yet a multi-algorithmic search engine could create a moving target that frustrates SEOs. By rotating the algorithms and even using negative weights on some algorithm results, a multi-algorithmic search company could cause high-ranked pages to plummet in rank over time. One week, a heavily keyworded site (e.g., one listing every possible keyword in metadata) might be at the top of the list, the next week it is at the bottom of the list. This raises the cost to sites trying to game the system. (The search company might even reward or penalize sites that change structure to often to either find the freshest sites or penalize the efforts of SEO).
There never can be one right way to do search.
Two wrongs don't make a right, but three lefts do.
Your statement is not completely correct. There is nothing "fake" about a cluster based supercomputer. In fact, all sufficiently large supercomputers are cluster based. Many of them use special purpose, low latency NICS and switches, and proprietary communication protocols, but the underlying principle of a Beowulf cluster is the same as that of the Earth simulator.
Try out fish, the friendly interactive shell.
Actually, I find an intersting way to rate search engines is to search for the word "cocks"
yeah, I know what your thinking.
You typically get a couple things from this search:
Porn (duh)
Chicken related things
and the band "The Revolting Cocks"
By looking at which ones come up first, you can infer some interesting and useful things about how an engine works. What those things are I will let you decide.
Mostly because it's funnier.
But seriously, folks, try it out.
I think google's ranking system needs a major overhaul; various sleazy companies have become *much* too effective at fooling it. For example, below are the first three hits that I got by typing "prozac suicide" into google (I've deleted the URLs to protect the guilty :-). Most of the top 20 hits are similar to these.
... prozac suicide. ...
...
... Prozac ...
prozac suicide
Prozac prozac suicide. prozac nation nude Viagra prozac hair loss Paxil
prozac dogs Yasmin ssri prozac Propecia prozac ocd.
Prozac Suicide - Shopping and Discounts - PROZAC SUICIDE
Prozac Suicide Prozac Suicide. Are you looking for Prozac Suicide? We've searched
the internet for the best Prozac Suicide and we hope you enjoy what you find!
Prozac Suicide
Real Pharm - Lowest Prices & Fantastic Service - Prozac Suicide,
Suicide Prozac Suicide. Prozac(R) is a selective serotonin
- Incoming link popularity appears to play a far smaller role than on Google. Pages that are "top of page 1" material in Google due to their oncoming links don't even show up on top of Yahoo.
- Yahoo is using the meta Description tag, at least in the display (but it also looks like they're using it for ranking.)
- They're giving extreme weight to items that show up in the Yahoo directory (which has been pay-for-inclusion for the most part the past several years.) In fact, one of my pages which has changed titles shows up in yahoo search under a 6 year old title (the one used to list it in the directory, natch.)
- Yahoo is also giving heavy weight to keywords that show up in URLs.
- Keyword cramming seems to move sites up on Yahoo (very annoying, especially for those of us who would rather get placed via honest content.)
To be honest, Yahoo's new engine reminds me of circa-1996 engines. Go run the same search on Yahoo and Google and see what comes back with better relevance (Google still looks better to me.)