Yahoo! Vs. Google: Algorithm Standoff

← Back to Stories (view on slashdot.org)

Yahoo! Vs. Google: Algorithm Standoff

Posted by timothy on Wednesday February 25, 2004 @12:53AM from the reverse-engineering dept.

An anonymous reader writes "There's a new report out from the guys who brought us the Google keyword density analysis. As they put it, "the goal of this analysis is to compare the keyword density elements of Yahoo's new algorithm with Google's algorithm." They compared 2000 low traffic, non-competitive keywords in the hopes of seeing the algorithms more clearly, without any possible search engine tweakings related to high-traffic keywords. Their findings are interesting. Should you go and rebuild your site based on these findings? Maybe not. It's worth a look though."

23 of 270 comments (clear)

Min score:

Reason:

Sort:

Uhhh... by Anonymous Coward · 2004-02-25 00:58 · Score: 1, Insightful

I don't get what the results of this test were. Did Google have better handling of the density, or did Yahoo? Is bigger better or does smaller win out?
I think by Bishop,+Martin · 2004-02-25 01:01 · Score: 5, Insightful

Google is way too embedded in everyones everyday life, it will just naturally be more widely used. When was the last time you heard someone say "Yahoo it"?

--
Setec Astronomy
A layman's view by EulerX07 · 2004-02-25 01:03 · Score: 3, Insightful

Yesyer I was hearing a colleague curse at his computer yesterday because he was looking for something specific.

"Man, Goggle SUCKS now!, I'll try yahoo."

"DAMN! Yahoo sucks even more!"

I have to admit that I used to think google was incredible just after it came out, but nowadays I'm used to wading through 10-15 pages of results before finding something relevant to what I need.
It's All Magic... by photonX · 2004-02-25 01:12 · Score: 5, Insightful

I'm one of those greybeards who was writing college reports in the pre-BBS days, never mind the World Wide Web. Remembering back to when I used to spend a half-day of research in the library to mine info that now magically appears on my computer screen in ten seconds, well...it's hard to throw stones. I'm just happy the damned things work at all.

--
Anti-gravity? That was *my* little secret! But I never patented it! Boy, was *that* dumb!
1. Re:It's All Magic... by Araneas · 2004-02-25 01:44 · Score: 4, Insightful
  
  I shave but the moustache is getting a little white. ;)
  What I miss is looking in the card catalogue under the general subject and being able to pull out all sorts of related material I hadn't thought of. Same for browseing the stacks. Grab the general Dewey number and go surf the titles.
  Wetware fuzzy logic at its best.
Sale sites. by Bender+Unit+22 · 2004-02-25 01:21 · Score: 2, Insightful

I have seen that sites that does nothing but sells stuff, has gotten higher rankings lately. But maybe I just need to be more specific in my searches.
The Problem with Search Algorithm Monocultures by G4from128k · 2004-02-25 01:34 · Score: 5, Insightful

While I know that various search engines use various core ideas in search, I would think that a better way to search would use multiple approaches. Some combination of link-based analysis, keyword analysis, expert analysis, cluster-analysis, etc. rather than a single "this-is-how-we-do-it-here" algorithm.

The first big challenge in search is in disambiguating what the searcher really wants without requiring a long string of inputs. A multiple-algoithmic approach would let a search engine serve up hits gathered in multiple ways (e.g., hit number 1 was top ranked using mehtod 1, hit #2 was top ranked using methd 2, etc.). The search company could then see which algorithm provides the best hits for a given search (i.e., by watching which hits the searcher clicks on).

The second big challenge is all the nasty spammers and SEOs (Search Engine Optimizers) who will try to use knowledge of any search algorithm to game the system and artificially raise their page rank for commerical purposes. This is probably one reason why Google cannot maintain dominance - any dominant search enegine attracts the concerted efforts of SEOs, thus ruining its search quality, thus ruining its dominance.

Yet a multi-algorithmic search engine could create a moving target that frustrates SEOs. By rotating the algorithms and even using negative weights on some algorithm results, a multi-algorithmic search company could cause high-ranked pages to plummet in rank over time. One week, a heavily keyworded site (e.g., one listing every possible keyword in metadata) might be at the top of the list, the next week it is at the bottom of the list. This raises the cost to sites trying to game the system. (The search company might even reward or penalize sites that change structure to often to either find the freshest sites or penalize the efforts of SEO).

There never can be one right way to do search.

--
Two wrongs don't make a right, but three lefts do.
Re:Search Engine Optimization Professional by Anonymous Coward · 2004-02-25 01:41 · Score: 5, Insightful

Or even better, just use an intelligent html parser that can work out if text would be hidden and ignore it if it is.
They are search engine spammers by Anonymous Coward · 2004-02-25 01:44 · Score: 3, Insightful

yeah trying to figure out how to get to the top of search engines by analysing keyword density so you can then construct copy text with fake entry pages or as the se.spammers call them "gateway" pages with 302 redirects via the useragent or constructing urls/with/the/keywords using ModRewrite

we know what they are up to, spamming search engines peddling shite with their refferer links

fuckers, these people are the reason 90% of search engines suck and who are rapidly poising google so in 5 years no-one can find shit without being taken for circlejerks and wading through shitty websites peddling porn,viagra and whatever shit is flavour of the month, if thats what the internet i see is gonna turn into then why the fuck do i bother

and we link em here at slashdot
i wouldnt give these people the time of day

A>S
Re:Pattern Recognition by fermion · 2004-02-25 01:45 · Score: 2, Insightful

The other problem with a statistical approach the general assumption the the data is largely unbiased. The problem with search engines is that they assume that information gathered from a self selected unmonitored population is valid. In the pre-google days this meant that we assumed that individual keywords were meaningful. Now we assume that links are meaningful. Neither of these are strictly true as we have intelligent agents with the mean and motivation to lie.
Statistically we should have some information gathering and analysis targeted towards assessing the validity of the information. Are the links themselves information or entropy? This is what google is working on. I think we are going to need some human processing, which is the link at the bottom of the pages asking if the results are useful.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
more isnt always better by dpw2atox · 2004-02-25 02:03 · Score: 3, Insightful

From what I have seen in the past as well as currently more results is not always better. One of the primary reasons I use google as my search engine is because it has very accurate results. I would rather have a search engine display 10 results which are accurate than 100 results which are completly wrong. This article might show that yahoo displays more results in certain areas but I plan on using both services for searches over the next few weeks to see which one is more accurate.
Re:SEO - SEM by silentbozo · 2004-02-25 02:25 · Score: 4, Insightful

The problem is that telling the public what your site is about is equivalent to telling search engines what your page is about. Aside from meta-tags (which should really be all you need in order to communicate "additional" info to search engines), any change to your website to "optimize" for a specific type of search engine, and not for the general public, has the effect upgrading your page ranking AT THE EXPENSE OF NON-OPTIMIZED SITES.

Here we go into the slippery slope that leads to situations like the tradgedy of the commons (where people tend to use up a resource because it isn't theirs), the hiring of lawyers (statistically, if one side hires a lawyer, they get better results, but if both sides hire lawyers they get the same settlement, only smaller because of lawyers fees), etc. It's the prisoner's dilemma - defect (ie, optimize) to improve my position, at the risk of everybody else defecting and earning worse returns than non defecting in the first place (ie, everybody stops using google because the rankings are screwed up and are no longer trustworthy.)

Put simply, the moment any site tries to game the system, even just a little bit, they ruin the usefulness of Google. As it stands, I'm getting better results with Metacrawler now than with Google - something I wouldn't have said just a year ago. Don't even get me started on websites with javascript-redirect gateway pages, or the ones that scrape search-engine/newsgroup/eBay pages for text in order to boost hit counts, and then link back to similar pages in order to get higher link relevancy, OR the ones that take over abandoned domains in order to exploit the ranking generated by pre-existing links that point to the domain name...
Re:Search Engine Optimization Professional by a24061 · 2004-02-25 02:33 · Score: 5, Insightful

Besides, in which way does Flash exclude other operating systems?

It excludes blind users with screen readers and people who don't or can't install superfluous plug-ins. Flash is great for entertainment but it should never be required for getting information.
Re:Search Engine Optimization Professional by Anonymous Coward · 2004-02-25 02:54 · Score: 1, Insightful

That's a very close relative to the halting problem if you're not going to render the site to interpret the output, and a real CPU-eater if you're going to render the site.
The search engines just need moderation by pj2541 · 2004-02-25 02:55 · Score: 4, Insightful

But the only choices should be "Interesting" and "Troll." If each vote added or subtracted a very small amount from the page rank, and steps were taken to prevent stuffing the ballot box, I think this would actually improve the search results for the users.
what do the terms mean??? by Anonymous Coward · 2004-02-25 03:23 · Score: 1, Insightful

Nowhere does the article explain what repeats and density mean and how they are calculated. What kind of report is this??

The first line of the google table shows 1001 5.1 2.4% (W, R, D). An obvious guess would be that the density is the percentage of the repeats in the words (d=r/w) but that is not what it is, so what is the density calculated from?
That's right, mod me down. by Ayanami+Rei · 2004-02-25 03:28 · Score: 2, Insightful

But you know I'm right. If you have similar sentiments, mod me back up. Let those fucknuts know how you feel about "Search Engine Optimization", i.e., "you'll never find an objective review about a commercial product EVER AGAIN with a search engine... HAHAHAHhAHAHHAAA"

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Re:SEO by 0x0d0a · 2004-02-25 03:57 · Score: 1, Insightful

Marketers attempt to help a company get their message across to those who would be interested in their product.

Yes. However, marketers also have little interest in the case of targetting people who are *not* interested in said products.

In the case of the Web, frequently marketer performance is measured by hits. Thus, marketers have incentive to mis-direct users to their site even if those users are not interested in a product.

Marketers are not in the business of simply informing people about a product. You absolutely do not need a whole department to manage that -- it's quite easy. Most marketing is designed to exploit human irrationality and quirks in the thought process and preferences to make humans make irrational purchasing decisions.

I don't hate SEOs any more than I hate politicians. Both are placed in a system that we designed that is exploitable. We screwed up, and the burden of fixing the system to not be exploitable lies on us -- come up with a better search engine, suggest some way to prevent people from mucking with results.

However, SEOs cause me about as much unhappiness and discomfort as email spammers do, so I'd hardly shed a tear if they all went out of business, say.

--
May we never see th
Re:Warning: You are being watched! by cubic6 · 2004-02-25 04:38 · Score: 2, Insightful

I think what original poster was referring to is that the clicks are being collected not by the article or it's authors, but by whoever submitted the link. If you went via the actual address of the article, only the article's server (gorank.com) would get that referrer. Due to the addition in the link, all visitors from slashdot get redirected through a page on searchguild.com, which may be collecting data.

--
Karma: Contrapositive
Re:Search Engine Optimization Professional by tanguyr · 2004-02-25 05:45 · Score: 2, Insightful

Ah c'mon you're all being just a tad harsh. Sure, gratituous flash usage sucks but flash can do some pretty nifty things: you can build simple web based video conferencing or shared whiteboard apps in flash quickly and easily, and there are some nice games out there as well. Don't think of it as the heavier and even more annoying replacement for animated gifs, think of it as an alternative gui technology - "not-so-thin client".

As pcs get faster and faster and more and more people get broadband, you will see more and more flash being used - but maybe not used well. One of the problems today is that web designers still think in html, so they do something that looks like moving html at 10 - 20 times the weight and 1/100th the reaction time. /t

--
#!/usr/bin/english
Re:Search Engine Optimization Professional by dcam · 2004-02-25 09:52 · Score: 2, Insightful

Add Javascript into the mix and this gets pretty difficult. I write some pages that only display stuff as the result of an onload event for the page. For example it is often more efficient to load up a javascript array with values and use that to populate a drop down list, as it can avoid roud trips to the server to regenerate the ddlist.

There are also pages that only display stuff when you click on something. The MSDN pages commonly do this, often for things like code samples. They display a basic page with the option to show more information under relevant sections.

I've also written a page that is generated almost entirely by javascript, basically because you can regenerate the page in response to certain actions on the client without having to make a round trip to the server each time.

How intelligent are you going to make this parser again?

--
meh
Re:Warning: You are being watched! by Anonymous Coward · 2004-02-25 11:23 · Score: 1, Insightful

Another side effect is that gorank does NOT see the Slashdot referrer, but lots of searchguild referrers.
Re:W3 compliance? by mbauser2 · 2004-02-25 16:58 · Score: 2, Insightful

Google doesn't give a rat's ass if a page complies to W3C standards. That would be a stupid way to run a search engine, because that would let junk sites boost their rank for superficial reasons while punishing relevant sites that have minor mistakes. Google is about content, first and foremost, and following standards doesn't improve content.

When it comes to web design issues, Google does not punish naive mistakes. If somebody's HTML is so weird that it must be an attempt at manipulation (like making an entire paragraph out of H1 elements), it might get penalized. Other stuff, Google doesn't care about, because any strategy that penalizes most of the web is counterproductive to their goals.

That said, Googlebot is computer program, so it probably does a better job of parsing pages that are well-formed (in the XML sense), and otherwise "easy to parse". Following standards is a good way to achieve "easy parsibility", so Google occasionally gives the "check your HTML" advice to people becuase it's easier than writing everything I ust wrote.

--
Proud to be / Smiley-free / Since Nineteen / Ninety-Three