The Man Behind Google's Ranking Algorithm

← Back to Stories (view on slashdot.org)

The Man Behind Google's Ranking Algorithm

Posted by CmdrTaco on Sunday June 3, 2007 @02:45AM from the dear-god-no-more-seo-spam-please dept.

nbauman writes "New York Times interview with Amit Singhal, who is in charge of Google's ranking algorithm. They use 200 "signals" and "classifiers," of which PageRank is only one. "Freshness" defines how many recently changed pages appear in a result. They assumed old pages were better, but when they first introduced Google Finance, the algorithm couldn't find it because it was too new. Some topics are "hot". "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds," said Singhal. Classifiers infer information about the type of search, whether it is a product to buy, a place, company or person. One classifier identifies people who aren't famous. Another identifies brand names. A final check encourages "diversity" in the results, for example, a manufacturer's page, a blog review, and a comparison shopping site."

13 of 115 comments (clear)

Min score:

Reason:

Sort:

Hrm, and all this time I though it was... by Anonymous Coward · 2007-06-03 02:53 · Score: 4, Funny

Pigeon Rank?
1. Re:Hrm, and all this time I though it was... by UltraAyla · 2007-06-03 04:06 · Score: 4, Informative
  
  parent is not offtopic - http://www.google.com/technology/pigeonrank.html
Amit Singhal ... by WrongSizeGlass · 2007-06-03 02:55 · Score: 5, Informative

... is not to be confused with Amit Singh, who also works at Google and has authored an excellent book on Mac OS X Mac OS X Internals.
Re:apple vs Apple by niheuvel · 2007-06-03 02:57 · Score: 5, Informative

No, but I DO see the difference between 'appleS' and 'apple', just as the text you're quoting mentions.
...only one? by dwater · 2007-06-03 02:58 · Score: 4, Funny

> They use 200 "signals" and "classifiers," of which PageRank is only one.

How many did they expect PageRank to be? In the words of someone immortal, "There can be only one.".

--
Max.
Feature Request by rueger · 2007-06-03 03:13 · Score: 4, Insightful

My ongoing gripe with Google is the number of times when the first page is filled with shopping sites, "review" pages, and click through pages that exist only to grab you onto the way to where you really want to go.

I would love a switch, or even a subscription, that would allow me to filter these usually useless types of pages and instead show me pages with real content.

--
Three Squirrels
1. Re:Feature Request by SilentStrike · 2007-06-03 06:42 · Score: 4, Informative
  
  This probably does what you want.
  
  http://www.givemebackmygoogle.com/
  
  It just negates a whole lot of affliate sites.
  
  This is part of the query it feeds to Google.
  
  -inurl:(kelkoo|bizrate|pixmania|dealtime|pricerunn er|dooyoo|pricegrabber|pricewatch|resellerratings| ebay|shopbot|comparestoreprices|ciao|unbeatable|sh opping|epinions|nextag|buy|bestwebbuys)
2. Re:Feature Request by quiddity · 2007-06-03 06:51 · Score: 4, Informative
  
  Firefox extension: http://www.customizegoogle.com/ lets you filter out URLs from the results (plus dozens of other useful things).
  
  You can filter out Wikipedia mirrors (using that extension) with the list here: http://meta.wikimedia.org/wiki/Mirror_filter
  
  --
  .
  . hmmm
Now I understand by Timesprout · 2007-06-03 03:15 · Score: 5, Funny

Search over the last few years has moved from Give me what I typed to Give me what I want, says Mr. Singhal
So this is why all my results are links to lesbian porn regardless of what I search for.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Re:Google sucks. by WrongSizeGlass · 2007-06-03 03:17 · Score: 4, Funny

Google Search is a primitive tool used by fanboys "Googling" for pictures of Natalie Portman. Ha! Shows what you know. The only pics I search for are of a tall drink of Texas water named Patricia Vonne and of Cowboy Neal in his homemade Hulk costume. Who knew the Hulk wore a tri-corner hat & rainbow wrestling boots?
One search feature by Z00L00K · 2007-06-03 03:27 · Score: 5, Interesting

that has been lost was the "NEAR" keyword that AltaVista used earlier. I found it rather useful.
This could allow for a better search result when using for example "APPLE NEAR MACINTOSH" or "APPLE NEAR BEATLES"
Ho hum... Times changes and not always for the better...

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Google is human too by polarbeer · 2007-06-03 04:27 · Score: 5, Insightful

One interesting thing about the article was the down-to-earth lack of abstraction in the problems described, such as the teak patio palo alto problem. Other search engines brag about their web-filtered-by-humans approach, as opposed to the "cold" algorithmic approach of Google. But it turns out Google is pretty human too, only with higher ambitions of creating generalizations from the human observations.
How does it work by Anonymous Coward · 2007-06-03 08:11 · Score: 5, Informative

It is rather simple (I am an insider).

Google breaks pages in words. Then, for evey word it keeps a set which contains all the pages (by hash ID) that contain that word. A set is a data structure with O(1) lookup.

When you search for "linux+kernel" google just does the set union operation on the two sets.

Now a "word" is not just a word. In google sees that many people use the combination linux+kernel, a new word is created, the linux+kernel word and it has a set of all the pages that contain it. So when you search for linux+kernel+ppp we find the union of the linux+kernel set and the "ppp" set.

So every time you search, you make it better for google to create new words. And this is part of the power of this search engine. A new search engine will need some time to gather that empirical data.

Of course, there are ranks of sets. For example, for the word "ppp" there are, say, two sets. The pages of high rank that contain the word ppp, and the pages of low rank. When you search for ppp+chap, first you get the set union of the high rank sets of the two words, etc.

Now page rank has several criteria. Here are some:
well ranked site/domain, linked by well ranked page, document contains relevant words, search term is in the title or url, page rank not lowered by google emploee (level 1), page rank increased, etc.

It is not very difficult actually.

(posting AC for a reason).