The Man Behind Google's Ranking Algorithm
nbauman writes "New York Times interview with Amit Singhal, who is in charge of Google's ranking algorithm. They use 200 "signals" and "classifiers," of which PageRank is only one. "Freshness" defines how many recently changed pages appear in a result. They assumed old pages were better, but when they first introduced Google Finance, the algorithm couldn't find it because it was too new. Some topics are "hot". "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds," said Singhal. Classifiers infer information about the type of search, whether it is a product to buy, a place, company or person. One classifier identifies people who aren't famous. Another identifies brand names. A final check encourages "diversity" in the results, for example, a manufacturer's page, a blog review, and a comparison shopping site."
Pagerank is the source of all wisdom in google... but there is so much more... Like string searching & matching algos, file searching.. you name it.. Just the other day I was searching for books about Google's algorithms... I found zero interesting stuff.. They keep their algorithms secret and out of the public domain... (like they should..). we praise Pagerank, but if we knew what other stuff is there, we would all be members of Church of Google (http://www.thechurchofgoogle.org/) :P
God had a 7 day deadline... So he made the world in LISP
One of the most annoying things about google for me is how it interprets queries with strange characters common to almost all programming languages. A google search for "ruby <<" returns no results related to the ruby append operator. A Simple search for "<<", by itself returns ZERO results.
This could allow for a better search result when using for example "APPLE NEAR MACINTOSH" or "APPLE NEAR BEATLES"
Ho hum... Times changes and not always for the better...
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Not sure about this:
"Google rarely allows outsiders to visit the unit, and it has been cautious about allowing Mr. Singhal to speak with the news media about the magical, mathematical brew inside the millions of black boxes that power its search engine."
I could see tens of thousands, maybe hundreds of thousands, but millions?
From the results I've been getting lately, they seem to dropping page rank in preference to how many times the words 'google adwords' appears om the page, or more precisely the code for generating them. Totally worthless pages but obviously not worthless for google's bottom line. This story obviously reflects one thing and one thing only, the growing perception in the public's eye of the deteriorating quality of google's results, hence yet another marketing fluff piece, to try to convince them, it just ain't so.
Chaos - everything, everywhere, everywhen
And the thing that I want to know is how they evaluate the results. I actually do research in this space right now, and by far the most painful thing is evaluation of results. We have a system that automates most of the work, but there's still a lot of human involvement, and this limits the input dataset size and speed with which we can iterate the improvements.