PageRank-Type Algorithm From the 1940s Discovered

← Back to Stories (view on slashdot.org)

PageRank-Type Algorithm From the 1940s Discovered

Posted by samzenpus on Wednesday February 17, 2010 @11:55AM from the oldest-google-juice dept.

KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."

2 of 108 comments (clear)

Min score:

Reason:

Sort:

Re:Patent? by Meshach · 2010-02-17 12:08 · Score: 5, Informative

So it could be used as previous art to invalidate Google's patent?
From my read of the linked article it seems that Sergey and Larry cited the previous art in their publications. So it looks like there was no plagiarism, just building a new idea using the tools provided by an earlier idea.

--
"Maybe this world is another planet's hell"
Aldous Huxley
Re:linearity by martin-boundary · 2010-02-17 12:44 · Score: 4, Informative

What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me,

The algorithm is not at all linear in the effect of inbound links. Two inbound links don't have the same effect, instead their effects are first weighted by the PageRank of each node of origin.
Now the distribution of PageRank among nodes is approximately power-law distributed on the web. Intuitively, this means that among all inbound links of a particular node, when that number is high, then 99% have practically no effect on the rank of that node, exactly as you probably thought in the first place. More precisely, you can expect a pareto (or similar) distribution for the influence of incoming nodes, which is not unexpected since these sorts of distributions occur a lot in social sciences.
That said, the PageRank algo is actually linear, but only in the sense of being a linear operator on weight distributions. If you normalize the weights after each iteration, the algo is actually affine (on normalized distributions) rather than linear.