Slashdot Mirror


PageRank-Type Algorithm From the 1940s Discovered

KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."

10 of 108 comments (clear)

  1. Good advice for all developers by ls671 · · Score: 4, Insightful

    Well, this is actually pretty good advice for any developer; Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs. Of course, as a last resort, one can design something new once he has done his homework and made sure nothing that has been done before may be re-used.

    Through my life, I have seen a amazing high level of work that has been done in vain because it yielded poor results and that something doing the same better already existed anyway.

    Don't get me wrong here, once you have made sure that nothing already existing suits your needs or can be reused, it is fine to innovate and create real new stuff. Just don't get caught trying to reinvent the wheel unless you reinvent it better ;-)

    Also, an exception to that principle could be allowed for trivial tasks that are really quick to implement and where searching for an existing solution might cost more than implementing it yourself but be really careful applying that exception rule, it is an open door that leads to trying to reinvent the wheel sometimes ;-))

    --
    Everything I write is lies, read between the lines.
    1. Re:Good advice for all developers by cyphercell · · Score: 5, Funny

      It would have been a pretty exhaustive search without google.

      --
      Under the influence of Post-Cyberpunk Gonzo Journalism
    2. Re:Good advice for all developers by pipatron · · Score: 4, Funny

      I can see that you must be young enough never to have used the search engines you list, if you suggest that you would have been able to find anything useful.

      --
      c++; /* this makes c bigger but returns the old value */
  2. Re:Patent? by Meshach · · Score: 5, Informative

    So it could be used as previous art to invalidate Google's patent?

    From my read of the linked article it seems that Sergey and Larry cited the previous art in their publications. So it looks like there was no plagiarism, just building a new idea using the tools provided by an earlier idea.

    --
    "Maybe this world is another planet's hell"
    Aldous Huxley
  3. Or there's the number ... by PPH · · Score: 5, Funny

    ... of times one Pharaoh's cartouche was chiseled into the obelisks beloning to other Pharaohs.

    --
    Have gnu, will travel.
  4. Re:linearity by Ibiwan · · Score: 5, Insightful

    From a naive, off-the-cuff armchair analysis, it seems to me that PageRank only serves as a way to provide ordering of search results. Funny thing... sorting on positive values will always yield the same ordering as a sort on those values' logs.

    --
    -- //no comment
  5. Re:linearity by martin-boundary · · Score: 4, Informative

    What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me,

    The algorithm is not at all linear in the effect of inbound links. Two inbound links don't have the same effect, instead their effects are first weighted by the PageRank of each node of origin.

    Now the distribution of PageRank among nodes is approximately power-law distributed on the web. Intuitively, this means that among all inbound links of a particular node, when that number is high, then 99% have practically no effect on the rank of that node, exactly as you probably thought in the first place. More precisely, you can expect a pareto (or similar) distribution for the influence of incoming nodes, which is not unexpected since these sorts of distributions occur a lot in social sciences.

    That said, the PageRank algo is actually linear, but only in the sense of being a linear operator on weight distributions. If you normalize the weights after each iteration, the algo is actually affine (on normalized distributions) rather than linear.

  6. additional ranking algorithm in the 1940s paper by commodoresloat · · Score: 4, Funny

    allowed pages to be ranked and categorized according to whether it was "insightful," "interesting," "informative," "funny," "flamebait," or "troll."

  7. Re:Patent? by mirix · · Score: 4, Funny

    I'd better get a rush on a patent for "using pagerank on the internet" then. Take that google.

    --
    Sent from my PDP-11
  8. Re:linearity by Jerry+Talton · · Score: 4, Insightful

    *sigh*

    You understand neither how the parent post is using the word "linear" nor the PageRank algorithm itself. You can rewrite the eigenproblem at the heart of PageRank as the solution to a linear system, but very few people do. Moreover, this is not the correct intuition to employ to understand what's going on: there are no "massive matrix inversions" here, just a simple iterative algorithm for extracting the dominant eigenvector of a matrix.

    Furthermore, you've got it exactly backwards regarding the "connection" between PageRank and light transfer. Since the Markov process used to model a web surfer in the PageRank paper is operating on a discrete domain with an easily-calculable transition function, the stationary distribution (or ranking) can be determined exactly. In rendering, you have a continuous problem for which Markov chain Monte Carlo techniques provide one of the most efficient ways to approximate the solution...but you have to actually simulate a Markov chain to get it (see, for instance, Veach's seminal paper on Metropolis Light Transport). Computing PageRank is an "easy" problem, by comparison.