PageRank-Type Algorithm From the 1940s Discovered
KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."
Well, this is actually pretty good advice for any developer; Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs. Of course, as a last resort, one can design something new once he has done his homework and made sure nothing that has been done before may be re-used.
Through my life, I have seen a amazing high level of work that has been done in vain because it yielded poor results and that something doing the same better already existed anyway.
Don't get me wrong here, once you have made sure that nothing already existing suits your needs or can be reused, it is fine to innovate and create real new stuff. Just don't get caught trying to reinvent the wheel unless you reinvent it better ;-)
Also, an exception to that principle could be allowed for trivial tasks that are really quick to implement and where searching for an existing solution might cost more than implementing it yourself but be really careful applying that exception rule, it is an open door that leads to trying to reinvent the wheel sometimes ;-))
Everything I write is lies, read between the lines.
CHECKLIST:
Is Google a megacorp? Check.
Did Google threaten to move to India if Obama raises corporate taxes? Check.
Does Google spy on users and collect data? Check.
Did Google receive monetary assistance from the Taxpayer's Public Treasury? Not sure.
Okay. I'll let them slide and not try to invalidate their pagerank patent, but that likely won't stop Microsoft from making the attempt.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
In many different types of jobs, people use the counts the number of times their research papers have been referenced and quoted with different "points" depending on where it was quoted. Fx. someone working with medicine hos has his work referenced in The Lancet, counts more than a reference in local-hillbilly-news.
I belive there are sources collects this information. Sorry for being so vague, but I can't remember the specifics. (but hey, isn't that just what we do i Slashdot comments)
From a naive, off-the-cuff armchair analysis, it seems to me that PageRank only serves as a way to provide ordering of search results. Funny thing... sorting on positive values will always yield the same ordering as a sort on those values' logs.
--
Maybe this is just the elitist snob in me, but I don't feel that the latest American Idol singer is really a thousand times better than Billie Holliday, just because a thousand times more people listen to him than to her.
You are measuring the wrong thing. Google isn't measuring who is 'better,' it is trying to measure what page is more interesting to a web surfer, and pages tend to be more popular because they are more interesting to more people. Thus if you do a search for Brittany, you are more likely to find Brittany Spear's fan club than you are an academic study of why her beautiful innocence was so popular (and oh yes was it beautiful!). People who are looking for more specific academic things learn to add extra search terms to their query, like "Brittany analysis" or "Why is Brittany popular?"
The way to solve this problem better is for Google to get to know you and your preferences: if Google knows that you are mainly interested in academic sorts of things, then it can automatically return that sort of thing when you do a search for Brittany. This is convenient, but drives certain people crazy because of privacy issues.
Qxe4
Patent?
I think the way the Google maintains its search superiority has more to do with massive banks of machines and keeping their algorithms secret rather than sending lawyers after anyone. The PageRank algorithm is little more than a useful application of Markov Chains... hardly seems patentable. (Of course, RSA doesn't seem like it should have been patentable either...)
*sigh*
You understand neither how the parent post is using the word "linear" nor the PageRank algorithm itself. You can rewrite the eigenproblem at the heart of PageRank as the solution to a linear system, but very few people do. Moreover, this is not the correct intuition to employ to understand what's going on: there are no "massive matrix inversions" here, just a simple iterative algorithm for extracting the dominant eigenvector of a matrix.
Furthermore, you've got it exactly backwards regarding the "connection" between PageRank and light transfer. Since the Markov process used to model a web surfer in the PageRank paper is operating on a discrete domain with an easily-calculable transition function, the stationary distribution (or ranking) can be determined exactly. In rendering, you have a continuous problem for which Markov chain Monte Carlo techniques provide one of the most efficient ways to approximate the solution...but you have to actually simulate a Markov chain to get it (see, for instance, Veach's seminal paper on Metropolis Light Transport). Computing PageRank is an "easy" problem, by comparison.