PageRank-Type Algorithm From the 1940s Discovered
KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."
Well, this is actually pretty good advice for any developer; Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs. Of course, as a last resort, one can design something new once he has done his homework and made sure nothing that has been done before may be re-used.
Through my life, I have seen a amazing high level of work that has been done in vain because it yielded poor results and that something doing the same better already existed anyway.
Don't get me wrong here, once you have made sure that nothing already existing suits your needs or can be reused, it is fine to innovate and create real new stuff. Just don't get caught trying to reinvent the wheel unless you reinvent it better ;-)
Also, an exception to that principle could be allowed for trivial tasks that are really quick to implement and where searching for an existing solution might cost more than implementing it yourself but be really careful applying that exception rule, it is an open door that leads to trying to reinvent the wheel sometimes ;-))
Everything I write is lies, read between the lines.
The algorithm ran on pigeons that were the size of entire rooms, and with less processing power than today's pigeons which fit into the palm of your hand.
So it could be used as previous art to invalidate Google's patent?
Nil novi sub sole
What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me, but I don't feel that the latest American Idol singer is really a thousand times better than Billie Holliday, just because a thousand times more people listen to him than to her. If it was me, I'd have used some kind of logarithmic scaling. I think people do usually describe page ranks in terms of their logarithms, but that's taking the log on the final outcome. I'm talking about taking logs at each step before going on to the next iteration.
To me, this has an intuitive connection to the idea that the internet used to be more interesting and quirky, and it was more about individuals expressing themselves, whereas now it's more like another form of TV.
Of course that's not to say that I want to go back to the days before page rank. God, search engine results were just horrible in those days.
From an elitist snob point of view, one good thing about page rank is that it doesn't let you just vote in a passive way, as Nielsen ratings do for TV. In order to have a vote, you have to do something active, like making a web page that links to the page you want to vote for.
Find free books.
Who gives a rats ass ...
Have gnu, will travel.
In many different types of jobs, people use the counts the number of times their research papers have been referenced and quoted with different "points" depending on where it was quoted. Fx. someone working with medicine hos has his work referenced in The Lancet, counts more than a reference in local-hillbilly-news.
I belive there are sources collects this information. Sorry for being so vague, but I can't remember the specifics. (but hey, isn't that just what we do i Slashdot comments)
Markov Chains were introduced in 1906 according to Wikipedia. That's the origin of PageRank. People have also been using these tools for ages to rank the impact of journals, etc.
allowed pages to be ranked and categorized according to whether it was "insightful," "interesting," "informative," "funny," "flamebait," or "troll."
is all page rank is.
Great minds think alike; fools seldom differ.
The most amazing computer book ever. It has Doug Englebart's first description of “augmenting the human intellect” using computers. It describes what we know now as windows (generic) with pointing devices. It has an early linear document retrieval system using page ranks based on word co-occurrences and it has an early language translation system (Russian to English with examples of translating Soviet missile papers). What a preview of things to come.
It is worth a read just to get into the heads of some of the computing pioneers.
Another required reading book for all aspiring CS students should be John Von Neumann’s the “Computer and the Brain.” Dated, but again this is what they were thinking.
We have a lot to be humble about given the hardware and compilers they had to work with. Not to mention primitive development environments, a.k.a. the card punch.
I guess previous work had the idea right, but actually building a system which can handle millions of links and reply in no time is not a small feature.
This reminds me of the discussion we had previously about the gap from research prototype transistors to having factories actually deliver them.
Back in the late 90s I created a relevance ranking system for my employer to rank the output of our legal research system. Similar to a hyperlink, legal documents have a unique references. Sometimes they're created by the publisher such as West (now Thomson) and their Federal Supplement and other times for unpublished documents it's a docket number from the court. Long story short, the documents were indexed and at run time using a combination of hit density and the number of times a document was referred to by other documents, we had a fairly accurate relevance engine. I even took it a step further and for the documents that referenced the found document, looked to see if the original search term was present in the linking document. If so, we assumed that it was linking to the found document for reasons related to the search rather than for some other reason, as court cases often are referenced for reasons outside of their main ruling.
Could some /. member who is an IP attorney comment on whether this might constitute prior art that could open up relevant Google patent(s) to an invalidation attack based on obviousness? Which, in my limited understanding, would go something like: "Well, a person skilled in the art as of the date of Google's patent application would have known of the Leontief work (published, knowledge therefore presumed) and it would have been obvious to implement the Leontif work on a computer.".
And for extra interest, could anyone with "standing" (which could be any of us who use Google) file a petition for re-examination of the patents with the USPTO?
The mathematics of PageRank go back a century. There have been many different applications since then, including to hypertext and the web. From Wikipedia:
As such, the algorithm wasn't new, but Google was the first to build a working, large-scale search engine around it.
Latin is dead
It's called being a Movie star. Your importance is ranked by how many people really like you. And it can be gamed just like the Google one.
Why bother
hey, i thought of that. just did'nt patent it. we now are own you regards, mike(hunkering down in the frozen north )
actually, could think of better ranking systems for movie stars: ranking factor for each role (lead=1000, other starring role=100, minor speaking role = 10, extra = 1) summed for all movies. maybe even scale by gross receipts for each movie GNP-deflator normalized to 1970 dollars.
creativity always builds on the past -- okay, redundant using subject line in body, my bad
K, so how is Brin and Page developing PageRank when an obscure economics paper published at Harvard in 1941 and only re-discovered in 2010 reinventing the wheel?
Who says it was first re-discovered in 2010, and not in the 90's, by two guys from Stanford?
There are a million possible ways Brin and Page could have come across that paper. A friend who was studying economics, maybe a magazine like the Economist mentioned it, etc.
Pretty funny claiming to be the first to re-discover something, though. What if you didn't 'discover' the other mentions in the, oh, SIXTY YEARS since the paper was published?
Also, I don't know which is worse. The idea of them stumbling across the paper and using the idea (and thus having no original idea except the application of the concept to ranking of web pages, which isn't nearly as impressive) or the idea that the PageRank concept is something of an obvious idea?
Please help metamoderate.
some would like to dev
some would like to search
just no one knows all in all
and why
patents blockage
humans blockage
so no improvement overall
The summaries were intriguing but lame. Here's the real thing (preprint):
http://arxiv.org/abs/1002.2858
Author's page is here:
http://users.dimi.uniud.it/~massimo.franceschet/
Interesting stuff.
[|]
The concept of rating something based on the weighted reputation of those entities that endorse it ("endorse" being used in the general sense) has been around, well, forever. People do it all the time when they decide who to trust. TFA should have been titled, "Brin and Page Rediscover Leontief-Type Algorithm from the 1940s" with the subtitle, "Journalists Finally Put Two and Two Together."
.. while you're in the process.
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
It's called Eigenvector centrality
A librarian described the new google thing to me back then as being like the science citation index only applied to the web in general instead of published papers.
Since everyone knows that Linux is superior to whaterver else, why would one heretical comment start a war against you? Aren't linux-lovers the most gentle and forgiving people on earth?
What I think it is? your nick: Loving a commodore 64 puts you in the same age group as Bill Gates, too old to post on /.
- Oh and worrying about the National debt: that means you don't live in your mom's basement anymore. Definitely not someone one would like to have on /.