Compute Google's PageRank 5 Times Faster
Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""
RTA. PageRankings are computed in advance and take several days. A 5x increase in speed means specialized rankings could be computed.
But, didn't Google originate out of Stanford? Isn't it reasonable to think that the two are still pretty friendly?
(Don't you hate it when people speak in questions? Don't you? Huh?)
Withdrawal before climax is very ineffective and those who try this are usually called "parents."
Printer friendly version here
... and furthermore
Google Search doesn't show hits exactly in the order of page rank. Relevance and other factors also affect order. My biggest page (the one that is my Slashdot URL) is PR7, but there are words on the page for which a lower-rank page beats me, because they're more relevant for that word. Relevance includes how many times the word appears on the page, the HTML context in which it is used, whether pages that link link using the search terms, and the order and nearness of the words in a multi-word search without quotes.
The shareholder is always right.
But, didn't Google originate out of Stanford?
Yep. Originally called Backrub, curiously.
Slashdot looked deep within my soul and assigned
me a number based on the order in which I joined
I studied under the SCCM program at Stanford, and started the same year as Sepandar Kamvar. I remember him as a great guy, very smart, and an EXCEPTIONALLY good speaker and tutor (I was always pestering him for explanations of the week's lectures).
I'm glad to hear his research is getting attention, and I hope others who are interested in the theoretical aspects of data mining and web search engines will take a look at the SCCM and statistics programs at Stanford (shameless plug - other can post pointers to similar programs).
"It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
I can't remember the last time I paid Unisys for using a GIF...
When was the last time you bought a copy of GraphicConverter, Fireworks, Photoshop, Paint Shop Pro, or any other program licensed under U.S. Patent 4,558,302 and foreign counterparts? The price of each of those programs includes a royalty paid to Unisys.
Will I retire or break 10K?
Didn't read the article did we? The page rank process is sped up 5x. All the pages are ranked ahead of time in a multi-day process so when you do your search you are searching against those pre-calculated ranks. What this technology will do is allow Google to rank their pages every day (instead of once every couple of days) or create more special interest sites ala groups, images, news, etc. with the extra processing power.
Sounds a lot like Kleinberg's HITS algorithm, circa 1997. Try Teoma for a real-world implementation.
Coincidence time: I used the same example in a presentation a couple of years ago to illustrate how subgroupings can be found for a single search term. Try it on Teoma, and see the various subtopics under "Refine". IIRC each of those is a principal eigenvector of the link matrix.Topologically speaking, each principal eigenvector corresponds to a more or less isolated subgraph, eg the subgraph for "San Francisco Giants" is not much connected to the nest of links for "They Might Be Giants", and we get a nice list of subtopics.
(I once tried to explain this algorithm to my bosses at my former employer, which is why I have so much free time to type this right now.)
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger