Slashdot Mirror


Compute Google's PageRank 5 Times Faster

Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""

7 of 140 comments (clear)

  1. Ok... by Anonymous Coward · · Score: 3, Interesting

    Who owns the software patent for this for the next 20 years?

  2. CmdrTaco, ScienceBlog editor? by jbellis · · Score: 5, Interesting
    A 5 times speedup is still many orders of magnitude too slow to personalize terabytes of data for millions of customers. That's just ludicrous. But somehow Science Blog puts "...may make it realistic to calculate page rankings personalized for an individual's interests" in their abstract when the actual article from National Science Foundation says nothing of the sort:
    Computing PageRank, the ranking algorithm behind the Google search engine, for a billion Web pages can take several days. Google currently ranks and searches 3 billion Web pages. Each personalized or topic-sensitive ranking would require a separate multi-day computation, but the payoff would be less time spent wading through irrelevant search results. For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.
    ...
    The complexities of a personalized ranking would require [far] greater speed-ups to the PageRank calculations. In addition, while a faster algorithm shortens computation time, the issue of storage remains. Because the results from a single PageRank computation on a few billion Web pages require several gigabytes of storage, saving a personalized PageRank for many individuals would rapidly consume vast amounts of storage. Saving a limited number of topic-specific PageRank calculations would be more practical.
    Clearly the ScienceBlog and /. editors share more than a work ethic, or, uh, lack thereof. Next up: CmdrTaco's secret double life revealed!
  3. Assumption: by moogla · · Score: 4, Interesting

    That google hasn't already implemented something akin to quadratic extrapolation, or some orthogonal optimization technique. Google has come a long way since the published page rank papers 4 years back.

    What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?

    What if they already figured that out?

    In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).

    Or did they have access to the current page-rank algorithm?

    --
    Black holes are where the Matrix raised SIGFPE
  4. Clarification. (reply to self) by moogla · · Score: 4, Interesting

    According to the document, they reference the original 1998 paper on PageRank. I see a number of other references about improvements to the algorithm, but nothing specific to Google's own implementation. The paper mentions how the improvements help, but not if Google uses them.

    Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.

    That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^

    --
    Black holes are where the Matrix raised SIGFPE
  5. Re:Does speed matter? by DavidMonks · · Score: 3, Interesting

    Um. This is about speed of *calculation* of PageRank, not speed of delivering the calculated result to you.

    The articles and earlier postings explain this a little more fully. Anyone who can't take the time to read them really needs to learn some patience :)

    PageRanks are periodically calculated for the Web as a whole. The results are stored and served to users. (The periodic update is sometimes referred to as the GoogleDance.) PRs are not calculated on the fly.

    Hence, a speed increase could reduce Google's required hardware investment and/or allow them to update more often (and hence pick up more topical items) and/or allow them to calculate a spectrum of regionally or topically themed PRs

  6. I'm Not Sure I Like The Part About... by Mister+Transistor · · Score: 3, Interesting

    The bit about customized rankings based on user profiling of some type.

    Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.

    If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).

    I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?

    --
    -- You are in a maze of little, twisty passages, all different... --
  7. Cool but unimportant by Anonymous Coward · · Score: 3, Interesting

    Well, according to Moore's law (or rather observation), PageRank would become 5 times faster in a couple of years anyway.