Slashdot Mirror


Compute Google's PageRank 5 Times Faster

Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""

13 of 140 comments (clear)

  1. Ok... by Anonymous Coward · · Score: 3, Interesting

    Who owns the software patent for this for the next 20 years?

  2. Lets see... by DanThe1Man · · Score: 2, Interesting

    What is 1/14th of a second divided by five?

  3. CmdrTaco, ScienceBlog editor? by jbellis · · Score: 5, Interesting
    A 5 times speedup is still many orders of magnitude too slow to personalize terabytes of data for millions of customers. That's just ludicrous. But somehow Science Blog puts "...may make it realistic to calculate page rankings personalized for an individual's interests" in their abstract when the actual article from National Science Foundation says nothing of the sort:
    Computing PageRank, the ranking algorithm behind the Google search engine, for a billion Web pages can take several days. Google currently ranks and searches 3 billion Web pages. Each personalized or topic-sensitive ranking would require a separate multi-day computation, but the payoff would be less time spent wading through irrelevant search results. For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.
    ...
    The complexities of a personalized ranking would require [far] greater speed-ups to the PageRank calculations. In addition, while a faster algorithm shortens computation time, the issue of storage remains. Because the results from a single PageRank computation on a few billion Web pages require several gigabytes of storage, saving a personalized PageRank for many individuals would rapidly consume vast amounts of storage. Saving a limited number of topic-specific PageRank calculations would be more practical.
    Clearly the ScienceBlog and /. editors share more than a work ethic, or, uh, lack thereof. Next up: CmdrTaco's secret double life revealed!
  4. Assumption: by moogla · · Score: 4, Interesting

    That google hasn't already implemented something akin to quadratic extrapolation, or some orthogonal optimization technique. Google has come a long way since the published page rank papers 4 years back.

    What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?

    What if they already figured that out?

    In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).

    Or did they have access to the current page-rank algorithm?

    --
    Black holes are where the Matrix raised SIGFPE
  5. Clarification. (reply to self) by moogla · · Score: 4, Interesting

    According to the document, they reference the original 1998 paper on PageRank. I see a number of other references about improvements to the algorithm, but nothing specific to Google's own implementation. The paper mentions how the improvements help, but not if Google uses them.

    Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.

    That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^

    --
    Black holes are where the Matrix raised SIGFPE
  6. Why are public funds going to... by dsanfte · · Score: 1, Interesting

    Why are a public university's funds and time being used to benefit a private company? Last I checked, Google isn't a charity. Doesn't Google have its own programmers? Wouldn't these "CS Researchers'" time be better spent furthering science instead of being free labor for corporations, at the expense of taxpayers?

    --
    occultae nullus est respectus musicae - originally a Greek proverb
  7. Re:Does speed matter? by DavidMonks · · Score: 3, Interesting

    Um. This is about speed of *calculation* of PageRank, not speed of delivering the calculated result to you.

    The articles and earlier postings explain this a little more fully. Anyone who can't take the time to read them really needs to learn some patience :)

    PageRanks are periodically calculated for the Web as a whole. The results are stored and served to users. (The periodic update is sometimes referred to as the GoogleDance.) PRs are not calculated on the fly.

    Hence, a speed increase could reduce Google's required hardware investment and/or allow them to update more often (and hence pick up more topical items) and/or allow them to calculate a spectrum of regionally or topically themed PRs

  8. I'm Not Sure I Like The Part About... by Mister+Transistor · · Score: 3, Interesting

    The bit about customized rankings based on user profiling of some type.

    Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.

    If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).

    I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?

    --
    -- You are in a maze of little, twisty passages, all different... --
    1. Re:I'm Not Sure I Like The Part About... by NerveGas · · Score: 2, Interesting

      GOOGLE can complain. By making it five times faster, they can spend:

      -five times less on servers
      -five times less on power for the servers
      -five times less on data center real estate
      -five times less on cooling the data center
      -five times less on replacing dead hardware
      -much less on paying people to maintain the machines

      The list doesn't stop there, either. The costs involved with running a high-traffic web site are very significant.

      steve

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    2. Re:I'm Not Sure I Like The Part About... by forged · · Score: 2, Interesting

      Actually Google doesn't replace dead hardware in their datacenters. It just stays there... (couldn't believe it myself when I read that).

  9. Cool but unimportant by Anonymous Coward · · Score: 3, Interesting

    Well, according to Moore's law (or rather observation), PageRank would become 5 times faster in a couple of years anyway.

  10. Public Funding? by grimani · · Score: 2, Interesting

    The research was done partially with public funding from an NSF grant, yet the commercial applications are obvious and immediate.

    So my question is, who sees the benefit of the research? The researchers? Can Google just jack the results and incorporate into their system?

    It seems to me that the current system of allocation research dollars with public and private grants is very messy and needs overhaul.

  11. More important than speed and quantity... by ktorn · · Score: 2, Interesting

    ... is quality.

    I'm surprised how Google is choosing not to implement search features that would greatly enhance advanced queries.
    How often I'd wish they allowed wildcards in their queries (where engl* would pull hits with england, english, etc).
    Field searches still require you to add keywords, so I cannot just query "site:somesite.com" to get all the currently indexed pages from somesite.com
    In this respect Altavista still produces better results, with an excelent range of fields to choose from.
    If there is anything that Google is lacking, it's defenitely that.
    Having said that, still my number one SE.