Slashdot Mirror


The Math Behind PageRank

anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."

3 of 131 comments (clear)

  1. Nouns maybe? by Bryansix · · Score: 3, Insightful

    It seems like it would be the nouns, pronouns, etc. that Google should be paying attention to. Who cares about all the verbs, adjectives, etc. that just muddy the indexing waters?

  2. Re:Does PageRank count? by Trieuvan · · Score: 4, Insightful

    The pagerank that's reported from toolbar is really old. Google never want to let you know the real number or it will be easy to spam ...

  3. Re:Bad summary by martin-boundary · · Score: 4, Insightful
    It's nowhere near like that. A web matrix is very sparse, so if you did a true 25Bx25B matrix power iteration, you'd be multiplying zero by zero a gazillion times. Optimization is about not doing things you don't need to do, and optimizing PageRank is about figuring out clever ways to not do the full multiplication. Moreover, PageRank is calculated in parallel over a computer farm. Overall, you can expect a single iteration to take on the order of an hour, and you can expect around 50-80 iterations before Google gives up and says it's converged. You can also try and reuse the previous "converged" PageRank vector to cut down on the 50-80 iterations after you've crawled new pages.

    If google used a single computer to do all the work, and truly did 80*25B^2 operations, they'd be morons.