Computing PageRank on your PC?

← Back to Stories (view on slashdot.org)

Computing PageRank on your PC?

Posted by CmdrTaco on Thursday June 12, 2003 @06:20AM from the for-the-curious-hacker dept.

An anonymous reader writes "A group of CS researchers of the University of Milan has found a way to compress web graphs at 3 bits per link, and to access them in compressed form. They provide data sets representing real snapshots of portions of the web with one hundred million nodes and 1 billion links. You just need some bandwidth to download a few hundred megabytes of data, and you can compute PageRank with your PC. All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!"

18 of 186 comments (clear)

Min score:

Reason:

Sort:

PageRank is part of Google's algo by Anonymous Coward · 2003-06-12 06:27 · Score: 5, Informative

It's basically how well linked to your page is, and how well linked to the pages linking to you are, and so on. It's an advanced form of link popularity. The idea is that the more people that link to something, the more influential/important it is. Some sites have high PageRanks of 10 (like Google), while Slashdot is something like an 8. Many pages are in the 4-6 range. Every link you create is like a "vote" for another web page.
1. Re:PageRank is part of Google's algo by e2d2 · 2003-06-12 07:06 · Score: 4, Informative
  
  Google's PageRank was actually named after Larry Page, the creator of their system for ranking pages. Pun was obviously intended.
2. Re:PageRank is part of Google's algo by Anonymous Coward · 2003-06-12 07:45 · Score: 2, Informative
  
  If your site is in the Google Directory (based on DMOZ), it may have a pagerank listed next to it.
3. Re:PageRank is part of Google's algo by InfoCynic · 2003-06-12 15:08 · Score: 2, Informative
  
  PageRank is a one-dimensional recursive weighting for a web page. Intially you assume all pages were created equal. Now for each page, compute an updated PageRank based on indegree (number of pages linking to the site). You usually also introduce a weighting factor which is designed to simulate some random chance that you "jump" to the next page by just typing a URL, not following a link. After that, you typically normalize the scores (sum of squares must equal one is the preferred norm).
  
  Now you have to iterate, but on subsequent iterations, you're no longer consider with purely indegree. You care about the PageRank of pages linking to you. Pages that are "popular" and have high PageRank will boost your score. Typically you iterate until the values converge to within a given threshold. If you know linear algebra, you can also cheat and use eigenvalues, but that's not the point.
  
  There are better algorithms, like Kleinberg's, which gives each page a "hub" and "authority" score, where just linking to a page isn't enough, and you can learn more about that in a Web Algorithms course.
  
  --
  "Recta non toleranda futuaris nisi irrisus ridebis"
Re:Dumb Question: by Chris_Stankowitz · 2003-06-12 06:31 · Score: 5, Informative

Do you mods ever stop to wonder if this guy could have been asking a legit question? Its possible he doesn't know. Also possible that others don't. I know...I know..., this is /. how could he not know right. It is still very possible. I'm not saying he should have been modded up, but by modding him down someone may miss the chance to read his post and reply to it with an intelligent answer. All of that being said. I would answer his question. But now that I think about, I'm not sure what it is. I 'think' I know. But, I think he and I are in the same boat. I also thought about posting this as an AC, but I won't. Then surley someone will just think that it was the original poster posting as an AC. He may be trolling. He may not be. It won't hurt to answer the question.
Doesn't actually calculate PageRank? by Vultan · 2003-06-12 06:31 · Score: 5, Informative

As best as I can tell from the website, the API is only for storing and interacting with a large graph. Nothing there is actually involved with PageRank. You could use this API presumably to write your own PageRank code, but to say "everybody can grok PageRank now!" is misleading at best.

Moreover, IANAL, but isn't the PageRank algorithm patented by Google? Wouldn't this prevent anyone from releasing GPL code that computes PageRank?
Re:Isn't that illegal? by Anonymous Coward · 2003-06-12 06:33 · Score: 2, Informative

Yes, and it's trademarked, too. Here's a bunch more info on PageRank.
By the way... by Sanity · 2003-06-12 06:36 · Score: 2, Informative

...it isn't on a fat pipe, so please understand if its slow.
Proof of concept only by Saganaga · 2003-06-12 06:38 · Score: 5, Informative

I think this project is really just a proof of concept. As another post pointed out, to make this really useful you'd need to regularly update your local data set, which isn't very practical for most people.

Also, if the downloadable dataset only covers a small portion of the web, how can this system's utility really compare to Google's?

That said, I think computer science proof-of-concept type project are very useful and serve a valuable purpose in getting the ideas out there for others to improve upon.
Explains it all... by Spokehedz · 2003-06-12 06:41 · Score: 2, Informative

It uses Slashdot as a root, of course. ;)

Seriously, I don't know. Here's a page on how Google works though.

http://www.google.com/technology/index.html
Re:What a mess by dpbsmith · 2003-06-12 06:57 · Score: 4, Informative

Just in case this wasn't an implied rhetorical question... the term, as far as I know, was invented by Robert Heinlein in his novel _Stranger in a Strange Land,_ where it is an expression used by Martians. It literally means "to drink," but the Martians use it to mean an understanding that is both very deep and very complete.

--
"How to Do Nothing," kids activities, back in print!
Ask and ye shall receive... by Theaetetus · 2003-06-12 07:13 · Score: 4, Informative

and I say "Dammit, where are all the pretty pictures."
Here (for free)
Here too (for free)
This one too (for free)
This one also (free)
And don't forget this classic ($30 poster)
-T
Re:What a mess by mbourgon · 2003-06-12 07:20 · Score: 2, Informative

Yes. Basically, "to share water with", which on Mars meant you were more than brothers. Considering how little water was/is on Mars, it was a great honor.

--
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Re:Dumb Question: by biomass · 2003-06-12 07:21 · Score: 2, Informative

Page rank, to a first order of approximation, ranks your page by "popularity". Using a voting system,it counts the number of links to your page.

To a second order of approximation, it weights the votes of the referencing links by their popularity.

To a third order of approximation, it is a Markov chain that measures the long term likelihood of you arriving at a page, if you to randomly traverse the net: taking random links out of a pages and occasionally take (1/20?) random jumps to arbitrary urls.
Re:Which sites are the Root(s)? by warkda+rrior · 2003-06-12 07:49 · Score: 5, Informative

It is a graph, not a tree, so there is no one root. Maybe you are looking for the seed site, i.e. the first site added to the webgraph they construct. You can choose any site you prefer, although something well-connected is better. It seems to me that Yahoo! would be a good starting point.

--
You need to install an RTFM interface.
Re:can anyone explain what a web graph is? by lordbrain · 2003-06-12 08:11 · Score: 5, Informative

In a graph is made up of two things, edges and vertices.

In a web graph, vertices are webpages and edges are hyperlinks.

PageRank determines how many incoming edges a vertex has. Given the nature of the web, this is a nontrivial problem because a vertex only knows its outgoing edges.

The assumption for PageRank is that the more incoming edges a vertex has, the more popular it is. So you would use this to figure out how popular a particular vertex is.

Given this you could do like Google and combine it with a search engine to prioritize the results.

--

Thank you. Thank you. Please no applause; just throw money
Re:The major thing missing from Mozilla by Anonymous Coward · 2003-06-12 08:30 · Score: 3, Informative

http://googlebar.mozdev.org
Re:The major thing missing from Mozilla by wherley · 2003-06-12 10:03 · Score: 3, Informative

Tried it...but it provides no pagerank. They say:
"We currently have no plans to implement pagerank"

Still - a cool addition to mozilla.