Computing PageRank on your PC?
An anonymous reader writes "A group of CS researchers of the University of Milan has found a way to compress web graphs at 3 bits per link, and to access them in compressed form. They provide data sets representing real snapshots of portions of the web with one hundred million nodes and 1 billion links. You just need some bandwidth to download a few hundred megabytes of data, and you can compute PageRank with your PC. All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!"
It's basically how well linked to your page is, and how well linked to the pages linking to you are, and so on. It's an advanced form of link popularity. The idea is that the more people that link to something, the more influential/important it is. Some sites have high PageRanks of 10 (like Google), while Slashdot is something like an 8. Many pages are in the 4-6 range. Every link you create is like a "vote" for another web page.
Do you mods ever stop to wonder if this guy could have been asking a legit question? Its possible he doesn't know. Also possible that others don't. I know...I know..., this is /. how could he not know right. It is still very possible. I'm not saying he should have been modded up, but by modding him down someone may miss the chance to read his post and reply to it with an intelligent answer. All of that being said. I would answer his question. But now that I think about, I'm not sure what it is. I 'think' I know. But, I think he and I are in the same boat.
I also thought about posting this as an AC, but I won't. Then surley someone will just think that it was the original poster posting as an AC. He may be trolling. He may not be. It won't hurt to answer the question.
As best as I can tell from the website, the API is only for storing and interacting with a large graph. Nothing there is actually involved with PageRank. You could use this API presumably to write your own PageRank code, but to say "everybody can grok PageRank now!" is misleading at best.
Moreover, IANAL, but isn't the PageRank algorithm patented by Google? Wouldn't this prevent anyone from releasing GPL code that computes PageRank?
Yes, and it's trademarked, too. Here's a bunch more info on PageRank.
...it isn't on a fat pipe, so please understand if its slow.
I think this project is really just a proof of concept. As another post pointed out, to make this really useful you'd need to regularly update your local data set, which isn't very practical for most people.
Also, if the downloadable dataset only covers a small portion of the web, how can this system's utility really compare to Google's?
That said, I think computer science proof-of-concept type project are very useful and serve a valuable purpose in getting the ideas out there for others to improve upon.
It uses Slashdot as a root, of course. ;)
Seriously, I don't know. Here's a page on how Google works though.
http://www.google.com/technology/index.html
Just in case this wasn't an implied rhetorical question... the term, as far as I know, was invented by Robert Heinlein in his novel _Stranger in a Strange Land,_ where it is an expression used by Martians. It literally means "to drink," but the Martians use it to mean an understanding that is both very deep and very complete.
"How to Do Nothing," kids activities, back in print!
Here (for free)
Here too (for free)
This one too (for free)
This one also (free)
And don't forget this classic ($30 poster)
-T
Yes. Basically, "to share water with", which on Mars meant you were more than brothers. Considering how little water was/is on Mars, it was a great honor.
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Page rank, to a first order of approximation, ranks your page by "popularity". Using a voting system,it counts the number of links to your page.
To a second order of approximation, it weights the votes of the referencing links by their popularity.
To a third order of approximation, it is a Markov chain that measures the long term likelihood of you arriving at a page, if you to randomly traverse the net: taking random links out of a pages and occasionally take (1/20?) random jumps to arbitrary urls.
It is a graph, not a tree, so there is no one root. Maybe you are looking for the seed site, i.e. the first site added to the webgraph they construct. You can choose any site you prefer, although something well-connected is better. It seems to me that Yahoo! would be a good starting point.
You need to install an RTFM interface.
In a graph is made up of two things, edges and vertices.
In a web graph, vertices are webpages and edges are hyperlinks.
PageRank determines how many incoming edges a vertex has. Given the nature of the web, this is a nontrivial problem because a vertex only knows its outgoing edges.
The assumption for PageRank is that the more incoming edges a vertex has, the more popular it is. So you would use this to figure out how popular a particular vertex is.
Given this you could do like Google and combine it with a search engine to prioritize the results.
Thank you. Thank you. Please no applause; just throw money
http://googlebar.mozdev.org
Tried it...but it provides no pagerank. They say:
"We currently have no plans to implement pagerank"
Still - a cool addition to mozilla.