Computing PageRank on your PC?
An anonymous reader writes "A group of CS researchers of the University of Milan has found a way to compress web graphs at 3 bits per link, and to access them in compressed form. They provide data sets representing real snapshots of portions of the web with one hundred million nodes and 1 billion links. You just need some bandwidth to download a few hundred megabytes of data, and you can compute PageRank with your PC. All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!"
Is a way to look at Google's pagerank. That's the only real thing the IE Google toolbar has over the Mozilla alternative.
Now if I can just think of a reason why I would need this..
Everyone is entitled to their own opinion. It's just that yours is stupid.
What's Page Rank? Does this indicate how often my page is visited?
Xesdeeni
"Finally, proof!!"
If Google tweaks one thing, causing result 97 to shift to result 98, they notice. They'd be doing this daily to check on their pages.
When these Web Graph or Page Rank things are drawn up which sites do they use as the roots?
I mean they've got to start with some site(s) and then go through each link from there.
It's basically how well linked to your page is, and how well linked to the pages linking to you are, and so on. It's an advanced form of link popularity. The idea is that the more people that link to something, the more influential/important it is. Some sites have high PageRanks of 10 (like Google), while Slashdot is something like an 8. Many pages are in the 4-6 range. Every link you create is like a "vote" for another web page.
"[...] even on a PC with as little as 256 Mbytes of RAM."
Somewhere in 1980, milk shoots out of Bill Gates' nose for no apparent reason.
"I'll be there in a minute! I'm downloading the Internet!"
Slashdotter are stupid and biased.
As best as I can tell from the website, the API is only for storing and interacting with a large graph. Nothing there is actually involved with PageRank. You could use this API presumably to write your own PageRank code, but to say "everybody can grok PageRank now!" is misleading at best.
Moreover, IANAL, but isn't the PageRank algorithm patented by Google? Wouldn't this prevent anyone from releasing GPL code that computes PageRank?
I think this project is really just a proof of concept. As another post pointed out, to make this really useful you'd need to regularly update your local data set, which isn't very practical for most people.
Also, if the downloadable dataset only covers a small portion of the web, how can this system's utility really compare to Google's?
That said, I think computer science proof-of-concept type project are very useful and serve a valuable purpose in getting the ideas out there for others to improve upon.
In a graph is made up of two things, edges and vertices.
In a web graph, vertices are webpages and edges are hyperlinks.
PageRank determines how many incoming edges a vertex has. Given the nature of the web, this is a nontrivial problem because a vertex only knows its outgoing edges.
The assumption for PageRank is that the more incoming edges a vertex has, the more popular it is. So you would use this to figure out how popular a particular vertex is.
Given this you could do like Google and combine it with a search engine to prioritize the results.
Thank you. Thank you. Please no applause; just throw money