Computing PageRank on your PC?

← Back to Stories (view on slashdot.org)

Computing PageRank on your PC?

Posted by CmdrTaco on Thursday June 12, 2003 @06:20AM from the for-the-curious-hacker dept.

An anonymous reader writes "A group of CS researchers of the University of Milan has found a way to compress web graphs at 3 bits per link, and to access them in compressed form. They provide data sets representing real snapshots of portions of the web with one hundred million nodes and 1 billion links. You just need some bandwidth to download a few hundred megabytes of data, and you can compute PageRank with your PC. All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!"

16 of 186 comments (clear)

Min score:

Reason:

Sort:

The major thing missing from Mozilla by Anonymous Coward · 2003-06-12 06:22 · Score: 5, Interesting

Is a way to look at Google's pagerank. That's the only real thing the IE Google toolbar has over the Mozilla alternative.
1. Re:The major thing missing from Mozilla by JamesDotCom · 2003-06-12 14:01 · Score: 2, Interesting
  
  The problem is, is that the google toolbars checksum changes constantly. So if you were to find out how the google toolbar works exactely regarding pagerank, all it takes if for googles official toolbar to change it and it wont work anymore. The catch is however, that if you send a wrong checksum to google, they don't send back an error message of any sort instead they send back a fake pagerank. So you wouldn't really know if it was still working or not.
Dumb Question: by Xesdeeni · 2003-06-12 06:23 · Score: 5, Interesting

What's Page Rank? Does this indicate how often my page is visited?

Xesdeeni
Some webmasters/SEO's are obsessive by Anonymous Coward · 2003-06-12 06:25 · Score: 5, Interesting

If Google tweaks one thing, causing result 97 to shift to result 98, they notice. They'd be doing this daily to check on their pages.
Which sites are the Root(s)? by amembleton · 2003-06-12 06:26 · Score: 5, Interesting

When these Web Graph or Page Rank things are drawn up which sites do they use as the roots?

I mean they've got to start with some site(s) and then go through each link from there.
1. Re:Which sites are the Root(s)? by menscher · 2003-06-12 16:44 · Score: 2, Interesting
  
  Google starts their webcrawl with the Stanford University home page. (Info based on a talk given by Craig SIlverstein, the directory of technology at Google.)
beyond PageRank... by rfischer · 2003-06-12 06:26 · Score: 3, Interesting

... I would be interested in how the links change over time. Maybe take a new snapshot every day or week, see the web evolve.
1. Re:beyond PageRank... by big_gibbon · 2003-06-12 07:23 · Score: 2, Interesting
  
  That would be amazingly cool.The only problem (and it's not really a problem) would be that generally people never, or rarely, remove links. If you limited this to links only (say) a month old or younger, you could see the paths of memes round the web . . . for example right now, you'd probably see a lot of BitTorrent hotspots, whereas a couple of years ago there'd be lots focussed on "all your base" . . .
  
  Anyone got a lot of procesing power and some spare time? ;)
  
  P
Google with feedback by Sanity · 2003-06-12 06:29 · Score: 3, Interesting

Doesn't Google have a patent on PageRank?
Anyway, forgive the opportunism, but this is reasonably on-topic. Last weekend I set myself the ambitious task of improving on Google. I came up with a Google front-end which allows you to give feedback on the quality of search results, and thus refine your search. I could really use people's help to test it out - you can find it here. Feedback would really be appreciated.
Google patents? by PaulBu · 2003-06-12 06:31 · Score: 4, Interesting

All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!

GPL'd? Hmm, I thought that Google did patent the PageRank algorithm (correct me if I am wrong), so re-implementing THEIR algorithm even more efficiently would be incompatible with GPL. OTOH, if it is not THEIR algorithm, it can not be called 'PageRank'
Oh, the evils of software patents...
Paul B.
1. Re:Google patents? by JoeBuck · 2003-06-12 06:37 · Score: 3, Interesting
  
  Google hasn't exactly patented the algorithm for all uses, and no court has determined that the code infringes the patent, and software patents aren't valid in most countries, so it's not clear whether or not there is any compatibility.
  It would seem that anyone who uses the code to build a search engine would be infringing, but even that is something that lawyers can argue about.
I wonder... by crashnbur · 2003-06-12 06:58 · Score: 4, Interesting

...how this can be used to discover the percentage of broken links on the web at any given moment in time.
Besides - not all browsers show page rank by kiddailey · 2003-06-12 07:13 · Score: 2, Interesting

And let's not forget... not all of us even get exposed to page rank regularly.

On my Mac for example, I can't see it at all. On my Wintel I can, thanks to the Google toolbar.
Could I someday use it for my PC - Re:This soun by leoaugust · 2003-06-12 07:55 · Score: 2, Interesting

I wonder if I can use pagerank algorithm for the smaller universe of my harddrive itself?
I have over 6,000 files on my PC many of which link to each other, and I am adding more links between them as time goes by. The collection is now so big that I can't even revist my own files and reason out the implications of the links between pages, beacuse of the huge time it would take to even spend a minute on each saved file.
I wonder if something like Pagerank will let the important files that are linked by many others on my PC to rise "up" like the cream to say, and I can avoid having to use keywords and categories to wade through all the clutter on my harddrive ...
Any other ideas of how to study the relationship between my 6,000+ files?
I also have quite a few articles, e.g. news items saved from the web itself. I wonder if the pagerank of google for those saved articles could also help me flush out the important "external" articles on my harddrive itself.

--
To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...
Googling your harddisk by |>>? · 2003-06-12 09:52 · Score: 2, Interesting

While calculating PageRank seems like a nice idea, I'm much more interested in having a google search available over my harddisk. I recall that AltaVista in the mid-90's had a programme that created an index over your whole disk - it dealt with many filetypes including .doc, .pdf, .mbox and basically gave you an AltaVista search over all your harddisk content.
Anyone know of anything like that?

--
|>>? ..EBCDIC for Onno..
Re:why is rank/rating necessary? by baka_boy · 2003-06-12 10:28 · Score: 4, Interesting

I really shouldn't rise to this bait, but I can't resist: yes, given the choice between those networks, I would choose PBS. Just as I would take a non-profit-driven Internet, public radio over Clear Channel and its ilk, and community mesh wireless networks over 3G mobile phone service.

Google has been, so far at least, a rare exception in the world of privatized communications utilities, by consistently showing a amazing lack of intention to lock people into their service, using either exclusivity agreements of some sort or the simple expedient of proprietary technology (i.e., "increase your PageRank by 10% if you support new encrypted GoogleML tags on your site!"). Nothing is permanent, though, and as we all know, single points of failure are a no-no.

So, to bring all this back somewhere in the general neighborhood of the main story: further distributing the capability to build "mini-Googles", or specialized, community-maintained (but still fairly large-scale in terms of number of pages and links indexed) search tools is very interesting, and a useful body of technology to perpetuate.

Or, even more generally, the technology needed to do large-scale storage, analysis, and manipulation of directed graph structures is a very useful tool. Software analysis often relies heavily on large graphs showing dependencies, caller-callee relationships, variable accesses, etc., as do any number of AI subdomains like knowledge representation and planning systems.