davepeck · Slashdot Mirror

← Back to Users

User: davepeck

davepeck's activity in the archive.

Stories: 0
Comments: 2
First seen: 1999-09-08
Last seen: 1999-09-08
Profile: (view on slashdot.org)

Comments · 2

Re:I fail to see how this could be useful. on Web: 19 Clicks Wide · 1999-09-08 06:24 · Score: 1

I agree that by itself, the average link distance between two pages (19) isn't a very useful number. However, there is definitely useful information in the article itself:

1. We are given a real-world (probabilistic) distribution of link distances between pages (i.e. given two randomly chosen pages, what is the probability that the shortest/longest link distance between them is X?)

2. From the visualizations, we can see that the web is a graph containing a number of densely connected components which are themselves only fairly loosely connected to one another, and that this behavior is fairly scale-independent.

These two tidbits could lead to impressively improved Web crawlers. You could decide to stop following links once you've gone 25 deep, for example; you could try and determine on-the-fly if more than one of your crawler processes is working on the same densely connected component of the Web and combine their efforts (or move one of the processes over to a new uncharted component), thus effectively searching more of the web. Using similar statistics for distribution of in-link and out-link counts, you could improve crawler heuristics so that pages with a number of out-links significantly deviant from the mean are given more weight for future crawling.

Oh well, just some random thoughts.
Re:Example program on Web: 19 Clicks Wide · 1999-09-08 05:55 · Score: 1

I can imagine some cases in which a link-path between two pages would be useful. For example, if you are researching differential geometry and combinatoral topology, you might suspect that there is some connection between them. Unfortunately, a page containing a proof of the Gauss-Bonet theorem -- the connection you're looking for -- probably doesn't contain both of the original search terms on it. A link path between a diff-geo and a comb-top page might work out better.

"Trailblazing" through link-space was a prime motivator of Bush's Memex vision: finding new paths between separate "pages" of information was the same as discovering new relationships between discrete pieces of knowledge. In fact, knowledge can be thought of as connections between previously unlinked sets of facts.

In all seriousness, finding a link-path between two separate pages is a thorny issue. First, you are dealing with a directed graph, and as the posts above point out, a link-path from A to B probably won't contain the same set of pages as a link-path from B to A. Then there is the issue of _which_ link paths are useful (and I believe there are some which could potentially be useful) and which aren't; this is largely a decision made based on the weighting you've placed on the Web Pages in question. Finally, there is the issue that you have to have link-structure information sitting around for a good chunk of the Web before something like this could actually work.

But it would be neat!