Describing The Web With Physics

← Back to Stories (view on slashdot.org)

Describing The Web With Physics

Posted by timothy on Sunday August 5, 2001 @02:34PM from the like-a-large-enough-lever-mated-to-a-spring dept.

Fungii writes: "There is a fascinating article over on physicsweb.com about 'The physics of the Web.' It gets a little technical, but it is a really interesting subject, and is well worth a read." And if you missed it a few months ago, the IBM study describing "the bow tie theory" (and a surprisingly disconnected Web) makes a good companion piece. One odd note is the reseachers' claim that the Web contains "nearly a billion documents," when one search engine alone claims to index more than a third beyond that, but I guess new and duplicate documents will always make such figures suspect.

4 of 133 comments (clear)

1,000,000,000 urls by grammar+nazi · 2001-08-05 14:50 · Score: 4, Insightful

The story mentions "nearly 10^9 urls", so duplicate documents would be counted multiple times.
Most of their research seems to be on 'static pages'. They state that the entire internet is connected via 16 links (similar to the way that people are connected to 5-6 aquantances). I believe as the ratio of dynamic to static content on the internet increases, this will bring increase the total number of clicks that it takes to get one site to the next. For example, I could create a website that dynamically generates pages, the first 19 pages are all contained within my site and the 20th time that the page is generated, it contains a link to google.
The metric functions that they use are good for randomly connected maps, but they don't apply to the internet, where nodes are not randomly connected. Nodes cluster into a group depending on topic or categories. For example, one Michael Jackson site links to other Michael Jackson websites.

--

Keeping /. free of grammatical errors for ~5 years.
Read the fscking article... by friode · 2001-08-05 15:18 · Score: 5, Interesting

One odd note is the reseachers' claim that the Web contains "nearly a billion documents," when one search engine alone claims to index more than a third beyond that

Look deeper, grasshopper:

...This expression predicts typically that the shortest path between two pages selected at random among the 800 million nodes (i.e. documents) that made up the Web in 1999 is around 19 assuming that such a path exists...

...the typical number of clicks between two Web pages is about 19, despite the fact that there are now over one billion pages out there...

Hey, Timothy, next time try reading the article instead if skimming it.

--
There may be many reasons not to kill you, but among them is not that you'll be missed by NASA - The Long Kiss Goodnight
IBM "bow tie" paper by mgarraha · 2001-08-05 16:00 · Score: 4, Interesting

In "Graph structure in the web," Kumar et al. divide 200 million web pages into four categories of roughly equal size:

The first piece is a central core, all of whose pages can reach one another along directed hyperlinks -- this "giant strongly connected component" (SCC) is at the heart of the web. The second and third pieces are called IN and OUT. IN consists of pages that can reach the SCC, but cannot be reached from it - possibly new sites that people have not yet discovered and linked to. OUT consists of pages that are accessible from the SCC, but do not link back to it, such as corporate websites that contain only internal links. Finally, the TENDRILS contain pages that cannot reach the SCC, and cannot be reached from the SCC.

So is your home page an innie or an outie?
Re:LAIN by Erasmus+Darwin · 2001-08-05 16:21 · Score: 4, Interesting

What will happen as the net becomes more and more like a brain? Can it have a soul?
Please don't take this the wrong way, but that's honestly the sort of question I'd expect from someone who doesn't understand computers.
While I believe in the possibility of machine intelligence (along with the moral, ethical, and most importantly philisophical questions that raises), the net is more of a data transfer mechanism than a processing mechanism. Short of very delibrate projects, such as SETI@Home, you just don't have your average machine on the net doing random computation. In that sense, the net really hasn't changed much since its inception. Further, if you did have a distributed consciousness, what would the consequences of lag, network outages, and outright crashes be? In that sense, it would be interesting to see if random/semi-random/genetic algorithms are capable of generating an intelligence capable of coping with such noise. However, I think such issues would rapidly kill off something before it became "evolved" enough to cope. If we do get an intelligence, I think it'll be something that happens on purpose. It may be distributed (maybe as a redundant, non-real-time simulation of a brain), but I doubt it'll be a spontaneous Skynet-like entity.