Bow Tie Theory: Researchers Map The Web
Paula Wirth, Web Tinker writes "Scientists from IBM Research, Altavista and Compaq collaborated to conduct the most intensive research study of the Web. The result is the development of the "Bow Tie" Theory. One of the initial discoveries of this ongoing study shatters the number one myth about the Web ... in truth, the Web is less connected than previously thought.
You can
read more about it "
Given the criteria they picked, there have to be four groups. The binary-valued criteria are "has links to it" and "has links from it". There are then four possible combinations. All four exist in practice, which is to be expected. Big deal.
Look at the money going into streaming media. A large segment of the business world still sees the internet as just another medium for TV or radio broadcasting. By it's very nature broadcasting is not interconnected, it's passive and linear.
Tim Berners-Lee wrote in his book, Weaving the Web that the main obstacle to the web being a true information web of shared knowledge is that content is controlled by too few. He was upset that browsers were developed which could not edit web pages like his original browser/editor.
The silver lining to this, IMHO, is the "weblog" phenomenon, including sites like Slashdot, where ordinary users can contribute their ideas, especially in html format so that they can contribute links. I really believe that some day soon the conventional media sites will be forced to give this kind of capability to their readers, or else risk losing all those eyeballs to Slash-like sites.
"What I cannot create, I do not understand."
All this tells me is that developers are selective in what they link to. Some tend to get together and link to each other. Some tend to link only to themselves. Some want to be noticed so they provide lots of links, but aren't truly interesting, so nobody links to them.
This makes complete sense. If every page had links to every other page, you would never be able to find anything. Each page would have too many links. The way the web is developing, you start looking for info within the IN group (usually a search engine or someones index page). This lead to the SCC which eventually points you to a leaf node in the OUT group which has the truly interesting information.
I find this structure to be efficient and elegant.
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
The study does not mention the impact of secure sites. For that matter, any site that the search engine couldn't crawl. For instace, what abot PHP, ASP or other script generated sites? Do these show up as "outs" or strongly connected or dead links? I have not seen anything in the paper addressing this. I have to imagine there are a fairly large number of scripted pages and secure websites. What about AD-click links, does this make the web appear more connected just because Ads appear on a otherwise dead end page? There may be reasons to question the validity of any such reasearch done using web-bots since the nature of web-content has rapidly changed over the last few years.
I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths...
Then don't! Thats one thing many "web authors" still don't get... The WWW is a text-oriented medium. It's a page of text that has links to other pages of text. Everything else is just cruft.
HTML doesn't define how a web site should look to the pixel, and this is one of it's strong points. It's up to the user to decide how to view a site. If the user doesn't want images, your site should look just fine without them.
The minute you start checking to make sure your site looks the same on all browsers, you should re-think your entire site. Why do you want it to look the same on all browsers (it won't by the way...)? This usually indicates that you are focusing too much on presentation and not enough on content.
The web is broke. We're not using it properly
I agree with your second statement. The web isn't broke... people just aren't using it properly. There are so many corporate sites that look like brochures. It's sickening. My previous job was to set up a web page for a small business, and all they wanted me to do was scan each page of their brochure into GIF's, put them up on the web, and put "forward" and "backward" buttons on the bottom to navigate between pages. I said, WTF!?!? The concept of actually including text information and links to other resources was totally absurd to my boss.
These kinds of people think of the web only as a marketing tool, and thus can't take advantage of the power it has to offer.
Our analysis reveals an interesting picture (Figure 9) of the web's macroscopic structure. Most (over 90%) of the approximately 203 million nodes in our crawl form a single connected component if hyperlinks are treated as undirected edges. This connected web breaks naturally into four pieces. The first piece is a central core, all of whose pages can reach one another along directed hyperlinks -- this "giant strongly connected component" (SCC) is at the heart of the web.
In graph theory, a strongly connected component is a set of mutually reachable equivalence classes of vertices in a graph - i.e a group in which every vertice is reachable from each other.
What's interesting is that the four groups mentioned in this article are all approximately the same size, with the SCC group being only slightly larger than the others, which are:
So what they're saying is that really only about a quarter of the internet is the core that is strongly connected to the rest of it. Which is interesting in itself, because I'd have thought it was a lot higher.
I'm a web developer. I've always loved the potential of the web until recently. Now I don't like working with it. I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths, and design templates that came from a print artist who thinks that the web is one big brochure.
The web is broke. We're not using it properly, there are too many poorly done corporate sites, contributing to insecurity, poor usability and incompatibility.
Many clients we work with are dead set against sending anyone away from their site. I don't think they realize that links are what the web is made of. This contributes to the unreachable part of the bowtie. These corporate folk are afraid that by linking away from the site, they will lose a viewer, and that use won't find their way back. They don't realize that the web is a pull technology, and the if the user was looking for certain information, the user will come back if it is the best source of such info. The back button is one of the browsers most used features.
We need more of these research projects to help us figure out what needs to be changed. The W3C is a start, but it's expensive to join and it's rare that you find a website that conforms to the standards. In fact, I've run into web developers who have never HEARD of the w3c.
The web is a new, completely different medium. It's not a CDROM, it's not a brochure, it's not TV. We can't keep treating it like these other media.