Slashdot Mirror


Bow Tie Theory: Researchers Map The Web

Paula Wirth, Web Tinker writes "Scientists from IBM Research, Altavista and Compaq collaborated to conduct the most intensive research study of the Web. The result is the development of the "Bow Tie" Theory. One of the initial discoveries of this ongoing study shatters the number one myth about the Web ... in truth, the Web is less connected than previously thought. You can read more about it "

13 of 133 comments (clear)

  1. Of COURSE there are four groups by Animats · · Score: 4

    Given the criteria they picked, there have to be four groups. The binary-valued criteria are "has links to it" and "has links from it". There are then four possible combinations. All four exist in practice, which is to be expected. Big deal.

  2. Alternate Pasta-Based Web Theories by joshv · · Score: 3

    This was shortly followed by announcements from the W3C of the 'Angel-Hair', 'Fucilli', and 'Linguini' web theories.

    -josh

  3. This is no surprise by ballestra · · Score: 5
    Slashdotters, and especially Everything noders, are good at including relevant links in their posts, and presumably on their own pages. The problem is that most of the content being created for the web is written the same way as traditional magazine or newspaper copy. It's the old 90/10 rule: 90% of the eyeballs are viewing 10% of the available content, and that 10% is generally on commercial sites one or two clicks away from the Yahoo, Netscape, MSN, or AOL main pages.

    Look at the money going into streaming media. A large segment of the business world still sees the internet as just another medium for TV or radio broadcasting. By it's very nature broadcasting is not interconnected, it's passive and linear.

    Tim Berners-Lee wrote in his book, Weaving the Web that the main obstacle to the web being a true information web of shared knowledge is that content is controlled by too few. He was upset that browsers were developed which could not edit web pages like his original browser/editor.

    The silver lining to this, IMHO, is the "weblog" phenomenon, including sites like Slashdot, where ordinary users can contribute their ideas, especially in html format so that they can contribute links. I really believe that some day soon the conventional media sites will be forced to give this kind of capability to their readers, or else risk losing all those eyeballs to Slash-like sites.

    "What I cannot create, I do not understand."

  4. How useful is this? by Shotgun · · Score: 4

    All this tells me is that developers are selective in what they link to. Some tend to get together and link to each other. Some tend to link only to themselves. Some want to be noticed so they provide lots of links, but aren't truly interesting, so nobody links to them.

    This makes complete sense. If every page had links to every other page, you would never be able to find anything. Each page would have too many links. The way the web is developing, you start looking for info within the IN group (usually a search engine or someones index page). This lead to the SCC which eventually points you to a leaf node in the OUT group which has the truly interesting information.

    I find this structure to be efficient and elegant.

    --
    Aah, change is good. -- Rafiki
    Yeah, but it ain't easy. -- Simba
  5. Impact of secure sites and generated pages? by BoLean · · Score: 4

    The study does not mention the impact of secure sites. For that matter, any site that the search engine couldn't crawl. For instace, what abot PHP, ASP or other script generated sites? Do these show up as "outs" or strongly connected or dead links? I have not seen anything in the paper addressing this. I have to imagine there are a fairly large number of scripted pages and secure websites. What about AD-click links, does this make the web appear more connected just because Ads appear on a otherwise dead end page? There may be reasons to question the validity of any such reasearch done using web-bots since the nature of web-content has rapidly changed over the last few years.

  6. Re:The web is broken. by Stiletto · · Score: 5

    I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths...

    Then don't! Thats one thing many "web authors" still don't get... The WWW is a text-oriented medium. It's a page of text that has links to other pages of text. Everything else is just cruft.

    HTML doesn't define how a web site should look to the pixel, and this is one of it's strong points. It's up to the user to decide how to view a site. If the user doesn't want images, your site should look just fine without them.

    The minute you start checking to make sure your site looks the same on all browsers, you should re-think your entire site. Why do you want it to look the same on all browsers (it won't by the way...)? This usually indicates that you are focusing too much on presentation and not enough on content.

    The web is broke. We're not using it properly

    I agree with your second statement. The web isn't broke... people just aren't using it properly. There are so many corporate sites that look like brochures. It's sickening. My previous job was to set up a web page for a small business, and all they wanted me to do was scan each page of their brochure into GIF's, put them up on the web, and put "forward" and "backward" buttons on the bottom to navigate between pages. I said, WTF!?!? The concept of actually including text information and links to other resources was totally absurd to my boss.

    These kinds of people think of the web only as a marketing tool, and thus can't take advantage of the power it has to offer.

  7. Why more connected? by Hard_Code · · Score: 3

    I hope people don't use this paper to promote arbitrary linkage to other sites. I mean /why/ do things have to be more connected? When I'm on my web page I don't want or need one click access to every other part of the web. That's why there are portals and search engines. Islands I understand. But we wouldn't necessarily /want/ those two sections of 24%, origin and termination, to be arbitrarily linked more to the core. We'd just end up with the whole web being a humongous hairball of a core in which each page linked to many other pages in the core. What a mess. People put indices in one place, at the BACK of a book, for a reason.

    --

    It's 10 PM. Do you know if you're un-American?
  8. The /real truth/ about web's topology... by VSc · · Score: 3
    ...is that in fact it consists of two distinctive bodies:
    • Slashdot
    • The rest
    'The rest' can be further subdivided into 3 parts:
    • News sites Slashdot links to
    • Non-new sites which get slashdotted
    • News sites talking about Slashdot
    • The other category would be "None of the above " but in that case we don't really care to count, do we.
    And of course I have to mention that this study of mine is highly unbiased, openminded, and generally guaranteed to be 100% completely accurate.

    __________________________________________

    --

    God did not appoint us to suffer wrath but to receive salvation through our Lord Jesus Christ --1Thes5:9

  9. More information by spiralx · · Score: 4

    Our analysis reveals an interesting picture (Figure 9) of the web's macroscopic structure. Most (over 90%) of the approximately 203 million nodes in our crawl form a single connected component if hyperlinks are treated as undirected edges. This connected web breaks naturally into four pieces. The first piece is a central core, all of whose pages can reach one another along directed hyperlinks -- this "giant strongly connected component" (SCC) is at the heart of the web.

    In graph theory, a strongly connected component is a set of mutually reachable equivalence classes of vertices in a graph - i.e a group in which every vertice is reachable from each other.

    What's interesting is that the four groups mentioned in this article are all approximately the same size, with the SCC group being only slightly larger than the others, which are:

    • IN - Pages that link to SCC but aren't linked from the SCC back.
    • OUT - Pages that are linked to from the SCC but don't link back to it e.g. corporate websites with only internal links.
    • TENDRILS - Sites totally unconnected to the SCC in either direction.

    So what they're saying is that really only about a quarter of the internet is the core that is strongly connected to the rest of it. Which is interesting in itself, because I'd have thought it was a lot higher.

  10. Re:The web is broken. by TheTomcat · · Score: 3

    The WWW is a text-oriented medium. It's a page of text that has links to other pages of text.

    What you've just described is gopher with links.
    I've said this on slashdot before, and I'll say it again: The web is NOT Gopher. The web is a multi-media platform. Including graphics, animation, video, sound, and any other funky stuff people want to throw up on it. The whole "The web should be text. Graphic elements are clutter." mentality makes me sick. I agree 100% that a site should NOT be DEPENDANT on graphics or other 'specialty media' to get content accross. That's what good consideration for the text-based users and ALT tags are for. But a web without graphics is merely gopher tunneled over http.

    Why do you want it to look the same on all browsers (it won't by the way...)?

    It's pretty simple: clients don't understand the web. They want all that pretty crap. They REQUIRE it to look the same wherever they see it. They expect things as low level as kerning and leading to be the same, universally.

    Like I said in my first post, we (as in everyone) need to recognize that the web is a new medium. Traditional media conventions don't apply.

  11. Sources and sinks by Seth+Finkelstein · · Score: 3
    1) How many corporate pages EVER link outside their site?

    2) Advertisers and news sites link into corporate pages

    3) Personal home pages are highly likely to link into popular sites, but not be linked-into themselves

    Applying these ideas, and others like them, leads to the "bowtie".

  12. Re:The web is broken. by coaxial · · Score: 3


    > The web is broke. We're not using it properly

    I agree with your second statement. The web isn't broke... people just aren't using it properly. There are so many corporate sites that look like brochures. It's sickening. My previous job was to set up a web page for a small business, and all they wanted me to do was scan each page of their brochure into GIF's, put them up on the web, and put "forward" and "backward" buttons on the bottom to navigate between pages. I said, WTF!?!? The concept of actually including text information and links to other resources was totally absurd to my boss.

    These kinds of people think of the web only as a marketing tool, and thus can't take advantage of the power it has to offer.


    Look at news sites. Howmany times do you come across a articles that are word-for-word taken directly from the printed page. (Almost to the fact that it says, "continued on page 3C".)

    The worst part is the page-turning. You know, the "next page" links at the bottom of articles. That right there is a sign that your sight is broken. You're using a static and linear approach in a dynamic and nonlinear medium.

    Break the story up. Link God damn it! If a comany gets mentioned link to it, not one of those pathetic stock quote drivels that news sites make. If some person made a speech, don't just quote the one or two sentences, link to the speech.

    I'm convinced that the web is going to suck until our children ascend to power. Look at television. In the early days of the late 40s and 50s everything was very rigid. You basically had radio programs being done in front of a cammera. After a generation was raised on televions did you actually get programs that started to take advantage of the medium. Compare how news was done in 1950 to how it's done today. Look at educational television. Before you had the monotone droning voice of an old man, and now you have Sesame Street. The same thing is going to happen to the web.

  13. The web is broken. by TheTomcat · · Score: 5

    I'm a web developer. I've always loved the potential of the web until recently. Now I don't like working with it. I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths, and design templates that came from a print artist who thinks that the web is one big brochure.

    The web is broke. We're not using it properly, there are too many poorly done corporate sites, contributing to insecurity, poor usability and incompatibility.

    Many clients we work with are dead set against sending anyone away from their site. I don't think they realize that links are what the web is made of. This contributes to the unreachable part of the bowtie. These corporate folk are afraid that by linking away from the site, they will lose a viewer, and that use won't find their way back. They don't realize that the web is a pull technology, and the if the user was looking for certain information, the user will come back if it is the best source of such info. The back button is one of the browsers most used features.

    We need more of these research projects to help us figure out what needs to be changed. The W3C is a start, but it's expensive to join and it's rare that you find a website that conforms to the standards. In fact, I've run into web developers who have never HEARD of the w3c.

    The web is a new, completely different medium. It's not a CDROM, it's not a brochure, it's not TV. We can't keep treating it like these other media.