Slashdot Mirror


Modeling Linking on the Web

An Anonymous Coward writes "Amazon has a much greater market share among online bookstores compared to the greatest market share for offline stores. How is this possible? Because the web changes how people find information. There are millions of links to Amazon on the web, which makes it more likely for people to find Amazon when surfing the web, or when using search engines which typically use link popularity in ranking. This makes it harder for new businesses to compete. Researchers have discovered that across the entire web, links are distributed according to a "power law" which leads to "rich get richer" or "winner's take all" behaviour where a small number of sites get the vast majority of links and traffic. A new study just released by NEC shows that this behaviour varies in different communities, and shows how to predict competition in different areas. For example, you can see how much tougher competition is among booksellers compared to photographers."

8 of 131 comments (clear)

  1. Tyranny of majority = PageRank by Lee+Bottemiller · · Score: 2, Informative

    This article is basically a fancy way of confirming the tyranny of the majority. Google's PageRank, as good as it is, both a) suffers from and b) perpetuates the tyranny of the majority (aka "the rich get richer", the "power law"). IE, the more links, the higher the pagerank, the more relevance, the more hits, the more links...

    Teoma seems to be aiming at this chink in Google's armor.

    From Teoma's page,...

    Teoma uses Subject-Specific PopularitySM. Subject-Specific Popularity ranks a site based on the number of same-subject pages that reference it, not just general popularity, to determine a site's level of authority.

    Using vectoring algorithms to find themed hives of related content, Teoma partitions the power law into manageable chunks. IE, the rich get richer, but at least a dominant site in one field doesn't get artificially inflated relevance when querying an unrelated field. At least in theory. (Kinda like laws are supposed to keep a monopoly from illegally entering other markets, but I digress.)

    This is working for Teoma: I (and others) are finding useful stuff on Teoma that Google didn't.

    Google is already aware of this particular limitation of PageRank, as can be seen from what they suggest programmers submit to their programming contest...

    Entries in the Applications track generally deal with the semantics of the data. Some examples include:

    Detecting common templates in pages, and separating out the common structure from the individual content.
    Classifying links on a page.
    Detecting pages that are near-duplicates of one another.
    Clustering pages by topic or type.

    Even with all that, I still think that humans are the best filters (and isn't a search engine just a programmable filter?). I suspect the rise of weblogs might have something to do with the usefulness found in tapping into some weblogger's idea of what's useful/cool/interesting.

    So perhaps the best way to find good info is a cross between a human and a content-vectoring search algorithm. Maybe that's why Ask Jeeves bought Teoma.

  2. Webcomics by BitwizeGHC · · Score: 3, Informative

    This is true of Webcomics as well. Ask someone what their favorite Webcomic is, and they will almost invariably respond with one of the following: User Friendly, Penny Arcade, PvP, Sluggy, Sinfest, Megatokyo or Exploitation Now. With the exception of Penny Arcade, I have found the total combined quality (art + writing + humor) to be fair at best, and atrocious at worst (guess what the worst is; hint: think of a little dustball with feet). But these sites are linked to from all over, and they often link to each other, creating "flash crowds" from Slashdot, other comic sites, personal home pages, etc.

    There is a class of "second tier" comics which have nice little followerships: Little Gamers, Sexy Losers, Polymer City and Cool Cat Studio (really, any Keenspot comic that isn't Sinfest or EN) are among these. Everyone else, myself and my comic included, is "third tier", i.e., tumbleweeds rolling across their allotted server space.

    Then there is Pokey, which stands conspicuously on its own. HOORAY.

    --
    N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
  3. Re:power law by soboroff · · Score: 4, Informative

    The difference between a Pareto distribution and a power law distribution is that in a Pareto distribution, the probability P[X > x] ~ x^-k, (that is, the probability that a observed value is greater than x is proportional to the inverse power of x) whereas a power law is P[X == x].

    And a Zipf law is a power law on ranks, rather than values.

    Lada Adamic of HP has an excellent how-to on power law distibutions you might find interesting.

  4. Amazon pays for those links. by clion999 · · Score: 2, Informative

    They kick back 5 to 15% to whomever provides a link that leads to a sale. That's not small beer. They make it easy for anyone to provide these links. So of course they're all over the place.

    1. Re:Amazon pays for those links. by chuckgrosvenor · · Score: 2, Informative

      it's small change though.. in the past two weeks, I've sent amazon 14,316 clicks and 8952 unique visitors.. 46 items ordered. For that I'll see about $40 (my current revenue only displays items shipped not ordered, so no exact numbers there, 37 items shipped is $28.57, so I might be a little optimistic).

      A cost-per-click banner ad that only pays $0.05 per click would have cost Amazon $715.80. In this case, the cost is closer to $0.0027 per click. It's quite cheap advertising compared to almost anything else on the web currently.

      Factor in the fact that affliates don't get referral fees for used items, auction items, and several other categories and you realize that their 5% referral is coming off their highest profit items. Any time a visitor buys a used item from Amazon, they don't have to ship anything, just collect a payment.

  5. A good tool to look at web linkage by SweenyTod · · Score: 3, Informative

    Is called VisIT. It produces a graphical representation of how sites link together, based around any given query. It was used quite sucessfully to demonstrate how Scientology had spammed Google, by creating multiple domains all linking back to their main web page.

    It's a freebie download and you can get it here.

    --
    Alas gallinaceas de urbe bovis volo
  6. Re:Scale-Free Networks by ngibbins · · Score: 5, Informative
    This is also called a scale-free network, and the research on it, by Albert-Laszlo Barabasi (currently at Notre Dame U) is in this week's New Scientist.

    There are a quite few papers on this topic (behaviour of disordered networks) by Barabasi and one of his research students, Reka Albert (now probably graduated), most of which are available from his research group's website or from arXiv.

    Particular highlights:

    A-L. Barabasi and R. Albert, Emergence of scaling in random networks, Science 286, 509, (1999)

    A-L. Barabasi, R. Albert and H. Jeong, Scale-free characteristics of random networks: The topology of the World Wide Web, Physica A 281, 69-77 (2000).

    A-L. Barabasi and R. Albert, Topology of evolving networks: local events and universality, Physcal Review Letters 85 5234 (2000).

    This work is an interesting counterpoint to the 'small world' networks of Watts and Strogatz:

    D.J. Watts and S.H. Strogatz, Collective dynamics of 'small-world' networks, Nature 393, 440-442, (1998).

    D.J. Watts, Small Worlds, Princeton University Press, (1999).

  7. Not everyone has a problem with Amazon by NDPTAL85 · · Score: 2, Informative

    You have to understand that the number of autistic geeks who have a problem with Amazon's "fsking commercials and screaming colors" is just too small to be of any consequence. Most people, including myself, just don't have a problem with it for what we get in return.

    Also whats with the Anti-Amazon sentiment? What exactly is wrong with a company surviving in part due to ad revenue? Does the immature desire for online companies to try to function in this world without advertising revenue still exist? Do you not know that Google has paid ads, in text form, on their site as well? And that they derive a lot of revenue by being the engines under AOL's and Yahoo's search engines?

    --
    Mac OS X and Windows XP working side by side to fight back the night.