Modeling Linking on the Web
An Anonymous Coward writes "Amazon has a much greater market share among online bookstores compared to the greatest market share for offline stores. How is this possible? Because the web changes how people find information. There are millions of links to Amazon on the web, which makes it more likely for people to find Amazon when surfing the web, or when using search engines which typically use link popularity in ranking. This makes it harder for new businesses to compete. Researchers have discovered that across the entire web, links are distributed according to a "power law" which leads to "rich get richer" or "winner's take all" behaviour where a small number of sites get the vast majority of links and traffic. A new study just released by NEC shows that this behaviour varies in different communities, and shows how to predict competition in different areas. For example, you can see how much tougher competition is among booksellers compared to photographers."
This article is basically a fancy way of confirming the tyranny of the majority. Google's PageRank, as good as it is, both a) suffers from and b) perpetuates the tyranny of the majority (aka "the rich get richer", the "power law"). IE, the more links, the higher the pagerank, the more relevance, the more hits, the more links...
Teoma seems to be aiming at this chink in Google's armor.
From Teoma's page,...
Using vectoring algorithms to find themed hives of related content, Teoma partitions the power law into manageable chunks. IE, the rich get richer, but at least a dominant site in one field doesn't get artificially inflated relevance when querying an unrelated field. At least in theory. (Kinda like laws are supposed to keep a monopoly from illegally entering other markets, but I digress.)
This is working for Teoma: I (and others) are finding useful stuff on Teoma that Google didn't.
Google is already aware of this particular limitation of PageRank, as can be seen from what they suggest programmers submit to their programming contest...
Even with all that, I still think that humans are the best filters (and isn't a search engine just a programmable filter?). I suspect the rise of weblogs might have something to do with the usefulness found in tapping into some weblogger's idea of what's useful/cool/interesting.
So perhaps the best way to find good info is a cross between a human and a content-vectoring search algorithm. Maybe that's why Ask Jeeves bought Teoma.
This is true of Webcomics as well. Ask someone what their favorite Webcomic is, and they will almost invariably respond with one of the following: User Friendly, Penny Arcade, PvP, Sluggy, Sinfest, Megatokyo or Exploitation Now. With the exception of Penny Arcade, I have found the total combined quality (art + writing + humor) to be fair at best, and atrocious at worst (guess what the worst is; hint: think of a little dustball with feet). But these sites are linked to from all over, and they often link to each other, creating "flash crowds" from Slashdot, other comic sites, personal home pages, etc.
There is a class of "second tier" comics which have nice little followerships: Little Gamers, Sexy Losers, Polymer City and Cool Cat Studio (really, any Keenspot comic that isn't Sinfest or EN) are among these. Everyone else, myself and my comic included, is "third tier", i.e., tumbleweeds rolling across their allotted server space.
Then there is Pokey, which stands conspicuously on its own. HOORAY.
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
The difference between a Pareto distribution and a power law distribution is that in a Pareto distribution, the probability P[X > x] ~ x^-k, (that is, the probability that a observed value is greater than x is proportional to the inverse power of x) whereas a power law is P[X == x].
And a Zipf law is a power law on ranks, rather than values.
Lada Adamic of HP has an excellent how-to on power law distibutions you might find interesting.
They kick back 5 to 15% to whomever provides a link that leads to a sale. That's not small beer. They make it easy for anyone to provide these links. So of course they're all over the place.
Is called VisIT. It produces a graphical representation of how sites link together, based around any given query. It was used quite sucessfully to demonstrate how Scientology had spammed Google, by creating multiple domains all linking back to their main web page.
It's a freebie download and you can get it here.
Alas gallinaceas de urbe bovis volo
There are a quite few papers on this topic (behaviour of disordered networks) by Barabasi and one of his research students, Reka Albert (now probably graduated), most of which are available from his research group's website or from arXiv.
Particular highlights:
A-L. Barabasi and R. Albert, Emergence of scaling in random networks, Science 286, 509, (1999)
A-L. Barabasi, R. Albert and H. Jeong, Scale-free characteristics of random networks: The topology of the World Wide Web, Physica A 281, 69-77 (2000).
A-L. Barabasi and R. Albert, Topology of evolving networks: local events and universality, Physcal Review Letters 85 5234 (2000).
This work is an interesting counterpoint to the 'small world' networks of Watts and Strogatz:
D.J. Watts and S.H. Strogatz, Collective dynamics of 'small-world' networks, Nature 393, 440-442, (1998).
D.J. Watts, Small Worlds, Princeton University Press, (1999).
You have to understand that the number of autistic geeks who have a problem with Amazon's "fsking commercials and screaming colors" is just too small to be of any consequence. Most people, including myself, just don't have a problem with it for what we get in return.
Also whats with the Anti-Amazon sentiment? What exactly is wrong with a company surviving in part due to ad revenue? Does the immature desire for online companies to try to function in this world without advertising revenue still exist? Do you not know that Google has paid ads, in text form, on their site as well? And that they derive a lot of revenue by being the engines under AOL's and Yahoo's search engines?
Mac OS X and Windows XP working side by side to fight back the night.