"Long Tail Effect" Doesn't Work As Advertised, Say Wharton Researchers
Death Metal writes "In a working paper titled, 'Is Tom Cruise Threatened? Using Netflix Prize Data to Examine the Long Tail of Electronic Commerce,' Wharton Operations and Information Management professor Serguei Netessine and doctoral student Tom F. Tan pull information from the movie rental company Netflix to explore consumer demand for smash hits and lesser-known films. Netflix made its data available as part of a $1 million prize competition to encourage the development of new ways that will improve its ability to introduce customers to lesser-known titles they might find appealing." In short, the researchers say that the Long Tail effect described by Chris Anderson is much less important in the real world than popularly held. Says the article: "The key difference between the opinion of [Anderson's] book and the study by Wharton researchers is how they define 'hits' and 'niches.' In the book, Anderson focuses on the definition of hits in absolute terms such as the top 10 or top 1,000 products, while Netessine and Tan argue that, to take growing product variety into account, one has to define popularity in relative terms, such as the top 1% or top 10% of products, to properly assess the presence or absence of the Long Tail."
The working title of the paper is misleading, since there is no mention of "threat", an "effect", or anything of that sort. In fact, all I got out of it was that they were just debating something rather trivial and inconsequential - the definition of a "hit" in a statistical model and how using "top 1000" or so is improper based on NetFlix data.
To be honest, this isn't really "news" worthy of a front page listing.
That's precisely the point. If the shape is such that a top movie gets only 1% of the market, top movies won't make enough profit to justify hiring Tom Cruise and it's a problem for him.
OK, I am not a mathematician, but this paper makes me deeply skeptical.
If the input data is indeed heavy tail (non-existing higher moments) or quasi-heavy tail (existing, but extremely large higher moments) how on earth they can use variance, R^2 and other measures? They may not even exist! And if the input is quasi-heavy tail, then of course they exist, but the convergence time could be arbitrarily long!
I had the unpleasure to work with quasi-heavy-tailed data, and it is really enlightening. You watch the evolution of some metric (e.g.: avg) as the function of incoming data, and you see of course convergence. At least for a while. And then in sudden an extreme outlier comes in, and the avg takes a huge jump! Now if your input is heavy tailed enough, you can be never sure that your measure finally came to rest (converged), or the next jump is just over the corner!
I hope a more educated person clarifies this, I am just an engineer.
But the shape of the market is exactly the point. In a competitive market profit margins are very thin and a relatively small difference may mean life or death to a company. In the entertainment industry we often see an effect where the biggest productions often seem to struggle to break even, while relatively small investments may bring huge profits.
Defining a "hit" as one of the top ten or top 1000 or any absolute number is stupid. It reminds me of a political joke in the Soviet Union, where the result of a race between two athletes, a Russian and an American, was reported in the press as "the Russian came in second while the American was next to the last". In electrical and electronics engineering threshold values are often defined as the point where the power is one half of the maximum, the so-called "-3 dB" points.
The crux of Andersen's argument is that, while Amazon et al have the same demand for big-name titles, their tail is longer and higher than a traditional bookstore, and by defining a cut at a certain point (say, those with less than 5% of the peak sales, those outside the top 10% or whatever is appropriate) it can be seen that the low-volume sales represent a larger fraction of the total sales due to the extreme length of the tail.
Quoting from the Wikipedia article on the topic:
In the graph shown above, Amazon's book sales or Netflix's movie rentals would be represented along the vertical axis, while the book or movie ranks are along the horizontal axis. The total volume of low popularity items exceeds the volume of high popularity items.
Andersen was suggesting that, in the limit of infinite items to sell and negligable stocking costs, much more profit is to be derived from the large number of items that sell a few copies than the few items that sell many copies.
Indeed, it went further than that, suggesting that as people got used to having more choice, they would begin to shun the "popular" items in favour of more obscure titles, further fattening the tail. But that's even more speculative and somewhat independent of the other economic predictions.