Slashdot Mirror


"Long Tail Effect" Doesn't Work As Advertised, Say Wharton Researchers

Death Metal writes "In a working paper titled, 'Is Tom Cruise Threatened? Using Netflix Prize Data to Examine the Long Tail of Electronic Commerce,' Wharton Operations and Information Management professor Serguei Netessine and doctoral student Tom F. Tan pull information from the movie rental company Netflix to explore consumer demand for smash hits and lesser-known films. Netflix made its data available as part of a $1 million prize competition to encourage the development of new ways that will improve its ability to introduce customers to lesser-known titles they might find appealing." In short, the researchers say that the Long Tail effect described by Chris Anderson is much less important in the real world than popularly held. Says the article: "The key difference between the opinion of [Anderson's] book and the study by Wharton researchers is how they define 'hits' and 'niches.' In the book, Anderson focuses on the definition of hits in absolute terms such as the top 10 or top 1,000 products, while Netessine and Tan argue that, to take growing product variety into account, one has to define popularity in relative terms, such as the top 1% or top 10% of products, to properly assess the presence or absence of the Long Tail."

12 of 82 comments (clear)

  1. Missing the point by BadAnalogyGuy · · Score: 4, Insightful

    The long tail doesn't threaten those at the top any more than it isolates those at the bottom. It only describes the shape of the market which necessarily has only a few specific market products which are used by the majority and the rest of the products with very few customers in the "long tail". It's a market definition, not a competition definition.

    You can cut the tail off of a gecko at any point, but it doesn't mean that somehow the tail can exist without a fat end and a thin end. Since the tail is simply the appendage attached to the abdomen, wherever it is attached defines its fat end, and where it ends is the thin end. Even if you cut the tail off completely, all that you've done is stimulated the tail regrowth reflex.

    1. Re:Missing the point by JimboFBX · · Score: 5, Informative

      The working title of the paper is misleading, since there is no mention of "threat", an "effect", or anything of that sort. In fact, all I got out of it was that they were just debating something rather trivial and inconsequential - the definition of a "hit" in a statistical model and how using "top 1000" or so is improper based on NetFlix data.

      To be honest, this isn't really "news" worthy of a front page listing.

    2. Re:Missing the point by Anonymous Coward · · Score: 5, Interesting

      It only describes the shape of the market

      That's precisely the point. If the shape is such that a top movie gets only 1% of the market, top movies won't make enough profit to justify hiring Tom Cruise and it's a problem for him.

    3. Re:Missing the point by mangu · · Score: 5, Insightful

      The long tail doesn't threaten those at the top any more than it isolates those at the bottom. It only describes the shape of the market

      But the shape of the market is exactly the point. In a competitive market profit margins are very thin and a relatively small difference may mean life or death to a company. In the entertainment industry we often see an effect where the biggest productions often seem to struggle to break even, while relatively small investments may bring huge profits.

      Defining a "hit" as one of the top ten or top 1000 or any absolute number is stupid. It reminds me of a political joke in the Soviet Union, where the result of a race between two athletes, a Russian and an American, was reported in the press as "the Russian came in second while the American was next to the last". In electrical and electronics engineering threshold values are often defined as the point where the power is one half of the maximum, the so-called "-3 dB" points.

    4. Re:Missing the point by coaxial · · Score: 4, Insightful

      Defining a "hit" as one of the top ten or top 1000 or any absolute number is stupid. It reminds me of a political joke in the Soviet Union, where the result of a race between two athletes, a Russian and an American, was reported in the press as "the Russian came in second while the American was next to the last". In electrical and electronics engineering threshold values are often defined as the point where the power is one half of the maximum, the so-called "-3 dB" points.

      There are 17,770 movies in the Netflix Prize training set. 1000 movies account for 5.63% of movies for the entire dataset. This 5% account for 63% of all rentals. If you use your threshold of "half of the maximum," then you have the top 100 movies. More to the point, your threshold definition using decibels is predicated on the data being normally distributed, and this data conforms to a power distribution, most likely Zip-Ian.

      Getting back to your point of defining a "hit" based on profitability, that too is poor way of defining it. It's much easier for a very cheap film to make multiples of it's budget in revenue, but still no one sees it. Number of viewers has always been the traditional way of defining a hit. Revenue is just proxy for that.

    5. Re:Missing the point by khchung · · Score: 4, Interesting

      Defining a "hit" as one of the top ten or top 1000 or any absolute number is stupid.

      While it sounds stupid, using a top xx% is, in a way, validating the idea of long tail.

      Why? Because before Amazon, when book stores are still only brick and mortar, there is only so much physical space to hold top 1000 or however many number of books. Note that this number is fixed, it won't grow because more kinds of books are published.

      So having an absolute number of top 100 or top 1000 simply corresponds to the physical constraint that most bookstore can only put so many books on the shelf.

      The advantage that Amazon has over physical book stores is that it can hold practically unlimited number of books. So only now, without the physical constrain, we can practically use top 10% instead. This, in fact, proved that there are many more profitable books outside top 1000 (or however many), and that physical bookstores are missing out many sales due to it.

      --
      Oliver.
    6. Re:Missing the point by ukyoCE · · Score: 4, Interesting

      This, in fact, proved that there are many more profitable books outside top 1000 (or however many), and that physical bookstores are missing out many sales due to it.

      YES.

      This is exactly why I've stopped using brick and mortar retailers almost entirely. They carry such a limited selection that it's often a wasted trip.

      This goes for video rental stores once they consolidated (in my area) to a chain of "new release-only" stores. I switched to Netflix and have never been back, and have converted many friends to Netflix too.

      Music stores, which in my area have never carried anything but the most popular overpriced crap. Now I buy from Amazon or direct from musicians' websites.

      Groceries are one of the few markets left worth using brick and mortar stores for. Anything else is just a showroom for cheaper online stores, at this point.

    7. Re:Missing the point by mabhatter654 · · Score: 4, Interesting

      But few retail stores stock more that 50 or 100 current titles, so I think the original idea is quite good. Movie theaters show only 12 or so films at a time, even in big cities. Opening a Brick and Mortar store or theater up to even 1% of "hits" say music/movies that actually made a popularity chart in the last 50 years would be an impressive achievement. Even blockbuster movies like Star Wars or Indiana Jones become "unpurchasable" in a relatively short amount of time.... They've already hit "bargin bin" status in most retail outlets.

      Having a surefire way to get at that back catalog would be highly important. The real key is getting business to focus on marketing in "long tail" manner. Something like Netflix is interesting because they are a business that really pays no penalty for keeping extra DVDs in the warehouse.... but how do they get people interested in WATCHING them. I find Emusic to be a similar thing in that area, but again, the hardest part is matching up MORE stuff I'm interested in rather than what publishers are currently marketing. I think Disney has the best handle on it because they republish back catalog in a big way every 10 years or so... making it "new" again to a new group of people. How do you do that for general things like "Bing Crosby" movies or "Rogers and Hammerstein" musicals? Heck, even getting recent Anime published in English in a reasonable time is difficult, or finding material from Electronica/J-Pop scenes due to publishers only wanting to publish "top 10" material.. when the majority of people don't BUY that stuff.. but they all would buy a different part of the top 500 or so songs.

  2. Heavy tail distributions are dangerous beasts by Anonymous Coward · · Score: 5, Interesting

    OK, I am not a mathematician, but this paper makes me deeply skeptical.

    If the input data is indeed heavy tail (non-existing higher moments) or quasi-heavy tail (existing, but extremely large higher moments) how on earth they can use variance, R^2 and other measures? They may not even exist! And if the input is quasi-heavy tail, then of course they exist, but the convergence time could be arbitrarily long!

    I had the unpleasure to work with quasi-heavy-tailed data, and it is really enlightening. You watch the evolution of some metric (e.g.: avg) as the function of incoming data, and you see of course convergence. At least for a while. And then in sudden an extreme outlier comes in, and the avg takes a huge jump! Now if your input is heavy tailed enough, you can be never sure that your measure finally came to rest (converged), or the next jump is just over the corner!

    I hope a more educated person clarifies this, I am just an engineer.

  3. Re:If I understand the Pareto distribution correct by ZombieWomble · · Score: 5, Informative
    What you posted describes the Pareto distribution, yes. However, the Pareto distribution is exactly the opposite of what the "Long Tail" model suggested by Andersen describes.

    The crux of Andersen's argument is that, while Amazon et al have the same demand for big-name titles, their tail is longer and higher than a traditional bookstore, and by defining a cut at a certain point (say, those with less than 5% of the peak sales, those outside the top 10% or whatever is appropriate) it can be seen that the low-volume sales represent a larger fraction of the total sales due to the extreme length of the tail.

    Quoting from the Wikipedia article on the topic:

    In the graph shown above, Amazon's book sales or Netflix's movie rentals would be represented along the vertical axis, while the book or movie ranks are along the horizontal axis. The total volume of low popularity items exceeds the volume of high popularity items.

    Andersen was suggesting that, in the limit of infinite items to sell and negligable stocking costs, much more profit is to be derived from the large number of items that sell a few copies than the few items that sell many copies.

    Indeed, it went further than that, suggesting that as people got used to having more choice, they would begin to shun the "popular" items in favour of more obscure titles, further fattening the tail. But that's even more speculative and somewhat independent of the other economic predictions.

  4. Re:This is completely moronic by Dogtanian · · Score: 4, Funny

    Of course not. You must have it. Now that you have Dewar's represented, should you also add Dickface Brand for half the price?

    You have *no* idea what you're talking about.

    Everyone knows that Dickface Brand is a bourbon, not a scotch.

    --
    "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  5. They redefined the terms and broke the model by petes_PoV · · Score: 4, Interesting
    Since these guys aren't using the same definitions that the original book used to describe hits, niches and long-tails it's really no surprise that they get different results - they've interpreted the data in different ways!

    The thing that's always struck me about the long-tail effect is that you've got to work it, to get value from it. Just having all the books or films by a particular author / actor isn't enough. You have to use that information and have the intelligent algorithms to guide your website visitors (or maybe "entice" would be a better word) to consider those alternate products. Just saying "Uuh, here's all the other stuff that guy's done" isn't enough, it needs enthusiasm and some knowledge of *why* a visitor might like a particular past work. That's where the gold lies: not in the long tail itself, but how you utilitise it.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons