"Long Tail Effect" Doesn't Work As Advertised, Say Wharton Researchers
Death Metal writes "In a working paper titled, 'Is Tom Cruise Threatened? Using Netflix Prize Data to Examine the Long Tail of Electronic Commerce,' Wharton Operations and Information Management professor Serguei Netessine and doctoral student Tom F. Tan pull information from the movie rental company Netflix to explore consumer demand for smash hits and lesser-known films. Netflix made its data available as part of a $1 million prize competition to encourage the development of new ways that will improve its ability to introduce customers to lesser-known titles they might find appealing." In short, the researchers say that the Long Tail effect described by Chris Anderson is much less important in the real world than popularly held. Says the article: "The key difference between the opinion of [Anderson's] book and the study by Wharton researchers is how they define 'hits' and 'niches.' In the book, Anderson focuses on the definition of hits in absolute terms such as the top 10 or top 1,000 products, while Netessine and Tan argue that, to take growing product variety into account, one has to define popularity in relative terms, such as the top 1% or top 10% of products, to properly assess the presence or absence of the Long Tail."
The long tail doesn't threaten those at the top any more than it isolates those at the bottom. It only describes the shape of the market which necessarily has only a few specific market products which are used by the majority and the rest of the products with very few customers in the "long tail". It's a market definition, not a competition definition.
You can cut the tail off of a gecko at any point, but it doesn't mean that somehow the tail can exist without a fat end and a thin end. Since the tail is simply the appendage attached to the abdomen, wherever it is attached defines its fat end, and where it ends is the thin end. Even if you cut the tail off completely, all that you've done is stimulated the tail regrowth reflex.
If you add an insignificant product to the end of the tail, it obviously increases the proportion of market share of the first X% of products. That's simple math!* If your model of the "long tail" completely fails in the simple case of adding a once-purchased product, maybe your model sucks and Chris Anderson's model was more useful. * Yeah, there's a small requirement of proportional market share of the Xth percentile product vs the insignificant one, but no need to nitpick that.
OK, I am not a mathematician, but this paper makes me deeply skeptical.
If the input data is indeed heavy tail (non-existing higher moments) or quasi-heavy tail (existing, but extremely large higher moments) how on earth they can use variance, R^2 and other measures? They may not even exist! And if the input is quasi-heavy tail, then of course they exist, but the convergence time could be arbitrarily long!
I had the unpleasure to work with quasi-heavy-tailed data, and it is really enlightening. You watch the evolution of some metric (e.g.: avg) as the function of incoming data, and you see of course convergence. At least for a while. And then in sudden an extreme outlier comes in, and the avg takes a huge jump! Now if your input is heavy tailed enough, you can be never sure that your measure finally came to rest (converged), or the next jump is just over the corner!
I hope a more educated person clarifies this, I am just an engineer.
how does this effect me?
I just want to thank the submitter/editor for providing the link to Wikipedia for those of us who don't know what's meant by the Long Tail. As it happens, I do know what the "long tail" is, but one of the more tiring aspects of SlashDot is the number of narrow articles that hit the front page that wholly lack any sort of description.
The road to tyranny has always been paved with claims of necessity.
Then the 80/20-rule is just a good rule of thumb.
If we have a simple hyperbolic distribution (which is a special case of Pareto), then adding more elements to the set and waiting for the distribution to renormalize as hyperbolic increases the relative weight of the top 20%. So if you have a big online retailer like Amazon with more titles than a conventional bookstore, then you can expect the top 20% sellers on Amazon generating a bigger part of all sales of Amazon than the top 20% of a bookstore in relation to all sales of said bookstore.
The crux of Andersen's argument is that, while Amazon et al have the same demand for big-name titles, their tail is longer and higher than a traditional bookstore, and by defining a cut at a certain point (say, those with less than 5% of the peak sales, those outside the top 10% or whatever is appropriate) it can be seen that the low-volume sales represent a larger fraction of the total sales due to the extreme length of the tail.
Quoting from the Wikipedia article on the topic:
In the graph shown above, Amazon's book sales or Netflix's movie rentals would be represented along the vertical axis, while the book or movie ranks are along the horizontal axis. The total volume of low popularity items exceeds the volume of high popularity items.
Andersen was suggesting that, in the limit of infinite items to sell and negligable stocking costs, much more profit is to be derived from the large number of items that sell a few copies than the few items that sell many copies.
Indeed, it went further than that, suggesting that as people got used to having more choice, they would begin to shun the "popular" items in favour of more obscure titles, further fattening the tail. But that's even more speculative and somewhat independent of the other economic predictions.
It's Zipfian.
I'm picturing the demand curve as an exponential, shifted so that it intercepts both the x and y axes. There's a lot of demand for the most popular items, and declining demand for less and less popular ones. By definition, of course, but the shape of the curve matters. No matter how far out you go, there's always somebody who'll want it (given a large enough population).
For a traditional bookstore, the x axis hits the curve pretty high. There's a substantial cost to stock each book, say $2.00/year. There's also a fairly small local demand, say 200 copies a week for a John Grisham novel. Only a few thousand titles sell fast enough to make a profit before that $2.00/year eats up the sale price minus wholesale price.
For a mail-order/online bookstore, the cost to stock each book is lower since you only need a warehouse instead of reading stacks + comfy chairs + cashiers + parking. The cost to stock each book could drop to $0.50/year. The demand is now national, so that same John Grisham novel sells 20,000 copies a week. And a title that sold once a year in a traditional store now sells twice a week. So, many more titles can beat the clock and turn a profit.
The shape of demand didn't change. In both cases it's an exponential cut off at the point of profitability. But that point is now much farther out along the x axis. So the online retailer can make money selling stuff that would never survive in a traditional store. And customers can find stuff online that they'd be lucky to ever see locally.
The thing that's always struck me about the long-tail effect is that you've got to work it, to get value from it. Just having all the books or films by a particular author / actor isn't enough. You have to use that information and have the intelligent algorithms to guide your website visitors (or maybe "entice" would be a better word) to consider those alternate products. Just saying "Uuh, here's all the other stuff that guy's done" isn't enough, it needs enthusiasm and some knowledge of *why* a visitor might like a particular past work. That's where the gold lies: not in the long tail itself, but how you utilitise it.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
From my own experience, I sometimes get an obscure book because I have a particular reason to get that specific title (be it for the subject or the writer). I listen to obscure music because it sits somewhere in my playlist and the player is on when I'm doing something else.
Reading a novel takes time but I find it no problem to put a book down while I prefer to watch a movie from start to finish in one go. Books on a particular subject I read when I have the time and the interest, or else use them as reference material for when I need to know something specific. Watching a movie requires my time and attention, it's something I plan rather than just listen to some music or read a chapter because I haven't got something better to do.
When I do sit down to watch a movie, I tend to stay on the safe side, try to get the highest chance of being entertained. That may well be with an obscure movie but more often it's with a reasonably wellknown title. But then, I can't remember when I last bought or rented a movie in stead of downloading so my consumption won't show up in any shop's statistic at all.
"I'm not much interested in interoperability. I want substitutability. I want to be able to throw your software out."
I thought that I would have to re-design all of my amplifiers.
All the recent brouhaha about the "long tail" doesn't merely relate to the shape of the distribution (which has been known for a long time); it's about inferences people draw from that shape.
A lot of the inferences I have seen are unwarranted, and papers like this come to the same conclusion.
Parent has it right: What you read about is that the film cost X in production costs and raked in Y in box office receipts, where Y > X. But from what I understand, no film has ever officially made a profit: it gets eaten up by all kinds of Byzantine expenses. The music industry is slightly less corrupt -- really big bands actually get a profit -- but it's incredible to read about how little a smash hit will make for the actual musicians who created and performed it.
Of course, that has nothing to do with the discussion at hand. And as others have said, that debate boils down to traditional (limited shelf space, therefore absolute numbers) versus online (unlimited shelf space, therefore percentages) sales models.
In the book, Anderson focuses on the definition of hits in absolute terms such as the top 10 or top 1,000 products, while Netessine and Tan argue that, to take growing product variety into account, one has to define popularity in relative terms, such as the top 1% or top 10% of products, to properly assess the presence or absence of the Long Tail.
So let me see if I get this:
Anderson says, "Assume a bell curve with a sufficiently large ordinal scale X axis. Select a sufficiently narrow segment in the center. The area under the curve not included in that segment will be greater than the area under the curve included in that segment."
Netissine and Tan respond, "However: Select a segment in the center whose width is a fixed percentage of the ordinal scale of the X axis. Select a percentage which is sufficiently large. Now the area under the curve in the included segment will be greater than the area not included."
For this, Anderson gets a best selling book, and two Wharton academics get a paper out of stating the counterpoint?!?
My brother went to Wharton. He is extremely smart. I know many of his friends from school -- they too are very intelligent folks. Wharton is not a kidding around school, it is pretty hard-core.
My brother and his friends from school mostly make stunning amounts of money. Many of them work for banks in the sorts of positions which, particularly given the recent total failure of banks at their primary wealth-creating task (risk management), might lead one to wonder if they are really responsible for enough sustainable GDP growth to justify their extraordinary compensation. To correct the first sentence in this paragraph, my brother used to make stunning amounts of money. A few years ago this very conundrum led my brother to retire, because he could not live with the disproportion between his production and his compensation. Most of his friends from school are not so infected with ethics.
Seeing this article, and the startling inconsequence of these supposed shining stars of business academia, I am inclined to agree with my brother's conclusion. And to reinforce my belief that we have, over the past 40 years, skewed the distribution of wealth toward the supposed best and brightest business thinkers, and away from all other areas of production, too heavily. Whether we, in chanting the mantra of ensuring that the business analysts and risk managers get fully compensated and motivated regardless of how outsized their compensation may appear to we mere mortals, have pushed the system much too far in that direction at the expense of compensation and motivation for those who are not business analysts and risk managers. Whether there may be forces at work which already influence cashflow in their direction, and our supposed levelling of the tax code has instead removed the normalizing force that was preventing an unhealthy portion of GDP from flowing to those with abnormally high influence on the flow of GDP -- but who, recent evidence suggests, are not really so extraordinary in their contribution to it.
Stop-Prism.org: Opt Out of Surveillance
Well put.
--My 'favorite' instances are when an article is loaded up with acronyms with no explanation. It happens everywhere, not just Slashdot, that I find myself scanning an article in reverse looking for a bloody definition --and about as often as not, never finding one. Something about that just makes me steam.
Of course I can always wiki a definition myself, but the nice thing about having one linked directly from an article description is that I can be reasonably assured that the author is using the SAME definition as the one linked.
Ask ten people to define a relatively simple noun and you'll get ten totally different definitions, half of which are so incompatible that ridiculous flame wars can erupt over different readings of the exact same sentence.
Cheers!
-FL
I thought the long tail was significantly discredited a while ago. Lets check google. hmm, not entirely. however, the guy who wrote it keeps coming up with new ideas which get a lot of attention and even praise before cooler heads actually think about it properly. The FT took a look at Chris Anderson's book freemium and, well, read John Gapper and his follow-up questions.
Where do you get that 50 to 100 titles data from? I can line 50 DVD cases (spine out) on my tiny 3 foot wide desk. Surely a STORE has more shelf space than THAT!
No matter how large a catalog of content that Netflix and Amazon have, the real challenge is giving their users the tools to find the content. For movies, you can search for one or more parameters like categories, actors, awards, directors and even member favorites. Apple has tried to add some assistance for music with their "Genius" service, but each movie, song or book is like a painting - each one unique, and each can be identified using many descriptions.
For travel, you have the advantage of narrowing your choices by the general area you'll be visiting. If you're traveling to New York City, you don't have to consider any hotels that are in Los Angeles. Still, New York City has lots of hotels to choose from, and when you're looking on Expedia or Travelocity, they can show you page after page of potential places. Even when you narrow your choices, you still have to look and see where the hotels are located, and see if that fits in with the rest of your trip.
We created Where's URL to address the location problem. We show you where everything is, and give you the ability to filter by category (Lodging, Food/Drink, Attractions, etc.) and sub-category (hotel, B&B, art museum, etc.). By letting you choose places by their location, we have evened the playing field for those businesses on the tail-end of the Long Tail of Travel. When you search for hotels in New York, you can easily find the ones that are across from Central Park, or near SoHo.
There's always been a long tail in the demand curve, and it always will.
What we need are useful recommendations that guide us from the head to the hidden treasures located along the tail area.
Also, I find very disappointing that none (Wharton, Anderson, etc.) uses the Long Tail model proposed by Kalevi Kilkki ( http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1832/1716 ). This model has a better formal definition of the Head, Mid and Tail parts of the curve (not based neither on absolute nor on %), based on how to split the (log) x-axis.
[shameless plug] I did a PhD named "Music Recommendation and Discovery in the Long Tail" http://www.iua.upf.es/~ocelma/PhD/index.html
So, I also did some boring analyses about the Long Tail in the music (recommendation) domain.