iTunes Sales Not 'Collapsing' After All
john82 writes "Earlier this month we had a report from Forrester, based on a random sampling of 2,000 credit card accounts, that purported to show that iTunes sales were crashing. Now comes another survey from Reston, VA-based ComScore which indicates the exact opposite. ComScore's report which is based on actual iTunes sales shows a 84% increase during the first nine months of this year compared to the same period last year. Meanwhile the author of the Forrester report, Josh Bernoff, noted in his blog yesterday that they shouldn't be pummeled just because everyone took what he wrote and ran with it."
ComScore. With a reputation like theirs, it must be true!
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Maybe this is obvious to people and maybe this isn't, but I thought I'd just clarify this in more standard lingo.
The sample size you need to take for doing the study is dependent upon the probability that you expect the event to occur. So for example, out of 1000 random purchases, how many do you expect to be iTunes purchases? Most people buy a lot of things on their credit cards. So my guess is that only maybe 5 out of 1000 purchases would be iTunes purchases. The rest would be clothes, gas, groceries, restaurants meals, movies, gifts, etc, etc, etc.
Let's say I'm right. If the expected value is 5 out of 1000, what are the odds that I might find 6 or 4 purchases in that sample? Well, depending on the distribution, it's not going to be that unusual. Remember, the *average* number you find will probably be about 5. If you actually look at 1000 random purchases, the actual amount you find will vary.
So you might find 4 or even 3 with a pretty high probability (don't know off the top of my head what that probability is -- especially since I don't know the distribution of the data). So you have a pretty high probability of reporting something like 0.3% of purchases are iTunes purchases, when the real value is 0.5%. That's a *huge* error.
But as others have said, the guys that do these studies know their stats. They don't put out crap reports by accident. They are intentionally misleading. Any reputable report that is based on statistical analysis will give you the error bars (i.e. + or - 5% 9 times out of 10). If this report had done this it would have said something like 65% reduction in sales +- 10% 1 time out of 10 (i.e. they aren't confident about their interval) or 65% reduction in sales +- 150% 9 times out of 10 (i.e. the error bars are totally crazy). And then it would be obvious the study was totally bogus.
Note: All numbers I've used are fictional. I took stats 20 years ago and I *really* don't remember any of the actual numbers...
Actually, the sample size of 1,000 was probably fine, or at least it would have been if they had used a truly random sample of credit cards. However, it is evident from their results, that they didn't. The failure was in trying to extrapolate results from data that wasn't statistically valid.
"Seriously though, did you *really* think that a sample size of just over 1000 purchases on credit cards obtained through a back channel source is a reliable sample size for the number of iTunes purchases?... Thats just high school statistics by the way..."
m
I'm a college professor of statistics. I don't think you can actually quote a high school statistics book which says that sample size is too small. In general, a sample size of 1,000 gives 95% confidence that your result is within +/-3% of the actual result. This is *regardless* of population size - that's how statisatics work, due to the Central Limit Theorem.
http://en.wikipedia.org/wiki/Sample_size
http://en.wikipedia.org/wiki/Central_limit_theore
Now, the first thing that pops into my head is why only credit-card purchases? And even more fundamentally, why would the same people need to buy music, after they just went on a music-buying spree? I would think the opposite. That was the thing that made me skeptical of the report yesterday in the first place.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
1000 would be borderline statistically insignificant. If you read the post, he actually admitted that out of those, he only really used 181. Less than two hundred users out of 6 million!? And he has the nerve to blame everyone else for "misreporting" his findings? Saying you had any significant findings in a pool that ridiculously small, without any research into those customers' other possible methods for puchasing, is ridiculous.
You are focusing on the number when I should have put the emphasis on how the sample was selected. how about "did you *really* think that a sample size of just over 1000 purchases on credit cards obtained through a back channel source is a reliable sample size for the number of iTunes purchases?..."
As one professor to another I am sure you also teach sampling error and experimental design, right? Additionally, it should be noted that the actual samples used for the analysis out of the total records pulled was less than 200. What does that do to power?
Visit Jonesblog and say hello.
For example, if your sample is 1000, your 95% confidence interval is 1/sqrt(1000) = +/-3%. So if your 1000 samples showed 250 occurrences, you would know that it's 95% likely that the frequency of occurrence is between 22% and 28%. So the real frequency could be between 220 occurrences or 280 occurrences per thousand. No big deal for year to year comparison purposes. Worst case a 50% drop in sales is measurable because one year you could've been low (220), and the next year high (280/2 = 140), and the change is still statistically significant (outside your confidence interval).
For rare phenomena, this runs into a problem. Say the frequency of occurrence is 0.1%. You take 1000 samples and you measure 1 occurrence. The neophyte statistics student will say "Cool, I meansured 1 occurrence +/- 3%, so I have 95% confidence that the actual rate of occurrences is between 0.97 per thousand and 1.03 per thousand." Unfortunately, that's wrong.
The confidence interval is based on the percentage you measured. Your confidence interval says there's a 95% chance that the actual frequency of occurrence lies between 0% and 3.1%. There is a huge, huge difference between 1 incident in a thousand and 31 incidents in a thousand, especially if you're trying to compare between two samples. One sample (year 2005) you might get 25. Next sample (year 2006) you might get 5. These are both within your confidence interval, but if you're not careful you would erroneously conclude that you have 95% confidence that sales plummeted to just 20% that of the previous year.
Put simply, if you want to accurately measure rare phenomenon, your sample size has to be large enough that your confidence interval is significantly smaller than the rate at which that phenomenon occurs. If iTunes sales account for 0.1% of all credit card sales (which I think is a very high estimate) and you want to compare year to year changes, you probably want an accuracy of at least 1/10th the 0.1%, or a margin of error of +/- 0.01%. Your sample size needs to be large enough that your confidence interval is around the 0.01% range. That is, you need a sample size of a 100 million credit card transactions.