Slashdot Mirror


How 136 People Became 7 Million Illegal File-Sharers

Barence writes "The British government's official figures on the level of illegal file sharing in the UK come from questionable research commissioned by the music industry. The Radio 4 show named More or Less examined the government's claim that 7m people in Britain are engaged in illegal file sharing. The 7m figure actually came from a report written about music industry losses for Forrester subsidiary Jupiter Research. The report was privately commissioned by none other than the UK's music trade body, the BPI. The 7m figure had been rounded up from an actual figure of 6.7m, gleaned from a 2008 survey of 1,176 net-connected households, 11.6% of which admitted to having used file-sharing software — in other words, only 136 people. That 11.6% was adjusted upwards to 16.3% 'to reflect the assumption that fewer people admit to file sharing than actually do it.' The 6.7m figure was then calculated based on an estimated number of internet users that disagreed with the government's own estimate. The wholly unsubstantiated 7m figure was then released as an official statistic."

26 of 313 comments (clear)

  1. Story meaning? by sopssa · · Score: 5, Interesting

    I actually had several feelings about this summery, because:

    1) Usually pro-filesharers try to make it sound like filesharing is usual activity and try go for most or 70-90% user share
    2) The summary tries to paint this study bad because it "downsides" the amount of filesharers
    3) The rant about examining only 1,176 people for the study - in which case the same kind of tv viewer statistics and other studies are made in what case.

    So could someone please explain *why* is it a questionable research. It is like every other study where you study small amount of people and make estimates based on it to reflect whole population. Usually this amount of people also gives somewhat correct results on the whole population. Theres some error margin, but its close enough.

    So what is the point of this story? That statistics researches use only minor subset or people to do their research instead of asking from everyone? They always have.

    1. Re:Story meaning? by Lemmy+Caution · · Score: 5, Insightful

      Because statistics are hard and outrage is easy.

    2. Re:Story meaning? by bertoelcon · · Score: 4, Interesting

      I think this could be summarized under lies. damn lies, and statistics.

      --
      Anything can be found funny, from a certain point of view.
    3. Re:Story meaning? by wizardforce · · Score: 4, Informative

      So could someone please explain *why* is it a questionable research.

      1. the same size is small.. probably too small to make the claims they did. 2. they altered the numbers on an estimate of how many people fileshare on the assumption that the number was under-reported. 3. conflict of interest... it's like the tobacco industry sponsoring studies claiming that smoking doesn't have anything to do with lung cancer... there is significant reason to believe that the study carries significant bias in favor of their conclusion and must at the least be repeated by other sources.

      So what is the point of this story? That statistics researches use only minor subset or people to do their research instead of asking from everyone? They always have.

      N. real statistics researchers know that this study has numerable crippling flaws and should not be held as gospel by anyone. Even a first year stats student can see it. The reason this story is important is that it may influence governmental policy and it's flawed... That's dangerous.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    4. Re:Story meaning? by caffeinemessiah · · Score: 5, Insightful
      Argh, where to begin?

      The summary tries to paint this study bad because it "downsides" the amount of filesharers

      I presume by "downsides" you mean "reduces"? Well the summary says "That 11.6% was adjusted upwards to 16.3% 'to reflect the assumption that fewer people admit to file sharing than actually do it.'" So they actually UPPED the number of filesharers. This is objection #1 to "good research":
      1. You do a survey to objectively measure the support of your hypothesis
      2. The survey of a tiny sample indicates that filesharers are a pretty low percentage
      3. You "adjust" this number -- otherwise known as "fudging the data" -- to better reflect your own hypothesis.

      The same tactics in any scientific endeavor would get your papers retracted, your funding canceled, some sort of disciplinary action initiated, etc.

      The second objection, and this applies to other studies too that try to make grand claims from small samples, is that it's A SMALL SAMPLE. For your survey to be representative, your sample has to be representative. It's also difficult to choose people independently at random, and without that assumption, all your basic statistics fall apart. Perhaps they went through a list of BT subscribers and pulled names at random -- but what if downloaders are overrepresented amongst BT subscribers? What if they only polled home internet users, but then used the "total number of internet users" -- which includes corporate subscribers -- to come up with their 11mil number? There are other possible, non-numerical issues too. What if the respondents confused downloading from bittorent with downloading from iTunes?

      If you want many other examples of "bad science", read Ben Goldacre's blog

      --
      An old-timer with old-timey ideas.
    5. Re:Story meaning? by Trepidity · · Score: 4, Informative

      It doesn't really make sense to claim "sample size is small" for an 1,100-person sample. If the sampling was done in a random, unbiased manner, that size sample gives a margin of error of +/- 3%. If there are flaws in the sampling method, that's another thing, but the sample size alone doesn't seem problematic, unless you need accuracy better than +/- 3%.

    6. Re:Story meaning? by wizardforce · · Score: 4, Interesting

      Oh I forgot to note this... anyway it addition to other potential flaws TFA says

      11.6% of which admitted to having used file-sharing software

      emphasis mine. They admitted to using file sharing software not pirating goods via said software... The study is effectively making the assumption that filesharing = copyright infringement. Also from TFA:

      The 6.7m figure was then calculated based on the estimated number of people with internet access in the UK. However, Jupiter research was working on the assumption that there were 40m people online in the UK in 2008, whereas the Government's own Office of National Statistics claimed there were only 33.9m people online during that year.

      Even if the study did get the sample size correct the conclusion would still be nearly 30% wrong owing to their false assumption of the number of people with net access. neglecting the distinction between filesharing and copyright infringement TFA estimates that the actual number is between ~30 and ~50% lower than the study claims.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    7. Re:Story meaning? by Anonymous Coward · · Score: 5, Insightful

      1. the same size is small.. probably too small to make the claims they did.

      First statistics lesson I ever had, first thing the professor did was make an estimate based on 10 people about the whole population. He was correct, by the way. He went on to rant that anything that uses large amounts of people (by which he meant more than at most a few dozen) was not proper statistics. If you simply count everybody, it should be called "counting", you see, not statistics.

      2. they altered the numbers on an estimate of how many people fileshare on the assumption that the number was under-reported.

      And since they are right that the number turned out to be bigger in other studies, slightly. It seems a reasonable adaptation. It's easy to say it's unreasonable, of course. But they are absolutely correct that the number is most likely smaller. So how much should they adjust it ? Like I said, it seems a reasonable adjustment. Not absurdly high, not absurdly low.

      3. conflict of interest... it's like the tobacco industry sponsoring studies claiming that smoking doesn't have anything to do with lung cancer... there is significant reason to believe that the study carries significant bias in favor of their conclusion and must at the least be repeated by other sources.

      There don't exist studies that have no bias. Either research is funded by companies, or it's funded by government. Both have serious axes to grind, mostly pertaining to political ideology. If business intrest groups would not fund research we'd never have even the semblance of unbiased research that we have.

      By the way, who should pay for studies ? Obviously the government has a vested interest in more legislation. The ifpi (us dept) has a vested interest in creating legal instruments to counteract filesharing. And the filesharers have a vested intrest in more "privacy", and legal instruments against ISPs (for the same reason a thief wants privacy, obviously, let's please not start the "what about those who only share openbsd", we all know that's not the filesharers being talked about).

      How about we do the sane thing, and let all of them fund studies. Then read them all, and see what we believe to be true.

      Just because people are biased, by the way, does not mean the truth can be biased. We are simply limited to imperfect instruments for reading the truth. Truth is absolute, and the number of filesharers is just a single number, not 2, not 5. And yes, we'll probably need a better definition and classification than "filesharers". The effects of filesharing are negative for artists (certainly for pop artists), and especially for the "music industry". There can be little doubt about that. How much damage is done, is anyone's guess. But by criticizing their observations, AND listening to them criticize our observations, we can hope to get closer to the real truth.

    8. Re:Story meaning? by Volante3192 · · Score: 5, Insightful

      And since they are right that the number turned out to be bigger in other studies, slightly. It seems a reasonable adaptation. It's easy to say it's unreasonable, of course. But they are absolutely correct that the number is most likely smaller. So how much should they adjust it ? Like I said, it seems a reasonable adjustment. Not absurdly high, not absurdly low.

      Here's where I find a major problem. You do not fudge your data. Period. These other studies may show higher numbers, but do we have proof they weren't fudged as well?

      There's too many stories about companies performing pharmecutical trials and then throwing the data away because it didn't present a positive light.

      If you're going to adjust numbers, you better have a damn sound reasoning for it rather than "we have a hunch people lied, so..."

    9. Re:Story meaning? by Anonymous Coward · · Score: 4, Informative

      A margin of error of +/- 3% is the Maximum margin of error for a random sample of 1100 drawn from a large enough population at the 95% significance level (actually its really +/-2.95%), i.e this is the margin of error when the observed % is 50% , The margin of error is less when the observed % approaches 0 or 100%.

      In the case of an observed % of 11.6 the margin of error is +/-1.9% so it is 95% likely that the population figure is between 9.8% and 13.5%

    10. Re:Story meaning? by Atario · · Score: 5, Informative

      it's A SMALL SAMPLE

      No, it's not.

      http://www.raosoft.com/samplesize.html

      About 60 million people in the UK, sample size of 1,176, confidence interval of 96% gives a margin of error of 2.99%. So, it's 96% likely that they got within 2.99% of the right answer (to the question of how many people admit to it).

      I hate seeing this "that's too small a sample size" objection to every single study, from people who clearly don't know enough about how sample sizes work.

      --
      "A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
    11. Re:Story meaning? by Anonymous Coward · · Score: 4, Insightful

      The study is effectively making the assumption that filesharing = copyright infringement.

      I have a very hard time believing that the vast majority of people that use any filesharing application do so exclusivley for legit and non-copyright infringing purposes.
      Given the vast quantity of content, I seriously doubt that very many people go through any sort of hassle to determine what is legit and what is not, which results in virtually everyone obtaining material that is copyrighted, regardless whether they know (or care). Given that, I think its a fair guess on their part that yes, most people that claim they are using file-sharing software do so to obtain material illegally.

      I just don't understand the stance that most people on this board seem to take regarding this issue. How can everyone be so supportive of what very obviously amounts to theft? It appears to me that somehow people think it is their "right" to obtain copyrighted material for free. I just don't buy for a second that people who claim to only use file-sharing apps for legitimate purposes only actually do so.

      If you do indeed use all file-sharing applications for 100% legit purposes, please educate me what you use these services for that makes them so very essential to cause these very emotional posts here.

    12. Re:Story meaning? by hot+soldering+iron · · Score: 4, Insightful

      If they're going to "adjust" the numbers, why did they even bother doing the research at all? Why not just come out and say,"We didn't like what the numbers said, so we threw them away and we're making a WAG with some bullshit we're pulling out of our ass." I understand that they're a research (read "marketing") company, and so are constitutionally incapable of telling the plain truth because they could burst into flames, but it would be a new experience. And fun to watch!

      --
      When you want something built, come see me. If you want correct grammar and spelling, get a F*ing liberal arts student.
    13. Re:Story meaning? by Raistlin77 · · Score: 4, Insightful

      most of the anger is directed toward the music/movie industry's response to piracy- weaken/destroy fair use, demonize all p2p [possibly restricting its use in the future out of fear] suing people as a scare tactic, excessive/un-constitutional fines, DRMed media etc...

      ...I don't see why these tactics are unreasonable...

      So, just so that you can protect your "copyrighted content" from being stolen by someone other than me, you believe that it is "reasonable" to use bogus or flawed "research" to fool the government into a) taking away my legal rights (fair use); b) criminalizing software that can be and is used for legal purposes (P2P); c) abuse our legal system (suing people as scare tactic/impose excessive/unconstitutional fines); and d) crippling your "copyrighted content" so that I cannot exercise my right of fair use after I have purchased your "copyrighted content" (DRM/refer back to a) )?

      It is even more difficult to attach a value to the legitimate uses of file-sharing networks, but if you can point me at examples of how file-sharing systems have a positive economic impact on anyone, please let me know.

      Really? So you don't see value in a content provider being able to reduce operating expenses by distributing their content via P2P? Just because you are too lazy to do a simple search using any common search engine doesn't mean such examples don't exist. And why exactly does it have to have a positive economic impact on anyone - why does it have to have any economic impact at all? There are many things that have neither a positive economic impact nor any economic impact whatsoever, should those be illegal too?

    14. Re:Story meaning? by blueg3 · · Score: 4, Insightful

      Just as easily as a random sample can accurately reflect a population as a whole, it can equally be skewed to be a completely inaccurate representation of the real world.

      If by "just as easily" you mean "with an enormously lower probability", then yes. But then, that's what a statement of margin of error says.

      Statistics isn't all that complicated, and what a statistical measure means can be both demonstrated and proven. You don't need to get all faux existential about how "it's all just a bunch of crap, man". You don't know what you're talking about.

      Also, entropy? No such thing as random? Really? Don't inject physical phenomena you clearly don't understand in a discussion about pure mathematics.

  2. What's the confidence interval? by Anonymous Coward · · Score: 4, Insightful

    Whenever you estimate a statistic like that, you should also indicate the level of uncertainty surrounding the estimate. Why are they not reporting the upper and lower bounds of the confidence interval surrounding that estimate?

  3. Meaningless admission by Trevin · · Score: 5, Insightful

    Using file-sharing software does not equate to sharing files illegally. I admit to using BitTorrent to download Fedora ISO's, and there's nothing illegal about that.

    1. Re:Meaningless admission by Blakey+Rat · · Score: 4, Funny

      I asked the British government, but unfortunately they told me you don't actually exist. Sorry.

  4. the story title is kind of lame by Trepidity · · Score: 5, Informative

    Some of the estimation steps might be sketchy, but the basic practice of estimating a population proportion from a sample of that population is not particularly questionable. That's how almost all studies of populations work, because taking censuses of all people in a country is rarely feasible. We have century-old statistical theory on how to put bounds on the sampling error, too, assuming the sample was indeed random.

    You could have a whole slew of these stories if you really objected to that basic methodology, e.g. nearly every estimate of N million people suffering from a disease or disorder is based on a sample.

    1. Re:the story title is kind of lame by Abreu · · Score: 4, Insightful

      Is it ok to change "11.6%" to "16.3%" based on a "hunch"?

      I'm not a statistician, this is an honest question

      --
      No sig for the moment.
    2. Re:the story title is kind of lame by caffeinemessiah · · Score: 5, Insightful

      Is it ok to change "11.6%" to "16.3%" based on a "hunch"? I'm not a statistician, this is an honest question

      IAAS, and the answer is no. That goes for the GP as well -- no one is contesting estimation theory, just that the fundamental assumptions are so grossly unmet in this "study" as to render it meaningless. And as someone else already commented, it's dangerous here because it's going to dictate public policy.

      If you're going to "adjust" your objective findings, based on some bizarre assumption that a certain percentage of people will lie about file sharing, then why do a survey at all if not to create mathematical/sciency-sounding smoke and mirrors?

      --
      An old-timer with old-timey ideas.
  5. mathematics by martas · · Score: 5, Funny

    maybe the authors of the study were taught math skills through unschooling?

  6. Why the BBC rocks by MosesJones · · Score: 5, Insightful

    This is yet another example as to why the BBC is the finest broadcasting and journalistic organisation on the planet (I've never worked for them, sold to them or have any other financial connection other than the license fee).

    They actually investigated something created by an industry group and found it to be bollocks and then reported it. The BBC are arguably the most "socialist" organisation in the democratic world (funded by a tax on everyone for the benefit of everyone) and yet they still question and challenge everything.

    The US seriously needs something that questions vested interests and rubbish statistics as much as the BBC. Jon Stewart and Bill Maher are just comedians and FoxNews is just comedy.

    Given a choice between the first amendment and the BBC, I'll take the BBC; its demonstrated more freedom of speech in a week than the US media has in a decade.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  7. Re:Wait, you believed them? by mdwh2 · · Score: 5, Insightful

    Indeed, let's look at the maths - supposing each person only shares 24 mp3s. By US standards at least, that's a cost of $1.92 million. So with 7 million file sharers, that's $13.44 trillion.

    Now let's check out http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal) - wow, these 7 million people are causing damage to the UK economy equal to almost 5 times the entire GDP of the UK...

  8. Re:If it's bogus, it's probably too low. by Miamicanes · · Score: 4, Interesting

    > Work backwards from the undisputed declining sales figures of the recording industry.

    The main reason for declining sales is the fact that CD sales during the 90s were artificially boosted by people replacing records and tapes with CDs... then replacing them again when remastered CDs were released a few years later. It was a once-in-a-lifetime event for the recording industry that won't be repeated during our lifetimes.

    People re-bought CDs they already owned in analog (or optimized-for-analog CDs) because they represented an epic improvement in quality by just about any meaningful standard over the analog media they replaced. Everything that's come out since CDs has only been cheaper, shittier-sounding, or intolerably-crippled by DRM.

    Here's an idea for the music industry: ditch the DRM'ed formats, and roll out a music format on DVD media with 96KHz 32-bit stereo PCM. Make the discs gold-colored, call it something like "X-fi", and sell them for $24.95. You'll win on all counts -- genX'ers will go back into highschool mode and buy them to show off how rich they are and/or pretend they sound sufficiently better than 16-bit CDs to justify spending ~twice as much on them, and the fact that every disc will be ~4-8 gigabytes will serve as self-limiting DRM for the next decade or so. Just make sure they still have the MOST compelling consumer benefit intact (and reason why people who buy CDs still DO buy CDs): it's a flawless first-generation master to use for making all your "working" copies for everywhere else.

  9. Scoundrel Statistics by anyaristow · · Score: 5, Informative

    Even a first year stats student can see it.

    This is almost as cliche in arguments of statistics as the car analogy is on slashdot, and it's the sign of a scoundrel. If you actually had a first year stat student's understanding of stats you'd know where the weaknesses actually are, and where all the rest of the smoke blown in this discussion goes laghably wrong.

    So let's apply some first year stats to the issue.

    First, the sample size. Whether it is numerically large enough to be useful is a matter not only of it's size but also the number of positive results. IOW, a sample size of 1176 is too small if you found 3 of what you're looking for, but if you found 136 (11.6% of 1176), you have plenty of samples. The question is then only whether you had a representative sample.

    My next concern would be precision. Using data with three or four significant digits (136, 1176) to make conclusions to seven significant digits (11.56463%) is silly, but that doesn't seem to have happened here. The only number in all of this that is fishy is the 16.3% number. To get three significant digits they'd have to know the number of lying households to that precision. If they had another study that determined this number they might very well have a number to that precision, but I'm assuming they just guessed.

    That's still not a problem. If you guess, you run your confidence interval through your formulae (here it's a simple product) to put a range on your results. If it's a from-your-ass guess you might put a 100% failure estimate on your low end (i.e. there might be no lying households at all) to arrive at a conservative range. Here, it looks like they used an estimate of 40%. They should have (and might have; I didn't RTFA) run the un-adjusted 11.6% through the formulae to get a conservative low-end range.

    Anyway, the number they finally used was 7%. One significant digit. That doesn't imply the same precision as, say, 6.7% would. In fact, if their figure for the number of lying households really was accurate to one digit (i.e. 35-45%) then rounding their final result to one digit was the correct procedure. If it was just a guess they should have run the absolute low estimate (probably, zero lying households) through to get a range.

    So, with actual first year stat knowledge it's possible to actually state what might be wrong with the study, and not resort to "any first year stat student" hand-waving. It's clear that the most-cited criticism (the sample size) is the result of ignorance and group think, not actual knowledge of statistics.