Slashdot Mirror


How 136 People Became 7 Million Illegal File-Sharers

Barence writes "The British government's official figures on the level of illegal file sharing in the UK come from questionable research commissioned by the music industry. The Radio 4 show named More or Less examined the government's claim that 7m people in Britain are engaged in illegal file sharing. The 7m figure actually came from a report written about music industry losses for Forrester subsidiary Jupiter Research. The report was privately commissioned by none other than the UK's music trade body, the BPI. The 7m figure had been rounded up from an actual figure of 6.7m, gleaned from a 2008 survey of 1,176 net-connected households, 11.6% of which admitted to having used file-sharing software — in other words, only 136 people. That 11.6% was adjusted upwards to 16.3% 'to reflect the assumption that fewer people admit to file sharing than actually do it.' The 6.7m figure was then calculated based on an estimated number of internet users that disagreed with the government's own estimate. The wholly unsubstantiated 7m figure was then released as an official statistic."

71 of 313 comments (clear)

  1. Story meaning? by sopssa · · Score: 5, Interesting

    I actually had several feelings about this summery, because:

    1) Usually pro-filesharers try to make it sound like filesharing is usual activity and try go for most or 70-90% user share
    2) The summary tries to paint this study bad because it "downsides" the amount of filesharers
    3) The rant about examining only 1,176 people for the study - in which case the same kind of tv viewer statistics and other studies are made in what case.

    So could someone please explain *why* is it a questionable research. It is like every other study where you study small amount of people and make estimates based on it to reflect whole population. Usually this amount of people also gives somewhat correct results on the whole population. Theres some error margin, but its close enough.

    So what is the point of this story? That statistics researches use only minor subset or people to do their research instead of asking from everyone? They always have.

    1. Re:Story meaning? by Lemmy+Caution · · Score: 5, Insightful

      Because statistics are hard and outrage is easy.

    2. Re:Story meaning? by bertoelcon · · Score: 4, Interesting

      I think this could be summarized under lies. damn lies, and statistics.

      --
      Anything can be found funny, from a certain point of view.
    3. Re:Story meaning? by wizardforce · · Score: 4, Informative

      So could someone please explain *why* is it a questionable research.

      1. the same size is small.. probably too small to make the claims they did. 2. they altered the numbers on an estimate of how many people fileshare on the assumption that the number was under-reported. 3. conflict of interest... it's like the tobacco industry sponsoring studies claiming that smoking doesn't have anything to do with lung cancer... there is significant reason to believe that the study carries significant bias in favor of their conclusion and must at the least be repeated by other sources.

      So what is the point of this story? That statistics researches use only minor subset or people to do their research instead of asking from everyone? They always have.

      N. real statistics researchers know that this study has numerable crippling flaws and should not be held as gospel by anyone. Even a first year stats student can see it. The reason this story is important is that it may influence governmental policy and it's flawed... That's dangerous.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    4. Re:Story meaning? by Loomismeister · · Score: 2, Informative

      The point isn't that they surveyed a small group of people and therefore the statistics aren't significant. If you RTFA you would see that they based the 7m number on the false statistic that 40m some people were using the internet that year when there was really only like 33.9m. They also bumped up the percentage of filesharing people based on the assumption that some people lied about whether they had programs like that or not. Really the lesson here is to read the featured articles because the slashdot summaries as a general rule are misleading.

    5. Re:Story meaning? by caffeinemessiah · · Score: 5, Insightful
      Argh, where to begin?

      The summary tries to paint this study bad because it "downsides" the amount of filesharers

      I presume by "downsides" you mean "reduces"? Well the summary says "That 11.6% was adjusted upwards to 16.3% 'to reflect the assumption that fewer people admit to file sharing than actually do it.'" So they actually UPPED the number of filesharers. This is objection #1 to "good research":
      1. You do a survey to objectively measure the support of your hypothesis
      2. The survey of a tiny sample indicates that filesharers are a pretty low percentage
      3. You "adjust" this number -- otherwise known as "fudging the data" -- to better reflect your own hypothesis.

      The same tactics in any scientific endeavor would get your papers retracted, your funding canceled, some sort of disciplinary action initiated, etc.

      The second objection, and this applies to other studies too that try to make grand claims from small samples, is that it's A SMALL SAMPLE. For your survey to be representative, your sample has to be representative. It's also difficult to choose people independently at random, and without that assumption, all your basic statistics fall apart. Perhaps they went through a list of BT subscribers and pulled names at random -- but what if downloaders are overrepresented amongst BT subscribers? What if they only polled home internet users, but then used the "total number of internet users" -- which includes corporate subscribers -- to come up with their 11mil number? There are other possible, non-numerical issues too. What if the respondents confused downloading from bittorent with downloading from iTunes?

      If you want many other examples of "bad science", read Ben Goldacre's blog

      --
      An old-timer with old-timey ideas.
    6. Re:Story meaning? by Trepidity · · Score: 4, Informative

      It doesn't really make sense to claim "sample size is small" for an 1,100-person sample. If the sampling was done in a random, unbiased manner, that size sample gives a margin of error of +/- 3%. If there are flaws in the sampling method, that's another thing, but the sample size alone doesn't seem problematic, unless you need accuracy better than +/- 3%.

    7. Re:Story meaning? by wizardforce · · Score: 4, Interesting

      Oh I forgot to note this... anyway it addition to other potential flaws TFA says

      11.6% of which admitted to having used file-sharing software

      emphasis mine. They admitted to using file sharing software not pirating goods via said software... The study is effectively making the assumption that filesharing = copyright infringement. Also from TFA:

      The 6.7m figure was then calculated based on the estimated number of people with internet access in the UK. However, Jupiter research was working on the assumption that there were 40m people online in the UK in 2008, whereas the Government's own Office of National Statistics claimed there were only 33.9m people online during that year.

      Even if the study did get the sample size correct the conclusion would still be nearly 30% wrong owing to their false assumption of the number of people with net access. neglecting the distinction between filesharing and copyright infringement TFA estimates that the actual number is between ~30 and ~50% lower than the study claims.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    8. Re:Story meaning? by Anonymous Coward · · Score: 5, Insightful

      1. the same size is small.. probably too small to make the claims they did.

      First statistics lesson I ever had, first thing the professor did was make an estimate based on 10 people about the whole population. He was correct, by the way. He went on to rant that anything that uses large amounts of people (by which he meant more than at most a few dozen) was not proper statistics. If you simply count everybody, it should be called "counting", you see, not statistics.

      2. they altered the numbers on an estimate of how many people fileshare on the assumption that the number was under-reported.

      And since they are right that the number turned out to be bigger in other studies, slightly. It seems a reasonable adaptation. It's easy to say it's unreasonable, of course. But they are absolutely correct that the number is most likely smaller. So how much should they adjust it ? Like I said, it seems a reasonable adjustment. Not absurdly high, not absurdly low.

      3. conflict of interest... it's like the tobacco industry sponsoring studies claiming that smoking doesn't have anything to do with lung cancer... there is significant reason to believe that the study carries significant bias in favor of their conclusion and must at the least be repeated by other sources.

      There don't exist studies that have no bias. Either research is funded by companies, or it's funded by government. Both have serious axes to grind, mostly pertaining to political ideology. If business intrest groups would not fund research we'd never have even the semblance of unbiased research that we have.

      By the way, who should pay for studies ? Obviously the government has a vested interest in more legislation. The ifpi (us dept) has a vested interest in creating legal instruments to counteract filesharing. And the filesharers have a vested intrest in more "privacy", and legal instruments against ISPs (for the same reason a thief wants privacy, obviously, let's please not start the "what about those who only share openbsd", we all know that's not the filesharers being talked about).

      How about we do the sane thing, and let all of them fund studies. Then read them all, and see what we believe to be true.

      Just because people are biased, by the way, does not mean the truth can be biased. We are simply limited to imperfect instruments for reading the truth. Truth is absolute, and the number of filesharers is just a single number, not 2, not 5. And yes, we'll probably need a better definition and classification than "filesharers". The effects of filesharing are negative for artists (certainly for pop artists), and especially for the "music industry". There can be little doubt about that. How much damage is done, is anyone's guess. But by criticizing their observations, AND listening to them criticize our observations, we can hope to get closer to the real truth.

    9. Re:Story meaning? by Volante3192 · · Score: 5, Insightful

      And since they are right that the number turned out to be bigger in other studies, slightly. It seems a reasonable adaptation. It's easy to say it's unreasonable, of course. But they are absolutely correct that the number is most likely smaller. So how much should they adjust it ? Like I said, it seems a reasonable adjustment. Not absurdly high, not absurdly low.

      Here's where I find a major problem. You do not fudge your data. Period. These other studies may show higher numbers, but do we have proof they weren't fudged as well?

      There's too many stories about companies performing pharmecutical trials and then throwing the data away because it didn't present a positive light.

      If you're going to adjust numbers, you better have a damn sound reasoning for it rather than "we have a hunch people lied, so..."

    10. Re:Story meaning? by jrumney · · Score: 2

      What does an "error of 3%" mean? Does it perhaps mean there is only a 50% chance (assuming normal distribution) that the proportion of filesharers in the total population is somewhere between 8.6% and 14.6%?

    11. Re:Story meaning? by Trepidity · · Score: 3, Informative

      Basically, except that the confidence level for the interval is 95%, not 50%. Should've quoted that, but 95% is the usual assumed one.

    12. Re:Story meaning? by vikstar · · Score: 2, Insightful

      Every politician should undergo a statistics examination as a prerequisite.

      --
      The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
    13. Re:Story meaning? by Anonymous Coward · · Score: 4, Informative

      A margin of error of +/- 3% is the Maximum margin of error for a random sample of 1100 drawn from a large enough population at the 95% significance level (actually its really +/-2.95%), i.e this is the margin of error when the observed % is 50% , The margin of error is less when the observed % approaches 0 or 100%.

      In the case of an observed % of 11.6 the margin of error is +/-1.9% so it is 95% likely that the population figure is between 9.8% and 13.5%

    14. Re:Story meaning? by mysidia · · Score: 2, Funny

      In this case, 3% is >1.17 million people.

    15. Re:Story meaning? by Atario · · Score: 5, Informative

      it's A SMALL SAMPLE

      No, it's not.

      http://www.raosoft.com/samplesize.html

      About 60 million people in the UK, sample size of 1,176, confidence interval of 96% gives a margin of error of 2.99%. So, it's 96% likely that they got within 2.99% of the right answer (to the question of how many people admit to it).

      I hate seeing this "that's too small a sample size" objection to every single study, from people who clearly don't know enough about how sample sizes work.

      --
      "A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
    16. Re:Story meaning? by Anonymous Coward · · Score: 4, Insightful

      The study is effectively making the assumption that filesharing = copyright infringement.

      I have a very hard time believing that the vast majority of people that use any filesharing application do so exclusivley for legit and non-copyright infringing purposes.
      Given the vast quantity of content, I seriously doubt that very many people go through any sort of hassle to determine what is legit and what is not, which results in virtually everyone obtaining material that is copyrighted, regardless whether they know (or care). Given that, I think its a fair guess on their part that yes, most people that claim they are using file-sharing software do so to obtain material illegally.

      I just don't understand the stance that most people on this board seem to take regarding this issue. How can everyone be so supportive of what very obviously amounts to theft? It appears to me that somehow people think it is their "right" to obtain copyrighted material for free. I just don't buy for a second that people who claim to only use file-sharing apps for legitimate purposes only actually do so.

      If you do indeed use all file-sharing applications for 100% legit purposes, please educate me what you use these services for that makes them so very essential to cause these very emotional posts here.

    17. Re:Story meaning? by Runaway1956 · · Score: 2, Insightful

      Yes, it makes a difference. When the lobbyists stand in front of lawmakers, those lawmakers want to know the real size of the problem. If the industry's lobbyists have to say, "We think we are losing almost a million pounds each and every year to piracy", lawmakers are going to be mildly concerned. However, if they lie, and claim that they are losing BILLIONS of pounds, those lawmakers realize that the tax collectors are losing a huge sum of money.

      When you want action, you always exaggerate your losses and/or the governments benefit.

      I think that claims in the us are 42 billion dollars lost annually. I followed THOSE studies back once, to find where the figures came from. That number is totally unsubstantiated as well, almost entirely based on guesses, estimates, and even false assumptions. One study after another cites the previous study, and almost no one knows where that 42 billion dollar figure came from, but it's impressive, so everyone continues to quote it.

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    18. Re:Story meaning? by vivaelamor · · Score: 2, Insightful

      Nice calculator, I think the GP's main point though was that there is no evidence of a properly selected sample. You would be right in saying that the sample size has very little to do with anything compared to whether the sample is biased or not.

    19. Re:Story meaning? by vivaelamor · · Score: 3, Insightful

      To me, the number is meaningless in itself. The fact that government agencies have been using the number is the issue. Either they knew that the number was wrong or they didn't bother checking it. Both possibilities can point to incompetence or malice and reflect very badly on the people responsible.

      You might be happy with government by making shit up and gut feelings but for the rest of us this is a good example of why government has no respect.

    20. Re:Story meaning? by wizardforce · · Score: 3, Informative

      I just don't understand the stance that most people on this board seem to take regarding this issue. How can everyone be so supportive of what very obviously amounts to theft?

      not everyone does obviously... most reasonable slashdotters advocate for reformed copyright pertly because of the unenforceable nature of longer copyright terms. many such as myself support the concept of a shorter more reasonable copyright term that does what the constitution requires: encourage the advancement of the arts.

      If you do indeed use all file-sharing applications for 100% legit purposes, please educate me what you use these services for that makes them so very essential to cause these very emotional posts here.

      most of the anger is directed toward the music/movie industry's response to piracy- weaken/destroy fair use, demonize all p2p [possibly restricting its use in the future out of fear] suing people as a scare tactic, excessive/un-constitutional fines, DRMed media etc...

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    21. Re:Story meaning? by stuartdb · · Score: 2, Interesting

      I'm currently doing a stats paper at the moment (still basic stuff). I never thought I would actually use anything from it in the real world but meh....

      Correct me if I'm wrong, but the calculations for sample size are only correct if it is actually a true SRS (Simple Random Sample) I couldn't find a link to the actual paper in the article but I think it would be safe to assume but the sample taken would not be a true SRS. It is more likely that this would resemble a self selecting sample, if that is the case, the calculations don't apply.

      I'm not saying the data is incorrect, but without more detailed information (perhaps this information is in the actual paper) it is hard to make a conclusion. What is clear though is that the linked article is pretty much FUD.

    22. Re:Story meaning? by hot+soldering+iron · · Score: 4, Insightful

      If they're going to "adjust" the numbers, why did they even bother doing the research at all? Why not just come out and say,"We didn't like what the numbers said, so we threw them away and we're making a WAG with some bullshit we're pulling out of our ass." I understand that they're a research (read "marketing") company, and so are constitutionally incapable of telling the plain truth because they could burst into flames, but it would be a new experience. And fun to watch!

      --
      When you want something built, come see me. If you want correct grammar and spelling, get a F*ing liberal arts student.
    23. Re:Story meaning? by dlthomas · · Score: 2, Insightful

      It does not "obviously" amount to theft. It *is* illicit, and it may be immoral (see Free Rider Problem), but it is not theft. If I steal 10 M&Ms from you, you have 10 fewer M&Ms - not the case if I download your song, in which case you have less than you otherwise would have *if and only if* I would otherwise have paid for it. This clearly is not the case for, say, college students with tens of thousands of dollars "worth" of media on their hard drive.

      As for legal uses of "file-sharing" technologies, well - how about the entire world-wide web? We're sharing files...

      Specifically P2P file-sharing technologies? Linux ISOs and WoW updates, to name two common legal uses.

      Finally, I for one have an emotional reaction to assertions that technology should be restricted unless I can make you understand what it is for - and I don't even personally use any P2P software at the moment.

    24. Re:Story meaning? by Bigjeff5 · · Score: 2, Informative

      The second objection, and this applies to other studies too that try to make grand claims from small samples, is that it's A SMALL SAMPLE. For your survey to be representative, your sample has to be representative. It's also difficult to choose people independently at random, and without that assumption, all your basic statistics fall apart. Perhaps they went through a list of BT subscribers and pulled names at random -- but what if downloaders are overrepresented amongst BT subscribers?

      You don't seem to understand the way good polling and statistics work. If you already have solid data on the demographic makeup of your population, it does not take a very large sample size at all to get accurate results. A sample size of 1000+ is more than enough to come within 3% accuracy (plus or minus) for any given study provided you already have good demographic information. To be accurate with a small sample size, you do NOT want to choose your survey takers at random, at least not completely. Specifically who takes the survey is random, but where they come from, what income level they fall under, how many computers they own, etc. should not be random at all. That's how you make a small sample size representative of the population, and can therefore get accurate results.

      For example, if a census (which has a near 100% sample rate) 5 years ago told you that 75% of the population owns a computer, and 75% of computer owners use the internet, and 50% of internet users have broadband, you can get very accurate results with a sample size a fraction of a percent of the size of the total population by simply making certain that your smaller sample breakdown matches the larger survey. 100% of people surveyed should own a computer, since the survey would need to be 30% larger to include those who don't have a computer and still get the same accuracy (accuracy would be slightly better, but almost certainly not worth the expense). 75% of those people should have internet (you could start here instead with still very high accuracy), and 50% of those people should have broadband.

      A result of 10% of people share files from the study that followed demographics and only used 1,000 people is going to be exponentially more accurate than a survey of 10,000 people chosen completely at random. To get any kind of accuracy with a pure random sampling you would need to sample a very large percentage of the total population. This is impractical and idiotic and not very useful.

      Statistics done well are reliable, it's who's using the statistics, what they are saying about them, and what they aren't telling you about them that make statistics untrustworthy.

      It's not the statistician who is the liar, it's the lawyer, or marketer, or politician who is the liar. It's their fault that 60% of statistics can be made to say whatever the hell you want them to say. That said, I don't trust any numbers given by the MPAA, especially when they arbitrarily adjust them up. More than likely the number should have been adjusted up, but the 5% figure seems rather pulled from thin air and unjustified. 2% or 3% would be more conservative, boosting the number of filesharer's by 50% just 'cause screams of desperation.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    25. Re:Story meaning? by Raistlin77 · · Score: 4, Insightful

      most of the anger is directed toward the music/movie industry's response to piracy- weaken/destroy fair use, demonize all p2p [possibly restricting its use in the future out of fear] suing people as a scare tactic, excessive/un-constitutional fines, DRMed media etc...

      ...I don't see why these tactics are unreasonable...

      So, just so that you can protect your "copyrighted content" from being stolen by someone other than me, you believe that it is "reasonable" to use bogus or flawed "research" to fool the government into a) taking away my legal rights (fair use); b) criminalizing software that can be and is used for legal purposes (P2P); c) abuse our legal system (suing people as scare tactic/impose excessive/unconstitutional fines); and d) crippling your "copyrighted content" so that I cannot exercise my right of fair use after I have purchased your "copyrighted content" (DRM/refer back to a) )?

      It is even more difficult to attach a value to the legitimate uses of file-sharing networks, but if you can point me at examples of how file-sharing systems have a positive economic impact on anyone, please let me know.

      Really? So you don't see value in a content provider being able to reduce operating expenses by distributing their content via P2P? Just because you are too lazy to do a simple search using any common search engine doesn't mean such examples don't exist. And why exactly does it have to have a positive economic impact on anyone - why does it have to have any economic impact at all? There are many things that have neither a positive economic impact nor any economic impact whatsoever, should those be illegal too?

    26. Re:Story meaning? by Bigjeff5 · · Score: 2, Informative

      No, you missed the point of that post.

      The point was that the sample size has almost no bearing on the accuracy of the survey provided it is truly representative of the overall population.

      If you can get a sample size of 10 that is representative of a population of 60,000,000 people, you'll have a pretty accurate survey. The reality is, that's not possible in most cases. You'll generally have more than 10 demographics of varying percentages of the total population, making 10 simply too small. 1000, however, is not too small unless you are looking for very, very small percentages of the population. I.e if you are expecting results of less than 2%, a sample size of 1000 is too small because the margin of error is around 3% - you could easily run the survey and get no positive hits at all. For that survey, you'd probably need to bump it up to around 10,000 to drop the margin of error low enough to get reliable results.

      Since the results they got were 11.6%, and the margin of error was about 3%, you can very reliably say between 8.6% and 14.6% of people use file sharing software.

      I don't like that they added 4.7% to their figure without anything to back that up, especially since that is nearly 50% of their results. They basically said 30% of file sharers lie about being file sharers, without any data to back that up. They also used 40 million as their figure for people on the internet, when the government survey states something like 33.5 million.

      The numbers they should have used were 2.9 - 4.9 million people use file sharing software. That is accurate and can be backed up by statistics. It is probably more like 6 million due to people lying about using file sharing software, but that's still just a number I pulled out of my ass, and not statistically accurate.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    27. Re:Story meaning? by MeNeXT · · Score: 2, Insightful

      He was correct, by the way. He went on to rant that anything that uses large amounts of people (by which he meant more than at most a few dozen) was not proper statistics.

      10 people could in no way make any reasonable assumption of any population. He was wrong. First and foremost 10 randomly selected people of any population would not represent any complete demographics of any nation. Rich, poor, middle class, professional, artist, unemployed, computer literate, music fan, and student does not come close to representing the whole segment of any nation. I am positive that you could fill in hundreds more. Some are more prone to "Download than others". The combinations and permutations may affect the results. The sample size needs to be large enough and random to arrive at any reasonable accuracy.

      --
      DRM? No thanks, I'll just get it somewhere else...
    28. Re:Story meaning? by wizardforce · · Score: 3, Interesting

      Why would the solution to something that is not easily enforceable be to make it legal?

      because it doesn't work? why are our police resources being used to enforce extended copyright law when it is neither enforceable nor in the public interest to do so?

      With this particular issue, it simply became trivial for virtually anyone to obtain copyrighted material illegally

      hence the law is unenforceable- that is to say that it can't be enforced without far more draconian measures that violate other rights.

      Nobody is going to stop the advancement of the arts if it is made more difficult to share copyrighted content

      all it has to do is discourage the advancement of the arts relative to an alternative solution. In that case the copyright system as it is would be unconstitutional in the US.

      As someone who makes a living creating copyrighted content, I don't see why these tactics are unreasonable.

      those tactics are often illegal, rights violating and unconstitutional. suing people for 10,000 x damages is a violation of the 8th amendment. various practices by the RIAA/MPAA are illegal including but not limited to violating the DMCA, abuse of the legal system, fraud and entrapment...

      but if you can point me at examples of how file-sharing systems have a positive economic impact on anyone, please let me know.

      live cds, distribution of software patches, advertising which ADV films uses P2P to distribute advertising clips for their anime media, distribution of creative commons licensed materials etc...

      ALL CD stores but one have been driven out of business, and virtually everyone I know has stopped purchasing CD's

      I'm sure that had nothing to do with single tracks being sold on Itunes, the poor state of CDs released today or the recession.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    29. Re:Story meaning? by blueg3 · · Score: 4, Insightful

      Just as easily as a random sample can accurately reflect a population as a whole, it can equally be skewed to be a completely inaccurate representation of the real world.

      If by "just as easily" you mean "with an enormously lower probability", then yes. But then, that's what a statement of margin of error says.

      Statistics isn't all that complicated, and what a statistical measure means can be both demonstrated and proven. You don't need to get all faux existential about how "it's all just a bunch of crap, man". You don't know what you're talking about.

      Also, entropy? No such thing as random? Really? Don't inject physical phenomena you clearly don't understand in a discussion about pure mathematics.

    30. Re:Story meaning? by koxkoxkox · · Score: 2, Insightful

      Just as easily as a random sample can accurately reflect a population as a whole, it can equally be skewed to be a completely inaccurate representation of the real world.

      Of course, but not more than 5% of the time. Please read a bit more about statistics and maybe listen to real statisticians instead of journalists before spewing so much hate. Statistics don't try to prove anything, they are just tools we can use to make decisions.

    31. Re:Story meaning? by Merls+the+Sneaky · · Score: 2, Interesting

      11 million world of warcraft players regularly use file sharing as a means to receive regular game updates. They could easily have picked up a bunch of those players and lumped them in with the "illegal" file sharing crowd. The study is bogus because it doesn't account for those situations.
       

    32. Re:Story meaning? by 4D6963 · · Score: 2, Funny

      Excellent Summery

      Statistics are hard, and so is gramer.

      --
      You just got troll'd!
    33. Re:Story meaning? by Runaway1956 · · Score: 2, Insightful

      You seem to be working from an assumption of some sort. How about we consider the taxman coming to your place of business to conduct an audit. He finds that you owe 7 million dollars in taxes, instead of the 4 million dollars that you claimed. Can you argue that the numbers aren't off by an order of magnitude? "But, sir, my numbers are only wrong by about 75%, this isn't fraud!" Or, we can work those same numbers backward - "But, sir, my numbers are only off by about 50% - of course it's not fraud!" Good luck with that, huh?

      The numbers are fraudulent, plain and simple. As others have pointed out, anyone in the scientific field(s) would be laughed out of academia for submitting such flawed numbers and such flawed reasoning.

      "In fact, unless they only surveyed people WITH internet access,"
      BTW - TFA specifically says that everyone in the survey had internet access.

      There is no line of work on planet earth where people are permitted to do such obviously fraudulent math. If similarly flawed mathematics were applied to a construction job by a bunch of backwoods hill billies, they would soon be out of business.

      You simply cannot justify the numbers with any sort of logic. Attempting to do so is an exercise in fraud.

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    34. Re:Story meaning? by blueg3 · · Score: 2, Informative

      No, it's a thermodynamic concept that has been extended to information theory. Prior to major developments in statistical mechanics, entropy was a loss of energy associated with physical transformations. This significantly predates both statistical mechanics and information theory. Stat mech formulated the modern definition of entropy, and Shannon applied it to information theory.

    35. Re:Story meaning? by JAlexoi · · Score: 2, Insightful

      The problem is that the government is taking numbers from a statistical research paper, that was commissioned by a biased entity.
      We all know, that statistics tend to sway towards the point of view of the paying party.
      And then we get a problem with Wikipedia Source paradox. The state takes numbers from a biased corporate study and present's them as fact, then that entity will say that "According to the government..." and present that study as official government sponsored study.

    36. Re:Story meaning? by FrankieBaby1986 · · Score: 2, Insightful

      Because 1,176 people is a miniscule (0.0168 %) amount compared to 7 million, so there is room for a LARGE margin of error, in either direction. The sample size is too small for the number of people they are trying to represent.

      --
      ERROR: SIG NOT FOUND (A)bort, (R)etry, (F)ail?:
  2. Re:If it's bogus, it's probably too low. by SoupGuru · · Score: 2, Insightful

    That's what I was thinking. The summary makes it seem that estimating the number that high is outrageous. I certainly wouldn't wager any money that it's significantly higher than actual piracy.

    --
    What doesn't kill you only delays the inevitable
  3. What's the confidence interval? by Anonymous Coward · · Score: 4, Insightful

    Whenever you estimate a statistic like that, you should also indicate the level of uncertainty surrounding the estimate. Why are they not reporting the upper and lower bounds of the confidence interval surrounding that estimate?

    1. Re:What's the confidence interval? by caffeinemessiah · · Score: 3, Insightful

      Whenever you estimate a statistic like that, you should also indicate the level of uncertainty surrounding the estimate. Why are they not reporting the upper and lower bounds of the confidence interval surrounding that estimate?

      Perhaps because it's hard to come up with confidence intervals when you admit to fudging your own data by bumping the estimate up by almost five percentage points.

      --
      An old-timer with old-timey ideas.
  4. Wait, you believed them? by girlintraining · · Score: 3, Informative

    They think that a single copy of a song is worth over a hundred thousand dollars too. They claim to lose more in revenue each month than the GDP of most countries. All because of those dyyyeaaarrrn pirates. Enron looks positively boring in comparison to the accounting techniques the recording industry uses. None of this is news. About the only people that buy this crap are judges and legislators -- the rest of us are almost universally of the mindset that a bag of potato chips has more value than most of the recording industry's portfolio.

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Wait, you believed them? by mdwh2 · · Score: 5, Insightful

      Indeed, let's look at the maths - supposing each person only shares 24 mp3s. By US standards at least, that's a cost of $1.92 million. So with 7 million file sharers, that's $13.44 trillion.

      Now let's check out http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal) - wow, these 7 million people are causing damage to the UK economy equal to almost 5 times the entire GDP of the UK...

  5. Meaningless admission by Trevin · · Score: 5, Insightful

    Using file-sharing software does not equate to sharing files illegally. I admit to using BitTorrent to download Fedora ISO's, and there's nothing illegal about that.

    1. Re:Meaningless admission by Blakey+Rat · · Score: 4, Funny

      I asked the British government, but unfortunately they told me you don't actually exist. Sorry.

    2. Re:Meaningless admission by 4D6963 · · Score: 2, Insightful

      Yeah, sure, because I'm sure there's SOOOO many people who use BitTorrent only to download free linux ISOs and never ever download movies, series, porn, games, books, music.

      Translation : the number of people who only do that are insignificant. It would be dishonest or delusional to disagree.

      --
      You just got troll'd!
  6. the story title is kind of lame by Trepidity · · Score: 5, Informative

    Some of the estimation steps might be sketchy, but the basic practice of estimating a population proportion from a sample of that population is not particularly questionable. That's how almost all studies of populations work, because taking censuses of all people in a country is rarely feasible. We have century-old statistical theory on how to put bounds on the sampling error, too, assuming the sample was indeed random.

    You could have a whole slew of these stories if you really objected to that basic methodology, e.g. nearly every estimate of N million people suffering from a disease or disorder is based on a sample.

    1. Re:the story title is kind of lame by Abreu · · Score: 4, Insightful

      Is it ok to change "11.6%" to "16.3%" based on a "hunch"?

      I'm not a statistician, this is an honest question

      --
      No sig for the moment.
    2. Re:the story title is kind of lame by Trepidity · · Score: 3, Insightful

      If there was some previous result that only 2/3 of filesharers admit it when asked, then an upwards revision by 1/3 in an estimate would be defensible. A "hunch" is not quite as good evidence. of course.

      I was objecting mainly to the "how 136 people became 7 million" title, which to my ears reads mainly as a criticism of the sample size. But whatever the problems with this estimate, the sample size wasn't really among them.

    3. Re:the story title is kind of lame by caffeinemessiah · · Score: 5, Insightful

      Is it ok to change "11.6%" to "16.3%" based on a "hunch"? I'm not a statistician, this is an honest question

      IAAS, and the answer is no. That goes for the GP as well -- no one is contesting estimation theory, just that the fundamental assumptions are so grossly unmet in this "study" as to render it meaningless. And as someone else already commented, it's dangerous here because it's going to dictate public policy.

      If you're going to "adjust" your objective findings, based on some bizarre assumption that a certain percentage of people will lie about file sharing, then why do a survey at all if not to create mathematical/sciency-sounding smoke and mirrors?

      --
      An old-timer with old-timey ideas.
    4. Re:the story title is kind of lame by mdwh2 · · Score: 2, Interesting

      I would hope it's no.

      There's actually a clever way to try to account for this kind of thing - you ask them something like "Do you file-share, or is your birthday in January" (or perhaps something even more obscure that the questioner/Government wouldn't know). The point is that people are more willing to admit to it, because people can't know for sure if they really do file-share, or if they answered yes because of the second question.

      But when it comes to the population as a whole, because you can estimate the proportion who fall into the second category, you can factor that out, and work out the true value.

      But it doesn't look like they did anything like that here.

    5. Re:the story title is kind of lame by DarkOx · · Score: 2, Informative

      Not but you need some basis if you are going to make such an adjustment. There are ways to determine the rate of sampling error for instance and then use that. In this case that might be to much effort or get you into legally murky waters so what an honest researcher would write something like this:

      In my sample of XXXX, YY responded that they sometimes used p2p software in an illegal fashion. Based on this the number of extra legal file sharers in the total population would be ZZZZZZ. I would not expect a person who does not use p2p in an illegal way to respond to my survey in the affirmative while it is easier to image someone who does would respond in the negative; therefor the number may actually be greater than ZZZZZZ.

      ---
      Do so would present the numbers as clearly as they can actually be known; states its assumptions and bias in a consice way.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
  7. This calls for a quote by SilverHatHacker · · Score: 2, Insightful

    "If they facts don't fit the theory, change the facts."
    ~Albert Einstein

    --
    Funny may not give karma, but +5 Informative never made anyone snort coffee out their nose.
  8. mathematics by martas · · Score: 5, Funny

    maybe the authors of the study were taught math skills through unschooling?

  9. file sharing software=pirate??? by Anonymous Coward · · Score: 3, Insightful

    using file sharing software does not mean you pirate software or media.....

  10. So, optimistically, 2.12 million, then? by Animaether · · Score: 2, Interesting

    136 out of 1176 people in households with internet connections admitted to having used file-sharing software (source: the summary)
    18.3 million households in the UK had internet access at time of polling in 2009 (source: http://www.statistics.gov.uk/CCI/nugget.asp?ID=8 )

    136/1176 * 18.3M ~= 2.12M

    Not sure if "having used file-sharing software" means that they downloaded / distributed at least 1 item - say, a song - via said software and that they had no actual rights to do so (you know, as most people use file-sharing software to distribute Linux distros, or have simply 'used it' but didn't actually download or upload anything... *cough*)...

    But let's presume it does.

    Then let's take the low price in iTunes UK of GBP 0.79 per song, then the music industry 'lost' ('cos obviously people had no intention of buying that song that they didn't download / distribute because they were downloading a Linux distro instead *cough*) about GBP 1,671,897.96.

    Well, that's peanuts, innit.

  11. Why the BBC rocks by MosesJones · · Score: 5, Insightful

    This is yet another example as to why the BBC is the finest broadcasting and journalistic organisation on the planet (I've never worked for them, sold to them or have any other financial connection other than the license fee).

    They actually investigated something created by an industry group and found it to be bollocks and then reported it. The BBC are arguably the most "socialist" organisation in the democratic world (funded by a tax on everyone for the benefit of everyone) and yet they still question and challenge everything.

    The US seriously needs something that questions vested interests and rubbish statistics as much as the BBC. Jon Stewart and Bill Maher are just comedians and FoxNews is just comedy.

    Given a choice between the first amendment and the BBC, I'll take the BBC; its demonstrated more freedom of speech in a week than the US media has in a decade.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
    1. Re:Why the BBC rocks by vivaelamor · · Score: 3, Insightful

      Oh come on, the BBC have reported this number many times since it was first used and you sing their praises because Radio 4 happens to do a show devoted to statistics? I wonder just how much time they will devote to debunking this statistic considering how many times they have quoted it.

      Just because the BBC is better than the US networks doesn't mean we should be proud, personally I'm appalled at how low the bar is set.

    2. Re:Why the BBC rocks by FourthAge · · Score: 2, Informative

      You can also avoid paying the licence fee if your TV can't receive over-the-air pictures, e.g. if it is disconnected from the aerial.

      There was once a "radio licence", you can still see a reference to it in one episode of Monty Python, but this was phased out when almost nobody owned a radio but not a TV.

      In the future, I expect the TV licence will be extended to include Internet connections as well, since those can now be used to receive BBC programmes too. At that point, we will see if the BBC can continue to convince people that it is worth the money.

      --
      The tao of democracy: the government you can vote for is not the real government.
    3. Re:Why the BBC rocks by _Shad0w_ · · Score: 3, Informative

      You only need one license, you can have as many tellies as you like. Portable tellies used in caravans and the like will be covered by the license for your home as well.

      If you have two houses, you will need two licenses though, afaicr - which is why students away at Uni need to buy a license - including if they're in halls - even though their permanent residence might still be their parent's house.

      I find the BBC great value and love it dearly. I suspect people will say that's because I'm white, middle class and liberal or something.

      --

      Yeah, I had a sig once; I got bored of it.

    4. Re:Why the BBC rocks by houghi · · Score: 2, Insightful

      The BBC are arguably the most "socialist" organisation in the democratic world [...] and yet they still question and challenge everything.

      What does their political thinking have to do with whether they challenge anything or not? I would call many unions more socialist then the BBC. And they do challenge everything all the time as well (Rightfully or not is another discussion).

      I think you confused "socialist" with "socially engaged" which is not the same thing.

      --
      Don't fight for your country, if your country does not fight for you.
    5. Re:Why the BBC rocks by evilviper · · Score: 2, Interesting

      This is yet another example as to why the BBC is the finest broadcasting and journalistic organisation on the planet

      "The grass is always greener" as the saying goes, so I'd naturally love to believe that. But sadly, I can't, because I have extensive experience watching, listening, and reading the BBC.

      The BBC's news reports are almost always moderately-shallow fluff, VERY light on facts relative to their US counterparts, and rarely researched more than summarily, and constantly providing unconfirmed 3rd party information without so much as a footnote.

      The levels of journalistic integrity displayed by the BBC would become a scandal at any (real*) major US news provider, either print or the major 3 TV networks.

      The US seriously needs something that questions vested interests and rubbish statistics as much as the BBC. Jon Stewart and Bill Maher are just comedians and FoxNews is just comedy.

      None of the above are serious US news sources. Try comparing the BBC to the New York Times, or to the morning/nightly news broadcasts by the major 3 US TV networks (NBC, ABC, CBS). For more in-depth issues, try comparing the BBC to Frontline, 60 Minutes, etc., and then come back and attempt to justify your US-bashing... The BBC does a reasonable job, but they can

      No, the "local news" that's on several hours a day isn't up to par with the BBC, but the two have completely different purposes and scope, so they're hardly comparable. Nor the "early shows" which resemble talk shows vastly more than a serious attempt at news reporting.

      And no, the existence of all the crappy, non-news sources that you (or I) can (and have) point to don't detract from the fact that there are many extremely GOOD news sources in the US. If you want to go that route, the UK is in an even sorrier state... Even the most dedicated tabloid readers in the US would be aghast at the tabloids on the UK news stands. Never-mind the heavy-handed, unbelievably biased pieces of trash which get passed off as documentaries (even on BBC TV/Radio, though not the worst of it).

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  12. Re:It's probably still accurate though. by gbjbaanb · · Score: 3, Interesting

    When you know the total population of the UK is roughly 30 million households, that's a fair chunk of the population. (total population is roughly 60 million people)

    Out of the total population, only 18.7 million have broadband. Guess roughly 40% of the population is a pirate then. We should make it legal, government being there for the populace and all that.

  13. Winston Churchill by feufeu · · Score: 2, Interesting

    O "Statistics are like a drunk with a lampost: used more for support than illumination."
    O "The only statistics you can trust are those you falsified yourself."
    Tick one.

  14. Re:If it's bogus, it's probably too low. by Miamicanes · · Score: 4, Interesting

    > Work backwards from the undisputed declining sales figures of the recording industry.

    The main reason for declining sales is the fact that CD sales during the 90s were artificially boosted by people replacing records and tapes with CDs... then replacing them again when remastered CDs were released a few years later. It was a once-in-a-lifetime event for the recording industry that won't be repeated during our lifetimes.

    People re-bought CDs they already owned in analog (or optimized-for-analog CDs) because they represented an epic improvement in quality by just about any meaningful standard over the analog media they replaced. Everything that's come out since CDs has only been cheaper, shittier-sounding, or intolerably-crippled by DRM.

    Here's an idea for the music industry: ditch the DRM'ed formats, and roll out a music format on DVD media with 96KHz 32-bit stereo PCM. Make the discs gold-colored, call it something like "X-fi", and sell them for $24.95. You'll win on all counts -- genX'ers will go back into highschool mode and buy them to show off how rich they are and/or pretend they sound sufficiently better than 16-bit CDs to justify spending ~twice as much on them, and the fact that every disc will be ~4-8 gigabytes will serve as self-limiting DRM for the next decade or so. Just make sure they still have the MOST compelling consumer benefit intact (and reason why people who buy CDs still DO buy CDs): it's a flawless first-generation master to use for making all your "working" copies for everywhere else.

  15. Re:the true "what the fuck" by sakdoctor · · Score: 2, Funny

    As a solipsist I'd say everyone does it.

  16. Scoundrel Statistics by anyaristow · · Score: 5, Informative

    Even a first year stats student can see it.

    This is almost as cliche in arguments of statistics as the car analogy is on slashdot, and it's the sign of a scoundrel. If you actually had a first year stat student's understanding of stats you'd know where the weaknesses actually are, and where all the rest of the smoke blown in this discussion goes laghably wrong.

    So let's apply some first year stats to the issue.

    First, the sample size. Whether it is numerically large enough to be useful is a matter not only of it's size but also the number of positive results. IOW, a sample size of 1176 is too small if you found 3 of what you're looking for, but if you found 136 (11.6% of 1176), you have plenty of samples. The question is then only whether you had a representative sample.

    My next concern would be precision. Using data with three or four significant digits (136, 1176) to make conclusions to seven significant digits (11.56463%) is silly, but that doesn't seem to have happened here. The only number in all of this that is fishy is the 16.3% number. To get three significant digits they'd have to know the number of lying households to that precision. If they had another study that determined this number they might very well have a number to that precision, but I'm assuming they just guessed.

    That's still not a problem. If you guess, you run your confidence interval through your formulae (here it's a simple product) to put a range on your results. If it's a from-your-ass guess you might put a 100% failure estimate on your low end (i.e. there might be no lying households at all) to arrive at a conservative range. Here, it looks like they used an estimate of 40%. They should have (and might have; I didn't RTFA) run the un-adjusted 11.6% through the formulae to get a conservative low-end range.

    Anyway, the number they finally used was 7%. One significant digit. That doesn't imply the same precision as, say, 6.7% would. In fact, if their figure for the number of lying households really was accurate to one digit (i.e. 35-45%) then rounding their final result to one digit was the correct procedure. If it was just a guess they should have run the absolute low estimate (probably, zero lying households) through to get a range.

    So, with actual first year stat knowledge it's possible to actually state what might be wrong with the study, and not resort to "any first year stat student" hand-waving. It's clear that the most-cited criticism (the sample size) is the result of ignorance and group think, not actual knowledge of statistics.

    1. Re:Scoundrel Statistics by HiThere · · Score: 2, Interesting

      Your criticisms are largely valid, but I still think the sample size was too small. After all, they couldn't know before they did the study what percentage would answer what way ... not unless the study was rigged.

      Of course, it also depends on what the purpose is. If it were for marketing, then this might be a quite acceptable procedure. In that case a large amount of error wouldn't cause significant problems to anyone. But if it's being used to lobby for laws, then it's just that it won't cause any problems for *them*. That the results have been adjusted to be something that can be released to massage public opinion. Etc. In such a case I have a much higher bar for study requirements, and it requires that either the population tested be standardized to eliminate bias (which is impossible if you don't know where bias is coming from already) or it needs to be a MUCH large random sample.

      A good study in this area would first investigate the characteristics of a large population WRT standardizing their likelihood of file-sharing. This step in itself would involve many thousands of people in many different social, economic, and geographic strata. (You might want to steer clear of race or national origin. It's likely significant, but too touchy.) After you've done that, then you can standardize a random sample for study WRT characteristics associated with file-sharing. And at THAT point you might be able to establish reasonable guesses at error bars.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    2. Re:Scoundrel Statistics by Bigjeff5 · · Score: 2, Informative

      Survey sizes of around 1000 are pretty standard. If you run the survey and get 3 positives out of 1000, you say "Oh shit, sample size is too small", then run the same survey with 5,000 or 10,000 people to catch a larger number people you are targeting - i.e. we're looking to see what percentage of people practice illegal file sharing, we need to find at least a decent number of illegal file sharers so we know our survey is accurate.

      It's not a matter of knowing what you'll get before hand or rigging the study, you have to have a jumping off point somewhere, else you'll never do the study. If you get untrustworthy results, you simply adjust your sample size and conduct the survey again.

      A good study in this area would first investigate the characteristics of a large population WRT standardizing their likelihood of file-sharing.

      That is completely unnecessary if all you want to know is what percentage of people practice illegal file sharing. And I'm not sure what you mean by "standardizing" their likelihood of file sharing. Huh?

      This step in itself would involve many thousands of people in many different social, economic, and geographic strata.

      The government does this thing called a Census every few years, that collects just such information from close to 100% of the population, making it extremely reliable.

      (You might want to steer clear of race or national origin. It's likely significant, but too touchy.)

      Why? If it can affect your outcome, it should be in your demographics, otherwise your study is unreliable. Why the hell is that information "touchy"? Does a white guy not relize he's white? Or did someone forget to tell the Irishwoman she's from Ireland? What the hell?

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    3. Re:Scoundrel Statistics by dman123 · · Score: 2, Funny

      My next concern would be precision. Using data with three or four significant digits (136, 1176) to make conclusions to seven significant digits (11.56463%) is silly, but that doesn't seem to have happened here. The only number in all of this that is fishy is the 16.3% number. To get three significant digits they'd have to know the number of lying households to that precision. If they had another study that determined this number they might very well have a number to that precision, but I'm assuming they just guessed.

      It wasn't that precise. The original number was 17.0% and the article poster just converted it from metric percentages so Americans wouldn't get confused.

      --

      --
      dman123 forever!
      Filtering out the -1s and 0s since 1999.
  17. Enough about sample size by anyaristow · · Score: 2, Informative

    Was 1,176 a sample large enough to represent the the 40,000,000? I would assume not. You could assume so. The fact would still be that we would both be assuming.

    Assume nothing. Google is your friend.

    Google: sample size

    First result has all you need.

  18. Figures lie... by hyades1 · · Score: 2, Interesting

    And liars figure.

    The best way to shut these slime-oids up would be to conduct a forensic audit of their royalty payments to artists. I bet not one of the companies would come out clean.

    --
    I've calculated my velocity with such exquisite precision that I have no idea where I am.