Big Talk About Small Samples

← Back to Stories (view on slashdot.org)

Posted by samzenpus on Monday November 17, 2014 @04:40AM from the read-all-about-it dept.

Bennett Haselton writes: My last article garnered some objections from readers saying that the sample sizes were too small to draw meaningful conclusions. (36 out of 47 survey-takers, or 77%, said that a picture of a black woman breast-feeding was inappropriate; while in a different group, 38 out of 54 survey-takers, or 70%, said that a picture of a white woman breast-feeding was inappropriate in the same context.) My conclusion was that, even on the basis of a relatively small sample, the evidence was strongly against a "huge" gap in the rates at which the surveyed population would consider the two pictures to be inappropriate. I stand by that, but it's worth presenting the math to support that conclusion, because I think the surveys are valuable tools when you understand what you can and cannot demonstrate with a small sample. (Basically, a small sample can present only weak evidence as to what the population average is, but you can confidently demonstrate what it is not.) Keep reading to see what Bennett has to say.

The smallest sample I've ever used to make an argument was when I submitted some legal briefs, each no longer than five pages, in the anti-spam cases that I'd been filing in Washington State small claims court. Since I suspected the judges were not taking the cases seriously, I filed the briefs with the third and fourth pages stuck together in the center, by a tiny thread of paper joining the back of the third page to the front of the fourth page. (If someone were to turn the pages and actually readthe brief, the thread would break.) I did something similar in six different cases, and when the motions were all rejected, I went to the courthouse to look at the paper motions still in the file. In three out of six cases, the judge had rejected the motion without reading it first.

Now, the point was not to make any accurate estimation of the actual proportion, in the total population of small claims court judges, who would reject a brief in an anti-spam case without reading it. There's no basis for saying that the proportion of such judges is close to 50%. But we can still probably reject any contention that the proportion of such judges is very low. If only 10% of judges were rejecting motions without reading them, then there is only about a 1.4% chance of taking a random sample of six rejected motions and finding that in three or more cases, the judge did not read the motion. Even if 20% of judges were doing so, for an event with a probability of p=0.20 you would still only see it occur in three out of six cases, about 8.2% of the time. (If an event has probability p, the exact probability of that event occuring three or more times in six trials is given by 20*(p^3)*((1-p)^3) + 15*(p^4)*((1-p)^2) + 6*(p^5)*((1-p)^1) + 1*(p^6)*((1-p)^0).) So we can say that the proportion of such judges is quite probably more than 20%. I did this repeatedly because even after I had "caught" the first judge, I wanted to head off any objection that this was just an isolated case of rare behavior.

And, as always, it's important not to generalize too much about the behavior whose probability we're estimating. I don't think that 20% or more of judges, even in small claims court, are throwing most types of cases without reading or listening to the arguments. My impression was that most judges see view small claims court as a place to redress injustices, and that they see anti-spam and anti-telemarketer plaintiffs as just trying to "make money" at it, so they take those suits less seriously. I disagreed with this stance because (1) anti-spam plaintiffs usually really have been harmed and are not just "whining about one email" which they are trying to "cash in" (I still get so much spam that it interferes somewhat with the operation of my server and with my ability to get through my daily email); and (2) the law is intended after all as a deterrent, with disproportionate damages in order to discourage spammers from spamming in the first place. However, the charitable reading of the results is to assume that judges are merely biased against anti-spam plaintiffs -- but at least they probably don't treat all cases as casually as they treat anti-spam suits!

Back to the issue of small samples. My previous article was prompted by an editorial about the online response that had been elicited by two different photos -- one showing a black woman breastfeeding, and a nearly identical photo showing a white woman breastfeeding. The author asserted that the photos had received vastly different responses, which she attributed to racism. I presented a survey to a sample if users recruited from Amazon's Mechanical Turk, randomly showed each survey-taker one of the two photos, and asked:

Our academic department has asked everyone to submit a "fun" photo of themselves, so that our photos can be displayed together on the department home page. One of our employees submitted a photo that has caused some internal debate about whether the photo is inappropriate. I wanted to do a poll to get the opinion of a random sample of Internet users of different backgrounds.

Do you think this is an appropriate picture to be used in a photo collection on our academic department home page?

Out of 47 respondents who saw the black woman's photo, 36 of them (77%) said it was inappropriate. Out of 54 respondents who saw the white woman's photo, 38 of them (70%) said it was inappropriate.

As before, these samples are to small to say precisely what the relevant proportions in the background populations are, but we can probably reject certain statements about the populations -- for example, that the percentage of users offended by the black woman's photo is 20 percentage points higher than the percentage of users offended by the white woman's photo. This is where the counterintuitive part comes in. Suppose that in the background population, 81% of respondents would find the black woman's photo offensive, but only 61% would be offended by the white woman's photo. What are the odds of getting 77% or less "yes that's offensive" responses from a sample of 47 users shown the black woman's photo, and getting 70% or more "yes that's offensive" responses from a sample of 54 users shown the white woman's photo? It doesn't sound unlikely at all, because the percentages are quite close to the originals -- but you can verify, either with statistical calculations or with a quickly written computer program, that the odds are only about 2.5%.

Two main factors contribute to this counterintuitive result. First, even with a sample size of a few dozen, the frequency of an event starts to tend very closely to the frequency in the background population (if 80% of your population has some trait, and you take a sample of size 50, there's about a 95% chance that the number with that trait in your population will be between 34 and 46). Second, to find the odds of seeing both of these deviations at the same time (deviating from an assumed 81% in the background population down to 77% in the first sample, and deviating from an assumed 61% in the background population up to 70% in the second sample), you have to mutiply the probabilities of these two unlikely events. The probability of the first deviation is about 19%, the probability of the second is about 13%, and so the probability of them both occurring is about 2.5%.

The reason I calculated the odds of getting 77% or less "offended" responses for the black woman's photo while also getting 70% or more "offended" responses for the white woman's photo, is that in calculating the "unlikeliness" of a statistical result, it's customary to calculate the odds of getting "this result or a more extreme one". For example, suppose you want to know if a company's hiring process is gender-balanced (assuming a 50/50 gender split in the population), and you notice that in a random sample of 100 recent hires, 61 were men. You wouldn't ask "What are the odds of there being exactly 61 men in this sample?", because the odds of getting any particular number, are small. You'd ask, "What are the odds of getting this result or a more extreme one -- i.e. the odds of getting 61 or more men out of a random sample of 100, if the population were truly gender-balanced? As this calculation tool shows, the odds are only about 1.7%.

Similarly, in the case of the two populations being measured, the author of the original editorial hypothesized that there was some significant gap between the percentages of the population that were offended by the two photos, which I arbitrarily assumed to be 20 percentage points. Under that assumption, showing the two pictures to two different groups and having them be offended at similar rates, is the unexpected, "extreme" result, and the closer the rates are to each other, the more extreme the result is. That's why I calculated "77% of less" for the first group vs. "70% or more" for the second group.

And out of the pairs of numbers that I tested which were separated by 20 percentage points, 81% and 61% were the numbers which made the given result the least unlikely. 80/60 and 79/59 give odds of about 2.5% and 2.4%; 82/62 and 83/63 give odds of 2.4% and 2.2%.

You can do the statistical calculations directly, but in case you won't believe it unless you see the results unfold with your own eyes, you can run this perl script, which iterates through a million trials of the experiment, counting the number of times that the unexpected result occurs.

Why did I assume a 20-point gap? That was the most subjective leap that I made. Looking through the original editorial, I figured that on the basis of inflammatory statements like

"Only one woman was called 'adorable' by the media and portrayed with girlish innocence, and it wasn't the black one. It never is."

and

"The contrast in headlines is so stark, it deserves to be examined" [I assume here she meant the contrast in responses]

the author meant to imply a difference in people's attitudes that was at least that large. But the results suggest that it isn't.

For all of this effort, of course, I could have just expanded the original experiment to a sample of several hundred and mollified some people's concerns. But I wanted to argue for what you can show, even with small samples, because I would like to try (and would like others to try) similar experiments in the future, and do not think people should be discouraged if they can't afford to pay a thousand Amazon Mechanical Turk workers to take their survey. I paid my 100 respondents $0.25 each; naturally, one experiment I'd like to do soon is to figure out what's the lowest I can get away with paying them.

28 of 246 comments (clear)

Min score:

Reason:

Sort:

I am not reading that. by Anonymous Coward · 2014-11-17 04:44 · Score: 5, Insightful

Slashdot is trying to move their user base from news for nerds and geeks to news for normals.
Seriously, I've noticed the Register getting more active as people move over there.
We, geeks, view this entire article as a bunch of shenanigans that waste our time. Please stop spitting in my face.
Give me an article about Intel latest and greatest chipset plans or how AMD screwed the poorch or about how one can modify a blackberry to run android applications. Those things are Useful.
\
Infotainment designed to incite does not nor should enter my world, it makes my world more stressful and wastes my time.
1. Re:I am not reading that. by i+kan+reed · 2014-11-17 04:51 · Score: 5, Insightful
  
  I'm glad you think explaining, mathematically, a statistical forecasting process is for "normals".
  Whereas us "geeks" are only interested in short little blurbs about software pathces, right?
  Now, I absolutely understand everyone who is concerned about a single contributor dominating the submission queue, possibly hurting the richness of available information, but your complaint seems so petty. Actual critical reasoning about previous information that was questioned is the good kind.
2. Re:I am not reading that. by Anonymous Coward · 2014-11-17 05:18 · Score: 3, Insightful
  
  Slashdot is trying to move their user base from news for nerds and geeks to news for normals.
  No normal person is going to be the least bit interested in Bennett Haselton's inane ramblings.
  I have no idea why Slashdot is posting this garbage, but attracting "normal" readers certainly is not why.
3. Re:I am not reading that. by tepples · 2014-11-17 06:01 · Score: 3, Insightful
  
  First of all, this is basic statistics.
  Some Slashdot commenters have shown that they need an article about basic statistics, more specifically what can be inferred even from a small sample. Read the first paragraph.
4. Re:I am not reading that. by khasim · 2014-11-17 06:26 · Score: 5, Insightful
  
  Some Slashdot commenters have shown that they need an article about basic statistics, more specifically what can be inferred even from a small sample.
  There are lots of people out there (and here) who do not understand basic statistics. Bennett Haselton is one of them.
  The FIRST problem is not the small sample set. It is that the small sample set is "some people on Amazon's Mechanical Turk who are willing to take a survey for $X". His sample set is flawed.
  And his home-written "survey" is also flawed.
  So his math is meaningless. Garbage-in, Garbage-out.
  In order to deal with the flaw in his sample set he'd have to have a much larger sample set. OR a properly selected sample set.
  THEN he'd need his "survey" re-written.
  And only then could he try his hand at the math. He hasn't even explained what his margin of error is or which method he used to calculate it. BECAUSE HE DOES NOT UNDERSTAND STATISTICS.
Keep reading to see what Bennett has to say. by QuietLagoon · 2014-11-17 04:46 · Score: 5, Insightful

Why? . Really, why?
.
He already wasted ten minutes of my life with his last episode of keyboard effluent, why should I waste my time with him anymore?
Let off some steam, Bennett by Anonymous Coward · 2014-11-17 04:47 · Score: 3, Insightful

I guess it's kinda cool that you took over what use to be a major tech-news website and turned it into your personal blog.
Re:Bennett!!!!!! by Anonymous Coward · 2014-11-17 04:49 · Score: 5, Insightful

Fried post.
No kidding. Who is Bennett, and why does he get to use /. as his blog? What happened to WP:NOTBLOG?
Why doesn't Bennett get his OWN blog? by sconeu · 2014-11-17 04:49 · Score: 3, Insightful

That way the rest of us don't have to hear about his bullshit.

--
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
1. Re:Why doesn't Bennett get his OWN blog? by Aighearach · 2014-11-17 09:39 · Score: 3, Insightful
  
  John Dvorak was guilty of posting well-written articles on interesting topics... and being completely wrong on all the conclusions.
  Bennett isn't well-written and doesn't have interesting topics.
  I never thought I'd say this, but... "leeaave Johny aloooooooooone!"
tldr by phantomfive · 2014-11-17 04:52 · Score: 5, Insightful

There's a really good book that talks about brevity and how to communicate your ideas more concisely with fewer words. I suggest Bennett read it.

--
"First they came for the slanderers and i said nothing."
1. Re:tldr by Flavianoep · 2014-11-17 04:59 · Score: 3, Insightful
  
  I second you, but think he needs a grammar book, too. I could spot so many mistakes I couldn't keep reading, and English is not even my first language!
  
  --
  Linux is for people who don't mind RTFM.
Look, give us an exclusion. by gstoddart · 2014-11-17 04:54 · Score: 4, Insightful

I'm sorry, but this is getting absurd.
If Slashdot is going to be Bennett "aint I smug and pointless" Haselton's personal blog ....
Give us a STORY EXCLSUION for this clown.
I do not see value in Bennett and hit shit, and I don't care.
But apparently at least samzenpus and timothy with post any of the shit this idiot writes.
Seriously, just fucking make it stop. Nobody here gives a shit about Bennett Haselton. So give us a fucking way to stop reading his crap.

--
Lost at C:>. Found at C.
STOP POSTING BENNET DRIVEL by TheRaven64 · 2014-11-17 04:54 · Score: 5, Insightful

It might be different if Bennet were a frequent poster and would be actively engaged in discussions, but he's not. He's just some guy who once heard that brevity is the soul of wit and went off to write ten thousand words explaining what it meant.

--
I am TheRaven on Soylent News
I don't get the hate. by waspleg · 2014-11-17 04:54 · Score: 2, Insightful

I don't read his long articles, generally speaking, but he has been an advocate against censorship and I respect that much.
No one makes anyone read the articles, and without even checking, I'd guess you can configure /. not to even show them.
The Haselton hate reminds me of the Jon Katz days, which is kind of amusing ;)
1. Re:I don't get the hate. by gstoddart · 2014-11-17 05:03 · Score: 4, Insightful
  
  I'd guess you can configure /. not to even show them
  No, you can't, and that's the problem.
  I can't click something and be done with this clown. Because multiple Slashdot editors post his crap.
  Short of stringing up some editors, or a lot of really loud angry posts, we do not have any easy means to say "do not wish to see this crap".
  Which means you can guarantee every one of this posts will get this kind of response.
  If they would give us a check box to say "do not wish to see any shit from Bennett Haselton", that would be preferable. Instead we're all forced to read his opinion on everything.
  Hey Bennett, what's your opinion of getting kicked in the nuts? Have you done extensive testing to tell us it hurts?
  
  --
  Lost at C:>. Found at C.
Boycott Bennett! by sootman · 2014-11-17 05:06 · Score: 5, Insightful

Slashdot by now has OBVIOUSLY seen how much we don't like this guy. The fact that they keep posting him means they're just trolling us, or going for pageviews, or both. Or maybe Bennett has some kind of deal with the site, or has something on one of the editors. Whatever. I don't care. From now on, NO ONE post any comments on one of his stories. Not even to say how much you hate his stories. This will be my last comment on one of his stories. Hope this takes!

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Still no, sorry by Anonymous Coward · 2014-11-17 05:07 · Score: 5, Insightful

But we can still probably reject any contention that the proportion of such judges is very low. If only 10% of judges were rejecting motions without reading them, then there is only about a 1.4% chance of taking a random sample of six rejected motions and finding that in three or more cases, the judge did not read the motion.
But you DIDN'T HAVE A RANDOM SAMPLE. In particular, you had a sample from Washington State small claims court. So you can ONLY draw conclusions about Washington State small claims court. You have no idea what happens in New York, or in England. But that's only one example of how non-random your sample was. The problem is, ANY small sample is going to have non-random attributes, because it's a small sample. You can roll a dice three times and the results will appear highly non-random - no instances at all of some values - you have to roll it a hundred times to get a good distribution and the dice is random. If you start with a non-random dice - like your "sampling only from one court" or your "using Mechanical Turk" - your small sample size gives you results that are simply MEANINGLESS.
Go and study stats and stop posting this drivel on Slashdot where people might believe it.
Confidence levels by Okian+Warrior · 2014-11-17 05:09 · Score: 4, Insightful

38 out of 54 survey-takers, or 70%

Bennett, try this experiment.
Make a program that flips 54 coins and notes the number of heads and the number of tails at each round. Then run this program for one million rounds.
When you're done, note the number of rounds the random generator saw 38 or more heads and frame this as a proportion; ie - "the random generator reached this level X% of the time".
Then compare your results with the random generator. If your results are unlikely to come from the random generator, then perhaps you have something.
Now, " unlikely" is an arbitrary measure with no compelling foundation (it's the wrong measure to determine the significance of a result(*)), but in scientific circles we use a "rule of thumb": results are considered significant when they are less likely than 95% of the random results.
Even at this level, we expect 1-in-20 studies to be due to random chance, but then follow-on studies should confirm or deny the findings (and 1-in-20x20 of *those* will be due to random chance as well).
If the results might lead to potentially catastrophic decisions we might use a higher level of significance; for example, 99% confidence when deciding whether a drug is safe. Physics uses an insanely high level of confidence.
Try that and get back to us - we await your next post with baited breath.
(*) The correct measure is the number of bits saved by compressing the original data by factoring out the result (glossing over some details).
1. Re:Confidence levels by Okian+Warrior · 2014-11-17 05:55 · Score: 3, Insightful
  
  Dude! News for nerds indeed. Try using this command in R: 1-pbinom(38,54,.50). You will find that the probability of getting 38 or more heads in 54 trials is approximately 0.0007481294. There are plenty of things wrong with the lump of stupid in the blog post above, but at least get the math right.
  Part of explaining something is knowing your audience.
  Telling someone to type a command in R doesn't explain *why* typing that command works, or what's going on in the background.
  And yes, there's things wrong with the post, but Bennett is most definitely NOT A STATISTICIAN. You don't saturate a beginner with all the gory details - you start from the basics and work up.
  Part of explaining something is knowing your audience. Practice explaining things to people and you, too, will figure that out.
Re:Bennett!!!!!! by Anonymous Coward · 2014-11-17 05:15 · Score: 4, Insightful

But why is Bennett's garbage being approved? I understand slashvertisements, because there is at least a monetary benefit to posting them. I also understand some pseudoscience occasionally slipping by, because the editor didn't read it carefully. But this crap? It is obvious shit from beginning to end. He has nothing to say. It is just completely pointless.
Wikipedia by McGruber · 2014-11-17 05:18 · Score: 5, Insightful

Bennett Haselton (born November 20, 1978) is a frequent commenter on the website Slashdot.org, where he is widely disliked by readers.
1. Re:Wikipedia by fructose · 2014-11-17 05:28 · Score: 5, Insightful
  
  You needed a cite. I fixed it for you.
Slashdot is not your blog. Go away. by pla · 2014-11-17 05:44 · Score: 5, Insightful

Slashdot is not a your blog. Go away.
Bigger sample size in comments section by luckymutt · 2014-11-17 05:57 · Score: 4, Insightful

You want to see a more meaningful sample size? Look at the number of comments in Bennett's "submissions" that are complaining about this waste of time. Compare that to the number that actually gives a shit.
It was bad enough that the first sorry the other day had NOTHING whatsoever to do with news for nerds, nor was it well written, nor was it well conducted.
But /. now needs to post a whiny follow-up piece???

Few people care about this Miley Cyrus' opinion on things that do matter, and fewer still care about his opinion on all the crap that doesn't matter.
Breastfeeding pictures? Burning Man parking? Burning Man Ice distribution? How come 5th Ammendment?
Fuck this clown.
Op-ed by tepples · 2014-11-17 05:58 · Score: 3, Insightful

Wikipedia is not a blog, but Slashdot is not Wikipedia. Plenty of newspapers and the like have in-house opinion columnists and other writers producing exclusive original content that distinguishes each publication from other AP/Reuters aggregators.
You are wrong, again. by khasim · 2014-11-17 07:43 · Score: 4, Insightful

However, I still say it's correct that even on the basis of a small sample, you can rule out claims about the background population.
You can say that but you are wrong.
With a small, non-random sample you cannot say ANYTHING about anything.

You reach in, grab a ball at random and pull it out, and see that it's red.
Random is not the same as non-random.
A small sample size that is random is NOT THE SAME as a small sample size that is non-random.

It's trivially true that "any small sample is going to have some non-random attributes", but that doesn't mean the sample itself isn't random, ...
Again, your sample was not random.
No matter how many times you try to imply/claim that it was random, it was not random.
Re:Not even wrong. by Aighearach · 2014-11-17 09:58 · Score: 4, Insightful

As I said, I included the link to the perl script in the article, so that you don't have to take my word for it about the statistical calculations -- you can run one million trials of the experiment and verify that, under the posited hypothesis, a result similar to the one that occurred will only occur about 2.5% of the time. So the posited hypothesis is probably wrong.
Three minutes before posting this you were smacked down by a statistics prof posting as AC. I recommend you just apologize for having defended your small sample size with bad statistics, and hope people forget in a few years.