Engineers Have More Sons, Nurses More Daughters
Bifurcati writes "While it might be irrelevant for many /.ers, a recent study has shown that people in stereotypically male professions (engineering, IT, mathematics, etc) are more likely to have sons than daughters, while nurses, therapists and teachers tend to produce more girls. Based on independent survey data, engineering types produce 140 boys to every 100 girls, while nurses and the like produce 135 girls to 100 boys. The explanation is unclear, but it might have interesting long-term social implications. A more detailed summary of the journal article is available on Illuminating Science."
The question, of course, is whether this is a reasonable interpretaiton of an objective set of data, or whether this is pseudostatistics where you start from a conclusion, and work backwards to find it in the numbers. Some questions I'd like to see addressed:
* How were the groupings into "masculine" and "feminine" professions done? Is this reasonable, and did they truly choose the most "obvious" masculine and feminine professions to include?
* Do these groupings span the dataset, or are some (possibly most) professions excluded as "neutral"?
* What is the breakdown by profession for all professions, not just the included groups?
* Most importantly, was the selection of the "masculine" and "feminine" professions determined BEFORE or AFTER the data was collected?
My concern here is that they started with a dataset for chilbirth for all professions (probably on a fairly small dataset). They noticed some professions skewed one way, some another. They noticed that some of the professions skewing male were "masculine" and some skewing female were "feminine" and called it a conclusion, sweeping all the other anomalites in their dataset under the rug. Hey, presto! Conclusion!
Fact: The general benchmark for "statistical significance" is 95% confidence that the data cannot be explained as a random phenomenon.
Experiment: Create 20 hypothetical correlations to test for on a completely random dataset. On average, you should find one in twenty hits the 95% confidence mark.
Intellectually dishonest followup: Publish your one statistically significant result with great fanfare. Bury the othe 19 in a footnote, if you mention them at all.
Step 3: Profit!