Testing for statistical significance should be an aid to interpreting scientific results, and — when applied sensibly — to decision-making. It should not be a mindless quest for verification (see V. Amrhein et al. Nature 567, 305–307; 2019). In my experience, the correction of P values for multiple testing — a valuable tool in the fight against P hacking and in the proper interpretation of genome-wide association studies, for example — is being comparably abused through ignorance.

Too often, I find myself up against criticisms from reviewers who draw no distinction between tests carried out on evidence-weighted, mechanistically legitimate risk variables and tests applied to ad hoc collections of measurements (roughly akin to grandmothers’ dogs’ tail lengths). The distinction was spelt out more than 20 years ago (T. V. Perneger Br. Med. J. 316, 1236–1238; 1998). That nobody took any notice shows how tight a grip the lust for certainty — neatly dubbed by Amrhein and colleagues as “dichotomania” — has on a researcher’s psyche.

Nature 569, 192 (2019)

Nature Briefing

Sign up for the daily Nature Briefing email newsletter

Stay up to date with what matters in science and why, handpicked from Nature and other publications worldwide.

Sign Up

Article credit to:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *