Extraordinarily corrupt or statistically commonplace? Reproducibility crises may stem from a lack of understanding of outcome probabilities.

Souto-Maior, Caetano (2022) Extraordinarily corrupt or statistically commonplace? Reproducibility crises may stem from a lack of understanding of outcome probabilities. In: UNSPECIFIED.

Preview

Text
Extraordinarily corrupt or statistically commonplace.pdf
Download (817kB) | Preview

Abstract

Failure to consistently reproduce experimental results, i.e. failure to reliably identify or quantify an effect — often dubbed a ‘reproducibility crisis’ when referring to a large number of studies in a given field — has become a serious concern in many communities and is widely believed to be caused by (i) lack of systematic methodological description, poor experimental practice, or outright fraud. On the other hand, it is common knowledge of the scientific practice that (ii) replicate experiments — even when performed in the same lab, by the same experimenter — will rarely show complete quantitative agreement between them. The presence of the widely believed (i) and commonplace (ii) explanations are not mutually exclusive, but they are incompatible as justifications for irreproducibility. Invoking the former implies an anomaly, a crisis, while the latter is statistically expected and therefore amenable to quantification.

Interpreting two or more studies as conflicting is often a reduction to a mechanicist view where a ground truth exists that must be observed with every properly performed experiment; a slightly less naive view (at best) is a frequentist view where statistical tests must confidently identify a true effect (i.e. a single parameter value) as significant almost always (i.e. an arbitrary proportion of 95\% of times). A broader view, however, may consider that the effect can only be observed as a probability distribution; individual experiments are, therefore, not expected to differ only by sampling and power to identify a significant effect, but by variation at the level of the parameter value itself — i.e. it is accepted that there are sources of variation that cannot be controlled with infinite precision, for instance in the environment and from the experimenter, or it is acknowledged that there may be unknown, uncontrolled factors that will introduce biases. Quantitatively, that perspective is consistent with a Bayesian hierarchical formulation, where the effect (commonly called the group-level) parameters are under a hyperprior and above individual experiment parameters.

Put another way, the Bayesian hierarchical view allows reconciliation between seemingly discordant results by interpreting each experiment as a sample itself of a (group- or system-level) distribution, which in turn sets the range and probability of expected outcomes for new individual experiments. As a corollary, a large number of replicates will increase the confidence not only in the expected value but also in the deviation for it. Thus, “validating” an experiment does not mean getting the same number every time, but establishing the range and likelihood of well-performed experiments. Conversely, once an experiment has been extensively replicated, the effect distribution is informative of how much each repetition deviates from expectation, whether they are actually extreme — and potentially contain anomalies or misconduct — or if they are probabilistically not surprising. This formulation has profound consequences for assessments and claims on reproducibility.

Export/Citation:

Social Networking:

Share |

Item Type:

Conference or Workshop Item (UNSPECIFIED)

Creators:

Creators	Email	ORCID
Souto-Maior, Caetano		0000-0002-0271-2576

Keywords:

Replication, Reproducibility, Hierarchical, Bayesian

Subjects:

General Issues > Data
Specific Sciences > Mathematics > Epistemology
Specific Sciences > Biology > Developmental Biology
Specific Sciences > Biology > Evolutionary Theory
Specific Sciences > Biology > Molecular Biology/Genetics
General Issues > Experimentation
General Issues > Laws of Nature
General Issues > Models and Idealization
Specific Sciences > Probability/Statistics
General Issues > Reductionism/Holism
General Issues > Theory/Observation

Depositing User:

Dr. Caetano Souto-Maior

Date Deposited:

06 Jun 2022 16:29

Last Modified:

06 Jun 2022 16:29

Item ID:

20720

Subjects:

Date:

6 June 2022

URI:

https://philsci-archive.pitt.edu/id/eprint/20720

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item

Search & Browse

Information

Extraordinarily corrupt or statistically commonplace? Reproducibility crises may stem from a lack of understanding of outcome probabilities.

Abstract

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

ULS D-Scribe

E-Prints

Share

Feeds

Get Alerts for All New Posts