Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311(7003), 485-485. https://doi.org/10.1136/bmj.311.7003.485

Armstrong, R. A. (2014). When to use the Bonferroni correction. Ophthalmic and Physiological Optics, 34, 502-508. https://doi.org/10.1111/opo.12131

Barrett, L. F. (2015). Psychology is not in crisis. The New York Times, A23. https://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html

Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how? Journal of Clinical Epidemiology, 54, 343-349. https://doi.org/10.1016/S0895-4356(00)00314-0

Bergkvist, L. (2020). Preregistration as a way to limit questionable research practice in advertising research. International Journal of Advertising, 39(7), 1172-1180. https://doi.org/10.1080/02650487.2020.1753441

Type I Error Rates are Not Usually Inflated 22

Berk, R. A., Western, B., & Weiss, R. E. (1995). Statistical inference for apparent populations. Sociological Methodology, 25, 421-458. https://doi.org/10.2307/271073

Birnbaum, A. (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57(298), 269-306. https://doi.org/10.1080/01621459.1962.10480660

Bolles, R. C. (1962). The difference between statistical hypotheses and scientific hypotheses. Psychological Reports, 11(3), 639-645. https://doi.org/10.2466/pr0.1962.11.3.639

Boring, E. G. (1919). Mathematical vs. scientific significance. Psychological Bulletin, 16(10), 335-338. https://doi.org/10.1037/h0074554

Brower, D. (1949). The problem of quantification in psychological science. Psychological Review, 56(6), 325-333. https://doi.org/10.1037/h0061802

Chow, S. L. (1998). Précis of statistical significance: Rationale, validity, and utility. Behavioral and Brain Sciences, 21(2), 169-194. https://doi.org/10.1017/S0140525X98001162

Cook, R. J., & Farewell, V. T. (1996). Multiplicity considerations in the design and analysis of clinical trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 159, 93-110. http://dx.doi.org/10.2307/2983471Cox, D. R. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics, 29(2), 357-372. http://dx.doi.org/10.1214/aoms/1177706618Cox, D. R., & Mayo, D. G. (2010). Objectivity and conditionality in frequentist inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 276-304). Cambridge University Press.

Del Giudice, M., & Gangestad, S. W. (2021). A traveler's guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic decisions. Advances in Methods and Practices in Psychological Science, 4(1). https://doi.org/10.1177/2515245920954925

Dennis, B., Ponciano, J. M., Taper, M. L., & Lele, S. R. (2019). Errors in statistical inference under model misspecification: Evidence, hypothesis testing, and AIC. Frontiers in Ecology and Evolution, 7, Article 372. https://doi.org/10.3389/fevo.2019.00372

Devezer, B., & Buzbas, E. O. (2023). Rigorous exploration in a model-centric science via epistemic iteration. Journal of Applied Research in Memory and Cognition, 12(2), 189- 194. https://doi.org/10.1037/mac0000121

Devezer, B., Navarro, D. J., Vandekerckhove, J., & Buzbas, E. O. (2021). The case for formal methodology in scientific reform. Royal Society Open Science, 8(3), Article 200805. https://doi.org/10.1098/rsos.200805

Feynman, R. P. (1955). The value of science. Engineering and Science, 19(3), 13-15. https://calteches.library.caltech.edu/1575/1/Science.pdf

Firestein, S. (2012). Ignorance: How it drives science. Oxford University Press.

Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture. 33, 503-515. https://doi.org/10.23637/rothamsted.8v61q

Fisher, R. A. (1930). Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society, 26(4), 528-535. https://doi.org/10.1017/S0305004100016297

Fisher, R. A. (1956). Statistical methods and scientific inference. Oliver & Boyd.

Fisher, R. A. (1971). The design of experiments (9th ed.). Hafner Press.

Fraser, D. A. S. (2019). The p-value function and statistical inference. The American Statistician, 73(sup1), 135-147. https://doi.org/10.1080/00031305.2018.1556735

Type I Error Rates are Not Usually Inflated 23

García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology, 8, Article 100120. https://doi.org/10.1016/j.metip.2023.100120

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102, Article 460. http://dx.doi.org/10.1511/2014.111.460Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311-339). Erlbaum.

Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2), 198-218. https://doi.org/10.1177/2515245918771329

Greenland, S. (2017). Invited commentary: The need for cognitive science in methodology. American Journal of Epidemiology, 186(6), 639-645. https://doi.org/10.1093/aje/kwx259

Greenland, S. (2021). Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatric and Perinatal Epidemiology, 35, 8-23. https://doi.org/10.1111/ppe.12711

Greenland, S. (2023). Connecting simple and precise p‐values to complex and ambiguous realities. Scandinavian Journal of Statistics, 50, 899-914. https://doi.org/10.1111/sjos.12645

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337-350. https://doi.org/10.1007/s10654-016-0149-3

Hager, W. (2013). The statistical theories of Fisher and of Neyman and Pearson: A methodological perspective. Theory & Psychology, 23, 251-270. http://dx.doi.org/10.1177/0959354312465483Haig, B. D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American Journal of Psychology, 122(2), 219-234. https://doi.org/10.2307/27784393

Haig, B. D. (2018). Method matters in psychology: Essays in applied philosophy of science. Springer.

Hancock, G. R., & Klockars, A. J. (1996). The quest for α: Developments in multiple comparison procedures in the quarter century since. Review of Educational Research, 66(3), 269-306. https://doi.org/10.3102/00346543066003269

Hewes, D. E. (2003). Methods as tools. Human Communication Research, 29, 448-454. https://doi.org/10.1111/j.1468-2958.2003.tb00847.x

Hochberg, Y., & Tamrane, A. C. (1987). Multiple comparison procedures. Wiley.

Hurlbert, S. H., & Lombardi, C. M. (2012). Lopsided reasoning on lopsided tests and multiple comparisons. Australian & New Zealand Journal of Statistics, 54(1), 23-42. https://doi.org/10.1111/j.1467-842X.2012.00652.x

Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196-217. http://dx.doi.org/10.1207/s15327957pspr0203_4Kim, K., Zakharkin, S. O., Loraine, A., & Allison, D. B. (2004). Picking the most likely candidates for further development: Novel intersection-union tests for addressing multi-component hypotheses in comparative genomics. Proceedings of the American Statistical Association, ASA Section on ENAR Spring Meeting (pp. 1396-1402). http://www.uab.edu/cngi/pdf/2004/JSM%202004%20-IUTs%20Kim%20et%20al.pdfType I Error Rates are Not Usually Inflated 24

Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S.,...& Sowden, W. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443-490. https://doi.org/10.1177/2515245918810225

Kotzen, M. (2013). Multiple studies and evidential defeat. Noûs, 47(1), 154-180. http://www.jstor.org/stable/43828821Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular analysis in systems neuroscience: The dangers of double dipping. Nature Neuroscience, 12(5), 535-

540. https://doi.org/10.1038/nn.2303

Kuhn, T. S. (1977). The essential tension: Selected studies in the scientific tradition and change. The University of Chicago.

Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American statistical Association, 88, 1242-1249. https://doi.org/10.1080/01621459.1993.10476404

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3), 151-159. https://doi.org/10.1037/h0026141

Mackonis, A. (2013). Inference to the best explanation, coherence and other explanatory virtues. Synthese, 190(6), 975-995. https://doi.org/10.1007/s11229-011-0054-y

Matsunaga, M. (2007). Familywise error in multiple comparisons: Disentangling a knot through a critique of O'Keefe's arguments against alpha adjustment. Communication Methods and Measures, 1, 243-265. https://doi.org/10.1080/19312450701641409

Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago University Press.

Mayo, D. G. (2014). On the Birnbaum argument for the strong likelihood principle. Statistical Science, 29, 227-239. http://dx.doi.org/10.1214/1 3-STS457

Mayo, D. G., & Morey, R. D. (2017). A poor prognosis for the diagnostic screening critique of statistical tests. OSFPreprints. https://doi.org/10.31219/osf.io/ps38b

McShane, B. B., Bradlow, E. T., Lynch, J. G. Jr., & Meyer, R. J. (2023). “Statistical significanceänd statistical reporting: Moving beyond binary. Journal of Marketing.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834. https://doi.org/10.1037/0022-006X.46.4.806

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195-244. https://doi.org/10.2466/pr0.1990.66.1.195

Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 393- 425). Erlbaum.

Merton, R. K. (1987). Three fragments from a sociologist's notebooks: Establishing the phenomenon, specified ignorance, and strategic research materials. Annual Review of Sociology, 13(1), 1-29. https://doi.org/10.1146/annurev.so.13.080187.000245

Molloy, S. F., White, I. R., Nunn, A. J., Hayes, R., Wang, D., & Harrison, T. S. (2022). Multiplicity adjustments in parallel-group multi-arm trials sharing a control group: Clear guidance is needed. Contemporary Clinical Trials, 113, Article 106656. https://doi.org/10.1016/j.cct.2021.106656

Type I Error Rates are Not Usually Inflated 25

Morgan, J. F. (2007). P value fetishism and use of the Bonferroni adjustment. Evidence-Based Mental Health, 10, 34-35. http://dx.doi.org/10.1136/ebmh.10.2.34Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Percie du Sert, N., ... & Ioannidis, J. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1-9. https://doi.org/10.1038/s41562-016-0021

Neyman, J. (1950). First course in probability and statistics. Henry Holt.

Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97-131. https://doi.org/10.1007/BF00485695

Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika 20A, 175-240. http://dx.doi.org/10.2307/2331945Neyman, J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231, 289-337. https://doi.org/10.1098/rsta.1933.0009

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600-2606. https://doi.org/10.1073/pnas.1708274114

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-

Nosek, B. A., & Lakens, D. (2014). Registered reports. Social Psychology, 45(3), 137-141. https://doi.org/10.1027/1864-9335/a000192

Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26(5), 1596-1618. https://doi.org/10.3758/s13423-019-

01645-2

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716. https://doi.org/10.1126/science.aac4716

Parker, R. A., & Weir, C. J. (2020). Non-adjustment for multiple testing in multi-arm trials of distinct treatments: Rationale and justification. Clinical Trials, 17(5), 562-566. https://doi.org/10.1177/1740774520941419

Parker, R. A., & Weir, C. J. (2022). Multiple secondary outcome analyses: Precise interpretation is important. Trials, 23(1), Article 27. https://doi.org/10.1186/s13063-021-05975-2

Parker, T. H., Forstmeier, W., Koricheva, J., Fidler, F., Hadfield, J. D., Chee, Y. E., ... & Nakagawa, S. (2016). Transparency in ecology and evolution: real problems, real solutions. Trends in Ecology & Evolution, 31(9), 711-719.

Perneger, T. V. (1998). What's wrong with Bonferroni adjustments. British Medical Journal, 316, 1236-1238. https://doi.org/10.1136/bmj.316.7139.1236

Pollard, P., & Richardson, J. T. (1987). On the probability of making Type I errors. Psychological Bulletin, 102(1), 159-163. https://doi.org/10.1037/0033-2909.102.1.159

Popper, K. R. (1962). Conjectures and refutations: The growth of scientific knowledge. Basic Books.

Popper, K. R. (2002). The logic of scientific discovery. Routledge.

Redish, D. A., Kummerfeld, E., Morris, R. L., & Love, A. C. (2018). Reproducibility failures are essential to scientific inquiry. Proceedings of the National Academy of Sciences, 115(20), 5042-5046. https://doi.org/10.1073/pnas.1806370115

Type I Error Rates are Not Usually Inflated 26

Reichenbach, H. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge. University of Chicago Press. https://philarchive.org/archive/REIEAP-2

Reid, N. (1995). The roles of conditioning in inference. Statistical Science, 10(2), 138-157. https://doi.org/10.1214/ss/1177010027

Reid, N., & Cox, D. R. (2015). On some principles of statistical inference. International Statistical Review, 83, 293-308. http://dx.doi.org/10.1111/insr.12067Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology, 1, 43-

46. https://www.jstor.org/stable/20065622

Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (3rd ed.). Lippincott Williams & Wilkins.

Rubin, M. (2017a). An evaluation of four solutions to the forking paths problem: Adjusted alpha, preregistration, sensitivity analyses, and abandoning the Neyman-Pearson approach. Review of General Psychology, 21(4), 321-329. https://doi.org/10.1037/gpr0000135

Rubin, M. (2017b). Do p values lose their meaning in exploratory analyses? It depends how you define the familywise error rate. Review of General Psychology, 21(3), 269-275. https://doi.org/10.1037/gpr0000123

Rubin, M. (2020a). Does preregistration improve the credibility of research findings? The Quantitative Methods for Psychology, 16(4), 376-390. https://doi.org/10.20982/tqmp.16.4.p376

Rubin, M. (2020b). “Repeated sampling from the same population?” A critique of Neyman and Pearson's responses to Fisher. European Journal for Philosophy of Science, 10, Article 42, 1-15. https://doi.org/10.1007/s13194-020-00309-6

Rubin, M. (2021a). There's no need to lower the significance threshold when conducting single tests of multiple individual hypotheses. Academia Letters, Article 610. https://doi.org/10.20935/AL610

Rubin, M. (2021b). What type of Type I error? Contrasting the Neyman-Pearson and Fisherian approaches in the context of exact and direct replications. Synthese, 198, 5809-5834. https://doi.org/10.1007/s11229-019-02433-0

Rubin, M. (2021c). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 199, 10969-11000. https://doi.org/10.1007/s11229-021-03276-4

Rubin, M. (2022). The costs of HARKing. British Journal for the Philosophy of Science, 73(2), 535-560. https://doi.org/10.1093/bjps/axz050

Rubin, M., & Donkin, C. (2022). Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests. Philosophical Psychology. https://doi.org/10.1080/09515089.2022.2113771

Savitz, D. A., & Olshan, A. F. (1995). Multiple comparisons and related issues in the interpretation of epidemiologic data. American Journal of Epidemiology, 142, 904-908. https://doi.org/10.1093/oxfordjournals.aje.a117737

Schulz, K. F., & Grimes, D. A. (2005). Multiplicity in randomised trials I: Endpoints and treatments. The Lancet, 365, 1591-1595. https://doi.org/10.1016/S0140-6736(05)66461-6

Senn, S. (2007). Statistical issues in drug development (2nd ed.). Wiley.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366. https://doi.org/10.1177/0956797611417632

Type I Error Rates are Not Usually Inflated 27

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214. https://doi.org/10.1038/s41562-020-0912-z

Sinclair, J., Taylor, P. J., & Hobbs, S. J. (2013). Alpha level adjustments for multiple dependent variable analyses and their applicability—A review. International Journal of Sports Science Engineering, 7, 17-20.

Spanos, A. (2006). Where do statistical models come from? Revisiting the problem of specification. Optimality, 49, 98-119. https://doi.org/10.1214/074921706000000419

Spanos, A. (2010). Akaike-type criteria and the reliability of inference: Model selection versus statistical model specification. Journal of Econometrics, 158(2), 204-220. https://doi.org/10.1016/j.jeconom.2010.01.011

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712. https://doi.org/10.1177/1745691616658637

Stefan, A. M., & Schönbrodt, F. D. (2023). Big little lies: A compendium and simulation of p hacking strategies. Royal Society Open Science, 10(2), Article 220346. https://doi.org/10.1098/rsos.220346

Syrjänen, P. (2023). Novel prediction and the problem of low-quality accommodation. Synthese, 202, Article 182, 1-32. https://doi.org/10.1007/s11229-023-04400-2

Szollosi, A., & Donkin, C. (2021). Arrested theory development: The misguided distinction between exploratory and confirmatory research. Perspectives on Psychological Science, 16(4), 717-724. https://doi.org/10.1177/1745691620966796

Taylor, J., & Tibshirani, R. J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences, 112(25), 7629-7634. https://doi.org/10.1073/pnas.1507583112

Tukey, J. W. (1953). The problem of multiple comparisons. Princeton University.

Turkheimer, F. E., Aston, J. A., & Cunningham, V. J. (2004). On the logic of hypothesis testing in functional imaging. European Journal of Nuclear Medicine and Molecular Imaging, 31, 725-732. https://doi.org/10.1007/s00259-003-1387-7

Veazie, P. J. (2006). When to combine hypotheses and adjust for multiple tests. Health Services Research, 41(3p1), 804-818. https://dx.doi.org10.1111%2Fj.1475-6773.2006.00512.x

Venn, J. (1876). The logic of chance (2nd ed.). Macmillan and Co.

Wagenmakers, E. J. (2016). [Comment]. https://www.psychologicalscience.org/observer/why preregistration-makes-me-nervous

Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-

638. https://doi.org/10.1177/1745691612463078

Wasserman, L. (2013, March 14). Double misunderstandings about p-values. Normal Deviate. https://normaldeviate.wordpress.com/2013/03/14/double-misunderstandings-about-p values/

Wilson, W. (1962). A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychological Bulletin, 59, 296-300. https://doi.org/10.1037/h0040447

Worrall, J. (2010). Theory confirmation and novel evidence. In D. G., Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 125-169). Cambridge University Press.

Type I Error Rates are Not Usually Inflated 28

Peer review: This article has not undergone formal peer review.