Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311(7003), 485-485. Armstrong, R. A. (2014). When to use the Bonferroni correction. Ophthalmic and Physiological Optics, 34, 502-508. Barrett, L. F. (2015). Psychology is not in crisis. The New York Times, A23. Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how? Journal of Clinical Epidemiology, 54, 343-349. Bergkvist, L. (2020). Preregistration as a way to limit questionable research practice in advertising research. International Journal of Advertising, 39(7), 1172-1180. Type I Error Rates are Not Usually Inflated 22 Berk, R. A., Western, B., & Weiss, R. E. (1995). Statistical inference for apparent populations. Sociological Methodology, 25, 421-458. Birnbaum, A. (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57(298), 269-306. Bolles, R. C. (1962). The difference between statistical hypotheses and scientific hypotheses. Psychological Reports, 11(3), 639-645. Boring, E. G. (1919). Mathematical vs. scientific significance. Psychological Bulletin, 16(10), 335-338. Brower, D. (1949). The problem of quantification in psychological science. Psychological Review, 56(6), 325-333. Chow, S. L. (1998). Précis of statistical significance: Rationale, validity, and utility. Behavioral and Brain Sciences, 21(2), 169-194. Cook, R. J., & Farewell, V. T. (1996). Multiplicity considerations in the design and analysis of clinical trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 159, 93-110., D. R. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics, 29(2), 357-372., D. R., & Mayo, D. G. (2010). Objectivity and conditionality in frequentist inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 276-304). Cambridge University Press. Del Giudice, M., & Gangestad, S. W. (2021). A traveler's guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic decisions. Advances in Methods and Practices in Psychological Science, 4(1). Dennis, B., Ponciano, J. M., Taper, M. L., & Lele, S. R. (2019). Errors in statistical inference under model misspecification: Evidence, hypothesis testing, and AIC. Frontiers in Ecology and Evolution, 7, Article 372. Devezer, B., & Buzbas, E. O. (2023). Rigorous exploration in a model-centric science via epistemic iteration. Journal of Applied Research in Memory and Cognition, 12(2), 189- 194. Devezer, B., Navarro, D. J., Vandekerckhove, J., & Buzbas, E. O. (2021). The case for formal methodology in scientific reform. Royal Society Open Science, 8(3), Article 200805. Feynman, R. P. (1955). The value of science. Engineering and Science, 19(3), 13-15. Firestein, S. (2012). Ignorance: How it drives science. Oxford University Press. Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture. 33, 503-515. Fisher, R. A. (1930). Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society, 26(4), 528-535. Fisher, R. A. (1956). Statistical methods and scientific inference. Oliver & Boyd. Fisher, R. A. (1971). The design of experiments (9th ed.). Hafner Press. Fraser, D. A. S. (2019). The p-value function and statistical inference. The American Statistician, 73(sup1), 135-147. Type I Error Rates are Not Usually Inflated 23 García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology, 8, Article 100120. Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102, Article 460., G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311-339). Erlbaum. Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2), 198-218. Greenland, S. (2017). Invited commentary: The need for cognitive science in methodology. American Journal of Epidemiology, 186(6), 639-645. Greenland, S. (2021). Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatric and Perinatal Epidemiology, 35, 8-23. Greenland, S. (2023). Connecting simple and precise p‐values to complex and ambiguous realities. Scandinavian Journal of Statistics, 50, 899-914. Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337-350. Hager, W. (2013). The statistical theories of Fisher and of Neyman and Pearson: A methodological perspective. Theory & Psychology, 23, 251-270., B. D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American Journal of Psychology, 122(2), 219-234. Haig, B. D. (2018). Method matters in psychology: Essays in applied philosophy of science. Springer. Hancock, G. R., & Klockars, A. J. (1996). The quest for α: Developments in multiple comparison procedures in the quarter century since. Review of Educational Research, 66(3), 269-306. Hewes, D. E. (2003). Methods as tools. Human Communication Research, 29, 448-454. Hochberg, Y., & Tamrane, A. C. (1987). Multiple comparison procedures. Wiley. Hurlbert, S. H., & Lombardi, C. M. (2012). Lopsided reasoning on lopsided tests and multiple comparisons. Australian & New Zealand Journal of Statistics, 54(1), 23-42. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196-217., K., Zakharkin, S. O., Loraine, A., & Allison, D. B. (2004). Picking the most likely candidates for further development: Novel intersection-union tests for addressing multi-component hypotheses in comparative genomics. Proceedings of the American Statistical Association, ASA Section on ENAR Spring Meeting (pp. 1396-1402). I Error Rates are Not Usually Inflated 24 Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S.,...& Sowden, W. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443-490. Kotzen, M. (2013). Multiple studies and evidential defeat. Noûs, 47(1), 154-180., N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular analysis in systems neuroscience: The dangers of double dipping. Nature Neuroscience, 12(5), 535- 540. Kuhn, T. S. (1977). The essential tension: Selected studies in the scientific tradition and change. The University of Chicago. Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American statistical Association, 88, 1242-1249. Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3), 151-159. Mackonis, A. (2013). Inference to the best explanation, coherence and other explanatory virtues. Synthese, 190(6), 975-995. Matsunaga, M. (2007). Familywise error in multiple comparisons: Disentangling a knot through a critique of O'Keefe's arguments against alpha adjustment. Communication Methods and Measures, 1, 243-265. Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago University Press. Mayo, D. G. (2014). On the Birnbaum argument for the strong likelihood principle. Statistical Science, 29, 227-239. 3-STS457 Mayo, D. G., & Morey, R. D. (2017). A poor prognosis for the diagnostic screening critique of statistical tests. OSFPreprints. McShane, B. B., Bradlow, E. T., Lynch, J. G. Jr., & Meyer, R. J. (2023). “Statistical significanceänd statistical reporting: Moving beyond binary. Journal of Marketing. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834. Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195-244. Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 393- 425). Erlbaum. Merton, R. K. (1987). Three fragments from a sociologist's notebooks: Establishing the phenomenon, specified ignorance, and strategic research materials. Annual Review of Sociology, 13(1), 1-29. Molloy, S. F., White, I. R., Nunn, A. J., Hayes, R., Wang, D., & Harrison, T. S. (2022). Multiplicity adjustments in parallel-group multi-arm trials sharing a control group: Clear guidance is needed. Contemporary Clinical Trials, 113, Article 106656. Type I Error Rates are Not Usually Inflated 25 Morgan, J. F. (2007). P value fetishism and use of the Bonferroni adjustment. Evidence-Based Mental Health, 10, 34-35.ò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Percie du Sert, N., ... & Ioannidis, J. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1-9. Neyman, J. (1950). First course in probability and statistics. Henry Holt. Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97-131. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika 20A, 175-240., J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231, 289-337. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600-2606. Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. Nosek, B. A., & Lakens, D. (2014). Registered reports. Social Psychology, 45(3), 137-141. Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26(5), 1596-1618. 01645-2 Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716. Parker, R. A., & Weir, C. J. (2020). Non-adjustment for multiple testing in multi-arm trials of distinct treatments: Rationale and justification. Clinical Trials, 17(5), 562-566. Parker, R. A., & Weir, C. J. (2022). Multiple secondary outcome analyses: Precise interpretation is important. Trials, 23(1), Article 27. Parker, T. H., Forstmeier, W., Koricheva, J., Fidler, F., Hadfield, J. D., Chee, Y. E., ... & Nakagawa, S. (2016). Transparency in ecology and evolution: real problems, real solutions. Trends in Ecology & Evolution, 31(9), 711-719. Perneger, T. V. (1998). What's wrong with Bonferroni adjustments. British Medical Journal, 316, 1236-1238. Pollard, P., & Richardson, J. T. (1987). On the probability of making Type I errors. Psychological Bulletin, 102(1), 159-163. Popper, K. R. (1962). Conjectures and refutations: The growth of scientific knowledge. Basic Books. Popper, K. R. (2002). The logic of scientific discovery. Routledge. Redish, D. A., Kummerfeld, E., Morris, R. L., & Love, A. C. (2018). Reproducibility failures are essential to scientific inquiry. Proceedings of the National Academy of Sciences, 115(20), 5042-5046. Type I Error Rates are Not Usually Inflated 26 Reichenbach, H. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge. University of Chicago Press. Reid, N. (1995). The roles of conditioning in inference. Statistical Science, 10(2), 138-157. Reid, N., & Cox, D. R. (2015). On some principles of statistical inference. International Statistical Review, 83, 293-308., K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology, 1, 43- 46. Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (3rd ed.). Lippincott Williams & Wilkins. Rubin, M. (2017a). An evaluation of four solutions to the forking paths problem: Adjusted alpha, preregistration, sensitivity analyses, and abandoning the Neyman-Pearson approach. Review of General Psychology, 21(4), 321-329. Rubin, M. (2017b). Do p values lose their meaning in exploratory analyses? It depends how you define the familywise error rate. Review of General Psychology, 21(3), 269-275. Rubin, M. (2020a). Does preregistration improve the credibility of research findings? The Quantitative Methods for Psychology, 16(4), 376-390. Rubin, M. (2020b). “Repeated sampling from the same population?” A critique of Neyman and Pearson's responses to Fisher. European Journal for Philosophy of Science, 10, Article 42, 1-15. Rubin, M. (2021a). There's no need to lower the significance threshold when conducting single tests of multiple individual hypotheses. Academia Letters, Article 610. Rubin, M. (2021b). What type of Type I error? Contrasting the Neyman-Pearson and Fisherian approaches in the context of exact and direct replications. Synthese, 198, 5809-5834. Rubin, M. (2021c). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 199, 10969-11000. Rubin, M. (2022). The costs of HARKing. British Journal for the Philosophy of Science, 73(2), 535-560. Rubin, M., & Donkin, C. (2022). Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests. Philosophical Psychology. Savitz, D. A., & Olshan, A. F. (1995). Multiple comparisons and related issues in the interpretation of epidemiologic data. American Journal of Epidemiology, 142, 904-908. Schulz, K. F., & Grimes, D. A. (2005). Multiplicity in randomised trials I: Endpoints and treatments. The Lancet, 365, 1591-1595. Senn, S. (2007). Statistical issues in drug development (2nd ed.). Wiley. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366. Type I Error Rates are Not Usually Inflated 27 Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214. Sinclair, J., Taylor, P. J., & Hobbs, S. J. (2013). Alpha level adjustments for multiple dependent variable analyses and their applicability—A review. International Journal of Sports Science Engineering, 7, 17-20. Spanos, A. (2006). Where do statistical models come from? Revisiting the problem of specification. Optimality, 49, 98-119. Spanos, A. (2010). Akaike-type criteria and the reliability of inference: Model selection versus statistical model specification. Journal of Econometrics, 158(2), 204-220. Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712. Stefan, A. M., & Schönbrodt, F. D. (2023). Big little lies: A compendium and simulation of p hacking strategies. Royal Society Open Science, 10(2), Article 220346. Syrjänen, P. (2023). Novel prediction and the problem of low-quality accommodation. Synthese, 202, Article 182, 1-32. Szollosi, A., & Donkin, C. (2021). Arrested theory development: The misguided distinction between exploratory and confirmatory research. Perspectives on Psychological Science, 16(4), 717-724. Taylor, J., & Tibshirani, R. J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences, 112(25), 7629-7634. Tukey, J. W. (1953). The problem of multiple comparisons. Princeton University. Turkheimer, F. E., Aston, J. A., & Cunningham, V. J. (2004). On the logic of hypothesis testing in functional imaging. European Journal of Nuclear Medicine and Molecular Imaging, 31, 725-732. Veazie, P. J. (2006). When to combine hypotheses and adjust for multiple tests. Health Services Research, 41(3p1), 804-818. https://dx.doi.org10.1111%2Fj.1475-6773.2006.00512.x Venn, J. (1876). The logic of chance (2nd ed.). Macmillan and Co. Wagenmakers, E. J. (2016). [Comment]. preregistration-makes-me-nervous Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632- 638. Wasserman, L. (2013, March 14). Double misunderstandings about p-values. Normal Deviate. values/ Wilson, W. (1962). A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychological Bulletin, 59, 296-300. Worrall, J. (2010). Theory confirmation and novel evidence. In D. G., Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 125-169). Cambridge University Press. Type I Error Rates are Not Usually Inflated 28 Peer review: This article has not undergone formal peer review.