Young’s P-Value Plot as an Agnogenic Technique, Dan Hicks

In a recent paper in the journal Environmental Epidemiology (Hicks 2022), I examined the statistical-evidential properties of Young’s p-value plot, a method used by the biostatistician S. Stanley Young and various collaborators to critique air pollution epidemiology. I showed that the plot was fundamentally unable to support some of these criticisms; it might in principle be able to support other criticisms, but only when used much more carefully. Here, I discuss Young’s p-value plot as an agnogenic technique, a technique for manufacturing doubt about the hazards of air pollution. I first give some non-technical background on the p-value plot and summarize the findings from my study … [please read below the rest of the article].

Image credit: Dunk via Flickr / Creative Commons

Article Citation:

Hicks, Dan. 2022. “Young’s P-Value Plot as an Agnogenic Technique.” Social Epistemology Review and Reply Collective 11 (3): 36-44.

🔹 The PDF of the article gives specific page numbers.

Constructing P-Value Plots

Young and collaborators have published numerous papers using the p-value plot to criticize research on the hazards of air pollution (Young, Bang, and Oktay 2009; Young and Xia 2013; You, Lin, and Young 2018; Young, Acharjee, and Das 2019; Young and Kindzierski 2019b, 2019a, 2020a, 2020b; Young and Kindzierski 2019c; Young 2020; Kindzierski et al. 2021; Young, Kindzierski, and Randall 2021). The plot itself is very simple to construct. We start with a collection of p-values, such as those reported in the collection of studies used by a meta-analysis. We then plot the p-values in order, from smallest to greatest, and examine some features of the plot; Young and collaborators do this visually, while in my paper I did this numerically across a few thousand simulated studies.

For example, the simulated p-value plot below seems to have the “hockey stick” shape, with a more-or-less straight “handle” that curves up into a “blade” on the right-hand side. Young and collaborators frequently interpret this shape as evidence of statistical heterogeneity (different statistical effects in different subpopulations) or questionable research practices such as p-hacking (conducting many different analyses on a single dataset but only reporting the statistically significant ones). On this basis, they allege that questionable research practices are widespread in environmental epidemiology (Young and Kindzierski 2019c; Young, Kindzierski, and Randall 2021; among others), there is no basis for thinking that air pollution is hazardous (Young and Kindzierski 2020a; among others), and indeed that “regulation of PM2.5 [a major type of air pollution] should be abandoned altogether” (Young 2017).

An example of Young’s p-value plot. In my simulation, a simple, standardized study design is replicated a number of times. Each study generates an independent p-value, the p-values are ordered from smallest to largest, and then displayed in the p-value plot. For this plot, there were 30 replications of a study design with a sample of 60 observations (30 each for exposure and control) and the same small effect size (Cohen’s d = 0.3).

However, the simulated studies that produced this example plot had neither heterogeneity nor questionable research practices. Across the simulation, plots with this “hockey shape” feature are common, and so the p-value plot is incapable of providing evidence to support this kind of criticism. Some other critiques are more subtle: Young and his collaborators sometimes look at the slope of the plot, and claim that a slope of 1 shows there’s no effect (Young, Acharjee, and Das 2019; among others). My simulation indicates that the p-value plot can be used to support this kind of conclusion. But doing so requires using an uncommon statistical test, and specifying in advance how much the slope of the p-value plot is allowed to deviate from 1. As far as I can tell, Young and his collaborators rely almost entirely on visual assessments, which are too imprecise to support this kind of conclusion.

Despite tracing citations and using tools such as Google Scholar, I was unable to find any peer-reviewed studies that validate the p-value plot as Young and his collaborators have used it. Instead, typically Young and his collaborators cite two other plot methods. Simonsohn et al.’s p-curve is a histogram of p-values below the conventional 0.05 threshold for statistical significance, designed to detect publication bias and p-hacking and validated for this use using simulation studies (Simonsohn, Nelson, and Simmons 2014). It has almost nothing to do with Young’s p-value plot. Schweder and Spjøtvoll’s p-value plot more closely resembles Young’s p-value plot—there is a 1-to-1 relationship between the two plots, though not between their slopes—and Schweder and Spjøtvoll (1982) give a mathematical (but somewhat hand-wavy) validation of their plot for their purposes. But their p-value plot is designed to answer a different question and uses a different set of assumptions from Young’s p-value plot. In short, Young’s p-value plot is quite different from the two other plots, and so citations to the latter are insufficient to validate the former.

Sometimes Young and collaborators cite a few documents of their own as though they validated the p-value plot (Young and Kindzierski 2019a; Young, Kindzierski, and Randall 2021). These documents are not peer-reviewed and just give examples of p-value plots; neither systematically examines the statistical-evidential properties of the plot. (As of February 2022 Young has sent me two unsolicited emails with Young, Kindzierski, and Randall (2021) as an attachment, expressing polite but non-specific disagreement with Hicks (2021). I have not replied to either email.)

Agnogenic Technique

Agnotology, “the study of ignorance,” is a conceptual framework developed within history, philosophy, and social studies of science that understands ignorance as something produced by human activity. The framework emphasizes that ignorance is not necessarily just a gap in our knowledge that has yet to be filled in. Rather, the structure and culture of scientific communities can “passively” lead to systematic gaps in our knowledge (Schiebinger 2008; Frickel et al. 2010; Elliott 2012). Or agents can deliberately create ignorance or uncertainty, manufacturing doubt as a “strategic ploy”; for example, numerous studies have examined the ways various industries have deliberately created ignorance to avoid regulation (Michaels 2008; Proctor 2008; Oreskes and Conway 2011; Fernández Pinto 2017; Supran and Oreskes 2017).

I suggest the term agnogenic technique to describe any technique that is effective for producing ignorance, whether it has this effect intentionally or unintentionally. Specifically, I claim that Young’s p-value plot is an agnogenic statistical technique. I emphasize that, in making this claim, I am not making any claims about whether Young or his collaborators intended to produce knowledge or ignorance. I, at least, don’t have the evidence necessary to make claims about the intentions of Young and his collaborators. (In the examples of manufactured doubt above, the authors generally do have appropriate evidence, such as memos or emails expressing these intentions.) At the same time, Young’s p-value plot is clearly effective at producing uncertainty or ignorance about the health effects of air pollution, by creating the appearance of questionable research practices in air pollution epidemiology.

Fernández Pinto (2017) identifies 5 strategies used by organizational actors—her primary case study is the tobacco industry—to create ignorance. In the remainder of this post, I relate these strategies to several features of Young’s p-value plot and the context of its use. Again, I do not make claims about any agent’s intentions. Rather, my claim is that, for Young’s p-value plot, these features are agnogenic values—features that make the technique especially effective at producing ignorance or uncertainty (compare the definition of epistemic values developed by Steel 2010).

Strategies for Creating Ignorance

First Strategy

The first strategy is to exploit ordinary, legitimate scientific uncertainty. Many of the most recent p-value plot papers refer to the replication crisis, an ongoing epistemic crisis (You, Lin, and Young 2018; Young, Acharjee, and Das 2019; Young and Kindzierski 2019a; Young and Kindzierski 2019c). While the crisis has unfolded primarily in social psychology and preclinical biomedical research, the replication crisis literature is often written much more broadly, raising concerns about “science” in general in ways that neglect the diversity of methods, standards, and practices used across (and indeed, within) scientific disciplines (see, for example, Munafò et al. 2017). Both the term “p-hacking” and the p-curve technique to detect it were developed within this literature. Young and collaborators also frequently raise concerns about the inability of peer review to catch flawed scientific research (Young 2012, 2013; Miller and Young 2017; Young and Kindzierski 2019c). (Note that these arguments were made using the p-value plot, a flawed method that was not caught in peer review; see the discussion of the fourth strategy below.) In these ways, Young and collaborators exploit the ambiguous scope of a legitimate epistemic crisis and legitimate concerns about peer review. However, unlike in social psychology, there does not appear to be any empirical evidence of replication problems in environmental public health (Hicks 2021).

Second and Third Strategies

Fernández Pinto’s second and third strategies involve supporting industry-friendly research and, specifically, recruiting distinguished industry-friendly scientists (Fernández Pinto 2017, 56–58). Young is a distinguished statistician. He coauthored an influential textbook on methods of adjusting for multiple comparisons (Westfall and Young 1993), is a fellow of both the American Statistical Association and the American Association for the Advancement of Science (“Who We Are – S. Stanley Young” n.d.), and in 2017 was appointed to the US Environmental Protection Agency’s Science Advisory Board. A reader familiar with his credentials is likely to assume that he wouldn’t use an unvalidated or flawed statistical technique. Young has also been supported by the fossil fuels industry: at least three of Young’s recent papers acknowledge funding from the American Petroleum Institute (You, Lin, and Young 2018; Young, Acharjee, and Das 2019; Young and Kindzierski 2019c). Another paper is a letter criticizing a study of occupational exposures to acrylonitrile, which is used in the manufacture of polyacrylamides; this letter acknowledges funding from SNF, a major manufacturer of polyacrylamides (Young 2020). Oddly, a letter published around the time of the American Petroleum Institute funding does not report any funding or conflicts of interest (Young and Kindzierski 2019b).

Young’s use of the p-value plot to make skeptical claims appears to predate this industry funding (Young, Bang, and Oktay 2009). Holman and Bruner (2017) and Weatherall, O’Connor, and Bruner (2018) argue that industry funding can create significant distortions in the scientific literature by supporting the work of ex ante friendly researchers, that is, researchers whose work tended to favor industry prior to any (direct) industry support. In this respect, it is not important whether the American Petroleum Institute changed the views and attitudes of Young and his collaborators. What is important is whether this industry funding enabled Young and collaborators to produce more skeptical papers or disseminate them more broadly.

Fourth Strategy

This leads directly to the fourth strategy, namely, the creation and support of venues for disseminating industry-friendly points of view, whether original research or criticisms of industry-unfavorable research (see also the fifth strategy, below). Fernández Pinto (2017) emphasizes what is often called grey literature—“non-peer-reviewed journals and pamphlets, such as the Tobacco and Health Report, a monthly newsletter”—as well as industry-funded academic symposia (Fernández Pinto 2017, 59). Many of the papers using Young’s p-value plot are grey literature: either not peer reviewed at all or in formats that typically undergo little or no peer review.

• At least two were published exclusively on the online preprint repository the arXiv, which involves no peer review (Young and Kindzierski 2019a, 2020a).

• Two other papers were published in academic journals as letters, a category that either does not go through peer review at all or is likely to receive less stringent review (Young and Kindzierski 2019b; Young 2020).

• Young, Kindzierski, and Randall (2021) is a report published by the National Association of Scholars, an advocacy organization best known for supporting various social conservative positions in education policy.

Other papers using Young’s p-value plot have been published in peer-reviewed academic journals. Three papers were published as articles in the peer-reviewed journals Regulatory Toxicology and Pharmacology (published by Elsevier) and Critical Reviews in Toxicology (published by Taylor & Francis) (You, Lin, and Young 2018; Young, Acharjee, and Das 2019; Young and Kindzierski 2019c). However, these two journals have been criticized as venues for industry-friendly research (Zou 2016). Only two relatively recent papers using Young’s p-value plot have been published as regular articles in what appear to be mainstream journals: Young, Bang, and Oktay (2009), published in Proceedings of the Royal Society B, and Young and Xia (2013), published in Statistical Analysis and Data Mining, a journal of the American Statistical Association. All together, Young and collaborators seem to favor either industry-friendly venues or working on the margins of peer review.

Fifth Strategy

The fifth strategy is attacking unfavorable research. In recent years, Young and collaborators have used the p-value plot almost exclusively to criticize meta-analyses that find evidence of harmful effects of air pollution—that is, research on the hazards of the products produced by the American Petroleum Institute’s member companies.

Parasitic Epistemic Mimicry

Finally, I want to develop Fernández Pinto’s broader comment that the strategies “are all practices traditionally tied to the process of knowledge production … but in this case they have been reshaped or rechanneled to fulfill the industry’s purposes” (Fernández Pinto 2017, 56). To me this suggests a sixth distinct strategy, which I suggest calling parasitic epistemic mimicry. This is the agnogenic strategy of adopting the appearance of legitimate or genuinely epistemic (knowledge-producing) scientific techniques. As discussed above, Young and collaborators frequently cite Schweder and Spjøtvoll’s plot and Simonsohn et al.’s p-curve, without acknowledging the fundamental differences between the three plots. This creates the false impression that Young and collaborators are simply applying the methods presented in these two highly-cited papers. It is highly plausible that most readers—including peer reviewers—will simply see these citations to more-or-less accepted methods and assume that the p-value plot has been validated. This is parasitic mimicry because of the way it harms genuinely epistemic techniques.

Young’s p-value plot might similarly exploit a kind of fetishization of p-values in the scientific literature. Arguably, p-values are best interpreted as a summary measure of the compatibility between some data and a model of the process that generated those data (Kass 2011; Greenland 2019). The model includes but is not exhausted by the target hypothesis. And, like other scientific models (Potochnik 2017), often some elements of the model are highly implausible or even known to be false. These implausible assumptions may even include the target hypothesis itself. (For example, often the target hypothesis is a null hypothesis that some effect is exactly equal to zero; but often this is highly implausible.) A p-value is also a statistic, in the technical sense, meaning (on the frequentist interpretation of probability) that we expect its value to vary somewhat across perfect replications of the data generating process (Cumming 2009). For all of these reasons, any inference from a p-value to a further conclusion (say, that we can reject the target hypothesis, or accept some rival hypothesis) must take into account many details about how the data were actually generated. We cannot treat a p-value as a standalone representative of a piece of scientific research. And yet Young’s p-value plot does this, not only representing a study’s results exclusively by its p-value but also purporting to divine researchers’ hidden motivations and behavior (p-hacking) through a kind of haruspicy (reading entrails).

Parasitic epistemic mimicry also enables Young et al.’s critiques of air pollution epidemiology to gain the protection of some standard norms in the review process that typically function to promote the production of knowledge. Hicks (2022) was rejected by four journals before being accepted at Environmental Epidemiology, two science journals and two in philosophy of science. Until the final revision at Environmental Epidemiology, the manuscript used “Young’s p-value plot” as the name of the plot, because Young appears to have created the plot (Westfall and Young 1993) and contributed to every paper I could find that used the plot. The introduction to the manuscript also noted the American Petroleum Institute funding and that Young served on the US Environmental Protection Agency’s Science Advisory Board during the Trump administration (US EPA 2017); based on this I suggested that Young et al.’s criticisms of air pollution research would likely be cited in legal and policy settings, and so warranted critical scrutiny.

Across both scientific and philosophical journals, multiple reviewers objected to these aspects of the manuscript, characterizing them as personal attacks on Young. For example, after quoting examples of these two aspects of the manuscript, one reviewer wrote that I was “targeting one specific scientist rather than a methodology” in a way that was “unconstructive and unfair.” None of these contextual points are ad hominem attacks, because they are relevant to evaluating the arguments that Young and collaborators make using the p-value plot and the importance of critically assessing these arguments (Resnik and Elliott 2013). But the norm against ad hominem attacks—which is typically an epistemic value, promoting the production of knowledge—protected the p-value plot from criticism, first by delaying publication and then by suppressing relevant information.

All together, I suggest that Young’s p-value plot is an agnogenic technique—a technique for effectively producing ignorance, here about the hazards of air pollution—though not just because it fails to have the statistical properties that would be required for it to produce evidence. The technique also enables Young and collaborators to rhetorically associate air pollution epidemiology with a legitimate epistemic crisis, thrives in an ecosystem of industry funding and industry-friendly venues, and parasitically mimics actual epistemic techniques. To extend the evolutionary metaphor, even if Young and collaborators have not intended to manufacture ignorance, the p-value plot has successfully constructed a niche in which it can thrive as a producer of ignorance.

Acknowledgments: Thanks to Heather Douglas, Ian Werkheiser, and Matt Brown for encouraging me to publish this commentary.

Author Information:

Dan Hicks,;, is a philosopher of science at the University of California, Merced. Within disciplinary philosophy, their work focuses on questions of science, policy, and values. They also have broader interdisciplinary interests touching on public scientific controversies, environmental justice, and data science.


Cumming, Geoff, dir. 2009. Dance p 3 Mar09. watch?v=ez4DgdurRPg.

Elliott, Kevin C. 2012. “Selective Ignorance and Agricultural Research.” Science, Technology & Human Values 38 (3): 328–50.

Fernández Pinto, Manuela. 2017. “To Know or Better Not To.” Science & Technology Studies 30 (2): 53–72.

Frickel, Scott, Sahra Gibbon, Jeff Howard, Joanna Kempner, Gwen Ottinger, and David J. Hess. 2010. “Undone Science: Charting Social Movement and Civil Society Challenges to Research Agenda Setting.” Science, Technology & Human Values 35 (4): 444–73.

Greenland, Sander. 2019. “Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values.” The American Statistician 73 (March): 106–14.

Hicks, Daniel J. 2022. “The P Value Plot Does Not Provide Evidence Against Air Pollution Hazards.” Environmental Epidemiology 6 (2): e198.

Hicks, Daniel J. 2021. “Open Science, the Replication Crisis, and Environmental Public Health.” Accountability in Research 0 (July): null.

Holman, Bennett, and Justin Bruner. 2017. “Experimentation by Industrial Selection.” Philosophy of Science 84 (5): 1008–19.

Kass, Robert E. 2011. “Statistical Inference: The Big Picture.” Statistical Science 26 (1): 1–9.

Kindzierski, Warren, Stanley Young, Terry Meyer, and John Dunn. 2021. “Evaluation of a Meta-Analysis of Ambient Air Quality as a Risk Factor for Asthma Exacerbation.” Journal of Respiration 1 (3): 173–96.

Michaels, David. 2008. Doubt Is Their Product How Industry’s Assault on Science Threatens Your Health. Oxford [u.a.]: Oxford Univ. Press.

Miller, Henry, and S. Stanley Young. 2017. “Viewpoint: Why so Many Scientific Studies Are Flawed and Poorly Understood.” Genetic Literacy Project December 13.

Munafò, Marcus R., Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017. “A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (0021): 1–9.

Oreskes, Naomi and Erik M. Conway. 2011. Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming. New York, NY, USA: Bloomsbury.

Potochnik, Angela. 2017. Idealization and the Aims of Science. Chicago and London: University of Chicago Press.

Proctor, Robert. 2008. “Agnotology: A Missing Term to Describe the Cultural Production of Ignorance (and Its Study).” In Agnotology: The Making and Unmaking of Ignorance, edited by Robert Proctor and Londa Schiebinger, 1–36. Stanford: Stanford University Press.

Resnik, David B, and Kevin Christopher Elliott. 2013. “Taking Financial Relationships into Account When Assessing Research.” Accountability in Research 20 (3): 184–205.

Schiebinger, Londa. 2008. “West Indian Abortifacients and the Making of Ignorance.” In Agnotology: The Making and Unmaking of Ignorance, edited by Robert Proctor and Londa Schiebinger, 149–62. Stanford: Stanford University Press.

Schweder, Tore and Emil Spjøtvoll. 1982. “Plots of P-values to Evaluate Many Tests Simultaneously.” Biometrika 69 (3): 493–502.

Simonsohn, Uri, Leif D. Nelson, and Joseph P. Simmons. 2014. “P-Curve: A Key to the File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534–47.

Steel, Daniel. 2010. “Epistemic Values and the Argument from Inductive Risk.” Philosophy of Science 77 (1): 14–34.

Supran, Geoffrey and Naomi Oreskes. 2017. “Assessing ExxonMobil’s Climate Change Communications (1977–2014).” Environmental Research Letters 12 (8): 084019.

US EPA. 2017. “Members of the Science Advisory Board.” December 1, 2017.

Weatherall, James Owen, Cailin O’Connor, and Justin P. Bruner. 2018. “How to Beat Science and Influence People: Policy-Makers and Propaganda in Epistemic Networks.” The British Journal for the Philosophy of Science August.

Westfall, Peter H. and S. Stanley Young. 1993. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. John Wiley & Sons.

“Who We Are – S. Stanley Young.” n.d. Heartland Institute. Accessed January 20, 2020.

You, Cheng, Dennis K. J. Lin, and S. Stanley Young. 2018. “Pm2.5 and Ozone, Indicators of Air Quality, and Acute Deaths in California, 2004–2007.” Regulatory Toxicology and Pharmacology 96 (July): 190–96.

Young, S. Stanley. 2020. “Re: Extended Mortality Follow-Up of a Cohort of 25,460 Workers Exposed to Acrylonitrile.” American Journal of Epidemiology 189 (4): 360–61.

Young, S. Stanley. 2017. “Suggestions for EPA.”

Young, S. Stanley. 2013. “S. Stanley Young: Scientific Integrity and Transparency.” Error Statistics Philosophy March 12.

Young, S. Stanley. 2012. “Testimony of Committee on Science, Space and Technology.”

Young, S. Stanley, Mithun Kumar Acharjee, and Kumer Das. 2019. “The Reliability of an Environmental Epidemiology Meta-Analysis, a Case Study.” Regulatory Toxicology and Pharmacology 102 (March): 47–52.

Young, S. Stanley, Heejung Bang, and Kutluk Oktay. 2009. “Cereal-Induced Gender Selection? Most Likely a Multiple Testing False Positive.” Proceedings of the Royal Society B: Biological Sciences 276 (1660): 1211–12.

Young, S. Stanley and Warren Kindzierski.  2020a. “Pm2.5 and All-Cause Mortality.” October 31, 2020.

Young, S. Stanley and Warren Kindzierski.  2020b. “Particulate Matter Exposure and Lung Cancer: A Review of Two Meta-Analysis Studies.” November 4, 2020.

Young, S. Stanley and Warren Kindzierski. 2019a. “Combined Background Information for Meta-Analysis Evaluation.” January 15, 2019.

Young, S. Stanley and Warren Kindzierski.  2019b. “Ambient Air Pollution and Mortality in 652 Cities.” New England Journal of Medicine 381 (21): 2072–75.

Young, S. Stanley, and Warren B. Kindzierski. 2019c. “Evaluation of a Meta-Analysis of Air Quality and Heart Attacks, a Case Study.” Critical Reviews in Toxicology, March.

Young, S. Stanley, Warren Kindzierski, and David Randall. 2021. “Shifting Sands: Unsound Science and Unsafe Regulation.” National Association of Scholars.

Young, S. Stanley and Jessie Q. Xia. 2013. “Assessing Geographic Heterogeneity and Variable Importance in an Air Pollution Data Set.” Statistical Analysis and Data Mining 6 (4): 375–86.

Zou, Jie Jenny. 2016. “Brokers of Junk Science?” Center for Public Integrity February 18.

Categories: Articles

Tags: , , , , , , , ,

Leave a Reply