RochefortMaranda, Guillaume
(2023)
Sampling error: The fundamental flaw of the severity measure of evidence.
[Preprint]
Abstract
Assuming a fixed population parameter to estimate and a test statistic based on a consistent and unbiased estimator of that parameter, I demonstrate without any shadow of a doubt that a popular measure of evidence championed by Deborah Mayo and Aris Spanos (the severity measure) is erroneous because of the sampling error. In fact, I show that the greater the sampling error, the greater the error of that measure.
Why am I so confident? Why 'without any shadow of a doubt'? Because I am presenting mathematical facts: 1Some statistical tests, like one sided ttests, are using statistics that are based on coherent and unbiased estimators such as the sample mean (the test statistic is essentially a centered and standardised sample mean in the one sided ttest). 2 We can increase or decrease the level of sampling error associated with the estimates at will by decreasing or increasing the sample size of the experiment. 3 This is equivalent to decreasing or increasing the power of the associated test, at will, by increasing or decreasing the sample size of the experiment. 4 As we decrease the power of the test (decrease the sample size), the only statistics that can reach the critical region under H1 eventually do so only because of the large sampling error and not because of the underlying truth of the matter: the real (usually unknown difference) between H0 and H1. 5 In that scenario, the test statistics become so deviant that they will inevitably corrupt the severity score because the latter is computed with the estimate that contains the large sampling error. This is all beautifully illustrated in the paper.
Under specific conditions, I show that it is in our best interest to work with the largest sample size that we can reasonable obtain in order to reduce that source of error. If the null hypothesis is false, working with the largest possible sample size will give us the best possible evidence against the null hypothesis by reducing the chance that our test statistic reaches the critical region of a test because of the sampling error. It will also improve the reliability of the severity measure of evidence, should we be inclined to use it.
Here is the catch: Mayo and Spanos( hereafter M \& S) are wellknown for claiming that more powerful tests do not provide better evidence against the null when the test is significant. I show that this is a mistake and that their own measure of evidence cannot allow them to make such a claim because it is less reliable when the power decreases. They simply cannot embrace the idea of improving their measure of evidence by encouraging the use of greater sample sizes and also claim that this will not improve the evidence against the null. Why bother with improving the measure of evidence then?
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)

View Item 