David Chart: Inference to the Best Explanation, Bayesianism, and Feminist Bank Tellers

David Chart

Department of History and Philosophy of Science
University of Cambridge
Free School Lane
Cambridge
CB2 3RH
UK

dc132@cam.ac.uk

Inference to the Best Explanation and Bayesianism have both been proposed as descriptions of the way that people make inferences. This paper argues that one result from cognitive psychology, the "feminist bank teller" experiment, suggests that people use Inference to the Best Explanation rather than Bayesian techniques.

Recently there has been some discussion of whether Inference to the Best Explanation is a tenable account of our inductive practices. Much of this discussion has centred on whether Inference to the Best Explanation is consistent with a Bayesian approach. (See, for example, van Fraassen 1989, Okasha 2000, Salmon (forthcoming a, b), and Lipton (forthcoming).) At present, the debate is tending towards the view that the two methods are consistent (see, for example, Okasha 2000, p. 702-4). That is, the steps involved in an inference to the best explanation can be matched to the steps in a Bayesian inference, and vice versa.

Nevertheless, there is no question but that the methods are different. Bayesian inference involves the consideration of probabilities, and their manipulation according to the probability calculus. Inference to the Best Explanation involves considering the qualities of various competing explanations. If the two methods are wholly consistent, then the differences between them are not relevant for justificatory accounts of induction; we might as well use one as the other. On the other hand, the difference does matter to descriptive accounts. People who reason using Bayesian techniques are different from people who reason using Inference to the Best Explanation, and there are different problems in the two cases.

However, given that they are consistent, distinguishing the possibilities looks difficult. If the processes are consistent, they should produce the same results, so we will not be able to tell, experimentally, which approach is being used. Introspection is notoriously unreliable in these cases; Bayesians believe that they use Bayesian reasoning, while supporters of Inference to the Best Explanation find themselves inferring to good explanations. How can the deadlock be broken?

Two processes which produce the same result by different methods are indistinguishable as long as they work correctly. However, if the methods are different, they will be prone to different modes of failure. Consider, for example, two ways of producing a list of all the even numbers. Method A steps through the integers, and drops every other one, so that only the even numbers are given as output. Method B steps through the integers, and gives the double of each as output.

Given the sequence (8,10,12), it is impossible to say which method was used to generate it. However, if our machine produces the sequence (8,9,10), we can be confident that it is using method A. It has failed to skip '9'. Method B could not realistically produce this sequence, as when it is giving us '8', the base number is '4'. There is no plausible mechanism for producing '9' as the next output, certainly not followed by '10'. On the other hand, if we get the sequence (8,5,12), we can be confident that we are dealing with Method B, for similar reasons.

If we want to distinguish between Inference to the Best Explanation and Bayesianism, then, we should look at cases in which inference appears to go wrong. Fortunately for us, there is an extensive literature on this topic, and many important papers were collected in Kahneman, Slovic, and Tversky 1982. I want to consider the case of the feminist bank teller.

A number of experimental subjects were given the following description of a person:

They were then asked to rank eight further statements about Linda in order of probability. Two were important for the experiment: (X) 'Linda is a bank teller' and (Y) 'Linda is a bank teller and is active in the feminist movement'. More than 80% of the subjects said that Y was more likely than X. A moment's (prompted) thought shows that this cannot be right. All cases of Y are also cases of X, so Y cannot be more likely than X. Even subjects with significant statistical training said that Y was more likely than X, and still did so if X and Y were the only options presented, so that their relationship should have been obvious.

How can we explain this error? It immediately rules out one way of reaching the answers. The subjects cannot be considering the class of all bank tellers, and then considering how many members it shares with the class of all Lindas, relative to the size of the class of Lindas. That method could not produce this kind of error, because it would be immediately obvious that X was at least as likely as Y, even if all bank teller Lindas were also feminist Lindas.

The obvious alternative suggestion is that people assess the probability by means of inference. They suppose that they have inferred from the information given about Linda to the new statement, and assess how good that inference is. Since we take good inferences to be highly probable given the evidence, and poor inferences to be improbable on the same grounds, this method should work if our inferences do. It will also be prone to errors characteristic of our mode of inference.

Suppose that we make inferences by Bayesian conditionalisation. Bayes's theorem is p(h|e) = (p(h)p(e|h))/p(e). That is, the probability of the hypothesis given the evidence is equal to the prior probability of the hypothesis multiplied by the probability of the evidence, given the hypothesis, divided by the probability of the evidence. Surprising evidence gives strong support for a theory, as does evidence that a theory entails. In this case, the evidence is the statement about Linda, and has the same prior in both cases. The hypotheses are the further statements.

p(X) should be greater than p(Y). That is an elementary proposition in probability theory. The probability of Linda's background, given that she is a bank teller, p(e|X), and the probability of her background, given that she is a bank teller and active in the feminist movement, p(e|Y), are more complex, because p(e|Y) can be greater than p(e|X). There is no contradiction in supposing that the proportion of feminist bank tellers who have Linda's background is higher than the proportion of all bank tellers who do. Suppose, for example, that all and only feminist bank tellers have Linda's background, but that only 5% of bank tellers are feminist. In that case, p(e|X) is 0.05, and p(e|Y) is 1.

If we use Bayesian conditionalisation, what sorts of errors are we likely to make? There seem to be two fundamental kinds: we might assign probabilities incorrectly, and we might make arithmetical mistakes. Consider arithmetical mistakes first. There seems no good reason to suppose that such errors would bias us in a particular direction; even if we only ever overestimate probabilities, there is no reason to suppose that we should only over-estimate p(Y|e). Thus, arithmetical errors should lead to a far lower error rate than is seen, because in at least some of the cases of error the right answer will be produced by accident.

What about mistakes in assigning probabilities? We have good evidence that people can assign p(Y) greater than p(X), as people did so in this experiment (the posteriors from one inference are the priors for the next). We also know that people can assign p(X) greater than p(Y) - people do occasionally get elementary probability right. Bayesianism gives us no reason to assume a bias in one direction rather than another. The same applies to the conditional probabilities. There is no reason why people should consistently overestimate p(e|Y) and consistently underestimate p(e|X), which is what is required for the results obtained. Indeed, the reverse might be expected. p(e|X) should be quite low, so there is more 'space' above it, and thus more of a possibility for overestimate. In the most extreme case, p(e|Y) might be one, in which case it cannot be overestimated. (If we assign probabilities greater than one or less than zero, we are not even bad Bayesians.)

Thus, Bayesianism predicts that there should be a bias in favour of the correct answer. If we are very error-prone, it may be only a small bias, but it should be there, as Bayesianism does not predict biases among the errors. Thus, the mistakes made here are not the kind that we would expect from a Bayesian method of inference.

Now, consider the case of Inference to the Best Explanation. In this case, we assess the inference by looking at the explanatory relations between the evidence and the hypothesis. These relations must run in both directions. A theory is supported by evidence if it provides the best explanation for that evidence. On the other hand, a theory predicts that the evidence will be such that the theory can provide a good explanation. Thus, a theory does not predict things that it cannot explain. (It may be that theories can explain things that they cannot predict, if, for example, they are indeterministic.)

Linda's background, as given, provides absolutely no explanation for her becoming a bank teller. It is completely unexpected, nothing in her background seems prone to cause it, and her background and the career are not at all unified. On the other hand, it does provide a partial explanation for her becoming a feminist bank teller: it explains why she is a feminist. Those political views are expected on the basis of her background, could be caused by several elements of it, and the background and politics are quite well unified. Thus, the inference to Y looks much better than the inference to X, and so we would expect the bias that is actually seen in the results.

Note that this model predicts that subjects will rank 'Linda is active in the feminist movement' above 'Linda is a bank teller and is active in the feminist movement'. The explanation in the first case is better than in the second, because there are no unexplained danglers. Tversky and Kahneman did include 'Linda is active in the feminist movement' in their experiments, but they do not report the relevant results. This strongly suggests that it was ranked above the conjunction, because had it not been they would have reported that as well. However, this is merely an inference, and we have seen how reliable those are.

Tversky and Kahneman give their own interpretation of the results. They suggest that we use a 'representativeness heuristic', and that a feminist bank teller seems more representative of Linda's background than merely being a bank teller. Their interpretation obviously has a close affinity to the interpretation I have provided, and on certain theories of explanation, particularly unification models, it may turn out to be the same interpretation.

In conclusion, then, this result suggests that, if the choice is between Inference to the Best Explanation and Bayesianism, we use Inference to the Best Explanation. We make mistakes which are to be expected on an explanatory model of inference, but which are surprising and hard to explain on a Bayesian model. Thus, according to Inference to the Best Explanation, we should tentatively conclude that we use explanatory, not Bayesian, inference.

Of course, that might appear to beg the question. So, let us consider it from the Bayesian perspective. Let us set the prior probabilities to be equal. (My personal inclination is to make the prior of Inference to the Best Explanation rather higher, but that would be unfair. Besides, the priors should eventually wash out.) Now, the probability of the results conditional on Bayesianism is quite low, while the probability conditional on explanatory inference is quite high. Since the evidence presumably has the same prior in both cases, this means that, at the end of the experiment, Inference to the Best Explanation has a higher posterior probability than Bayesianism does.

Although this discussion suggests that we use Inference to the Best Explanation rather than Bayesianism, it is far from conclusive. First, I have only considered two possible modes of inference - we might do something else altogether. Second, these are the results of a single experiment. It may be that in this sort of case there is another confounding factor interfering with our natural Bayesianism. I suggest, then, that it would be valuable to look at further experiments on systematic irrationality, to see what our errors tell us about the way we get things right.

References

Hon, G. and Rackover, S. (eds), forthcoming Explanation: Theoretical Approaches (Kluwer)

Kahneman, D., Slovic, P., and Tversky, A. (eds) 1982 Judgment under uncertainty: Heuristics and biases (Cambridge: Cambridge University Press)

Lipton, P. forthcoming 'Is Explanation a Guide to Inference? A Reply to Wesley Salmon' in Hon and Rackover forthcoming.

Okasha, S., 2000 'Van Fraassen's Critique of Inference to the Best Explanation' Studies in History and Philosophy of Science 31A 691-710

Salmon, W. forthcoming a 'Explanation and Confirmation: A Bayesian Critique of Inference to the Best Explanation' in Hon and Rackover, forthcoming.

Salmon, W. forthcoming b 'Reflections of a Bashful Bayesian: A Reply to Peter Lipton' in Hon and Rackover, forthcoming.

Tversky, A. and Kahneman, D. 1982 'Judgments by and of representativeness' in Kahneman, Slovic, and Tversky 1982, 84-98