|
Bayes' theorem (also known as Bayes' rule or Bayes' law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. In some interpretations of probability, Bayes' theorem tells how to update or revise beliefs in light of new evidence: a posteriori. The probability of an event A conditional on another event B is generally different from the probability of B conditional on A. However, there is a definite relationship between the two, and Bayes' theorem is the statement of that relationship. As a formal theorem, Bayes' theorem is valid in all interpretations of probability. However, frequentist and Bayesian interpretations disagree about the kinds of things to which probabilities should be assigned in applications: frequentists assign probabilities to random events according to their frequencies of occurrence or to subsets of populations as proportions of the whole; Bayesians assign probabilities to propositions that are uncertain. A consequence is that Bayesians have more frequent occasion to use Bayes' theorem. The articles on Bayesian probability and frequentist probability discuss these debates at greater length. Statement of Bayes theorem Bayes' theorem relates the conditional and marginal probabilities of stochastic events A and B: where L(A|B) is the likelihood of A given fixed B. Each term in Bayes' theorem has a conventional name: With this terminology, the theorem may be paraphrased as In words: the posterior probability is proportional to the prior probability times the likelihood. In addition, the ratio Pr(B|A)/Pr(B) is sometimes called the standardised likelihood, so the theorem may also be paraphrased as Derivation from conditional probabilities To derive the theorem, we start from the definition of conditional probability. The probability of event A given event B is Likewise, the probability of event B given event A is Rearranging and combining these two equations, we find Dividing both sides by Pr(B), providing that it is non-zero, we obtain Bayes' theorem: Alternative forms of Bayes theorem Bayes'theorem is often embellished by noting that where AC is the complementary event of A (often called "not A"). So the theorem can be restated as More generally, where forms a partition of the event space, for any Ai in the partition. Bayes theorem in terms of odds and likelihood ratio Bayes' theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as where are the odds of A given B, are the odds of A by itself, and is the likelihood ratio. See also the law of total probability. Bayes theorem for probability densities There is also a version of Bayes' theorem for continuous distributions. It is somewhat harder to derive, since probability densities, strictly speaking, are not probabilities, so Bayes' theorem has to be established by a limit process; see Papoulis (citation below), Section 7.3 for an elementary derivation. Bayes' theorem for probability densities is formally similar to the theorem for probabilities: and there is an analogous statement of the law of total probability: ! As in the discrete case, the terms have standard names. f(x, y) is the joint distribution of X and Y, f(x|y) is the posterior distribution of X given Y=y, f(y|x) = L(x|y) is (as a function of x) the likelihood function of X given Y=y, and f(x) and f(y) are the marginal distributions of X and Y respectively, with f(x) being the prior distribution of X. Here we have indulged in a conventional abuse of notation, using f for each one of these terms, although each one is really a different function; the functions are distinguished by the names of their arguments. Extensions of Bayes theorem Theorems analogous to Bayes' theorem hold in problems with more than two variables. For example: This can be derived in several steps from Bayes' theorem and the definition of conditional probability: A general strategy is to work with a decomposition of the joint probability, and to marginalize (integrate) over the variables that are not of interest. Depending on the form of the decomposition, it may be possible to prove that some integrals must be 1, and thus they fall out of the decomposition; exploiting this property can reduce the computations very substantially. A Bayesian network, for example, specifies a factorization of a joint distribution of several variables in which the conditional probability of any one variable given the remaining ones takes a particularly simple form (see Markov blanket). Example #1: Conditional probabilities Suppose there are two bowls full of cookies. Bowl Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl Given all this information, we can compute the probability of Fred having selected bowl As we expected, it is more than half. Tables of occurrences and relative frequencies It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies. The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies. Example #2: Drug testing Bayes' theorem is useful in evaluating the result of a drug test. A certain drug test is 99% accurate, that is, the test will correctly identify a drug user as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. This would seem to be a relatively accurate test, but Bayes' theorem will reveal a potential flaw. Let's assume a corporation decides to test its employees for marijuana use, and 0.5% of the employees use the drug. We want to know the probability that, given a positive drug test, an employee is actually a drug user. Let "D" be the event of being a drug user and "N" indicate being a non-user. Let "+" be the event of a positive drug test. We need to know the following: The probability that the employee is actually a drug user is only about 33%. The rarer the condition for which we are testing, the greater percentage of the positive tests will be false positives. This illustrates why it is important to do follow-up tests. Example #3: Bayesian inference Applications of Bayes' theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities. One such example follows. For additional worked out examples, including simpler examples, please see the article on the examples of Bayesian inference. We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the prior. The conditional distribution of A given the "data" B is the posterior probability distribution or just the posterior. Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes's theorem we can calculate the probability distribution function for r using rac . ! From this we see that from the prior probability density function f(r) and the likelihood function L(r) = f(m = 7|r, n = 10), we can compute the posterior probability density function f(r|n = 10, m = 7). The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval 0, 1. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely. Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution. As with the prior, the likelihood is open to revision -- more complex assumptions will yield more complex likelihood functions. Maintaining the current assumptions, we compute the normalizing factor, and the posterior distribution for r is then rac = 1320 , r^7 , (1-r)^3 for r between 0 and 1, inclusive. One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i.e., the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is which is about an "89% chance". Historical remarks Bayes' theorem is named after the Reverend Thomas Bayes (1702–1761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work. One of Bayes' results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur: If there be two subsequent events, the probability of the second b/N and the probability of both together P/N, and it being first discovered that the second event has also happened, the probability I am right i.e., the conditional probability of the first event being true given that the second has also happened is P/b. Note that the expression says nothing about the order in which the events occurred; it measures correlation, not causation. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes's Theorem (as described above), but it does not appear that Bayes himself emphasized or focused on that result. Bayes's main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is rac ! where m is the number of observed successes and n the number of observed failures. What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind. Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. See also Versions of the essay Commentaries Additional material | |||||||
|
| ||||||||
![]() |
|
| |