In Search of Bayesian Inference

An Air France A330-203 aircraft was destroyed during flight AF 447 in 2009 while carrying 228 passengers and crew.

In the early morning of June 1, 2009, Air France flight AF 447, carrying 228 passengers and crew, disappeared over a remote section of the Atlantic Ocean. French authorities organized an international search; after about six days, aircraft and ships started finding debris and bodies from the crash, but could not find the airplane itself. A month-long search along the intended flight path to try to pick up signals from the airplane’s underwater locator beacons turned up nothing; neither did a side-scan sonar search in August.

The French Bureau d’Enquêtes et d Analyses pour la securité de l’aviation civile (in English, “Bureau of Enquiry and Analysis for Civil Aviation Safety,” or BEA for short) turned to oceanographic experts to estimate the location of the wreckage based on how the recovered bodies and debris might have drifted along the surface currents. Yet side-scan sonar searches in April and May 2010 based on the predictions of the drift model failed to find the airplane.

Finally, the BEA asked Metron, a scientific consulting firm based in Reston, VA, to generate a probability map for the airplane’s location using Bayesian inference, a statistical approach to combining prior beliefs and experiences with new evidence. Metron started by constructing a probability map based on the initial data about the flight’s disappearance, then used Bayes’ Law to incorporate the evidence provided by the failures of the various search attempts.

A search in 2011 based on this new probability map found the wreckage within a week.

“Failure to use a Bayesian approach in planning the 2010 search delayed the discovery of the wreckage by up to one year,” the Metron team wrote in the February 2014 issue of Statistical Science. The success of the Bayesian analysis, the team wrote, “provides a powerful illustration of the value of a methodical, Bayesian approach to search planning.”

While Bayesian inference has been around for over 250 years, it has swung in and out of favor over the centuries. During most of the 20^th century, the approach was relegated to the statistical back burner, although even during that time it enjoyed some notable successes, such as Alan Turing’s use of Bayesian statistics to help crack the Enigma encryption machines used by Germany in World War II.

In the past two decades, Bayesian inference has undergone a renaissance. It is now used in a wide range of applications, thanks in large part to increases in computing power and mathematical techniques that have made its large computations feasible. The approach was catapulted into the public eye when Nate Silver, on his FiveThirty Eight blog, used it to predict correctly which way every state would go in the 2012 U.S. Presidential election. The issue of Statistical Science featuring the account of AF 447—a special issue dedicated to Bayesian inference—describes its use in a wide range of applications, including finding distant quasars, estimating HIV prevalence in different regions, and explaining the phenomenon that richer people tend to vote Republican while richer states tend to vote Democrat.

“There are hundreds or thousands of us [Bayesians] now, doing all sorts of things,” said Joseph B. Kadane of Carnegie Mellon University in Pittsburgh, who has used Bayesian inference, for instance, to demonstrate police bias in traffic stops along the New Jersey Turnpike.

Prior Knowledge

In its most basic form, Bayes’ Law is a simple method for updating beliefs in the light of new evidence. Suppose there is some statement A that you initially believe has a probability P(A) of being correct (what Bayesians call the “prior” probability). If a new piece of evidence, B, comes along, then the probability that A is true given that B has happened (what Bayesians call the “posterior” probability) is given by

where P(B|A) is the likelihood that B would occur if A is true, and P (B) is the likelihood that B would occur under any circumstances.

Consider an example described in Silver’s book The Signal and the Noise: A woman in her forties has a positive mammogram, and wants to know the probability she has breast cancer. Bayes’ Law says that to answer this question, we need to know three things: the probability that a woman in her forties will have breast cancer (about 1.4%); the probability that if a woman has breast cancer, the mammogram will detect it (about 75%); and the probability that any random woman in her forties will have a positive mammogram (about 11%). Putting these figures together, Bayes’ Law—named after the Reverend Thomas Bayes, whose manuscript on the subject was published posthumously in 1763—says the probability the woman has cancer, given her positive mammogram result, is just under 10%; in other words, about 9 out of 10 such mammogram results are false positives.

In this simple setting, it is clear how to construct the prior, since there is plenty of data available on cancer rates. In such cases, the use of Bayes’ Law is uncontroversial, and essentially a tautology—it simply says the woman’s probability of having cancer, in light of her positive mammogram result, is given by the proportion of positive mammograms that are true positives. Things get murkier when statisticians use Bayes’ rule to try to reason about one-time events, or other situations in which there is no clear consensus about what the prior probabilities are. For example, large passenger airplanes do not crash into the ocean very often, and when they do, the circumstances vary widely. In such cases, the very notion of prior probability is inherently subjective; it represents our best belief, based on previous experiences, about what is likely to be true in this particular case. If this initial belief is way off, we are likely to get bad inferences.

In the early 20^th century, a group of statisticians led by Ronald Fisher tried to eliminate subjectivity from statistical inference. Their “frequentist” approach, which views probability not as a degree of belief but as the relative frequency of events that can be repeated many times, became the dominant statistical paradigm.

Bayesians, however, point out that many aspects of frequentist inference—for example, what confidence level you use to make a decision—are subjective. “It’s not that you escape subjective judgments; it’s that you don’t explicitly acknowledge them,” said Lawrence Stone, chief scientist at Metron.

With time, more and more statisticians started arguing that completely rejecting the use of priors was throwing out the baby with the bathwater. “It’s not good to ignore information just because it’s not quantified by 1,000 tests,” Stone said.

“People realized that the rigor was a bit of a straightjacket,” said Michael Jordan, a statistician and computer scientist at the University of California, Berkeley. “You couldn’t talk about subjective probabilities and expert opinions.”

For decades, however, Bayesian analysis was too computationally intensive to carry out in many cases. The approach typically involves calculating high-dimensional integrals, whereas frequentist approaches more often involve optimization, which is easier from a computational standpoint. By the 1980s and 1990s, however, increases in computing power, combined with the development of Markov chain Monte Carlo methods for calculating numerical approximations to high-dimensional integrals “liberated Bayesian inference, and made it much more prominent,” Jordan said.

Bayesian inference is most useful, Jordan said, in domains such as physics and biology, where experts can provide good models from which to construct the prior. It is often less suitable, he said, in big data arenas in which no clear model is known. “In the majority of big data applications that you see in technology and science, it’s hard to start with models and prior probabilities for a lot of the exploratory analyses being done.”

Bayesian inference is most useful in domains where experts can provide good models from which to construct the prior.

Bayesian analysis can sometimes be useful even in situations in which there are hundreds or thousands of parameters, said James Berger, a statistician at Duke University in Durham, NC. “One of my favorite examples is when we were doing an astronomy problem with 600 unknowns,” he said. “One parameter was quite important, and we had a lot of knowledge about it.” The team constructed a prior for that parameter based on its model, and used what Bayesians call an “objective” prior for the other 599 parameters.

Today, many statisticians use both the Bayesian and the frequentist paradigms. The two approaches are not so much opposing as complementary, said statistician Bradley Efron of Stanford University. “They’re plowing the same field, just in orthogonal directions.”

Failed Beacons

In the case of flight AF 447, the BEA had determined the wreckage must lie within 40 nautical miles of the plane’s last known position, which had been transmitted by satellite shortly before the plane disappeared. The Metron team constructed a prior probability distribution for the location of the wreckage that incorporated this information with data about the impact points of nine previous commercial airplane crashes and a drift model that reversed the paths of the recovered bodies back to the time of the crash. The drift model entailed many uncertainties, so the team subjectively decided to give it relatively low weight in the prior.

Next, the team used Bayes’ Law to update this probability map in light of the four failed searches. Most statistical techniques cannot handle data that comes in so many different flavors—surface and underwater searches with different types of equipment, information about the plane’s flight path, the drift model, and so forth—but Bayesian inference allows statisticians to easily combine many different types of measurements and data. Each measurement simply gets transformed into a likelihood function on the space of all possible locations for the airplane, representing the likelihood of obtaining that particular measurement if the airplane is in that particular spot. Bayes’ Law then uses this likelihood function to update the prior, resulting in the posterior distribution.

“Bayesian analysis is the engine that allows you to combine different information and different uncertainties,” Berger said.

It even allows statisticians to combine conflicting data. In the case of AF 447, the prior distribution heavily favored the region close to the airplane’s last-known position, but the one underwater search of that region—the passive acoustic search for underwater locator beacons—had not found the airplane. The Metron team incorporated the failure of the search into its Bayesian model, but also incorporated the possibility that the locator beacons had been destroyed in the crash. The search based on Metron’s posterior probability map did, in fact, turn up the airplane in the region that the passive acoustic search had covered, and a later test of one of the beacons showed it was broken.

“It appears that the likely failure of the beacons to actuate resulted in a long and difficult search,” the Metron team wrote in Statistical Science.

To construct the prior and the likelihood functions, the team had to make many subjective decisions, using the best data available to them—about, for example, the probability both beacons would fail, the probability the side-scan sonar would have missed the wreck if it was searching in the right area, and so forth. “A substantial amount of art” goes into a Bayesian analysis, said Metron’s Stone, who was also involved in Bayesian analyses that recovered the lost U.S. nuclear submarine Scorpion and the wreck of the SS Central America, a steamship that sank off the Atlantic coast in 1857. “You have to be thoughtful about how you construct the priors.”

That does not mean, however, that only highly experienced Bayesian statisticians can use these tools, Stone said. In 2007, Metron created a Bayesian system that is the basis for the U.S. Coast Guard’s Search and Rescue Optimal Planning System, now used routinely in search and rescue missions, including the successful search for fisherman John Aldridge, whose recovery after 12 hours floating in the Atlantic Ocean on July 24, 2013 was chronicled at length in The New York Times Magazine in January 2014. The system is straightforward enough to be used by trained Coast Guard commanders, Stone said. “You might think that only specialists like me who have been through this a zillion times can do it, but it’s not true,” he said. “Bayesian search methods really work, and they can be used by people who aren’t Ph.D. statisticians.”

Bayesian analysis is a crucial tool when decisions have to be made under uncertainty, “particularly when you don’t have the luxury of saying, ‘I need more information before I can decide,'” Stone said. “With both Bayesian and classical statistics, if you get enough data, you will make the same decision. But when you can’t afford to wait, you should use the prior, and you will make better decisions if you do.”

Figures

Figure. An Air France A330-203 landing at Paris-Charles de Gaulle Airport. This aircraft was destroyed during flight AF 447 in 2009 while carrying 228 passengers and crew.

Figure. A summary of how the Probability Distribution Function (a probability map) for the location of underwater wreckage of Air France flight AF 447 was computed, using Bayesian Inference.