On November 18, 2020, the U.S. Federal Aviation Administration cleared the Boeing 737 MAX for flight, but the history of how Boeing got to this point remains disturbing.1 Back in September 2020, the U.S. House of Representatives released a 238-page report on the 737 MAX debacle, concluding an 18-month investigation.5 The report blamed the two crashes in October 2018 (Lion Air, in Indonesia) and January 2019 (Ethiopian Airlines, in Ethiopia) on the computerized flight-control system called Maneuvering Characteristics Augmentation System (MCAS). The 737 MAX had been Boeing's fastest-selling plane in history before government authorities worldwide grounded the fleet of nearly 400 aircraft—but only after the second crash. A technical system failure was the proximate cause of the disasters, which cost billions of dollars in losses to Boeing and the airlines, and, much more tragically, the lives of 346 passengers and crew.
Founded in 1916, Boeing remains one of the world's most renowned engineering companies. Were the 737 MAX crashes truly a failure of technology, an advanced aircraft-control system? Or was it a failure of management? Of course, at many levels, technology and management are inseparable. Nonetheless, executives, managers, and engineers at Boeing were not stumped by the complexity or unpredictability of a new technology. In a series of decisions, they put profits before safety, did not think through the consequences of their actions, or did not speak out loudly enough when they knew something was wrong. Let's look at the evidence.
We can start with Boeing's decision to deploy MCAS. The company wanted to put bigger, more fuel-efficient engines on an older aircraft, the 737NG (Next Generation). Boeing was responding to intense competition from Airbus and demand from airline customers for more fuel-efficient, single-aisle planes. But the new engines significantly changed the pitch angle and stability of the older 737. Rather than redesign the plane, Boeing chose to install MCAS, which it adapted from another aircraft. The idea was that MCAS software would enable the 737 MAX to emulate the handling characteristics of the 737NG model by pushing down the front of the plane when sensor readings indicated the nose was too high. Sounds good.
The original MCAS design had two external "angle of attack" (AOA) air sensors, one on each of the outer sides of the aircraft. However, one sensor was cheaper and simpler, and that became the final design. Boeing engineers also continually increased the power of MCAS to push down the nose of the aircraft, without changing assumptions about data and safety. In particular, the final design—with one sensor—assumed pilots could intervene if data was faulty or if anything else went wrong with MCAS. Yet, in 2015, Boeing documented MCAS was vulnerable to sensor failure.14 The external sensor was prone to damage from birds as well as errors in maintenance and calibration.9 A 2018 Boeing memo also revealed pilots had only four seconds to recognize an MCAS misfire and 10 seconds to correct it.13 Indeed, the day before the Lion Air crash, a maintenance worker had replaced a malfunctioning sensor. Lion Air did not relay to the pilots that crashed the next day the seriousness of the repair or details of a near-disaster on the prior flight, narrowly avoided with help from a third pilot who knew about MCAS and happened to be in the cockpit.5
Boeing decided pilots were the "backup" for MCAS, but the company did not explain in the 737 MAX operations manual how MCAS worked and how little time pilots had to respond. Why? Boeing had another objective: It wanted to treat MCAS and the MAX overall as an incremental upgrade in the 737 series. Why was that? The incremental designation allowed airlines to avoid spending millions of dollars on pilot training in new simulators. Meanwhile, Boeing was able to sidestep detailed scrutiny of MCAS and the 737 MAX by the FAA. The FAA also could depend on Boeing engineers to test and certify minor changes to the plane.
The congressional report had extensive access to company email and documents as well as detailed media coverage. These sources all describe the same decisions along with gradual but fundamental changes in Boeing's strategy and culture.
First, was Boeing's 1997 merger with McDonnell Douglas, a smaller aircraft maker with perilous finances. Usually, when a bigger company buys a smaller company, the culture of the bigger company dominates. Boeing was known for engineering excellence and safety, but McDonnell Douglas executives persuaded their Boeing owners to focus much more on costs, competition, and shareholder value (stock price). In essence, McDonnell Douglas took over Boeing, prompting one media comment that, "McDonnell Douglas bought Boeing with Boeing's money."4 For example, McDonnell Douglas generally tried to upgrade older aircraft incrementally rather than build more costly new models from scratch. Boeing clearly followed this incremental strategy to create the 737 MAX.14
Second, was Boeing's decision in 2001 to move its headquarters to Chicago from Seattle, where the company originated and had its primary engineering, manufacturing, and testing facilities for commercial aircraft. This move created physical distance between the leadership of the company and the technical teams focused on the 737 series. According to Boeing executives, the move was a strategic decision to separate management from the commercial aircraft division and to signal investors that Boeing was diversifying. In addition to commercial aircraft, headquartered in Seattle, Boeing now had McDonnell jet fighters, Douglas commercial aircraft, Hughes helicopters, and an aerospace division, all in different locations and easy to reach from Chicago.10
Third, was intensifying competition from Airbus, the European consortium founded in 1970 with backing from France, Germany, Spain, and the Netherlands. Today, Airbus is the world's largest aircraft manufacturer, ahead of Boeing because of a halt in 737 MAX production. But Airbus had briefly topped Boeing as number one in 2011, and it had a more competitive product in the same segment as the 737 MAX—the A320neo.6 Several European governments backing its main competitor probably put Boeing at a constant financial disadvantage. In addition, Airbus had a technical edge: It built the A320 series from scratch, first delivering planes in 1988. By comparison, Boeing retrofitted a much older 737 series, which first went to market in 1968.3
Fourth, was a change in priorities at the CEO and board of director levels. In 2005, James McNerney became the first Boeing chief executive not to be an engineer and he held this position until 2015. McNerney was a Harvard MBA who had worked at McKinsey and Proctor & Gamble before becoming president of GE Aircraft (which made jet engines) and then CEO of 3M. His expertise was in strategy and marketing, and he came in to improve financial performance. The 737 MAX development began in 2011, under McNerney's direction. The plane went into service in 2017 under another CEO, Dennis Muilenburg, who held this job from 2015 to 2019. Muilenburg was an engineer who had spent his entire career at Boeing. However, according to the current Boeing CEO, David Calhoun, Muilenberg carried on with McNerny's strategy and aggressively pushed sales and production of the 737 MAX.7 Boeing shareholders would later file lawsuits in June and September 2020 claiming that Muilenburg misled the board of directors about the seriousness of the 737 MAX problems while the board was lax in monitoring the design, development, and safety reports.12
In this highly competitive setting, and in a market completely dominated by two firms (their combined share is approximately 99%), Boeing executives, managers, and engineers made several critical decisions. In addition to the MCAS single-sensor design, in July 2014, Boeing decided that pilots experienced on earlier 737 models could fly the 737 MAX without new training on a simulator. Boeing made the same pledge to airline customers.11 Boeing even offered to refund $1 million per plane if more training proved necessary. Yet it was clear even before the first crash that the plane could be dangerous. Surely, some explanation of potential problems with MCAS called for a clearer warning to pilots about MCAS and the chaos that bad sensor data could create in the cockpit, or even grounding the aircraft after the first crash. Boeing and the FAA did send out notices after the first crash but they did not cite MCAS specifically or provide enough guidance to help the Egyptian crew avoid the second crash.8 Nor did Boeing or the FAA ground the aircraft after the second crash, or try to upgrade existing 737 simulators to replicate the MCAS behavior. To the contrary, after the two crashes, Boeing still tried to blame the accidents on "pilot error."14
We might also worry we have entered an era where software and hardware systems are so complex that government experts cannot independently certify technologies like Boeing put in the 737 MAX.
Another critical decision came in 2016, when Boeing decided to allow test pilots to stop flying actual 737 MAX planes and simply use flight simulators to continue testing. Not only did the simulators not properly mimic behavior of MCAS, but there was no simulation of what would happen with faulty data, which Boeing knew was a possibility. As a result, Boeing test pilots never actually tested a flying 737 MAX with a malfunctioning sensor. They never actually experienced what airline pilots in the two fatal crashes experienced.9
In an early design, Boeing also included an "AOA Disagree Alert," telling pilots when the two angle-of-attack sensors disagreed in their readings. The Disagree Alert would have made pilots aware there was a potential data problem. Boeing also allowed a supplier to tie the alert to an optional "AOA Indicator" display. Airlines were unaware of the importance of the indicator since there was no description of MCAS in the operations manual; most saw no need to pay extra for the alert option. As a result, 80% of the 737 MAX planes shipped without a functioning warning system that would have notified pilots of faulty sensor data.5
So what should we take away from this tragic story?
One lesson is that even the best companies can fall prey to competitive pressures as they seek to stay financially viable, grow faster, or profit by shipping products more quickly and cheaply. The venerable Toyota, often heralded as the world's best manufacturing company, went through a similar period of overly ambitious growth and sloppy testing and quality control, which cost lives and billions of dollars.2 One would think that aircraft manufacturers and automobile companies would never compromise safety for profits since they are, essentially, in the business of safe transport. This is not what happens in reality. The Boeing case also resembles the Challenger shuttle disaster in 1986. The pressure to launch led NASA managers to overrule engineers who were concerned about the safety of taking off in cold temperatures.15
Another lesson is we need governments to protect the public as well as to protect companies from themselves—from those competitive pressures that can lead to bad decisions. Lest we assume organizations can police themselves, or that engineers are good and managers bad, note the investigation produced email from Boeing engineers bragging they had "tricked" FAA regulators into believing no new training was necessary for the 737 MAX.14
We might also worry we have entered an era where software and hardware systems are so complex that government experts cannot independently certify technologies like Boeing put in the 737 MAX. For aircraft as well as automobiles, pharmaceuticals, food, banking, and many other products and services, governments rely mainly on companies to police themselves or to provide critical certification data. We allow "the fox to guard the henhouse," so to speak. There is no easy solution to this problem, but, at the least, government regulatory agencies need to be more diligent and hire more or better experts, and rely less heavily on what companies tell them. For their part, executives, managers, and engineers need to find a better balance between safety and cost. Faster and cheaper sounds great in the short term but can lead to disasters if the resulting products are not better or safer.
At least some people at Boeing knew there might not be enough time for pilots to react to an MCAS malfunction, yet the company decided not to inform pilots the system was operating behind the scenes or to provide simulator training. At least some people at Boeing knew MCAS was dangerous because one sensor constituted a single point of a potentially catastrophic failure. In short, the technology did not design itself or fail by itself, and that is why the 737 MAX debacle was primarily a failure of management.
1. Chokshi, N. Boeing 737 Max is cleared by F.A.A. to fly again. New York Times, (Nov. 18, 2020); https://nyti.ms/2UERbys
3. Duddu, P. Airbus vs. Boeing: A tale of two rivals. Aerospace Technology. (Jan. 31, 2020); https://bit.ly/3pIfvO0
5. House Committee on Transportation and Infrastructure. Final Committee Report: The Design, Development, and Certification of the Boeing 737 MAX. (Sept. 2020); https://bit.ly/2UDCSKp
14. The Fifth Estate. How Boeing crashed: The inside story of the 737 Max. (Jan. 19, 2020); https://bit.ly/3pIfIkg
The Digital Library is published by the Association for Computing Machinery. Copyright © 2021 ACM, Inc.
At the Safety Critical Systems Symposium in York in February 2020, Dewi Daniels made the point that, in part, the Boeing 737 Max 8 crashes were due to incorrectly classified requirements.
Because the MCAS system was initially only expected to be needed in cruise flight, its limit was set at 0.6 degrees and its DO-178C criticality was set at DAL-C (Major), rather than DAL-B (Hazardous) or DAL-A (Catastrophic).
When it was realised that the MCAS would also be needed in slow-speed flight and its limit was increased to 2.5 degrees, no change was made to the DAL. Engineers and assessors will spend more time examining DAL-A subsystems, than DAL-C ones (or, for automotive systems, will spend more time on ASIL-D subsystems than on ASIL-A subsystems).
I think that there is an argument that incorrect classification of a requirement played a role in the crashes that occurred.
[The following comment/response was submitted by Michael A. Cusumano. --CACM Administrator]
Yes, I am sure there were lots of problems with how Boeing managed the MCAS requirements process and changes, especially as engineers and test pilots learned more about what the system needed to do in slow-speed flight after takeoff.
Displaying all 2 comments