Opinion
Artificial Intelligence and Machine Learning

The Rise of the AI Co-Pilot: Lessons for Design from Aviation and Beyond

Building on cross-disciplinary insights to shape the future of human-AI interaction.

Posted
pilot in headset and uniform, surrounded by colorful gear icons, illustration

Credit: Andrij Borys Associates; Shutterstock.AI

No longer confined to the boundaries of laboratories and niche applications, generative AI methods have erupted onto the global stage with relevance to almost every aspect of knowledge work and with increasing use in everyday life. From generating communications, summarizing documents and crafting literature to engineering code, translating languages, and synthesizing videos, the magnitude of their potential impact has caught even the most forward-thinking AI experts and technology visionaries off guard. It is difficult to identify aspects of our lives that will not be influenced by this technological disruption. Advances in AI will likely transform everything from routine work to highly skilled professions, including high-stakes areas such as medicine, law, and education.15

As these advancements in AI lead us all to explore new ways of harnessing these capabilities, we are also developing a new language and set of metaphors for how we talk about the technology and our interactions with it. “Bots,” “agents,” and “co-pilots” are some of the terms that are being given new weight through their place in our lives. This is not just about new linguistic distinctions: they are also shared metaphors, having cultural implications for how we see the technology and expect to interact with it.

Many have argued that we need to view these technologies as collaborative partners rather than competitors—as agents that work for us or with us7,12 with some even calling for them to be relegated completely to the realm of smart “tools.”23 Questions about how we design AI technologies and how we configure and see our relationship to them are therefore of critical importance.

So how do we proceed? How do we best design technologies that can automate many aspects of human activity, have some degree of agency, and offer skills that can rival and even surpass human capabilities? As Elish9 has noted, there is a great deal at stake. As humans increasingly find themselves engaging with complex, automated systems, questions of responsibility in the case of incidents and accidents are more important than ever, especially as humans are ultimately accountable and bear the brunt of the blame despite poorly designed systems.

To address these challenges, we believe it is important to leverage decades of research on human-machine interaction and collaboration, much of which comes from the fields of human-computer interaction (HCI) and human factors engineering (HFE). While HCI has wrangled for decades with designs that influence our engagement with computer technology, HFE has largely taken a separate path, exploring how human operators interact with safety-critical systems such as process control in industrial plants and aviation. Taken together, there is a growing consensus that these fields have much to tell us about designs that influence the relationship between complex systems and their users or human operators.4

As an example, the metaphor “AI co-pilot” implies a great deal about how we expect to work with AI. This metaphor suggests a technological partner that:

  • Operates under human direction and oversight.

  • Can communicate in a fluid, conversational manner allowing for natural and complex interaction.

  • Can be assigned some degree of agency and intrinsic motivation in working toward shared goals.

  • Has a broad scope for problem solving, but also has a specific, well-defined set of skills that can supplement as well as complement those of the pilot.

  • Can act as a backup for the pilot, helping to monitor a situation and taking over tasks as and when needed.

The most important point is at the top of the list: the co-pilot is not the pilot. The co-pilot is subservient to direction from the pilot. It is the pilot who makes the critical decisions and has ultimate responsibility for flying the aircraft. And it is the pilot who oversees the co-pilot, assigning and withdrawing the co-pilot’s responsibilities. While the co-pilot has some workable grounding about the task at hand, it is assumed the co-pilot role and contribution is one part of the more comprehensive and expansive understanding and awareness the pilot has when it comes to safely flying a plane.

For these reasons, we believe the term co-pilot is a useful metaphor for describing how AI technology is intended to act in relation to the human user or operator. And whether we are envisioning AI systems that support people in one-off decisions, or in more ongoing and dynamic interactions with people, such as in semi-autonomous vehicles or alerting systems in medicine, the concept is a rich and useful one.

At the same time, it is also useful in leveraging prior efforts and insights from HFE. Lessons from aviation and industrial process control offer valuable insights about automation, including risks and best practices for working with technological partners that operate under human direction and oversight.

Four Lessons from Aviation

A good starting point is a short but influential paper written in 1983 by Lisanne Bainbridge entitled “Ironies of Automation”3 in which she pointed out some of the crucial design considerations needed to develop systems where humans collaborate with machines capable of automating complex tasks. Bainbridge was focused on application to process control in industrial plants and aviation flight decks. However, the parallels with human-AI collaboration are clear.

Most of Bainbridge’s original observations relate to the propensity for overreliance on automation. Her work alerts us to being cautious as we design our AI systems to concerns that are well known in the field of HFE as follows.

The problem of vigilance.  As the degree of automation increases, humans increasingly take on the task of vigilance, becoming monitors of what the system is doing rather than active participants in the workflow. We know from very early vigilance studies that humans are very poor at these kinds of tasks. For example, a classic study14 found that when people were asked to monitor visual processes for rare events, even highly motivated subjects found it difficult to maintain attention for more than about half an hour. Further, research (for example, Warm et al.25) shows that jobs that have a heavy monitoring requirement, with little in the way of direct engagement in activity, lead to high levels of stress and poor health, something that partly explains why jobs such as air traffic control are so difficult.13 When autopilot was first introduced in aircraft, concerns that this would turn humans into system monitors even led some HFE specialists to propose that, in the flight deck, we should ask whether it should be computers monitoring the pilots rather than the other way round.26

Turning to today’s AI systems, we should similarly ask what AI applications mean for a world in which many of our jobs might increasingly require us to monitor or oversee what our intelligent systems are doing, with concomitant concerns about whether we will notice when we need to intervene and, more long term, whether our jobs will become more stressful and less satisfying as a result.

Partly as a result of the COVID pandemic, we are much more attuned now as a society to the importance of well-being at work not just for personal reasons, but also for the health and productivity of organizations. As Csikszentmihalyi5 noted in his studies, we are at our happiest when we are absorbed in tasks that not only allow us to exercise our talents but to stretch them. This calls for careful attention to how to design the human-machine partnership around the primacy of people, to ensure humans are engaged, that they can do what they enjoy and are good at, and that they can learn and grow as a consequence.

The takeover challenge.  Another major concern is attending to the hand-off between human and machine during the course of work. When trouble arises, humans often need to intervene, and when processes are heavily automated there are consequences for this transition when the human has been largely out of the loop. As Bainbridge highlighted,3 having to re-engage with an automatic system can be problematic because human operators have been paying attention elsewhere, meaning they have reduced situational awareness of what the requirements of the task are and what the context of the work is, undermining the human’s ability to take appropriate action.

In aviation, there have been notorious examples where pilots have not understood, paid attention to, or been properly alerted to the actions of the autopilot system. Then, when things start to spiral out of control (sometimes quite literally) the pilot is lacking in the necessary situational awareness and knowledge to step in and take the right corrective actions. A salient example is the case of China Airlines Flight 006 in 1985 where reliance on the 747’s autopilot during an engine failure introduced complexities contributing to disorientation of the pilot, leading to a sudden uncontrolled 30,000-foot plunge of the aircraft toward the ocean.

The influence of automation on human awareness and vigilance can be seen in other domains too, including our role as the operators of heavy equipment in our daily lives. For example, studies show that new forms of automation in our cars, such as adaptive cruise control and smart lane following, can have detrimental effects. For example, automobile drivers can become complacent and less aware of hazards with the use of adaptive cruise control while driving, resulting in a negative impact on safety, such as longer response times to hazards.21,27

Extrapolating to AI systems more generally highlights the possibility that too much automation may create situations in which we over rely on systems to do our work for us, with the result that we have no clear understanding of the flow of work, where and how we should intervene, and a lack of resources in the moment to enable us to effectively respond, guide, or contribute. Keeping the human in the loop is therefore crucial not just for these reasons, but also because disengagement from tasks, where a person can otherwise exercise their skills, can lead to deterioration in well-being, mood, and creativity.6 Further, this points to the need for better capabilities on the part of the AI system for mutual grounding, feedback, and alerting.

De-skilling through automation.  Longer term, the ability to take over control is of course made worse if the flight crew has suffered from the deterioration of skills due to an ongoing lack of engagement in the flow of work. Bainbridge drew attention to the irony of not only physical skills deteriorating, but also cognitive skills waning when processes are increasingly automated. However, when automation fails, it is just these skills that are crucial when the human needs to step in and take control. The aviation industry has known this for decades,26 hence the need for simulator training where faults are routinely injected into the system to ensure that pilots not only know how to manually fly a plane, but that they also know how to diagnose a problem. Overreliance on aviation autopilots is known to have serious downsides both in terms of pilots failing to develop an implicit understanding of how a system works, as well as undermining the problem-solving skills needed to critically evaluate the output of the autopilot and the state of the aircraft in order to take corrective action.

Extrapolating to human-AI systems underscores the fact that we understand very little about how the offloading of different components of our workflows to these new AI systems will impact our cognitive skills for a given task or job of work. The complexity and impenetrability of today’s AI systems already compromises their intelligibility for us, but if we consider that we might be less and less “hands on” as these technologies are harnessed to do more, we might fail to build effective, implicit understandings of them. Such understandings are built through our ongoing interactions with systems, helping us to learn how our actions relate to the output of the machine. And there are many other nagging, related questions too: How deeply will we understand or remember aspects of the domain we are working within if an AI system does the work? This may not matter for some kinds of tasks, but for others, we may only ever develop a shallow understanding of the subject of our work. Further, what critical skills will wither away and which new ones will we need to develop? There is little doubt that AI will shape our cognition, which in turn has important implications for education, skills training, and jobs of the future.

Trust and automation bias.  A fourth issue concerns the predisposition for the human user or operator to critically assess or question the output of the automated system. When autopilot was first introduced into aircraft, accident analyses indicated a key contributing factor was that pilots were overly reliant and overtrusting of the autopilot system. The pilots either blindly accepted its output, or failed to act unless the autopilot system advised them or alerted them to act—a phenomenon now known as “automation bias.” This perception of the superiority and correctness of the machine is one lesson from aviation. Another lesson points to the perception of the superiority of the pilot in relation to more junior crew being a contributing factor in catastrophic accidents. Lessons were learned some decades ago22 of the risks of junior crew not speaking up or questioning the actions of the captain. Perhaps the most notorious case is that of the Tenerife airport disaster in 1977, where the reluctance of junior crew to challenge the captain of the KLM aircraft was cited as a contributing factor to the deadliest accident in aviation history. Too much trust or deference, combined with the failure to question actions in the cockpit, whether by pilot or autopilot, can lead to critical incidents.

The analogy to AI systems here should be clear. We know already that people are prone to automation bias when it comes to working with AI, tending to favor and failing to question the output of the AI system (for example, Alon-Barkat and Busuioc1). But there is much to learn about automation bias in human-AI workflows. For example, the timing of when AI output or recommendations are delivered—whether prior to human evaluation or as subsequent critiques—can impact human decision making in complex ways.8 Users are unlikely to recognize how such sequencing affects their judgments, and research is only beginning to uncover the underlying dynamics. More studies are therefore needed in order to design systems to engender the appropriate levels of trust in the output of the machine, and to support people in critically assessing the actions and recommendations of their AI partner.

What Can We Do?

Given these concerns, what can we do to ensure our future with AI systems keeps us engaged, in charge, and in a safe and trusted relationship with our digital co-pilot? Again, the literature in both HCI and HFE offers guidance, but we also must extend our approach: AI systems today are more complex, more dynamic, and being granted more agency than ever. They are capable of simulating human behavior in new ways, and of generating plausible output that may in fact be false. Their general, polymathic powers, high availability, and ease of engagement have made them pervasive in daily life. All of this implies both that we should draw as much as possible on existing design philosophy and research across fields, but also that we need to think deeply about our new priorities when we design these systems. Accordingly, here are some of the key insights that must be kept top of mind.

Keep humans in the loop.  As we have described, existing research highlights the need for human engagement (whether mental or physical) rather than complete relegation to the role of overseer or monitor. As technologists become more ambitious about the degree to which AI systems can automate human activity, humans may, at the same time, start to monitor these systems less as they trust them more. But for all the reasons cited above, users need to be kept in the loop to both be and feel empowered, to implicitly learn about the system through interaction, and for well-being. More specifically:

  • Be cautious about the amount of automation possible before humans are invited or required to intervene. For example, we know that continuous and ongoing interaction with an AI system can effectively and implicitly build users’ mental models of the system.24

  • Consider with care when it makes more sense that AI systems monitor humans, rather than the other way round. (This is after all what spell and grammar checkers do today, supporting and monitoring human action and offering assistance in an unintrusive way.)

Uphold the primacy of human agency and role allocation.  It is not enough that humans are engaged—we also must design a safe and effective partnership. Studies in HCI and HFE reach a similar conclusion on this point, but changes in the capabilities, pervasiveness, and agency of AI makes action in accordance with the findings and implications more urgent. First and foremost, as in aviation, the co-pilot must cede ultimate control and final responsibility to the human user as “pilot,” with the co-pilot acting in a well-defined supporting role. Designing the role of the machine also must take into account what people are good at versus the capabilities of machines, as well as consideration of what motivates and makes people happy. This means we must do the following:

  • Design AI features and overall software applications they are hosted within to enable and celebrate the primacy of human agency. Achieving this goal can be challenging and thus may tend to be lost to simpler, easier integrations of AI into applications.

  • Reinforce a clear hierarchy of control and decision making with the human in charge. Note that this does not mean that such oversight necessarily leads to better outcomes in all situations. In fact, research shows that predictive or diagnostic tasks may be more accurately achieved by the machine.10 However, such reliance on human oversight makes explicit that the pilot or human is the one held accountable, with the responsibility to ensure the best sources of information are evaluated and that the implications of output or recommended actions amid the complexities of the open world are considered.

  • Allow users control over how and when they engage in the workflow and degree of oversight. For example, AI automation without detailed human oversight may be preferred for tasks and scenarios where AI behavior and goals are well-understood, stakes are low, and recommendations are guided via observation of user actions or by higher-level user controls.19

  • Support users in carrying out aspects of tasks that allow for self-expression, creativity, social judgment, and navigating complex situations that extend beyond AI capabilities.

Building human skills and capabilities.  AI systems are now capable of taking on more of our human tasks than ever before, changing the skills we learn and maintain. However, when designed correctly, AI systems can enable us to build new skills rather than to undermine our existing capabilities. For example, educators are now exploring ways in which AI can be used as personalized coaches or tutors to boost our capabilities (for example, Mollick17). Consider the following:

  • Though AI systems are developing sophisticated conversational capabilities, users nonetheless need to learn how to “speak machine” to best effect both in terms of learning how to most effectively prompt these systems as well as how to assess their actions.

  • AI systems need to be endowed with the ability to collaborate with users to achieve mutual understanding about the intentions, goals and capabilities of users. One approach is the development of teachable AI systems (training the co-pilot) so that they are personalised and work for us in different contexts.18

  • Consider the coaching metaphor—design systems that help us develop new skills that continue to exist even when not using the AI system.11

Designing more intelligible systems.  By working to boost the intelligibility of AI systems, we support better situational awareness and mental models for users so that they can intervene when necessary, critically assess the output of the machine, and address problems when they occur. Tried and tested approaches from HCI are useful here, such as:

  • Refer to well-established HCI methods and design guidelines, including principles specifically for mixed-initiative interfaces, a term coined by Horvitz12 referring to systems where people and machines work in partnership toward shared goals. More recently, the set of principles for guiding designs for human-AI interaction were further extended and studied.2

  • General HCI principles include making clear what a system is capable of, acknowledging user actions, providing effective and ongoing feedback to users, and delivering well designed explanations for system responses, and alerting users to problems or issues in intelligent ways, at the right time and with guidance as to how to intervene.

  • While “explainable AI” is a burgeoning field, it is also clear that social science has a great deal to contribute here (for example, Miller16) and that we need to consider how to design the whole user experience, not just the output of the algorithms.24

Designing for appropriate levels of trust.  The sheer complexity of today’s AI models, the fact that they are probabilistic, and their tendency to produce plausible but sometimes false output, calls for a renewed emphasis on designing for trust. We need to design systems and train users so that they develop appropriate levels of trust with a system, and so that we can avoid well-known biases such as automation bias. Users need to be helped to develop their critical thinking skills and not fall into complacency. Intelligibility in general helps here, but more specific design considerations, mostly drawn from Horvitz12 and from Passi and Vorvoreanu20, include:

  • Take account of users who may vary in expertise, AI literacy, and task familiarity in designing the user interface and feedback from the system.

  • Identify ways to assess automation bias from telemetry and monitor with changes to the interface and workflows. Use onboarding techniques and tutorials to make users aware that overreliance is a common phenomenon, giving them examples of correct and incorrect output. It is especially crucial to set the appropriate levels of trust in early use of a system. Be transparent about a system’s strengths and limitations, as well as intended uses.

  • Endow systems with the capability to infer well-calibrated confidences in their recommendations and generations. Harness these confidences to guide, gate, and annotate output, including the sharing of indications of uncertainty with users.

  • Findings on automation bias highlights the importance of investing in additional research, such as study of the value of computing and relaying well-calibrated confidences to users. We also need better understandings of ideal designs for human-AI workflows, including how different strategies for sequencing and presenting AI recommendations can support, distort, or hinder human judgments.

Conclusion

The rapid evolution and widespread adoption of generative AI technologies have profoundly influenced various domains, from daily communication and workflows, to providing administrative and decision support in specialized fields such as medicine and law. As AI becomes more commonplace in our lives, our perception and language surrounding it evolve. Drawing parallels with aviation, the AI co-pilot metaphor captures important aspects of the relationship between AI and humans, where the machine assists but does not dominate. The human remains in control, making critical decisions, while the AI offers support, expertise, and backup. Lessons from aviation highlight the potential pitfalls of overrelying on automation, which brings several recognized challenges, including diminished vigilance, challenges in transferring control, de-skilling, and misplaced trust. To foster a harmonious and productive human-AI partnership, it is imperative to prioritize human agency, keep humans actively engaged, provide opportunities to enhance human skills, design more intelligible AI systems, and employ methods that cultivate an appropriate level of trust in AI output and guidance. By integrating insights from both HCI and HFE and investing more intensively in the rising subdiscipline of human-AI interaction and collaboration, we can navigate this transformative era, ensuring AI serves as a valuable co-pilot, enhancing human capabilities while respecting our agency and expertise.

    References

    • 1. Alon-Barkat, S. and Busuioc, M. Human-AI interactions in public sector decision making: “Automation bias” and “selective adherence” to algorithmic advice. J. Public Adm. Res. Theory 33, 1 (Jan. 2023), 153169.
    • 2. Amershi, S. et al. Guidelines for human-AI interaction. In Proceedings of CHI ’19, 2019.
    • 3. Bainbridge, L. Ironies of automation. Automatica 19 (1983).
    • 4. Chignell, M., Wang, L., and Zare, A. The evolution of HCI and Human Factors: Integrating human and artificial intelligence. ACM TOCHI 30, 2 (2023).
    • 5. Csikszentmihalyi, M. Flow: The Psychology of Optimal Experience. Harper and Row, New York, NY, 1990.
    • 6. Csikszentmihalyi, M. Beyond Boredom and Anxiety: Experiencing Flow in Work and Play. 25th Anniversary Edition. Jossey Bass, San Francisco, CA, 2000.
    • 7. Daugherty, P.R. and Wilson, H.J. Human + Machine: Reimagining Work in the Age of AI. Harvard Business Press, 2018.
    • 8. Fogliato, R. et al. Who goes first? Influences of human-AI workflow on decision making in clinical imaging. In Proceedings of the ACM Conf. on Fairness, Accountability, and Transparency (ACM FAccT) (June 2022).
    • 9. Elish, M. Moral crumple zones: Cautionary tales in human-robot interaction. Engaging Science, Technology and Society 5, (2019).
    • 10. Grove, W.M. and Meehl, P.E. Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy. Psychology, Public Policy, and Law 2, 2 (1996).
    • 11. Hofman, J., Goldstein, D., and Rothschild, D.  Steroids, sneakers, coach: The spectrum of human-AI relationships (Sept. 20, 2023); https://aka.ms/ssc
    • 12. Horvitz, E. Principles of mixed-initiative user interfaces. In Proceedings of CHI ’99 (1999).
    • 13. Loura, J., Yadav, A.S., and Duhan, M. Job stress in air traffic controllers: A review. IJMSSR 2, 6 (June 2013).
    • 14. Mackworth, N.H. Researches on the measurement of human performance. Medical Research Council Special Report, No. 2680. H.M.S.O, London, 1950.
    • 15. McKinsey. The State of AI in 2023: Generative AI’s Breakout Year. McKinsey, 2023.
    • 16. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267, (Feb. 2019).
    • 17. Mollick, E.R. and Mollick, L. New modes of learning enabled by AI chatbots: Three methods and assignments (Dec. 13, 2022); 10.2139/ssrn.4300783
    • 18. Morrison, C. et al.  Understanding personalized accessibility through teachable AI. ASSETS, (2023).
    • 19. Mozannar, H. et al.  When to show a suggestion? Integrating human feedback in AI-assisted programming. AAAI, (Feb. 2024).
    • 20. Passi, S. and Vorvoreanu, M. Overreliance on AI: Literature Rev.  (2023); https://www.microsoft.com/en-us/research/uploads/prod/2022/06/Aether-Overreliance-on-AI-Review-Final-6.21.22.pdf.
    • 21. Rudin-Brown, C.M. and Parker, H. Behavioral adaptation to adaptive cruise control (ACC): Implications for preventive strategies. Transp. Res. F. Traffic Psychol. Behav. 7, 2 (Mar. 2004).
    • 22. Salas, E., Shuffler, M., and Diazgranados, D. Team dynamics at 35,000 feet. Human Factors in Aviation. E. Sakasm and D. Maurino, Eds.  Academic Press, 2010.
    • 23. Shneiderman, B. Human-Centered AI. Oxford University Press, 2022.
    • 24. Thieme, A. et al. Interpretablity as a dynamic of human-AI interaction. Interactions (Aug. 2020).
    • 25. Warm, J.S., Parasuraman, R., and Matthews, G. Vigilance requires hard mental work and is stressful. Human Factors 50, 3 (Mar. 2008).
    • 26. Weiner, E.L. and Curry, R.E. Flight-deck automation: Promises and problems. Ergonomics 23 (1980).
    • 27. Xiong, H. and Ng Boyle, L. Drivers’ adaptation to adaptive cruise control: Examination of automatic and manual braking. IEEE Transactions on Intelligent Transportation Systems 13, 3 (Sept. 2012).

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More