Research and Advances
Architecture and Hardware Contributed articles

Collaboration With a Robotic Scrub Nurse

Surgeons use hand gestures and/or voice commands without interrupting the natural flow of a procedure.
  1. Introduction
  2. Key Insights
  3. Related Work
  4. System Architecture
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References
  9. Authors
  10. Figures
  11. Tables
Collaboration with a Robotic Scrub Nurse, illustration

Errors in the delivery of medical care are the principal cause of inpatient mortality and morbidity (98,000 deaths annually in the U.S.).16 Ineffective team communication is often at the root of these errors.7,10,16 For example, in assessing verbal and nonverbal exchanges in the operating room (OR), Lingard et al.18 found frequent communication failure, with commands delayed, incomplete, or not received at all, as well as left unresolved. Firth-Cousins7 found 31% of all communications in the OR represent failures,7 with one-third of them having a negative effect on patient outcomes.7 And Halverson et al.10 found 36% of communication errors are related to equipment use.

Back to Top

Key Insights

  • Gestonurse is the first multimodal robotic scrub nurse to assist surgeons by passing and retrieving surgical instruments during simple procedures.
  • Gestonurse recognizes both hand gestures and speech commands, mapping them to existing surgical instruments in a surgical tray.
  • Gestonurse recognizes and tracks surgical instruments in use, retrieving them for the procedure, thus reducing the risk of retained instruments.

Causes of errors include team instability (such as lack of familiarity between nurses and surgeons),5 lack of resources (such as minimal staffing), and distractions. Poor communication within a surgical team can result in greater likelihood of instrument-count discrepancies among team members, possibly indicating retention of surgical instruments in a patient’s body, with sponges and towels most common.6

Adding a robot to the operating theater as an assistant to a surgical team has the potential to reduce the number of miscommunications and their negative effects in two main ways: First, in the case of communication failure, a robotic scrub nurse (such as our Gestonurse) is able to deliver surgical instruments to the main surgeon communicating through hand gestures and speech recognition; timely, accurate surgical delivery to the surgeon can lead to decreased cognitive load, time, and effort for surgeons. And, second, the possibility of retained surgical instruments is avoided through accurate, thorough, timely tracking and monitoring of instruments used; retained instruments can puncture organs and cause internal bleeding. We have been developing Gestonurse at Purdue University for the past three years (see Figure 1).

The main use of robotics in surgery is not to replace the surgeon or surgical nurses but to work with them during surgery. In working side by side (see Figure 2), responsibility can be divided up like this: The robot passes instruments, sutures, and sponges during surgery and keeps an inventory of their use, while the surgical technician handles all remaining tasks (such as operating sterilizers, lights, suction machines, and electrosurgical units and diagnostic equipment and holding retractors and applying sponges to or suctioning the operative site). A robot controlled through hand gestures and speech commands is a natural alternative that does not affect the normal flow of surgery.

Back to Top

Related Work

Previous surgical robots include one used for object retrieval3 and others with haptic feedback (such as SOFIE24 and the da Vinci surgical system11 for minimally invasive procedures in endoscopic and laparoscopic surgeries).

Previous robotic scrub nurses include the voice-controlled “Penelope,”15,23 which localizes, recognizes, and returns used instruments. Another voice-controlled robot,4 also uses computer-vision techniques to recognize, deliver, and retrieve surgical instruments. A problem with voice-only systems is performance degradation22 in noisy environments due to, say, the sound of drills, anesthesia machines, surgical staff side conversations, and operating equipment that can compromise patient safety; for example, a surgeon might say “50,000 units,” but the anesthetist hears “15,000 units.”1 Errors can have dramatically adverse consequences for a patient’s well-being.

A voice-controlled robotic scrub nurse for laparoscopic surgery27 uses depth-based action recognition for instrument prediction; its 3D point-estimation method requires the surgeon wear markers of special reflective material for action recognition that could potentially compromise sterility.

Trauma Pod9 is a mobile-robotic-surgery effort sponsored by the U.S. Department of Defense intended to perform life-saving procedures on the battlefield; it responds only to voice commands, not physical gesture. Treat el al.23 developed a robot that delivers instruments to the main surgeon following verbal requests, retrieving them as soon as they are no longer required. The instruments are identified through machine-vision algorithms, with decisions made through a cognitive architecture. In 2011, Jacob et al.12,13,14 and Wachs et al.25 presented Gestonurse, the first surgical scrub nurse to understand nonverbal communication, including hand gestures.

In this article, we make two main contributions with respect to previous work: how verbal and nonverbal information can, when combined, improve the robustness of a robotic scrub nurse and how to assess the effectiveness of the interaction between surgeon and robot in an OR setting through a mock surgery—an abdominal incision and closure using a phantom simulator. Gestonurse gestures are recognized from the video/depth stream acquired by a Microsoft Xbox 360 Kinect sensor; a robotic arm then delivers the requested instrument to the surgeon. A significant advantage of gesture-based communication is it requires no special training by the surgeon. Gesturing comes naturally to surgeons since their hands are already their main tools; moreover, hand signs are the standard method for requesting surgical instruments,8,19 and gestures are not affected by ambient noise in the OR. A multimodal solution combining voice and gesture provides redundancy needed to assure proper instrument delivery.

Gestures for robotic control have been the focus of much research since the early 1980s. Early work was done with Richard A. Bolt’s Put-That-There interface2 followed by others using magnetic sensors or gloves to encode hand signs.20,21 Since then, gestures have been used in health care, military, and entertainment applications, as well as in the communication industry; see Wachs et al.26 for a review of the state of the art.

Back to Top

System Architecture

Figure 2 outlines the Gestonurse system architecture. The streaming depth maps captured through the Kinect sensor are processed by the gesture-recognition module while a microphone concurrently captures voice commands interpreted by the speech-recognition module. Following recognition, a command is transmitted to the robot through an application that controls a Fanuc LR Mate 200iC robotic arm across the network through a Telnet interface. Gestonurse then delivers the required surgical instrument to the surgeon and awaits the next command. We also designed an instrument-retrieval-and-disposal system. A network camera monitors a specific region of the operating area, then, upon recognizing a surgical instrument, picks it up and delivers it to the surgeon. Meanwhile, the surgeon’s hands are tracked to ensure robot and surgeon do not collide, ensuring safe human-robot collaboration.

Gesture recognition. To evoke a command, a member of the surgical staff places a hand on the patient’s torso and gestures. The moment the hand is in the field of view, the gesture is captured by the Kinect sensor and segmented from the background through a depth-segmentation algorithm we developed.12 A mask of the hand is obtained by thresholding the depth map and the area of the closest blob. The contour and convex hull of the blob identify the configuration of the fingers and associate the configuration to a hand in a static pose.13 A gesture is preceded by the hand in the static pose, terminating when the hand is not in the sensor’s field of view. Hand-posture recognition uses this movement as a cue to temporally segment the gesture. The Gestonurse lexicon for surgical tasks (see Table 1) is based on standard OR gestures for requesting surgical instruments.8,19 The positions of the localized fingertips are recorded to obtain trajectories for each fingertip during the gesture. The trajectories are derived from the screen coordinates (in pixels) of the localized fingertip and the depth (in millimeters) of the fingertip with respect to the depth sensor; see the online Appendix for algorithmic details.

The system recognizes gestures to within 160ms (real time) from the time they were performed by the surgeon. The system’s overall response time is slightly more than two seconds, including the time needed (160ms) to recognize the gesture, plus the overhead time the robotic arm needs to deliver the instruments (two seconds on average). The time the system needs includes for the surgeon to physically perform a gesture, approximately one to two seconds for experienced users and five to six seconds for novices.

Instrument localization and recognition. One of the many responsibilities of human surgical scrub techs is to remove instruments surgeons might place around the operating area during a procedure; this requires continuously monitoring the area to detect the presence of instruments. Detection means the instrument is localized, recognized, and finally removed. The camera monitoring the region is calibrated with respect to the robot so the coordinates of the instrument in the image plane can be converted to coordinates in the robot-frame view.

Robot control scheme. Delivering and retrieving instruments can be a risky activity due to potential harm to a patient and to the surgical team if potential collisions with the surgeon are not avoided. Imagine the robot passes the scalpel and the surgeon moves a hand without noticing the scalpel in the way. In order to safely collaborate with a surgeon (on instrument retrieval and delivery), the robot must determine the position of the surgeon and plan a path to avoid a collision. We implemented a potential-field method to compute a safe path to that goal (fixed position for instrument delivery and variable position for instrument retrieval). Additionally, we use the skeleton-tracking ability of a Kinect sensor to track the position of the surgeon’s hands. The camera is externally calibrated with the robot’s coordinate system such that the position of the surgeon’s hands is obtained in the robot’s field of view.

Experiments. We conducted three main experiments to assess Gestonurse feasibility and robustness, evaluating gesture-recognition accuracy of its vision module, recognition of the instruments, and delivery and disposal of the surgical instruments. Finally we assessed users’ learning ability, comparing the modalities used to communicate with the robot.

Gesture recognition. We assembled a database of 1,000 gestures from 10 users we asked to perform 10 gestures per instrument class, using 10-fold cross-validation to generate the set of receiver operating characteristic curves (see Figure 3) by varying a threshold parameter representing the strength of the recognition.

We normalized the log-likelihood scores from testing with the Viterbi algorithm such that the highest score was scaled to 1. We considered all instrument classes with normalized log-likelihood scores greater than ς to be a positive result (a detection); ς= 0.99 means an average gesture-recognition accuracy of 95.96%. Table 2 includes the confusion matrix for this operating point; we calculated the values as number of correct recognitions over the total number of instances presented.

Instrument recognition. We used a database of 700 color (RGB) images of 720×480 pixels each, including 10 images of the seven standard types of surgical instrument: scalpel, scissors, retractors, hemostats, clippers, forceps, and hooks. We segmented the instruments from the background by extracting nonstationary objects from the background buffer; we created and updated the background model using a Gaussian Mixture Model. We then used a support vector machine to classify the feature vectors of the segmented instrument. We used the fivefold cross-validation method to find the optimal support-vector parameter γ for the radial basis function kernel function; Table 3 outlines the confusion matrix at the optimal γ=0.5. Though we assumed the instruments were clean (in a real-life scenario, they could be partially contaminated by blood or other secretions), the background model we used for instrument segmentation made detection independent of instrument color.

We found Gestonurse average instrument-picking accuracy to be 100%, with dropping accuracy of 92.38%. Some instrument classes perform much better than others, possibly due to variance in instrument shape.

Instrument picking and dropping. To measure performance, we placed instruments from the seven classes on a Mayo surgical tray, then recorded the robot’s performance at picking and dropping per trial, conducting 15 trials total. We found instrument-picking accuracy of 100%, with average dropping accuracy of 92.38%; per-class accuracy is included in the online Appendix.

Modalities compared. We recruited 12 graduate and undergraduate students, including eight males and four females, all 20 to 30 years old, to test the effectiveness of modality training on a mock surgical task simulating an abdominal incision and closure (see Figure 4), a task requiring five instrument classes: scalpel, scissors, needle, retractor, and four hemostats, a total of eight instruments.

We tested Gestonurse under three conditions: speech (S), gesture (G), and combined speech and gesture (SG). Note that in SG, we used the gestures and speech (see the online Appendix) to request the surgical instruments but not simultaneously. While Gestonurse can deal with simultaneous requests from multiple modalities, simultaneous requests using different modalities is not desirable during real-life surgeries. Surgeons are allowed to use only speech, only gestures, or speech and gestures one at a time during surgery.

We assigned the 12 subjects randomly to one of three test groups depending on whether they would be using speech, gestures, or both, each participating in two experiments. Subjects in the S and G groups could use only five commands to request five instruments from the robot for the mock procedure. We asked the SG test group to use speech to request half the required instruments and gestures for the rest.

Within each group, we trained two subjects to communicate with Gestonurse before performing the procedure, then asked them to repeat each command 15 times. We then read the name of the recognized instrument to the subject through a text-to-speech program, Microsoft SAM text-to-speech. We similarly conducted training for gesture recognition, with each test subject repeating each gesture 15 times and shown a bar graph with the log-likelihood score of the gesture for each gesture class.

Another goal is to add the ability to predict the next likely surgical instrument according to the type of procedure (the context) instead of relying on a subjective, variable chain of verbal commands.

Each subject performed the surgical task six times, while we recorded the task-completion times, which were determined mainly by type of surgical procedure, not the speed of the computer-vision algorithm; for example, repairing an open abdominal aortic aneurysm can take up to eight hours.

Back to Top


Having robotics support surgical performance promises shorter operating times, greater accuracy, and fewer risks to the patient compared with traditional, human-only surgery. Gestonurse assists the main surgeon by passing surgical instruments while freeing surgical technicians to perform other tasks. Such a system could potentially reduce miscommunication and compensate for understaffing by understanding nonverbal communication (hand gestures) and speech commands with recognition accuracy, as we measured it, over 97%. We validated the system in a mock surgery, an abdominal incision and closure. In it we computed learning rates of 73.16% and 73.09% for the test subjects with and without gesture training, indicating learning occurred at the same rate with and without gesture training, and that improvement of 75.44 seconds (12.92% less) in task completion time was due directly to the training provided to the test subjects prior to the six trials. This means the test subjects’ skill was due to understanding and participating in the surgical task, rather than from learning to use hand gestures. Gesturing is presumably intuitive enough to be used by surgical staff with (almost) no training. Our informal discussions with surgical staff at Wishard Hospital, a public hospital affiliated with the Indiana University School of Medicine in Indianapolis, found surgeons excited about the possibility of using such a robot in a surgical setting.

The multimodal system is 55.95 seconds faster (14.9% less) than a speech-only system on average. However, we also found that gesture and voice together are no faster than gesture alone, performance that could be due to having to switch between modalities (and related additional cognitive load) that affects performance time. Our future work aims to address the kind of performance (in terms of functionality, usability, and accuracy) a robotic system must deliver to be a useful, cost-effective alternative to traditional human-only practice.

Back to Top


Gestonurse is a multimodal robotic scrub nurse we developed at Purdue and the Indiana University School of Medicine to reliably pass surgical instruments to surgeons and other members of a human surgical team, yielding gesture-recognition accuracy of 95.96% on average. The related Kinect-based robotic vision system we developed recognizes and picks instruments with a recognition rate of 92.38%. We also developed an instrument-picking system with 100% accuracy and a related disposal system with 92.83% accuracy (on average), as well as a field-path-planning algorithm to maximize safety in human-robot collaboration, implementing it in Gestonurse.

We conducted experiments on the effects of modality training for participants in a mock surgical procedure, calculating pre-task training delivered 12.92% reduced task-completion time on average across both speech and gesture. We also showed the system (following user training in both speech and gesture) is 14.9% faster than speech only, on average.

Future work on Gestonurse aims to fuse speech- and gesture-recognition data in a probabilistic fashion and transition to a real surgical setting involving animals at the Veterinary School at Purdue University in West Lafayette, IN. Another goal is to add the ability to predict the next surgical instrument likely to be needed according to the type of procedure (the context) instead of on a subjective, variable chain of verbal commands. We also aim to improve specific features of the system; for example, we assumed instruments were placed by surgical staff in fixed positions in a Mayo tray, with instrument coordinates saved as trajectory points in the teach pendant’s (the robot’s remote control) memory. Future versions will use the algorithm we developed to automatically detect and pick instruments, regardless of location. This will require minimal work since the current design already uses these techniques to retrieve instruments.

Back to Top


This project was funded, in part, by the Indiana Clinical and Translational Sciences Institute, by Grant Number Grant #TR000006 from the National Institutes of Health, National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. We also thank Dr. Steve Adams of Purdue University for his consultation and use of the Veterinary School OR, Dr. Rebecca Packer of Purdue for the surgical supplies we used in the experiment, and Hairong Jiang of Purdue for her support implementing the Gestonurse obstacle-avoidance module.

Back to Top

Back to Top

Back to Top


F1 Figure 1. Gestonurse robotic assistant.

F2 Figure 2. System overview.

F3 Figure 3. Receiver operating characteristic curve for Hidden Markov Model-based gesture recognition.

F4 Figure 4. Stages of mock abdominal incision: (A) incision; (B) exposure of linea alba; (C) incision enlarged with scissors; (D) linea alba incised; and (E) incision closed.

UF1 Figure. The surgeon requests scissors from Gestonurse using a hand signal recognized by the camera above the surgical bed.

UF2 Figure. Gestonurse safely hands off a surgical scissors to the surgeon requesting it.

Back to Top


T1 Table 1. Gesture lexicon.

T2 Table 2. Confusion matrix at ς = 0.99 for the gesture lexicon.

T3 Table 3. Confusion matrix at γ=0.5.

Back to top

    1. Beyea, S.C. Noise: a distraction, interruption, and safety hazard. AORN Journal 86, 2 (2007), 281–285.

    2. Bolt, R.A. Put-That-There: Voice and gesture at the graphics interface. Commun. ACM 14, 3 (1980), 262–270.

    3. Borenstein, J. and Koren, Y. A mobile platform for nursing robots. IEEE Transactions on Industrial Electronics 2 (2007), 158–165.

    4. Carpintero, E., Perez, C., Morales, R., Garcia, N., Candela, A., and Azorin, J. Development of a robotic scrub nurse for the operating theatre. In Proceedings of the Third IEEE International Conference on Biomedical Robotics and Biomechatronics (Elche, Spain, Sept. 26–29, 2010), 504–509.

    5. Carthey J., de Laval, M.R., Wright, D.J. et al. Behavioral markers of surgical excellence. Safety Science 41, 5 (2003), 409–425.

    6. Egorova, N.N., Moskowitz, A., Gelijns, A. et al. Managing the prevention of retained surgical instruments: What is the value of counting? Annals of Surgery 247, 1 (2008), 13–18.

    7. Firth-Cozens, J. Why communication fails in the operating room. Quality Safety Health Care 13, 5 (Oct. 2004), 327.

    8. Fulchiero, G.J., Vujevich, J.J., and Goldberg, L.H. Nonverbal hand signals: A tool for increasing patient comfort during dermatologic surgery. Dermatological Surgery 35, 5 (2009), 856–857.

    9. Garcia, P., Rosen, J., Kapoor, C., Noakes, M., Elbert, G., Treat, M., Ganous, T., Hanson, M., Manak, J., Hasser, C., Rohler, D., and Satava, R. Trauma pod: A semi-automated telerobotic surgical system. International Journal of Medical Robotics and Computer-Assisted Surgery 5, 2 (2009), 136–146.

    10. Halverson, A.L., Casey, J.T., Andersson, J., Anderson, K., Park, C., Rademaker, A.W., and Moorman, D. Communication failure in the operating room. Surgery 149, 3 (2010), 305–310.

    11. Intuitive Surgical. da Vinci Surgical System;

    12. Jacob, M.G., Li, Y., Akingba, G., and Wachs, J.P. Gestonurse: A robotic surgical nurse for handling surgical instruments in the operating room. Journal of Robotic Surgery 6, 1, (Mar. 2012), 53–63.

    13. Jacob, M.G., Li, Y., and Wachs, J.P. A gesture-driven robotic scrub nurse. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (Anchorage, AK, Oct. 9–12, 2011), 2039–2044.

    14. Jacob, M. Li, Y.T., and Wachs, J.P. Gestonurse: A multimodal robotic scrub nurse. In Proceedings of the Seventh ACM/IEEE International Conference on Human Robot Interaction (Boston, MA, Mar. 5–8). ACM Press, New York, 2012, 153–154.

    15. Kochan, A. Scalpel please, robot: Penelope's debut in the operating room. Industrial Robot: An International Journal 32, 6 (2005), 449–451.

    16. Kohn, L.T., Corrigan, J., and Donaldson, M.S. To Err Is Human: Building a Safer Health System, Volume 6. Joseph Henry Press, 2000.

    17. Li, Y.T., Jacob, M., Akingba, G., and Wachs, J.P. A cyber-physical management system for delivering and monitoring surgical instruments in the OR. Surgical Innovation (PubMed: 23037804);

    18. Lingard L., Espin S., Whyte S. et al. Communication failures in the operating room: An observational classification of recurrent types and effects. Quality Safety Health Care 13, 5 (2004), 330–334.

    19. Phillips, N., Berry, E., and Kohn, M. Berry & Kohn's Operating Room Technique. Mosby/Elsevier, 2004.

    20. Sturman, D.J. and Zeltzer, D. A survey of glove-based input. IEEE Computer Graphics and Applications 14, 1 (1994), 30–39.

    21. Starner T. and Pentland A. Visual recognition of American Sign Language using Hidden Markov Models. In Proceedings of the International Workshop on Automatic Face and Gesture Recognition (Zurich, Switzerland, 1995), 189–194.

    22. Takahashi, Y., Takatani, T., Osako, K., Saruwatari, H., and Shikano, K. Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Transactions on Audio, Speech, and Language Processing 17, 4 (May 2009), 650–664.

    23. Treat, M.R., Amory, S.E., Downey, P.E., and Taliaferro, D.A. Initial clinical experience with a partly autonomous robotic surgical instrument server. Surgical Endoscopy 20, 8 (2006), 1310–1314.

    24. van den Bedem, L. Realization of a Demonstrator Slave for Robotic Minimally Invasive Surgery. Ph.D. dissertation, Technische Universiteit Eindhoven, Eindhoven, the Netherlands, 2010;

    25. Wachs, J.P., Jacob, M.G., and Li Y. Does a robotic scrub nurse improve economy of movements? In Proceedings of the Image-Guided Procedures, Robotic Interventions, and Modeling Conference, SPIE Medical Imaging (San Diego, Feb. 5–7, 2012).

    26. Wachs, J.P., Kölsch, M., Stern, H., and Edan, Y. Vision-based hand-gesture applications: Challenges and innovations. Commun. ACM 54, 2 (Feb. 2011), 60–71.

    27. Yoshimitsu, K., Miyawaki, F., Sadahiro, T., Ohnuma, K., Fukui, Y., Hashimoto, D., and Masamune, K. Development and evaluation of the second version of scrub nurse robot for endoscopic and laparoscopic surgery. In Proceedings of the Intelligent Robots and Systems Conference (Oct. 29-Nov. 2, 2007), 2288–2294.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More