Research and Advances
Architecture and Hardware

Voice Says It All in the Navy

The commercial potential for voice technology innovations currently being developed by the U.S. Navy is immense.
  1. Introduction
  2. The NVID Project
  3. Focus Group Feedback
  4. Current Challenges
  5. Conclusion
  6. References
  7. Authors
  8. Footnotes
  9. Tables

The hatch opens, and a blast of hot air assaults the corpsman as he enters the noisy machinery space. He moves carefully along the catwalk and descends several ladders, squeezing between equipment holding his clipboard and manuals. Laboriously, he manually records readings of times, dials, and gauges in a process repeated dozens of times daily by thousands of personnel on hundreds of ships. Upon completing his rounds, the corpsman returns to his working space and transcribes his findings into a computer system for transmittal. Only when this collateral duty is complete can he proceed to his primary duty: administering to the health of the ship’s crew.

The inefficiency of this approach may one day be remedied. In support of Force Health Protection,1 the U.S. Navy has launched a Naval Voice Interactive Device (NVID) project that leverages existing technologies to facilitate automation of the business practices of Navy medicine. The goal of the NVID project is to create a lightweight, portable computing device that uses speech recognition to enter shipboard environmental survey data into a computer database, and to generate reports to fulfill surveillance requirements. This article examines the process used by the Navy to develop and test the NVID.

The seamless integration of voice technologies creates a human-machine interface that has been applied in many industries, including consumer electronics, automobiles, interactive toys, and industrial, medical, and home appliances [6]. Coupling computer recognition of the human voice with a natural language processing system makes speech technology for computers possible [4]. By allowing data and commands to be entered into a computer without the need for typing, this technology frees human hands for other tasks. Speech dictation by computers can also increase the rate of data entry, improve spelling accuracy, permit remote access to databases utilizing wireless technology, and ease access to computer systems by those who lack typing skills [2].

Although speech technology has existed for two decades, widespread use is a recent phenomenon. Following improvements in accuracy, speed, portability, and high-noise operation, the development of speech dictation and recognition applications by the private sector, federal agencies, and armed services has increased.

Until recently, few practical continuous speech recognizers were available. Most were difficult to build, resided on large mainframe computers, were speaker dependent, and did not operate in real time. The Voice Interactive Display (VID) developed for the U.S. Army made progress in eliminating these disadvantages [5]. VID was intended to reduce the bulk, weight, and setup times of vehicle diagnostic systems while increasing their capacity and capabilities for hands-free troubleshooting. The capabilities of VID were developed to allow communication with the supply and logistics structures within the Army’s common operating environment. Such efforts demonstrated the use of VID as a tool for providing a paperless method of documentation for diagnostic and prognostic results that will culminate in the automation of maintenance supply actions. Voice technology and existing diagnostic tools have now been integrated into a wireless configuration. The result is a hands-free interface between the operator and the Soldier’s On-System Repair Tool (SPORT) that forms the basis for the NVID.

Back to Top

The NVID Project

To ensure the health and safety of shipboard personnel, naval health professionals—including environmental health officers, industrial hygienists, independent duty corpsmen (IDCs), and preventive medicine technicians—perform clinical activities and preventive medicine surveillance on a daily basis. These inspections include, but are not limited to, water testing, heat stress, pest control, food sanitation, and habitability surveys [1]. Chief of Naval Operations Instruction 5100.19D, the Navy Occupational Safety and Health Program Manual for Forces Afloat, provides the specific guidelines for maintaining a safe, healthy work environment aboard Navy ships. Inspections performed by medical personnel ensure these guidelines are followed.

Typically, inspectors enter data and findings by hand onto paper forms and later transcribe these notes into a word processor to create a finished report. The process of manual note taking and entering data via keyboard into a computer database is time-consuming, inefficient, and error-prone. To remedy these problems, the Naval Shipboard Information Program was developed, allowing data to be entered into portable laptop computers while a survey is conducted [3]. However, the cramped shipboard environment, the need for mobility by inspectors, and the inability to have both hands free to type during an inspection make the use of laptop computers during a walk-around survey difficult. Clearly, a hands-free, space-saving mode of data entry that would also enable examiners to access pertinent information during an inspection is desirable. The NVID project was developed to fill this need.

The NVID prototype is a compact, mobile computing device that includes voice interactive technology, stylus screen input capability, and an indoor-readable display that enables shipboard medical personnel to complete environmental survey checklists, view reference materials related to these checklists, manage tasks, and generate reports using the collected data. The system uses Microsoft Windows NT, an operating environment that satisfies the requirement of the IT-21 Standard to which Navy ships must conform. The major software components include initialization of the NVID software application, application processing, database management, speech dictation, handwriting recognition, and speech-to-text capabilities. The power source for this portable unit accommodates both DC (battery) and AC (line) power options and includes the ability to recharge or swap batteries to extend the system’s operational time.

Back to Top

Focus Group Feedback

A focus group of 13 hospital corpsmen and one medical officer was convened to appropriately refine the NVID prototype to meet environmental surveillance requirement specifications. These individuals completed a questionnaire detailing methods of completing environmental surveys and reporting inspection results. The questionnaire addressed the needs of end users as well as their perspectives on the military utility of NVID. The survey consisted of 117 items ranging from nominal yes/no answers to frequencies, descriptive statistics, rank ordering, and perceptual Likert scales. Conclusions were drawn from the statistical analysis and a debriefing of focus group members, and recommendations were suggested for development and implementation of NVID.

The participants possessed varying clinical experience while assigned to deployed units (ships and Fleet Marine Force). Participants included IDCs (independent duty corpsmen), preventive medicine specialists, lab technicians, and aviation medicine specialists. All focus group members were previously involved in inspections within the Navy’s environmental surveillance group, and as such, had abundant experience in administering, analyzing, and evaluating environmental surveys.

Respondents were first asked about the methods they used to record findings while conducting an inspection (see Table 1). Responses to this section of the questionnaire were limited. The percentage of missing data ranged from 7.1% for items such as habitability and food sanitation safety to 71.4% for mercury control and 85.7% for polychlorinated biphenyls (PCBs). Missing data can be partially explained because inspectors may not have been involved in all types of inspections listed in the questionnaire. An aggregate of the information in Table 1 indicates the majority of inspectors relied on preprinted checklists. Only 7.1% of the users recorded their findings on a laptop computer for inspections focusing on radiation protection, work place monitoring, food sanitation safety, and habitability.

In addition to detailing their methods of recording inspection findings, the focus group participants were asked to describe the extensiveness of their notes during surveys. The results ranged from “one to three words in a short phrase” (35.7%) to “several short phrases, up to a paragraph” (64.3%). No respondents claimed to have used “extensive notes of more than one paragraph.” Respondents were also asked how beneficial voice dictation would be while conducting an inspection. Those responding that it would be “very beneficial” (71.4%) far outweighed those responding that it would be “somewhat beneficial” (28.6%). No respondents said that voice dictation would be “not beneficial” in conducting an inspection.

Participants also described the types of reference material needed during inspections. The results are shown in Table 2. Needed materials ranged from a low of 28.6% for procedure description information to 78.6% for current checklist-in-progress information. When asked how often they utilized reference materials during inspections, no participants chose the response “never.” Other responses included “occasionally” (71.4%), “frequently” (21.4%) and “always” (7.1%). Additionally, participants were asked to describe methods of reporting inspection results, which included preparing the report using: Shipboard Not-tactical ADP Program (SNAP) Automated Medical System (SAMS) (14.8%), word processing other than SAMS (57.1%), and both SAMS and word processing (28.6%). No respondents reported using handwritten or other methods of reporting inspection results. When asked if most of the problems or discrepancies encountered during an inspection could be summarized using a standard list of “most frequently occurring” discrepancies, 100% of respondents answered “yes.” When asked about device attribute preferences, “voice activation dictation” and “durability” ranked highest, while “wearable in front or back” and “earphones” were tied for lowest ranking. Participants also rated their computer efficiency. Just 14.3% rated themselves “expert,” while 42.9% chose “competent.” “Good” and “fair” were each selected by 21.4% of respondents.

The NVID prototype is a compact, mobile computing device that includes voice interactive technology, stylus screen input capability, and an indoor-readable display that enables shipboard medical personnel to complete environmental survey checklists, view reference materials related to these checklists, manage tasks, and generate reports using the collected data.

Table 3 reports desired outcomes when using a NVID device. By far, care for patients was the most important outcome, with 71.4% of participants indicating strong agreement. In addition to completing the questionnaire, the respondents agreed to a debriefing session where detailed qualitative information augmented their survey responses.

Based on the responses from the questionnaire and debriefing sessions, the following environmental surveys were chosen for inclusion in NVID:

  1. Food Sanitation Safety
  2. Habitability
  3. Heat Stress
  4. Potable Water
  5. Pest Control

Respondents indicated that preprinted checklists improve the process regardless of the type of inspection. A first step was therefore to use the NVID prototype to quickly build checklists for each survey to enhance automation of these business practices. While some “free dictation” was allowed (giving inspectors the freedom to include limited comments during the inspection), predetermined checklists with a limited necessary vocabulary (command and control) were necessary so the NVID team could use smaller computer devices with slower processors. The reasons for this approach are twofold. First, computer devices must be small and lightweight. Second, extensive “free dictation” capability requires faster processors that do not yet exist on small, portable computing platforms.

A master tickler (a calendar that tracks the progress of surveys and the dates of their required completion) was included in the module. Navy references and instructions were also included on the system, allowing inspectors to access regulations during surveys without requiring bulky paper manuals. Compatibility of the NVID system with medical department computer equipment was ensured so that downloads and sharing of information between computing platforms could easily be achieved.

Back to Top

Current Challenges

Although users reported positive responses to the prototype tested, the device exhibited the limitations of current speech-driven technologies. The processors in lightweight, wearable devices were not fast enough to process speech adequately. Yet, larger processors added unwelcome weight to the device; inspectors objected to the 3.5-pound weight of larger devices during the walk-around surveys. In addition, throat microphones used in the prototype to limit interference from background noise also limited speech recognition.

Accuracy of speech dictation/recognition also depended on the time a user committed to training the device to recognize his or her speech—rates varied from 85%–98% depending on software training time. Optimal training time appeared to be one hour for both the Dragon Naturally Speaking software and for the NVID software. Accuracy was also affected by changes in voice quality due to environmental or physical conditions. In addition, current software interprets utterances in the context of an entire sentence, so users had to form complete utterances mentally before speaking for accurate recognition. Technological refinements should improve these deficiencies. The current challenges of the NVID prototype system, summarized as follows, are generalizable to all industrial settings.

Shipboard operation in tight spaces. Space and resource constraints on Navy ships make it necessary to complete surveys in enclosed, tight spaces, similar to most industrial settings. A study of the ergonomics associated with the use of a NVID computer was performed. The human factors evaluated included, but were not limited to, the following parameters:

  • Safety equipment compatibility, including work clothing such as gloves, glasses, and hard hats; sound suppressors/hearing protection; and respirators.
  • Data input comparison and user acceptance (voice command vs. touchscreen).
  • User interface evaluation (ease of use), including user comfort, user adjustability, subcomponent connection procedure, and assessment of mean time to proficiency.

Operation in high-noise environments. Naval ships are industrial environments that contain the potential for high noise levels. To verify the effectiveness of the NVID prototype under such conditions, the difference in error rates using the unit with and without background noise were determined. Voice system software training was first conducted by using a script consisting of a repeatable set of voice commands. The following sets of tests were performed with consistent background noise with NVID providing acceptable performance under all conditions, especially when using voice activated headset microphones:

  • Lab test in a normal office environment (less than 70 decibels).
  • Lab test with baseline background noise up to the expected level (90 decibels).
  • Field test aboard ship with typical background noise (75–90 decibels).

Data gathering and checklist navigation. NVID prototype system users were capable of navigating through survey checklists by using voice commands, as well as other input devices. The information collected was then automatically stored in an on-system database. To determine whether the system could successfully open each checklist and allow entry and storage of the required data, a script was developed that thoroughly tested the functionality of the hardware and software.

Comment capture capability. The NVID application provides the ability to document the inspector’s notes via handwriting recognition, voice dictation, and a touchscreen keyboard. Verification of all three methods of data capture was performed using a predefined script of repeatable voice commands.

Back to Top


Despite current limitations in speech technology, the NVID prototype was successful in reducing the time needed to complete inspections, supporting local reporting requirements, and allowing corpsmen to complete surveys expeditiously. The use of the NVID prototype gave corpsmen more time to devote to their primary health care responsibilities. Attitudes of the users toward the device were favorable, with users believing the prototype saved time and improved the quality of reports.

The potential of this type of innovation for commercial industry is immense. The public medical industry could use mobile voice-driven devices for emergency room, in-transit, and on-call situations. Applications of such a device also exist in the area of diagnostic information sharing between doctors and other medical personnel dispersed over large geographic distances and in more isolated rural areas.

The potential in manufacturing and service industries is also great. Floor managers and workers can use voice technology to give and receive information on unexpected events as they occur on the shop floor. Analyzing multiple sets of verbal reports by workers can automatically trigger deviations from the norm. Service industries can also benefit because this type of device is useful in facilitating processes of any kind.

The future of voice technology systems is promising, with potential adoption ranging from airline pilots to health care professionals to service workers. Despite current limitations of hardware and software platforms, organizations like the U.S. Navy are advancing the boundaries and uses of this technology. Widespread adoption of these and related technologies across industry and consumer applications is certain to result in time savings, convenience, and safety.

Back to Top

Back to Top

Back to Top

Back to Top


T1 Table 1. Methods of recording inspection findings.

T2 Table 2. Types of reference information needed during inspections.

T3 Table 3. Frequencies of desirable outcomes.

Back to top

    1. Department of Health and Human Services. International Classification of Diseases, 9th revision, Clinical Modification, 3E. Government Printing Office, Washington D.C., 1989.

    2. Head, W. (2000). Breaking down the barriers with speech. In Proceedings of SpeechTEK (New York, 2000), SpeechTEK, New York, 93–100.

    3. Hermansen, L.A. and Pugh, W.M. Conceptual design of an expert system for planning afloat industrial hygiene surveys (Technical Report No. 96-5E). Naval Health Research Center, San Diego, CA, 1996.

    4. Hertz, S., Younes, R., and Hoskins, S. (2000). Space, speed, quality, and flexibility: Advantages of rule-based speech synthesis. In Proceedings of AVIOS: The Speech Technology & Applications Expo (New York, 2000), AVIOS, New York, 217–228.

    5. Hutzell, K. Voice Interactive Display (VID) (Contract Summary Report: Apr 98–May 2000). MTS Technologies, Inc., Johnstown, PA, 2000.

    6. Soule, E. Selecting the best embedded speech recognition solution. In Proceedings of SpeechTEK (New York, 2000), SpeechTEK, New York, 239–248.

    1The military has experimented with voice recognition systems including one system providing hands-free operations by fighter pilots.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More