For the past 15 years, American Institutes for Research (AIR) has been providing usability services as a third-party vendor to industry and government. Our services include analyzing user needs, designing and prototyping user interfaces, usability testing, performing expert reviews, and conducting workshops. At the heart of these activities is a user-centered design process. We encourage our clients to involve a product's end users at the beginning, during, and in the final stages of their product development efforts in an iterative manner.
Over the years the nature of our services has changed. In the mid- to late-1980s, our clients tended to be large hardware or software manufacturers who wanted formal usability tests of products late in the development cycle. Both parties viewed usability testing as analogous to other types of software quality-assurance testing. It was a method for checking usability before the product was released to its intended users or customers. In addition to viewing testing as similar to quality assurance testing, we also viewed testing methods as similar to behavioral science research methods. A usability test looked like a research study in its use of individual test participants and usability labs. Test teams talked about test designs as similar to experimental research designs and "sampling plans" of carefully chosen test participants. But even in the early days there were differences between usability tests and research studies. Participants were taught to "think out loud" as they worked and there was an emphasis on qualitative measures such as participants' expressions of feelings or opinions. It also became clear that serious usability problems could be uncovered with only a few participants. Groups of 1530, typical for cognitive psychology research studies, were not needed to uncover serious usability problems. The research studies by Virzi [5, 6] later confirmed the validity of using fewer participants in usability tests.
As we gained more experience with testing, we began to see the value of early testing. Most usability tests uncovered major usability problems with products. We called these problems "global"  because their scope was not limited to one screen of software or one page of a manual. The problems we found were pervasive, but clients were reluctant to make major changes so late in the development cycle. It was a major undertaking to create interactive software that had enough functionality and was bug-free enough to test. Moreover, early testing of manuals required creating relatively high-fidelity graphics and a table of contents and index to test whether participants could find the information they needed.
It was the development of effective prototyping tools that made testing early in development possible. Practitioners began to create testing schemes that allowed testing of the important parts of the user interface in prototype form . Early-stage testing of prototypes brought with it the desire for less formal testing methods. The number of participants in a test continued to get smaller and simple measures such as percentage of tasks completed became standard practice. Furthermore, the concept of iterative testing became a reality. From about 1990, every book on usability methods made iterative testing part of the preferred development method [2, 4].
While smaller companies continue to discover the need for usability engineering and usability testing, many large companies have established their own usability groups. While we still assist companies with their first usability tests, our relationship with companies that have established usability groups has changed. Our initial expectation that we would no longer be conducting tests for those companies was incorrect. Usability groups continue to seek out third-party vendors for special projects. For example, when an organization wants to conduct a test comparing its product against its competitors', they come to a third-party vendor to design and conduct an unbiased test. In addition, the design of some products sometimes becomes controversial within an organization and they sometimes want an outside opinion of its usability.
The following case studies exemplify the kinds of testing we currently perform.
This project was done for a medical equipment manufacturer. The project required us to involve users throughout several phases of the project. To start, we conducted 10 contextual interviews  at dialysis patients' homes to learn about their experiences and to see the use of current products. During these interviews, we observed patients using their machines and asked them questions regarding ease of use and training. We summarized what we learned from the interviews in a memo illustrated with photos of the participants and their products. Following the interviews, we attended two industry conferences to learn more about dialysis and competitors' machines. During the conferences, we conducted four focus groups with nurses from around the world who use these machines and who train patients. As part of the focus groups, we presented preliminary design sketches for the proposed user interface and industrial design solutions. Negative feedback led us to set design and usability goals for a new user interface direction. We used a focus-group format instead of a usability test because our sketches were not interactive and because we wanted a wider sampling of participants than we could get in a usability test. Using a focus-group format, we were able to get feedback from about 40 nurses in one day. Specifically, we learned that:
As a result of the focus groups, we changed the design. Next, the design team rated the frequency and urgency of all product features and functionality to determine what needed to be a surface-level control versus a deeper software control. Although we did not involve users during this phase, we believe that it would have helped us complete this exercise more effectively. The marketing team told us patients use 15% of the machine's features 85% of the time, so it was clear to us which features were critical to bring to the surface. However, we did not know what to do with the remaining 85% and could have used input from users rather than just the engineering team's input.
After another round of design, we conducted two additional focus groups with nurses and patients. The purpose of these focus groups was to help the design team select two out of five industrial design models and to select a preferred interaction style for the user interface. In future focus groups, we recommend that companies concentrate on the user interface or the industrial design, not both. We were unable to fully resolve user interface issues because we spent a considerable amount of time discussing the industrial design models. We were lucky that all of our participants preferred the same user interface concept. Next time, we plan to separate the research.
The last phase of our project involved completing the user interface design, prototyping an interactive model using Macromedia Director and conducting a traditional eight-person usability test in our laboratory. We prepared a detailed test plan that described participant recruiting criteria, task scenarios, and data analysis methods. We conducted the test sessions using a standard protocol, recorded task times, noted participant comments, and administered qualitative rating questionnaires to collect individual design preferences. To present our findings, we prepared a detailed report from the study including data analysis charts and illustrated design change recommendations.
Overall, our user-centered approach worked well in helping us create a final design concept. By combining the artistic talent of our user interface designers with user research procedures, we were able to build an artful yet usable product. We expect the product to be tested globally by users in other key international markets where the product will be introduced.
One of the more formal and structured types of usability testing that third-party vendors conduct is competitive testing. In a competitive test, a client wants to compare the usability of their product against its market competition. Often, the client's plan is to use the results of the test in promoting their product in advertising and marketing literature.
The most important requirements of a comparison test are (a) that it be valid and (b) that its method not be biased toward any of the tested products. The freedom from bias includes not only real sources of bias, but also any hint of apparent bias. For example, if the test participants were chosen from a list of customers from one manufacturer, the test findings would not be credible.
To insure the credibility of a comparison test we need to deal with three sources of bias: (1) participant selection bias, (2) task selection bias, and (3) test procedure bias. To avoid participant selection bias, we hire a neutral firm to recruit participants or recruit them ourselves. To avoid task selection bias, we insist on controlling the selection of tasks. In cases in which we do not have the specialized knowledge to select tasks, we hire an industry consultant to do the selection. To avoid procedural bias, we take a hands-off approach to interacting with the test participants. We are very careful about what we say during each task. We avoid praising participants when they complete a task successfully or when they make an evaluative statement about one of the products.
We recently completed a comparison test for a software client's database query software tool against two of its competitors. For the test, we recruited 24 participants who came to our lab and used all three of the competing products. Twenty-four participants were more than we would use for a diagnostic usability test. But for a competitive test, we needed enough statistical power to detect a difference, if there was one. Because we need at least six participants to counterbalance the order and sequence of three products, our final number of participants needed to be a multiple of six. We settled on 24 after calculating the power we would need to detect a reasonably large difference in usability between the products.
All of the participants used query tools in their jobs, but none had used any of the three products in the test. We hired an industry consultant to help us select the tasks. The consultant did not know who funded the study, or even that it was a comparison test. We simply asked him to review our selection of tasks for bias toward any of the common query software and for being typical tasks done with query tools. We translated the tasks into a set of task scenarios that did not use terms peculiar to any of the products. Each of the database query products used the same database of test data.
The participants attempted the same eight tasks with each of the products. We set a time limit for each task to ensure that the participants attempted all of the tasks and had the same amount of time to complete each task. We did not tell the participants when they completed a task successfully. When we used encouragement to keep the participants working, we did so only between tasks or during breaks in the test. After using each product, participants rated its ease of use. At the end of each test, we asked the participants to tell us which product they preferred overall. During the sessions, we recorded task times, errors, and comments from the participants.
When the sessions were completed, we tabulated the data and used appropriate statistics to check for significant differences. We used analysis of variance on mean task times and average ratings. The results showed that 20 of the 24 participants preferred our client's product and that the product had the fastest task times in seven out of the eight tasks. Our client has used the results of this test to promote their product as being more usable than their competitors' in advertisements. The precautions we took in the test design made the results of the test credible.
As a third-party vendor we have learned to be flexible within the constraints imposed by the need to conduct valid tests. We have learned to adapt our techniques to fit within the philosophy of organizations that hire us. Sometimes these organizations only want quick tests with a few participants, while other organizations, especially those with their own usability groups, require careful test design. In both cases, the fact that companies emphasize usability makes the demand for usability services continue to grow.
©1999 ACM 0002-0782/99/0500 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 1999 ACM, Inc.
No entries found