Credit: Kentoh / ShutterStock.com
Magic numbers are strictly hocus-pocus, so usability studies must test many more subjects than is usually assumed.
Erratum #1
In the geometric series formula on p.65 the n is an exponent:
D=1(1p)^n
The author
As you pointed out on page 70, counting usability problems to determine sample size is a futile exercise.
The reason you gave is that there is little consensus on what constitutes a usability problem, so they can't be counted.
Another reason you did not mention is that a website or other system that undergoes continual redesign never becomes free of opportunities to improve usability. A typical update improves some usability problems while introducing new ones. You can't find 80% of a large, unknown number of problems.
Your comments about sample size apply fairly well to benchmarks or other summative studies. They fall short if applied to formative studies. Early-stage studies are not conducted to minimize the number of problems that users will experience but rather to choose the most usable design among a set of alternatives.
The experimenter should plan a usability study program that's within an approved budget commensurate with business priorities, then test until funds run out or nothing important has been learned lately.
Thank you for your valuable comments. In the particular context of iterative web design, I can imagine that strict management of sample size, as I describe it, isn't very likely. And I did say so!
But, please, do not confuse my approach with others' suggestions on sample size requirements for (summative) usability measures, such as completion rates or durations. They are just about statistical precision. In contrast, I am interested primarily in effectiveness: To what extent have we learned what's wrong with a design?
You're saying that another purpose of early testing is choosing the best out of major alternatives. Let's assume you test your early designs of a website with moderate ten users. Figure 4 (green dotted line) tells that after ten sessions you haven't even seen half of what is wrong with your design. Would you see this as a solid base for a strategic decision on design? Wouldn't you at least like to know the level of confidence you place your decision with?
And what if you're designing a medical infusion pump?
The following letter was published in the Letters to the Editor in the August 2012 CACM (http://cacm.acm.org/magazines/2012/8/153804).
--CACM Administrator
We wish to clarify and expand on several points raised by Martin Schmettow in his article "Sample Size in Usability Studies" (Apr. 2012) regarding sample-size calculation in usability engineering, emphasizing the challenges of calculating sample size for binomial-type studies and identifying promising methodologies for future investigation.
Schmettow interpreted "overdispersion" as an indication of the variability of the parameter p; that is, when n Bernoulli trials are correlated (dependent), the variance can be shown as np(1p) (1+C), where C is the correlation parameter, and when C>0 the result is overdispersion. When the Bernoulli trials are negatively correlated, or CLan, C.E., Joseph, L., and Wolfson, D.B. Bayesian sample size determination for binomial proportions. Bayesian Analysis 3, 2 (Feb. 2008), 269296.
Displaying all 4 comments