Beta testing is an important phase of product development through which a sample of target users (potential adopters) try a product ahead of its official release. Such testing is practically ubiquitous; in every kind of company, from medicine to software development, participants test and troubleshoot products to help improve their performance and avoid defects.
Companies should care who their beta testers are. In order to generalize beta-testing outcomes, the population of testers must be as representative of the target users as possible. If not, results of the testing could be biased and fail to capture important product flaws; that is, beta testing for different purposes demands different sets of beta testers. Such aspects of beta testing are often underestimated by software developers; their companies often use any beta tester available, without proper selection, and later without analyzing to what extent the testers were comparable to the population of (targeted) users. Though it was once easier for companies to know their beta testers well,6 this is not the case today due to the vast reach of the Internet and quickening pace of releasing updates and new versions. They should indeed pay more attention to their testers, selecting wisely, because appropriate beta testing is more efficient and economical than later potential failure of a new product.
The costs of poor beta testing were apparent from the beginning of software development. An early example, from the 1990s, is a software company that chose only one site for its testing.6 Based on the results, developers then made several changes to the product being developed. Since the beta testers represented only a specific subpopulation of intended users, the product became so customized it could not be marketed to other organizations. For example, in 2012, the Goko company released a web portal for developing multiplayer games but used a pool of beta testers so small it did not notice a serious bug connected with site load as the portal became popular.22 Moreover, beta testing is not only about bug hunting; benefits also include enhanced product support and marketing.6
Here, we present case-study results of a comparison of beta testers and regular users for an online security product we conducted in 2015 and 2016. We analyzed the records of nearly 600,000 participants worldwide, aiming to determine whether the beta testers represented regular users well enough. Despite the fact that alpha testers are well described in the software-development literature, as far as we know, no larger field study comparing beta testers and regular users had ever been published. We thus present what we believe is the first public large-scale comparison of beta testers and regular users.
We investigated whether companies should be more selective about their beta testers or simply take an intuitive approach that says, "The more testers, the better the result." We did not aim to investigate goals or parameters and conditions of beta testing and began with three main research questions:
Do the subsamples have similar profiles with respect to hardware and operating system?;
Do the subsamples reflect similar age, gender, and education profiles? What about cultural background?; and
Do the subsamples see themselves as equally skilled regarding their use of computers and perceive their data as safe?
Here, we review published research in beta testing, describing methods and analyzing data, then map the results for the three questions, and finally discuss study limitations and contributions and actionable takeaways (see the sidebar "Actionable Takeaways" on page 44).
Testing represents 30% to 50% of the cost of software development1 and approximately 50% of development time.17 There are many testing phases, with the first usually involving alpha testers. The number of potential testers is limited by a company's size, and even for big companies, it is impossible to duplicate all possible hardware/software configurations. Whereas alpha testers are typically company employees, beta testers are the first product users outside the company, and their feedback can greatly influence product design before the product can be used by paying customers.
Tapping the universal scope of the Internet, thousands of beta testers with different devices and practices can report on a product before its official release. An additional benefit follows from being able to include test subjects in multiple countries. Since beta participants can come from many different locations, potential localization issues (such as language, currency, culture, and local standards) can be identified and included.22 Moreover, cultural context also affects a new product's perceived usability.24 Beta testers thus bring huge benefits to the development process by detecting potential hardware conflicts and performing usability checking.
While many alpha- and beta-testing studies have been published, the idea of comparing beta testers and regular users had only rarely been tackled when we began. For example, Mantyla et al.13 investigated the related question "Who tested my software?" but limited themselves to the employees of only three companies. Other studies11,14 yielded insight into the software-tester population yet were based mainly on specific subpopulations (such as people interested in testing, users of specialized forums, and LinkedIn participants) or company employees, so a selection bias could have occurred.
We compared beta testers and regular users in various aspects of software use and testing, starting with technology. Having similar devices with regard to, say, hardware and operating systems is a basic requirement for successful software beta testing. Since physical environment influences usability testing,20 the device used to test an application could also influence its usability. For example, security software running in background can decrease the perceived overall performance of the machine it is running on and thus its perceived usability. Participants with low-end hardware could encounter different usability issues compared to those with high-end hardware. Beta testers are viewed as problem solvers or early adopters19 with access to the most up-to-date computer hardware.
We also examined user demographics. Earlier research had reported that regular users' IT-related behavior is affected by gender, age, education, and cultural background. For example, Dunahee et al.8 found that a greater rate of computer use and online activity was associated with lower age, higher education, and being male.8 The differences in IT usage are also related to country of origin.16 Countries differ in terms of the state of their national information society culture, leading to varying access opportunities and creating digital disparities among nations.2,5 As a result, the populations of some nations could be more computer savvy and/or inclined to use free software, even while still in beta. For example, anecdotal evidence suggests Japanese users take up emerging technologies more slowly than users in other countries.15
We analyzed the records of nearly 600,000 participants worldwide, aiming to determine whether the beta testers represented regular users well enough.
Varying patterns of Internet/computer use are also associated with users' computer self-efficacy and attitudes concerning privacy. Computer self-efficacy4 reflects the extent to which users believe they are capable of working efficiently with a computer. Users with a greater confidence in their computer skills tend to use computers more,4 adopt new technology quicker,10,23 and achieve better performance in computer-related tasks.7 Regarding privacy perception, marketing research consistently shows how consumers' online behavior (such as willingness to provide personal information or intention to use online services) is affected by concern over privacy.12,21 Since beta testing usually includes sharing one's system settings, location, or even personal information with the testing company, it may discourage users with more strongly held privacy concerns or those who store more private data on their computers. However, discouraged potential testers could still be an important segment of the end-user population, with distinct expectations for the final product.
We conducted our study with ESET (https://www.eset.com), an online security software company with more than 100 million users in more than 200 countries and territories,a using two samples for analyses: beta testers and regular users of a line of ESET security-software solutions for Windows.
The ESET beta program allowed anyone to download the beta version of a product and become a public beta tester. Despite the fact that users had to complete and return a questionnaire before they could beta test the product, ESET uses no special criteria when selecting its beta testers. ESET beta testers report bugs and/or suggest improvements, motivated by the opportunity to use a beta product for free, possibly sooner than regular users.
We collected our sample of beta testers (N = 87,896) from June 2015 to December 2015 and the sample of regular users (N = 536,275) from January 2016 to March 2016. We first collected anonymized system parameters for each ESET installation, including processor configuration, RAM size, operating system, country, and time spent on each installation screen. We identified countries through the GeoIP2b database (https://www.maxmind.com/en/geoip2-databases). A single data record represented a single installation of the software.
We gave a questionnaire to users at the end of the installation process, saying that filling it out was voluntary; we offered no incentives other than to say that completing it will help ESET improve its products. A total of 6,008 beta testers completed at least one questionnaire item (7.80%), along with 27,751 regular users (5.56%). The questionnaire was in English, and we collected no identification data. The questionnaire was also a source for collecting demographic data and privacy perceptions.
Data cleaning. We cleaned the data to remove tester and user information associated with ESET's internal IP space domain (0.282% of the sample), ensuring we would exclude ESET's own alpha testers. Moreover, since each data entry reflected only a single installation, duplicate entries could potentially have come from the same device. To inhibit bias, we identified cases with the same combination of hardware specification and IP address, randomly selected one, and deleted the rest, thus removing 7.429% of beta tester and regular user data.
We presented the whole questionnaire on four screens, using the time testers and users spent on each screen to clean the data; we considered as invalid testers and users who spent less than six seconds on a screen with two items and those who spent less than seven seconds on a screen with three items, omitting their data from our analyses; N = 10,151, or 30.1%, of questionnaire respondents.
The final cleaned sample for the study thus included 576,170 installations on unique devices, including 29,598 questionnaires with at least one answered item (see Table 1).
Analytical strategy. We used the 2 test (categorical data) and t-tests (interval data) to assess the differences between beta testers and regular users; analyses on large samples typically show statistically significant results even for very small effects. When considering such results, it is important to interpret effect size rather than significance alone. We thus calculated Cramer's V (c) for categorical data and Cohen's d for interval data. For c, the value of 0.1 is considered small, 0.3 medium, and 0.5 a large effect size, and for d, the respective values are 0.2, 0.5, and 0.8.3,9
The fact that our questionnaire data came from only a subsample of users could suggest possible bias in our results (see the section on study limitations). For insight into the differences between the samples with and without the questionnaire, we compared users with regard to the parameters available for them all, including platform information, CPU performance, RAM, and OS version. We found the effect of the differences to be negligible (c < 0.034). We are thus confident the questionnaire data was valid and informative, despite having been obtained from only a small subsample of users.
We first looked at the technology, including hardware platform (32 bit or 64 bit), CPU model, RAM size, and OS version.
Hardware. The platforms running ESET software differed only slightly between subsamples; 35.3% of beta testers used 32-bit systems, while approximately 34.5% of regular users used 32-bit systems; 2(1) = 20.998, c = -0.006, p < 0.001, and N = 576,170.
We categorized CPU performance into four groupslow-end, mid-low, mid-high, and high-endbased on the PassMark CPU Mark criterion.18 We matched CPU name against the PassMark online database. Since CPU names are not standardized, we were unable to assign the score in 3.040% of the cases; NnocpuMart = 17,514, distributed proportionally among beta testers and regular users.
The beta testers were more represented in the low-performance category and regular users in the mid-high category. The proportions were notably similar in the mid-low and high-end categories (see Figure 1). Although statistically significant, the effect was small; 2(3) = 1187.546, c = 0.045, p < 0.001, and N = 576,170.
We likewise grouped RAM size into four categories: 02GB, 24GB, 48GB, 8GB, and >8GB. Regular users' proportion was greater in the 2GB-4GB category, while beta testers dominated in the lowest, or 0GB-2GB, category. The proportions in the two largest-size categories were similar, as in Figure 1. The small size of the effect suggested differences were negligible, despite being significant 2(3) = 206.926, c = 0.019, p < 0.001, and N = 576,170.
Operating system. Beta testers predominated in the two most current OS versions at the timeWindows 8 and Windows 10while regular users predominated in Windows 7, with nearly equal representation using Windows Vista and XP (see Figure 2). The size of the effect was again small at 2(2) = 1,925.745, c = 0.058, p < 0.001, and N = 575,979. Other Windows versions (such as Windows 98 and Windows 2000) were also marginally represented but omitted due to the extremely low counts; that is, <0.001%, NotherWinVersions = 191. Note the study targeted only users of Microsoft Windows software.
We found that Windows 10 was more often used by beta testers than by regular users, even though we collected their data sooner; regular users in the survey thus had more time to upgrade, indicating beta testers are often recruited from among early adopters.19
Specific configurations. We were also interested in specific configurations of users' devices. We combined all four technological aspects, including OS platform, CPU performance, RAM size, and OS, to help us identify 116 unique hardware+software combinations in the dataset of 43,519, or 7.556% of the total sample. The sample of regular users included 114 combinations; we found two specific combinations among beta testers' devices not find among regular users, and the sample of beta testers included 102 combinations. However, the combinations not present among beta testers were only marginally present among regular users at NonlyStandard = 52, or 0.010%, leading us to conclude that for almost every regular user in the sample, there was a beta tester in the sample with the same combination of examined parameters.
Here, we discuss participants' cultural and demographic profiles, focusing on country of origin, gender, age, and attained education.
Country of origin. As noted, we based a participant's country on the GeoIP2 database search, a procedure that failed to assign a country to 0.4% of participants NnoCountry = 2,408. We grouped countries by continents and compared the two subsamples (see Figure 3), observing significant differences, notably that beta testers substantially predominated in South America and Europe, while regular users were more often based in Asia, Africa, and Australia/Oceania, with 2(5) = 39,049.72, c = 0.261, p < 0.001, and N = 573,538; see Table 2 for detailed information regarding the study's most represented countries. Only Iran, India, Egypt, and the U.S. were represented in both subsamples.
ESET has subsequently begun to investigate these issues with respect to product localization and usability, where country differences likely play a role.
Gender and age. Figure 4 includes basic information regarding demography. In both subsamples, males represented the vast majority of study participants, though there were more females among regular users than among beta testers, with 2(1) = 277.493, c = 0.099, p < 0.001, and N = 28,328.
Regular users were on average older than beta testers, with Mbeta = 32.96, SD = 12.974; Mstandard = 35.74, SD = 16.327; t-test (25 938) = 11.108; p < 0.001; d = 0.195, and N = 28,940. Due to the wide range of ages among study participants11 to 80we categorized all ages into seven groups to analyze the differences in more informative ways, as in Figure 4. For example, there were significantly more beta testers than regular users ages 21 and 50, while the opposite applied to other categories, with 2(6) = 366.286, c = 0.119, p < 0.001.
Education. Education attainment reflected a consistent pattern in both subsamples, with college being represented most and primary school least. The pattern was consistent even when we omitted the youngest users, or those who could not have yet reached higher education. Beta testers were more represented in secondary education than regular users, but the effect size was small, with 2(6) = 237.085, c = 0.038, p < 0.001, and N = 26,354.
Other demographic insights. We combined the demographic data of study participants to determine how well beta testers also represented various demographic segments of regular users. Combining seven categories of age, gender, and education helped us identify 42 unique combinations. Only two were present in the sample of regular users (none among beta testers), both female, ages 71 to 80, one with primary (Nstandard = 4), the other with college education (Nstandard = 109). Remaining combinations were present in both subsamples, with a fairly similar distribution. The greatest difference we found was among males, ages 31 to 40, with college education, who were represented more often among beta testers (14.172%) than among regular users (9.539%).
We assessed users' computer self-efficacy and privacy perceptions through dedicated questions in an optional questionnaire, covering installation-related actions like displaying the target installation folder.
Each ESET software installation included an option for changing installation folder. Beta testers and regular users thus had to click on the "change installation folder" link on one of the screens during the installation process to go to the respective screen. This action was also the only way a user could see the default installation folder, not otherwise displayed. Only a few participants did this, with beta testers visiting the screen more than twice as often as regular users, with 1.1% of regular users and 2.6% of beta testers. This difference was statistically significant, though the effect size was negligible, with 2(1) = 1215.180, c = 0.046, p < 0.001, and N = 576,170.
Computer self-efficacy and digital skills. We included two questions to help us assess users' digital skills:
Do you consider yourself a skilled computer user? Likert scale from 1 (not at all skilled) to 6 (extremely skilled); and
Regarding this computer, are you an IT technician? Y/N. Participating beta testers were more often IT technicians, with 2(1) = 285.988, c = 0.110, p < 0.001, and N = 23,607, judging themselves more skilled than regular users, at Mbeta = 4.46, SD = 1.313; Mstandard = 4.18; SD = 1.473; t-test (22,631) = -11.743; p < 0.001; d = 0.200; and N = 22,633.
Privacy perceptions. The last part of the questionnaire asked about how private data is stored in users' computers, how sensitive users are regarding their privacy, and users' beliefs about the computer being generally a safe device. We measured all items on a six-point Likert scale ranging from 1 (not at all) to 6 (extremely private/sensitive/safe) by asking:
Beta testers and regular users alike reported the same average level of private data in their computers, with Mbeta = 4.678, SD = 1.419; Mstandard = 4.690, SD = 1.560; t-test (24,323) = 0.504; p = 0.614, and N = 24,325, and both quite similar in being privacy sensitive, with Mbeta = 4.755, SD = 1.376; Mstandard = 4.809; SD = 1.492; t-test (23 976) = 2.272; p < 0.05; d = 0.037, N = 23,978. We found only one small difference in their evaluations of general computer safety: Beta testers considered computers slightly safer than did regular users, with (Mbeta=4.098, SD = 1.712; Mstandard = 3.902; SD = 1.819; t-test (23 832) = 6.784; p < 0.001; d = 0.111, N = 23,834). We observed that beta testers consider themselves more skilled as IT users and the computer as a safer device than do regular users. This might suggest they were aware of security risks associated with computer use and felt capable of addressing them.
Some limitations beyond our control could have influenced these results. Despite our careful cleaning process, we could not be completely sure that each record corresponded to a unique participant/device. For example, the OS version was based on the Windows system variable "current version" that did not differentiate end user and server products. However, we assumed the number of servers in the study was negligible, as the installed base of ESET systems was, at the time, designed for end-user devices. We also lacked details of participants' devices, technological measures that might have shown more nuanced configuration discrepancies.
The relatively small ratio of users completing the questionnaire could also have represented other limitations. First, self-selection and non-response bias might have skewed our results. For example, most study participants reported at least some college education and could have thus been expected to be able to recognize the value of user feedback better and be more willing to complete a product-related questionnaire. However, they did not differ in terms of hardware or software from those skipping the questionnaire altogether. We had only a few options for validating participants' answers. Despite the thorough cleaning, some flawed questionnaire answers could have remained. Also, writing the questionnaire in English could have discouraged users not proficient in that language.
The datasets of participating beta testers and regular users included different numbers of participants and were collected at different times. This could have influenced the number of participants using, say, Windows 10, as the study was conducted during a free-upgrade period. Moreover, the research was based on only the English versions of the software, missing customers who prefer other languages.
Working with security-software firm ESET, we conducted a large-scale comparison between beta testers and regular users of ESET's main product. We focused on technological aspects of ESET's user demographics and nearly 600,000 users' self-reported computer self-efficacy.
The participating beta testers were early adopters of newer operating systems, and their distribution was significantly skewed toward the most current versions at the time, despite having limited time for Windows 10 migration. They also tended to be younger, more often male, and perceived themselves as more skilled with their computers and also more often IT technicians, supporting the "beta testers as geeks" stereotype. However, their hardwareplatform, CPU performance, and RAM sizewas similar to that of regular users, somewhat contradicting the popular image.
We found a striking difference in their countries of origin; from the top 10 most represented, only three appeared in both subsamples.
Overall, study beta testers represented regular users reasonably well, and we did not observe a regular-user segment that would be underrepresented among beta testers. ESET's approach of not filtering beta testers and "the more testers the better" followed by analyses of selected observed differences seems sufficient for developing its software products. For large international companies able to attract large numbers of beta testers, this may be the most efficient approach. However, for smaller, local, or less-well-established companies, this approach would probably not yield representative outcomes and could even shift development focus in a wrong direction.6
For more, including a video, see http://crcs.cz/papers/cacm2018
We thank Masaryk University (project MUNI/M/1052/2013) and Miroslav Bartosek for support and to the anonymous reviewers and Vit Bukac for valuable feedback.
7. Downey, J.P. and Rainer Jr., R.K. Accurately determining self-efficacy for computer application domains: An empirical comparison of two methodologies. Journal of Organizational and End User Computing 21, 4 (2009), 2140.
8. Dunahee, M., Lebo, H. et al. The World Internet Project International Report, Sixth Edition. University of Southern California Annenberg School Center for the Digital Future, Los Angeles, CA, 2016.
11. Kanij, T., Merkel, R., and Grundy, J. An empirical investigation of personality traits of software testers. In Proceedings of the IEEE/ACM Eighth International Workshop on Cooperative and Human Aspects of Software Engineering (Florence, Italy, May 18). IEEE, 2015, 17.
14. Merkel, R. and Kanij, T. Does the Individual Matter in Software Testing? Technical Report. Centre for Software Analysis and Testing, Swinburne University of Technology, Melbourne, Australia, May 2010;
17. Pan, J. Software testing. Dependable Embedded Systems 5 (Spring 1999); https://pdfs.semanticscholar.org/28ab/bfdcd695f6ffc18c5041f8208dcfc8810aaf.pdf
18. PassMark Software. CPU Benchmarks; https://www.cpubenchmark.net/
19. Perino, J. 6 Different Types of Betabound Testers: Which Are You? Sept. 11, 2014; http://www.betabound.com/6-types-beta-testers/
22. uTest, Inc. The Future of Beta Testing: 6 Tips for Better Beta Testing. White Paper. Southborough, MA, Sept. 2012; http://www.informationweek.com/pdf_whitepapers/approved/1376402531_uTest_Whitepaper_Beyond_Beta_Testing.pdf
24. Wallace, S. and Yu, H.-C. The effect of culture on usability: Comparing the perceptions and performance of Taiwanese and North American MP3 player users. Journal of Usability Studies 4, 3 (May 2009), 136146.
Copyright held by authors. Publication rights licensed to ACM.
Request permission to publish from firstname.lastname@example.org
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
No entries found