Who Should Test Whom?

Examining the use and abuse of personality tests in software engineering.

Posted Jan 1 2007

Introduction
Analyzing Personality in Software Engineering Research
What Can Personality Testing Offer Software Engineering? Dispelling the Myths
Will the Real MBTI Please Stand Up?
Test Administration and Feedback
Conclusion
References
Authors
Figures
Tables

The construction of software engineering teams, the interaction between members, and how individual personalities influence these, has been a concern from the 1960s to the present day [5]. Nevertheless, despite claims from leading figures in the field that it is fundamentally people that make the difference between software success and failure, a corpus of knowledge and good practice has failed to emerge. While there have been some attempts to investigate these issues through the application of psychometric tests, the issue of what personality analysis can or cannot offer software engineering is still open for debate [6, 9]. In this article we argue that the lack of progress in this field is due in part to the inappropriate use of psychological tests, frequently coupled with basic misunderstandings of personality theory by those who use them. To support this case we will present our analysis of papers that focus on the empirical use of personality tests in a software engineering context. Our analysis is supported by the expertise of the first author, who is both a chartered psychologist and a trained administrator qualified in the use of MBTI and 16PF psychometric tests. We conclude with a set of recommendations for test application and use for researchers, participants, and readers.

Analyzing Personality in Software Engineering Research

We surveyed papers published in the software engineering field relevant to the topic of personality testing, using digital libraries. This process generated 40 papers published between 1984 and 2004. From this pool 13 distinct papers were identified that focused on the empirical use of personality tests in a software engineering context: this subset is used to illustrate our arguments (osiris.sunderland.ac.uk/~cs0hed/CACMdata provides access to the full data set). Our analysis of these papers concentrates on examining test selection to identify whether reliable and valid instruments have been used, whether the test chosen is appropriate for the purpose, and the extent to which the personality testing process used is explicitly reported and discussed. It is our contention that as a minimum a paper must account for these issues if there is to be any confidence in the resultant data analysis. The majority of the papers surveyed (25 out of 40) focused on the Myers-Briggs Type Indicator (MBTI); we will therefore confine our discussion to this tool. The MBTI classifies personality in terms of people’s preferred ways of operating in the world. It categorizes individuals into one of 16 personality types. These types are derived from people’s expressed preferences on four functions: (E)xtroversion vs. (I)ntroversion, (S)ensing vs. (I)ntuition, (T)hinking vs. (F)eeling, and (J)udging vs. (P)erceiving [2]. Each type has a number of positive features: there is no ideal type, they are all equally valued. The eight functions and their focus are summarized in the table here.

What Can Personality Testing Offer Software Engineering? Dispelling the Myths

In general, the MBTI has been used within software engineering research in one of two ways: to discover the personality type(s) that most typify good software engineers, or to identify the makeup of software development teams that are likely to work well together, or exhibit tensions.

Capretz [3] studied the MBTI types of 100 software engineers and found the largest type was ISTJ. While Capretz acknowledges there is no link between type and performance, and that other factors have a bearing on career choice, he goes on to state that these findings are important for employers looking for software engineering professionals. More recently, the MBTI was used to investigate the link between personality type and a code review task with a sample of 64 students [4]. In this study those with an NT (Intuition-Thinking) preference were seen to perform the task better than other types: the largest single type was ENTP. The authors expressed surprise as their results conflicted with those of Capretz. However, these findings do not tell us a lot about the ideal or even adequate software engineer, given the fact that type is not normally distributed in the population. As Kerth et al. [9] correctly point out: personality tests cannot identify good software engineers over bad, nor can their results predict “on the job” performance; there is some evidence of the importance of other factors, such as work experience [11]. Where researchers wish to identify the personality factors related to software engineering, or those factors that typify a group of exceptional software engineers, a more appropriate approach would be to use a trait-based instrument (such as the 16PF) where comparisons to a normative sample can be made. The main barrier to this approach would be choosing, or most probably creating, a representative normative sample for comparison.

The relevance of the MBTI to identify the makeup of software development teams has been limited, in that observable behavior is not always related to the underlying type. People can, and do, choose to operate in the non-preferred mode as situations dictate. The MBTI is a tool for the development of self-awareness and, when results are shared, awareness of others. Knowledge of personality type within a team allows people to expect others to react differently from themselves and equip them to cope more constructively with those differences. As such, the MBTI can be used to improve teamwork with the hoped for byproduct of improved productivity and quality, as long as the test is used properly.

Will the Real MBTI Please Stand Up?

The value of any psychometric instrument is directly related to the techniques used during construction; not all psychometric tests are created equal. Test publishers describe the precise methods of test development, in particular, statistical data relating to test reliability and validity. They do this because to ignore such issues would render a test worthless: a poor test will yield poor results. However, the importance of this appears to be lost on many of those who use such tests. Unfortunately, the casual reader of many of the articles discussed here would not see the significance of this point, or its likely bearing on the validity of the research, because in the majority of cases details of the specific tests used are glossed over, and in some cases, misrepresented. Even when researchers have used the real MBTI, for example [1, 11], details of the administration process are missing.

Karn and Cowling’s [8] study of the interactions of personality types during software development claims to have used the MBTI to identify the individual personality types of two teams of student software developers. In fact, the MBTI was not used, a later technical report [7] reveals that a freely available test was used (www.humanmetrics.com). While it is claimed that there are “no significant statistical differences between this test and the MBTI” [7], the argument is not convincing. On inspection of www.humanmetrics.com, no data is provided on the methods of test construction, no reliability or validity data, and no MBTI correlation data. Moreover, in our opinion, the content and style of the site itself is hardly indicative of a professional organization: no surface contact details are provided and there is no firm evidence of credentials. The site offers an interesting range of other free tests including, “find your perfect partner”—perhaps the basis for a new slant on the concept of “pair programming”? Although this site might offer some amusement, the potential effects on the subject group are not so lighthearted. A critical part of the administration process is gaining client acceptance and willingness to answer honestly. The testing environment in this case can in no way have guaranteed that the subjects will have taken the process seriously.

Miller and Yin [10] discuss the use of the MBTI in the construction of software inspection teams. They claim to use the MBTI within this study, but then comment that they use the “standard approach of online specialized questionnaires.” We were interested to discover precisely which version of the MBTI had been used within this study and through personal communication with the authors it was established that rather than using the full MBTI, they had in fact constructed their own test: no details of test construction and validity were provided.

You might ask: So what? Well, the development of a robust personality measure is a time-consuming, iterative process that can take many years, not least because personality measures present particular problems during construction. For a test to be of any value it must be both reliable and valid. Test reliability is the extent to which a test is consistent within itself, and over time. That is, the degree to which a test will give the same score or personality type for an individual on retesting. Test validity is the extent to which a test measures what it is intended to measure. Reputable tests such as the MBTI provide statistical data on these factors and details of the methods and samples used to gather this data. To ignore these factors when choosing a test will increase the possibility of acquiring misleading data.

In addition, respondents may attempt to distort their profiles; for example, by responding to items in ways they believe will create a favorable impression. Care must be taken therefore during item development to limit the insight a respondent may have, and to ensure that one pole of the preference does not appear more appealing or acceptable than the other. Standardized tests such as the MBTI are developed in a way that will limit the effects of such response sets. However, this peace of mind comes at a price: tests such as the MBTI can be relatively expensive to purchase. Freely available tests generally do not provide data on reliability and validity. Nor do they offer an insight into the test construction process, nor comment on the possible effects of response sets and how the test design limits these effects. Taken together, the two issues of the lack of detail provided on test construction, and the absence of validity and reliability data, severely limit our ability to trust the results of such tests. A test is worthless if we cannot be sure that it measures what it is supposed to measure, and its results are consistent over time.

Test Administration and Feedback

All tests, including the MBTI, have a degree of error in their accuracy, and this error may be amplified by external factors, the most potent being the administration process. Therefore, all standardized tests provide administration procedures and it is important that these procedures are followed. Administration is not a simple process of issuing instructions and asking people to complete question booklets—it involves a degree of skill to ensure that the need for standardization is met and that clients are at ease and have a good understanding of personality theory and the underlying assumptions of the instrument in question. Test publishers are fully aware of this and consequently the purchase and use of standardized tests is restricted to qualified users who have undergone specific training.

In the case of personality assessment an important aspect of the process is providing feedback it to clients. With the MBTI feedback is an absolute necessity, as it involves type verification and the process of “best fit.” The MBTI questionnaire provides an indication or estimate of an individual’s personality type (the “reported type”). Type verification occurs during a client feedback session; in some instances type verification cannot be resolved in a single session.

The feedback process involves the test administrator explaining the history and aims of the MBTI, along with a description of the four functions, the client then self-assesses their type. This is done through a process of open discussion during which the test administrator asks questions that will facilitate reflection. It is essential that this process takes place as there may be discrepancies between an individual’s reported type and their self-assessment of their type. A number of studies have investigated discrepancies between reported and self-assessed type. Generally speaking, disagreements with reported type occur more frequently in dichotomies where the expressed clarity index is weak. For example, Walck [12] found that out of a sample of 256 people 75% agreed on all four letters and 97% agreed on three out of the four. The clarity index is a score that represents how sure the respondent is that he or she prefers one dimension over the other. It also provides an indication of those preferences where the client may not agree with their reported type. However, it does not, as Hardiman (in response to [9]) suggests, provide an indication that the respondent has degree of preference ranging on a continuum. The process of helping a client reach “best fit” can only be done by a qualified test administrator. Untrained individuals might bias this process through inappropriate or leading questions, their own misunderstandings of the MBTI and Jungian theory, or through the influences of their own personality type. If this process is not undertaken we cannot be sure the respondent would agree with the reported type. This is an important point since the clarity index data reported in some papers, for example [8], suggests the reported type for some respondents on some dimensions is extremely weak, whereas Bradley and Hebert [1] have examples of equal clarity indices for a pair of functions, but no discussion of how the choice for one over the other was made (for example, N rather than S).

None of the papers reviewed discussed the administration and feedback process in any detail. Therefore, even if we discount the problems of using inappropriate tests, the basic concept of the individual being involved in the process of “best fit” has been ignored, therefore we cannot be certain about the accuracy of the types identified in the research papers. These aspects may not have been neglected but, if they were carried out, they were not seen as worthy of discussion (despite being central to the effective and acceptable use of the MBTI).

Conclusion

Our analysis of what is required for effective use of psychometric tests leads us to the following recommendations aimed at potential participants, researchers, and readers.

We recommend to a potential participant (whether work or research related) that you ask the following of your tester:

1. What test is to be used? Press for the specific test and its version, a qualified tester will be able to be precise.

2. Is it a recognized and validated test? If it is not either politely refuse or, if you agree to be involved, be aware of the limitations of the test and its results.

3. Are the testers fully trained and qualified to administer the process? If this question causes bemusement then the answer will be “no.”

4. How will the process be administered? A valid approach will be relatively time consuming since for each individual there will be pre-test and post-test discussions in addition to the test time.

To a researcher whose aim is to investigate the impact of personality in a software team we suggest the following:

1. Become qualified, or team up with a qualified tester, and use standardized tests: use the flowchart provided in the figure here to help identify the appropriate test, and process, for your work.

2. Ensure in publications that you are clear about the tests and the process used and the following information is provided: the test used, the administration process, who the qualified testers were, how feedback was given, whether the types in the paper are reported types or verified types (derived after feedback).

To a reader of such articles we suggest the following:

1. Look for explicit details of types of test used, administration process, the qualifications of the testers.

2. Don’t assume because a paper has been published in a prestigious journal it is flawless.

Finally, people are entitled to develop questionnaires of their own to test out their hypotheses, gather data, and report results. These approaches are not invalid or necessarily suspect and therefore we are not warning against such work. However, those who claim to be using personality testing are making claims of authenticity and validity that are often not warranted. Software engineers often complain about those who, in the course of their work, do some programming in support of their professional activities: the claim being that such individuals are not professionals and do not understand the discipline. The same can be said of those who adopt psychological approaches without the relevant qualifications and background.

Figures

Figure. Guidelines for the testing process.

Tables

Table. The MBTI functions and their focus.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Who Should Test Whom?

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1188913.1188919

January 2007 Issue

Published: January 1, 2007

Vol. 50 No. 1

Pages: 66-71

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Jul 26 2024

Establishing Standards for Embodied AI

Shaoshan Liu

Architecture and Hardware

vitruvian man on green binary code background, illustration

BLOG@CACM Jul 24 2024

A Pioneer in Using AI to Teach Reading

Jeremy Roschelle

Architecture and Hardware

BLOG@CACM Jul 23 2024

A Versal Story in the Era of Hardware AI: Why the Chinese Could Win

Aleksandr Romanov and Maksim Popov

Architecture and Hardware

worker amidst rows of circuit boards at Chinese factory

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Analyzing Personality in Software Engineering Research

What Can Personality Testing Offer Software Engineering? Dispelling the Myths

Will the Real MBTI Please Stand Up?

Test Administration and Feedback

Conclusion

Figures

Tables

Who Should Test Whom?

DOI

January 2007 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.