Research and Advances
Architecture and Hardware

Protocol Analysis: A Neglected Practice

Protocol analysis is an evidence-gathering technique with an established record in experimental psychology. Here, it is applied to study how programmers solve problems.
  1. Introduction
  2. Role of Protocol Analysis
  3. Methodology
  4. Observations
  5. Conclusion
  6. References
  7. Authors
  8. Figures
  9. Tables
  10. Sidebar: Think-Aloud and Protocol Analysis

An undesirable characteristic of the computing profession is its lack of an established tradition of looking to evidence to provide support for its ideas. Practice is still apt to be based on anecdote, advocacy, and vague measures (“using our tools/methods/practices will improve your productivity by 25%”). Similarly, researchers seem reluctant to adopt relevant techniques from other domains [2, 8].

However, the past decade has seen a distinct advance in thinking about the nature of evidence in computing and about how it can be aggregated [4]. There have also been useful experiences adopting and adapting experimental techniques from other domains (such as the use of surveys [6]).

Here, we describe our experience using protocol analysis [1]. Well-established in experimental psychology, this technique is rarely employed in computing research; [2, 8] found no evidence of its use. Like surveys, protocol analysis can help extract expert knowledge and reasoning and codify experience, though in a complementary manner. (Surveys need a reasonably large sample of respondents, whereas protocol analysis is more appropriate for small samples.) Our experience may aid other researchers in gathering evidence and practitioners in assessing the quality of any evidence presented to them.

We adopted protocol analysis when studying how software designers employ available documentation for designing systems from existing components. Earlier work indicated that our subjects drew on many sources of knowledge, including component documentation [7]; we wished to gain more insight into the ways they assessed and employed that knowledge, reporting some outcomes in our earlier work [5].

Back to Top

Role of Protocol Analysis

As our aim was to investigate not only what but why documentation was accessed, we needed to collect information related to the reasoning of our subjects. We used tools for extracting and composing components and for document browsing we developed for our earlier work, along with integral logging capabilities that could be used to record developer actions in a nonintrusive manner [7].

Using them as our sole source of data involved two limitations: that certain pertinent actions would either leave a partial record (such as browsing a page for subsequently unused information) or create no record at all (such as using other system elements), and that our study required finer-grain data than had previously been recorded. We therefore used think-aloud to supplement existing logging capabilities and protocol analysis to filter and process the resultant data (see the sidebar “Think-Aloud and Protocol Analysis”). Table 1 outlines an extract from one of our sessions, along with the coding applied.

Back to Top


We employed Unix processes as components and a group of four subjects, each with several years’ experience using Unix but no experience writing shell scripts. They were allocated two tasks: produce a simple source code control system that would allow users to store and retrieve particular versions of source code and produce a utility to count, possibly recursively, types of files—linked, empty, and directory—in a specified path.

We gave the subjects access to all component utilities available on the system—standard Unix-type utilities plus GNU and other derivatives—and to the associated locally installed documentation. Extra introductory documentation regarding the use of the editor software and the production of shell scripts was provided on paper (see the figure for a schematic of the overall process). To limit the effects of knowledge transfer, pairs of subjects performed the tasks in the opposite order.

We used three sources of data:

  • Logs generated by the tools;
  • Videotape of the computer display recorded during the experiments, allowing us to follow all actions with on-screen representations; the logs could then be used to index the actions and help explain actions with no clear visual manifestation; and
  • Think-aloud data (utterances), which were also captured on videotape; recording the surrounding physical environment helped capture gestures and notes made by the subjects; the duration of a session (more than two hours) was too long for a single audio cassette.

We used a dry run with one of the tasks and a subject from the previous experiments to check the viability of our arrangements, allowing us to check recording levels and field-of-view for the camera, as well as provide initial data that could be used as a basis for the protocol construction.

The use of think-aloud and protocol analysis together [1, 8] is usually predicated on a subject having completed some form of training for the process of thinking aloud. The problem here was that our investigation focused on knowledge acquisition, so training could have had an effect on the final outcome due to the related knowledge transfer. The length of the sessions allowed us to attempt this procedure without think-aloud training.

Constructing the protocol. Analysis of the data relies on construction of a suitable protocol so as to exclude any data that cannot be encoded precisely. It must be flexible enough to encode all possibly relevant utterances yet specific enough to remove any possibility of ambiguity.

Our protocol adapted the form described in [10] where the need was to produce data comparable to other data obtained outside the immediate study (requiring a protocol not limited to its initial purpose). The use of segmentation within that protocol provided for this possibility; for our study we decided that a similar structure would act similarly and provide a simple way to record the temporal positions of connected (though not necessarily contiguous) thoughts and actions.

The protocol divides utterances into several categories—inquiries, assertions, derivations, and verifications—each of which may be extended to indicate either the cause or the effect of the utterance (as appropriate). Provision is made for knowledge to come from:

  • Appropriate man pages;
  • Tools;
  • Instructional material included with components;
  • Test execution of components;
  • Written material provided as part of the task; and
  • Subjects’ notes.

Use of prior knowledge also needs to be identified; though such use is often conjectural, utterances can hint at it.

Protocol analysis can be used by computer scientists and software engineers to investigate the influence of human factors in software development.

The protocol allows recording of:

  • The outcome of any inquiry or verification, including how successful was the attempt to address the original problem; and
  • The result of any categorical coding instigator, including whether the information was used.

In order to allow overlapping streams of thought or multiple attempts to address a situation (often witnessed), the protocol allows the recoding of a defer condition in which an outcome is evaluated in terms of other outcomes from the original instigator. The resultant coded data thus encapsulates the knowledge-acquisition-related phases of the original data and removes extraneous information. Table 2 is a coded interpretation of the sequence in Table 1.

Alongside the coding instigators, we also employed certain secondary codings. We used an INFER coding to transform a particular utterance into a more malleable form. Since attempts to do this could introduce bias, we deemed it important to recognize it as independent from any other coding effort. We also found it useful to be able to embed coders’ notes into the coded transcript in order to reflect incidental information that has no outcome as regards the interpretation of the coded data (such as timing issues and indexing marks). This coding can also contain notes to explain why a particular piece is coded in a particular way (such as an inquiry with no apparent accompanying attempt to view information).

The visual data was coded in a similar manner to aid integration with the coded utterances. This second protocol captured the movements of and interactions with graphical elements. It therefore provided information regarding the contents of the screen (and hence the data before the subject at a given time) and the interactions influencing the display, not least of which was construction of the script.

The two sources of data were then merged to produce a single transcript describing a particular session. Owing to the large size of the data set, though coded, any appreciation of patterns and structures within it was difficult, so a further process of visualization was needed.

Analysis. We used a form of the linkograph [3] for visualization. The coded data is represented as sequential points on a horizontal line (only the ordering, not the intervening time, is recorded). Each datum is considered in turn in an attempt to identify possible forward-looking or backward-looking links to other occurrences (points). Such links exist when a given operation implies the existence of a prior or future operation; a search implies the existence of a future “information found” occurrence, along with a prior occurrence in which a decision was made about what to search for.

This technique allows the length and composition of thought-action processes to be seen much more clearly. The results emphasize the thoughts that are abandoned and those that are relevant and yet have no apparent cause. Seeing the way in which individual coded items are linked also aids consideration of extended moves in the interaction with documentation by showing the span of particular investigations and giving indications of the ways in which they overlap and influence one another.

Practical benefits. We adopted protocol analysis as the only means by which we could capture and analyze certain important forms of data, including:

  • Patterns of information use that leave no trace in the log (such as a decision not to use a particular component); and
  • User intentions (such as those involving such high-level design decisions as architectural form, choosing among similar components, and sub-goal identification and prioritization and such lower-level code fixes and alterations as user perceptions of sources and forms of error.

None of this is knowledge we could have reliably obtained by interviewing subjects following the experimental sessions.

Back to Top


Table 3 summarizes some of our experience transcribing the tapes and applying protocol analysis to the encoded data. Precise transcription is less important once a protocol is finalized; when a coder is familiar with a protocol, its application, and the type of raw data being collected, only certain salient parts must be recorded.

However, while these techniques supported our study of CBSE design, think-aloud does require subjects to behave differently, and the use of protocol analysis is highly dependent on devising an appropriate protocol.

The think-aloud technique relies on people talking for an entire session. We found that subjects were prone to periods of silence in the beginning but would resume talking when prompted. We encountered similar problems toward the end of a session, usually due to the process of knowledge acquisition being blocked or simply to the length of the session.

The results obtained may be affected by the lack of subject training. We therefore had to be mindful that results from early in each session might not be as insightful as we hoped. We assumed that later periods would suffer less from this effect, if at all, as the earlier period would function as a training task. We also assumed that subjects might begin with more general inquiries relating to the nature of the task and the use of the tools. Our results indicate that early in each session (particularly in the first), subjects were looking for such general information; we thus feel that the lack of training probably had no detrimental impact.

We saw subjects attempt to impose a degree of coherence on their verbalizations. This can be explained partly by the lack of training, though the attempts often corresponded to points in the task where the subjects had difficulty. This suggests that these sanitized utterances formed part of a feedback mechanism [1] and can be regarded as a part of the overall problem-solving strategy. Differentiating such mechanisms from simple imposition of coherence can be difficult, making it impossible to simply classify such occurrences and deal with each class as appropriate.

The structure of the tasks (knowledge acquisition with an informal time limit) emphasized the element of reuse. The protocol analysis drew out distinct evidence of this being the case, including long chains of thoughts/actions that finished successfully. The most striking example showed one subject to have previously attempted a task similar to one presented in the study. Other subjects showed apparent attempts to reuse prior experience and information, even if it was not immediately applicable. Protocol analysis also helped reveal the differences between what had been remembered or intended and what was eventually done, a distinction not always made by the subject.

The protocol is the crux of this technique, yet it is difficult to evaluate it. The true test of the protocol is in the information it captures, but without the construction of a different protocol designed for the same purpose, little evidence is available for comparing its effectiveness. Although consistent application of a protocol may be ensured by correlating with the interpretation made by a second coder, only a second protocol (or determined adherence to a particular cognitive model) can evaluate the first.

For a study like ours that does not involve the investigation of a particular model of cognition, it is difficult to construct a protocol and predict results in order to provide a measure of effectiveness. We sought to address this by using a dry run that would provide data for the initial construction of the protocol. Assuming the dry run would provide representative data, we felt that a protocol designed to code such data would be capable of coding real data. Our findings seem to support this assumption.

Back to Top


Protocol analysis provided us valuable insight into the ways our subjects sought to discover information within a dynamic environment and how they used, swapped among, navigated, and discounted sources of that information. We found evidence of subjects misunderstanding documentation and were able to ascertain what was misunderstood and how it affected not only their completed solution but also their approach to its construction.

Our experience demonstrates that protocol analysis can be used by computer scientists and software engineers to investigate the influence of human factors in software development. The effort required to learn this technique is relatively modest, and its application, though demanding of researchers’ time, does ensure a rigorous foundation for extracting empirical evidence.

Back to Top

Back to Top

Back to Top


F1 Figure 1. The experimental process.

Back to Top


T1 Table 1. Sample of think-aloud with encoded equivalent.

T2 Table 2. An interpretation of the sample data from

T3 Table 3. Applying protocol analysis.

Back to Top

    1. Ericsson, K.A. and Simon, H. Protocol Analysis: Verbal Reports as Data. MIT Press, Cambridge, MA, 1993.

    2. Glass, R.L., Vessey, I., and Ramesh, V. Research in software engineering: An analysis of the literature. Inf. Softw. Technol. 44, 8 (2002), 491–506.

    3. Goldschmidt, G. Linkography: Assessing design productivity. In Proceedings of the 10th European Meeting on Cybernetics and Systems Research. World Scientific, Singapore, 1990, 291–298.

    4. Kitchenham, B.A., Dyba, T., and Jorgensen, M. Evidence-based software engineering. In Proceedings of ICSE `04 (Edinburgh, Scotland, May 23-28). IEEE Computer Society Press, Piscataway, NJ, 2004, 273–281.

    5. Owen, S., Budgen, D., and Brereton, P. Information use in CBSE design. In Proceedings of COMPSAC `03 (Dallas, TX, Nov. 3–6). IEEE Computer Society Press, Piscataway, NJ, 2003, 406–412.

    6. Pfleeger, S.L. and Kitchenham, B.A. Principles of survey research. ACM Softw. Eng. Notes 26, 6 (2001), 16–18.

    7. Pohthong, A. and Budgen, D. Reuse strategies in software development: An empirical study. Inf. Softw. Technol. 43, 9 (2001), 561–575.

    8. Ramesh V., Glass, R.L., and Vessey, I. An analysis of research in computing disciplines. Commun. ACM 47, 6 (June 2004), 89–94.

    9. van Someren, M.W., Barnard, Y.F., and Sandberg, J. The Think-Aloud Method: A Practical Guide to Modeling Cognitive Processes. Academic Press, London, 1994.

    10. von Mayrhauser, A. and Lang, S. A coding scheme to support systematic analysis of software comprehension. IEEE Trans. Softw. Eng. 25, 4 (July 1999), 526–540.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More