On the Importance of Replication in HCI and Social Computing Research

Ed Chi — Ed H. Chi - Google Research Scientist

At the recent CHI2011 conference, I was asked to serve as a panelist to discuss the issue of replication of research results. As part of this RepliCHI panel, I wrote an essay arguing for how replication isn’t just replication of experiments or rebuilding of systems, but instead it is used as an important step in building up greater understanding of a domain. Many panelist, including myself, were surprised when many people showed up at the panel (>100 people?), ready to discuss this seemingly dry academic issue. Here was my essay slightly edited:

One mainstream perspective on HCI is that it is a discipline built upon applied psychological science. Psychological science here refers to the understanding of mind and behavior, while ‘Applied’ here means that it is the application of approaches of methods, findings, models, and theories from the psychology domain. One only has to look at the CHI annual proceedings to see that it is full of borrowed methods from Experimental Psychology, a particularly approach to understanding mind and behavior based on scientific experimental methods. This approach worked well for HCI, since computers can be seen as a kind of stimuli that is not only interesting, but could augment cognition and intelligence [1].

Experimental psychology is based on the idea that, if you design the experiment and control the laboratory setting well enough, you have end up with evidences to believe that the results of the experiment will generalize. These ideas around controlled experiments of course form the basis of the Scientific Method. As part of the scientific discovery process, we ask researchers to document the methodology and results, so that it can be archived and replicated by others.

But my position is that replication is not the only goal. More importantly, if there are limitations to the study, later experiments might expand on the original experiment to examine new contexts and other variables. In these ways, the idea behind the replication and reproducibility of experiments is not just to ensure validity of the results, but it is also an essential part of the scientific dialog. After all, the reason we value research publications so much is not just because they document and archive the results of the research, but also that others might literally stand on the shoulder of giants, to reproduce *and* to build on top of the results.

Take for example, the great CHI 97 Browse Off in Atlanta which aimed to put together a number of hierarchical browsers to see which is the ‘best’. At the event, the Hyperbolic Browser [2] was the clear winner. While the event was not meant to be a controlled experiment, but it was widely publicized, especially amongst information visualization researchers. Several years later, this experiment was replicated in a laboratory setting at PARC [3] with the top 2 performing systems during the event — Hyperbolic Browser and Windows Explorer. Not just once, but twice, under different task conditions!

In the first experiment, the results were at odds with the Browse Off. Not only were there no difference between the browsers in terms of performance, it appears that subject variation had more effect on the results than any other variable.

Further analyses showed that there was an interesting interaction effect between the amount of information scent available via the interface conditions and performance, with better information scent resulting in lower retrieval task times with Hyperbolic Browser.

In the second experiment, when restricted to retrieval tasks rather than including comparison tasks also, Hyperbolic Browser was faster, and users appears to learn more of the tree structure than Explorer.

What’s interesting is the interpretation of the results suggest that squeezing more information onto the screen does not improve subject perceptual and search performance. Instead, the experiment show that there is a very complex interaction between visual attention/search with density of information of the display. Under high scent conditions, information seems to ‘pop out’ in the hyperbolic browser, helping to achieve higher performance.

The above extended example show that there are a number of fundamental problems with viewing experimental results as the end result of a line of research inquiry. Instead, they are often the beginning. Further experiments often shed light on the complex interaction between the mind/behaviors of the user and the system. Replication/duplication of results and further research efforts examining other contexts and variables are not just desirable, but it is an important part of the whole scientific exercise.

References

[1] Augmenting Human Intellect: A Conceptual Framework. 1962. http://www.dougengelbart.org/pubs/augment-3906.html

[2] Lamping, J., R. Rao, and P. Pirolli, A focus + context technique based on hyperbolic geometry for visualizing large hierarchies, in CHI 95, ACM Conference on Human Factors in Computing Systems. 1995, ACM: New York

[3] Peter Pirolli, Stuart K. Card, and Mija M. Van Der Wege. 2000. The effect of information scent on searching information: visualizations of large tree structures. In Proceedings of the working conference on Advanced visual interfaces (AVI ’00). ACM, New York, NY, USA, 161-172. http://doi.acm.org/10.1145/345513.345304