Research and Advances
Artificial Intelligence and Machine Learning A game experience in every application

Tracking Contact and Free Gesture Across Large Interactive Surfaces

Built into store windows, museum exhibits, and other communal spaces, these surfaces entice even casual passersby to playfully interact with information—and each other—by knocking on the glass.
  1. Introduction
  2. Conclusion
  3. References
  4. Author
  5. Footnotes
  6. Figures

Large projection displays and video walls are increasingly seen in public spaces like shopping malls and airports worldwide. As the enabling technologies improve and decrease in price, these devices should achieve even greater popularity. At the moment, however, such displays are mainly noninteractive, merely playing uninterrupted video streams without accommodating user input. If they were responsive, entirely new types of group interaction would be possible that aren’t natural to video kiosks, their TV-size ubiquitous cousins designed mainly for individual users.

User interaction with large displays is a topic of considerable interest among researchers in the fields of computer-human interaction and ubiquitous computing [2, 4, 12] pursuing ways to distribute the user interface among a variety of portals, including handhelds, mobile and wearable devices, and large interactive surfaces, in responsive environments and augmented rooms. Many applications have been explored in professional niches, including electronic blackboards for presentations, large screens for teleconferencing, augmented interaction in business and office environments, large electronic bulletin boards in corporate water-cooler settings, interactive visualizations in design studios, and big-board displays for military and situation rooms. In contrast, most of the implementations I describe here are directed at public settings for use in casual information browsing, interactive retail, artistic, and entertainment settings.

Because their activity tends to be publicly visible, participants often become performers. The systems are intrinsically collaborative; crowds gather to watch, participate, and suggest what to do next as users interact with the displays. The applications attain a social, gamelike quality.

Although several commercially available products identify and track objects accurately across large electronic whiteboards and tablets, in order to be usable in public settings, interactive walls have to respond to bare hands and not require users to wear any kind of active or passive target. At the moment, several sensing and tracking approaches have been used to make large surfaces bare-hand interactive, many introduced in [8]. Most of them, including capacitive sensing, resistive sandwiches, light curtains, and active acoustics, are derived from touch-screen technology [11]; others are based on video tracking [6]. However, most do not scale well to very large surfaces or involve significant complication and robustness issues, especially in unstructured public and outdoor installations.

The Responsive Environments Group at the MIT Media Lab has developed several relatively simple techniques to track activity across large surfaces [8]. All are essentially retrofits, as they do not require the addition of custom-designed material or any significant infrastructure. The first of these projects, the Gesture Wall, was an interactive music installation designed in 1996 for the Brain Opera [7], a large touring interactive media production now installed at the Haus der Musik museum ( in Vienna, Austria.1

An array of four pickup electrodes placed at the corners of a projection screen receives a signal capacitively coupled from the body of a participant standing atop an antenna transmitting a 50kHz electric field. The amplitude sensed at each pickup, reflecting the proximity of the body to the corresponding receive electrode, is processed to determine a mean position used by a rulebase producing interactive music and graphics. Although the system responded well to bulk gesture, its tracking accuracy, with only four sensing electrodes, varied widely with the posture of the participant, limiting its use to abstract kinetic expression of the sort exploited by the Gesture Wall installation.

The Group’s next system [8] employed a scanning laser rangefinder placed at a corner of a large projection screen to monitor a plane just above the projection surface. As commercial rangefinders were prohibitively expensive for this project, my colleagues and I designed our own relatively low-cost, continuous-phase-shift device that could track bare hands with roughly 1-cm accuracy out to about 4 meters at 30Hz. As the laser illumination was synchronously detected, the device was insensitive to background light and accurate enough for detailed, causal interaction. The system was used in interactive music applications and graphical database interfaces shown at the Emerging Technologies exhibits at SIGGRAPH 1998 and SIGGRAPH 2000. Despite its engaging performance, the technique requires the electromechanical scanning rangefinder to be mounted in a corner at the front of the display, potentially limiting its application, especially in outdoor settings.

Our next system was an extremely simple retrofit to a large display mounted inside a single-pane window (see Figure 1). Called the Tap Tracker, its technical roots sprang from the ball-impact tracker my colleagues and I designed for the PingPongPlus [3] interactive ping-pong table; its inspiration was my desire to make a simple system enabling taps on glass walls to annoy the nearby denizens of a virtual fish tank designed by the Media Lab’s Epistemology and Learning Group. Four piezo-ceramic contact microphones are glued to the inside corners of a large glass window. Their resultant signals are monitored by a low-cost digital signal processor (DSP) producing features analyzed by a connected PC generating content that can be projected onto a screen or video wall behind the window.

Bashes cause a group of images to appear at the fist position and fly off to the edges of the screen.

When participants knock on the glass, flexural or bending waves [1] travel from the point of impact to the microphones. Measuring the wavefront’s differential time of arrival at each transducer position, the system infers the location of the originating impact. Although lower-accuracy coordinates can be determined when knocking outside the square framed by the sensors, most of our applications concentrate the interactivity and projection well within this boundary where occlusion by the opaque, 3-cm-diameter sensors and their associated cable is not an issue.

However, bending waves are highly dispersive, hence the impulse waveform launched by a knock tends to spread out as it propagates through the glass, making difficult the straightforward determination of its rising edge (hence time reference). In addition, there are many ways for people to hit glass (including knocks with a knuckle, fist, or metal ring), each producing widely varying waveforms with differing frequency content, hence different propagation velocities. Rather than try to constrain the type of knock required (essentially impossible in a public installation), we first classify the impact, then process the data with a heuristically guided cross-correlation and edge-detection procedure [5].

The resultant system is able to locate knocks across 2-meter-by-2-meter windows with resolutions of standard deviation of 2.5 cm in 5-mm glass and up to standard deviation of 4 cm in 1-cm glass [9]. Although the caliber of accuracy doesn’t enable precision pointing, it is adequate for the relatively coarse level of selection needed by the applications for which the device was intended. The system is able to produce hit coordinates within 65 ms (dominated by waveform processing in the 26MIPS DSP), thus granting essentially real-time performance. In addition to deriving the location of the hit, the impact intensity is estimated and the type of hit—knuckle knock, fist bash, or tap from metallic object—is determined, hence the content reacts in a more sophisticated manner than a simple touch-screen by providing a response that reflects perceived affect, since knocking tends to be a fairly expressive gesture.

A set of commercial products developed by the French company Intelligent Vibrations appears to operate on a similar principal; however, it also seems to exploit the ultrasound component in hard fingernail taps, and hence requires a scripted “finger flick” gesture. As mentioned earlier, our Tap Tracker device responds to and classifies any kind of knock, an important feature for running such systems in unattended public venues where it is impossible to totally constrain user gestures.

Figure 1 shows the system can use other sensors. Since the particular piezoelectric pickups and attached amplifiers mounted in the window’s corners don’t respond well to the very low frequencies present in a fist-bash, the superior low-frequency sensitivity of an attached electrodynamic pickup can be exploited to readily discriminate bash events [9]. Due to the poor impedance match to sounds not produced in the glass itself, the adhered pickups are quite insensitive to extraneous audio. However, exceptions can occur for loud, sharp sounds (like handclaps) produced near the pickups. In this case, a “veto” microphone can be placed in the air near the window. Signals inducing a strong response in both the glass and the veto microphone are then rejected as background and don’t produce false hits.

An important feature of this approach is that all sensors are mounted on the inside of the glass. Nothing is attached to the outside surface—especially relevant for outdoor installations where no hardware need be mounted on single-pane windows. No significant tracking distortion was noticed when running the system on a window when outside conditions ranged from room temperature to below freezing; as four pickups “overdetermine” the position estimate, the system self-compensates bulk changes in the wavefront propagation velocity. Depending on the glass pane’s damping characteristics, multiple hits are independently registered in a short time, say, within 100 ms of one another, allowing the system to be used by several people at once. For closer intervals or in cases where the window has a long ringdown response, the later hit is ignored; as the strikes approach simultaneity, the data from the four sensors becomes inconsistent, and all the hits are generally rejected.

We used the system for simple, in-house demonstrations in 1999 [8], developing it sufficiently for formal installations by 2001. Our first applications were in the realm of interactive art. Figure 2 (left) shows a semipermanent installation running at the Ars Electronica Center in Linz, Austria. Called the Responsive Window, it is an interactive drawing program written by Ben Fry of the Media Lab’s Aesthetics and Computation Group in which the user extrudes rotating objects by knocking on a 1-cm sheet of plate glass backed by a holographic projection screen.

Figure 2 (middle) shows Telephone Story, an installation run at New York’s Kitchen Gallery, where a user selects and launches a video clip (shot by Bay Area artist J.D. Beltran) by knocking on a particular region of a projected desktop. If the user knocks on the screen while the video is running, an image relevant to the current segment of the video appears at the knock position, rotating faster with more forceful knocks. Bashes cause a group of images to appear at the fist position and fly off to the edges of the screen.

We last ran the system on a large window (2 meters by 2 meters, 0.5-cm glass) in the Emerging Technologies exhibition at SIGGRAPH 2002. The content was based on a complex visualization called Weather, a behavior-driven environment written by Marc Downie of the Media Lab’s Synthetic Characters Group that evolved in intricate ways with each knock, as shown in Figure 2 (right). A pair of very low-power 2.4GHz Doppler motion-sensing radars mounted behind the screen detected people moving in front of the glass. The radars, modified versions of those introduced in [7], have an onboard processor that extracts three features corresponding to the net amount of motion, mean speed, and average direction of motion for the objects in their field of view. Although their spatial discrimination is coarse, they are immune to changes in light conditions and the optical characteristics of cloth. Unlike video imagers, they see directly through nonconductive walls and penetrate clothing, sensing the skin directly. The radars open up a degree of noncontact interaction as people approach the wall; in this case, motion in front of the screen generated global, nonspecific behavior in the graphics (such as rolling, scrolling, and boiling effects) in accordance with the motion characteristics. Knocking created more specific and highly localized phenomena.

People knock lightly on the glass—a common but still unusual gesture for digital interaction.

After gaining experience with the system in museum installations, we collaborated with American Greetings, a Media Lab sponsor company, in a retail application, installing the system on a large window in its New York card store at Rockefeller Center from December 2001 through February 2002, thus spanning peak customer periods of Christmas and Valentine’s Day. Figure 3 shows it in operation, with random passersby interacting. A small speaker mounted outside, near the window, provided audio prompts and narration for the interaction, otherwise all hardware (transducers, electronics, and holographic projection screen) was inside the 1-cm-thick window. The pickups were mounted at the corners of a 2-meter square, placed well away from the projection screen to avoid user distraction.

The interactive content was fairly straightforward: users could choose to watch either of two brief video clips or engage in a game of Three Card Monty in which, after three successful rounds of knocking on the correct card image, they would be invited to enter the store and receive a free greeting card. The game was a ploy to get people into the store; indeed, American Greeting’s data indicated a significant increase in store traffic while the system was running.

Back to Top


Large interactive surfaces in public spaces enable interesting applications where games and practicality converge. They are intrinsically communal, encouraging people to meet and collaborate. The simple system described here locates the position of knocks and taps across a large sheet of glass using the differential time of arrival of bending waves at four locations to make single-pane windows (common features in any city) into a large tracking surface, enabling large interactive displays that let passersby explore content at venues ranging from storefronts to museum galleries. The system requires that they knock lightly on the glass—a common but still unusual gesture for digital interaction. The required hardware—a set of contact microphones and a low-end DSP (that can effectively be replaced by a commercial multichannel audio interface, as the associated processing can be done in real time on a moderately fast PC) is all inside the glass; nothing needs to be on the outside surface.

These installations have shown that once people are invited to knock (such as with audio prompts and visual suggestion and by example, observing others), they readily take to the interface, at least until their knuckles fatigue after scores of hits. By including noncontact sensing and ranging away from the plane of the display, such systems can detect people approaching and vary their resolution or adapt their content according to user proximity.

Although the related applications all involve close-up interaction with large dynamic displays, the tracking system is appropriate for other niches, including selecting objects placed behind glass partitions. This would enable, for example, interactive museum cases, where knocking near an object would bring up text, images, audio, or video bearing related information. It could also be used in vending machines, where knocking atop a desired snack seen through the glass would prompt the machine to dispense the item. Current keypad implementations are indirect and error-prone (think how many times you’ve spent your last coins on the wrong candy bar), and buttons on museum display cases often ruin the aesthetic, especially compared to a pristine and unobtrusive knock-sensitive surface.

Back to Top

Back to Top

Back to Top

Back to Top


F1 Figure 1. Essentials of the impact tracking system.

F2 Figure 2. Interactive art applications: (left) Responsive Window; (middle) Telephone Story; and (right) Weather.

F3 Figure 3. Interactive window browsing at an American Greetings store in New York.

Back to top

    1. Cremer, L., Heckl, M., and Ungar, R. Structure-Borne Sound, 2nd Ed. Springer-Verlag, New York, 1990.

    2. Funkhouser, T. and Li, K., Eds. Onto the wall: Large displays (special issue). IEEE Comput. Graph. Applic. 20, 4 (July/Aug. 2000).

    3. Ishii, H., Wisneski, C., Orbanes, J., Chun, B., and Paradiso, J. PingPongPlus: Design of an athletic-tangible interface for computer-supported cooperative play. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'99) (Pittsburgh, PA, May 15–20). ACM Press, New York, 1999, 394–401

    4. Johanson, B., Fox, A., and Winograd, T. The Interactive Workspaces Project: Experiences with ubiquitous computing rooms. IEEE Pervasive Comput. Mag. 1, 2 (Apr.–June 2002), 67–75.

    5. Leo, C. Contact and Free-Gesture Tracking for Large Interactive Surfaces. M. Eng. Thesis, MIT Dept. of EECS and MIT Media Lab, May 2002.

    6. Martin, D., Morrison, G., Sanoy, C., and McCharles, R. Simultaneous multiple-input touch display. In Proceedings of the UbiComp 2002 Workshop on Collaboration with Interactive Walls and Tables (Gothenburg, Sweden, Sept. 29, 2002).

    7. Paradiso, J. The Brain Opera technology: New instruments and gestural sensors for musical interaction and performance. J. New Music Res. 28, 2 (June 1999), 130–149.

    8. Paradiso, J., Hsiao, K., Strickon, J., Lifton, J., and Adler, A. Sensor systems for interactive surfaces. IBM Syst. J. 39, 3/4 (Oct. 2000), 892–914.

    9. Paradiso, J., Leo, C., Checka, N., and Hsiao, K. Passive acoustic sensing for tracking knocks atop large interactive displays. In Proceedings of the IEEE Sensors 2002 Conference (Orlando, FL, June 11–14). IEEE Computer Society Press, Piscataway, NJ, 2002, 521–527.

    10. Pearson, H. Bus shelters to talk back. Nature News Service (Sept. 16, 2002); see

    11. Quinnell R. Touchscreen technology improves and extends its options. EDN 40, 23 (Nov. 9, 1995).

    12. Tandler, P., Magerkurth, C., Carpendale, S., Inkpen, K., and Scott, S., Eds. Proceedings of the UbiComp 2002 Workshop on Collaboration with Interactive Walls and Tables (Gothenburg, Sweden, Sept. 29, 2002); see

    1Video clips demonstrating the systems described here can be viewed at

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More