People internet surfing in coffee shops may soon be waving their hands in front of their laptops (if they aren’t already), thanks to a new type of user interface making its way into the mainstream. In July, Leap Motion of San Francisco, CA, introduced a gesture-based controller; the device plugs into a USB port and sits in front of the computer, projecting a cone of infrared light that it uses to detect hand and finger positions with a pair of CCD cameras.
Other companies are also experimenting with new interfaces. Apple, for instance, was recently granted a patent on a system for recognizing gestures made above the surface of a touchscreen. A touch sensor in the screen reads the drop-off in capacitance as a finger moves away from the surface, and uses that drop-off to measure proximity. That allows the user to add a pulling motion to the standard swipe and pinch motions of touchscreens, and makes it possible to manipulate a virtual 3D object. Apple envisions the technology being used in computer-aided design systems, according to the patent.
That scene in the 2002 movie Minority Report, where Tom Cruise manipulates data on a series of large screens with waves of his hands, is moving from science fiction into reality, as computer scientists develop new types of user interfaces that go beyond the classic mouse and keyboard, from touch and voice to tracking gestures or detecting a user’s gaze.
John Underkoffler, chief scientist at Oblong, of Los Angeles, CA, designed the computer interface for that movie scene, and his design is making its way into the real world. In Oblong’s Boston office, Carlton Sparrell stands in front of three large display screens and makes a finger pistol with his right hand, index finger pointed straight, thumb cocked upright above it. Where he’s pointing, a cursor appears and moves as his hand moves. He hovers over an object and drops his thumb to select it, pulls it over to an adjacent screen, and lifts his thumb again, releasing it.
When Sparrell, Oblong’s vice president of product development, makes a circle with his thumb and forefinger, the objects on the screen form a circle. Swiping his hand to the left or right scrolls through a list of files. Pushing toward the screen shrinks an image while pulling back enlarges it. When he puts both arms up in an “I surrender” gesture, the screen resets to where it started.
Underkoffler says gestural input has the potential to have as significant an impact on the way humans interact with computers as when Apple popularized the graphical user interface with the introduction of the Macintosh in 1984. “There’s hasn’t been a really big breakthrough in user interface thinking or user interface technology in 30 years,” he says. “We’re still stuck with the Mac interface.”
Oblong’s main product is g-speak, its gestural recognition software platform. In the setup Sparrell demonstrated, he wore black gloves with small tabs on the fingers and the back of the hands. Cameras positioned around the room track light refracted from the tabs to give the computer x, y, and z coordinates for the hands and fingers.
However, the sensor technology can vary. Another system set up in the company’s office uses the same Primesense camera found in Microsoft’s Kinect system to track hand motions without the use of gloves, though Underkoffler says infrared tracking without gloves does not yet provide high-enough resolution to be reliable. Oblong also makes a system for office conference rooms that relies on a wand, with a gyroscope and ultrasonic sensors to track its position and buttons to select items on the screen. That system lets participants in a meeting download an app and control the display from the keyboard on their laptops or the touchscreens on their tablets or smartphones.
For Leap Motion, early reviews of its controller were mixed, with the main complaints being that not many apps accepted gestural input, and that different apps used different gestures to perform similar functions. David Holz, co-founder and chief technology officer of Leap Motion, says app developers and users will eventually figure out a gestural language that makes sense. The challenge when introducing a new input device, he argues, is getting away from adapting actions enabled by a previous type of input and figuring out the unique abilities of the new device. “We’re leaving it open for experimentation,” Holz says. “Some of this is us building our own intuition.”
For instance, drop-down menus exist because they are a handy way to access files via a mouse. For gesture, however, they might not be as useful as, say, making a sweeping hand motion to slide past a set of options. A lot of the actions that seem easy with a mouse and keyboard are that way just because users have been using them for so long. Holz says gesture provides a three-dimensional interaction that more closely matches how people manipulate objects in the real world than does moving a mouse across one two-dimensional plane to affect action in another two-dimensional plane. Mouse movements are an abstraction that slow down whatever the user is trying to do, he says. “Every time we sort of go one step closer to what is going on inside our minds, it lets us remove one level of abstraction that doesn’t need to be there.”
Underkoffler calls gesture “a higher-bandwidth way of getting information in and out of the head of the human.” Anything that one would want to move around and through—a structural model of a protein, a brain scan, a panoramic photograph of a city—can be handled more smoothly with gesture than with a mouse, he says. “You can see stuff, because you can fly around it, that you never could see before.”
Others agree that gestural input can take advantage of intuitive ways of dealing with the real world. “We have a lot of intuition of how to manipulate groups of objects, and a lot of that intuition is lost when you move to a graphical user interface,” says Andy Wilson, who manages the Natural Interaction Research group at Microsoft Research. For instance, someone might arrange a group of files into a pile, the same way they would pile up folders on their desk; then they would be able to grab that pile and move it somewhere else. Such a task, he says, could happen more smoothly with gesture than with a mouse.
Among the projects Wilson is working on is the Wearable Multitouch Projector. A user wears a portable projector and depth-sensing system on his shoulder that projects an image onto available surfaces, such as walls or the palm of the user’s hand. When the user touches the projected image, the sensors detect finger motion and the computer responds appropriately, creating what is in essence a virtual touchscreen.
There are other ways to interact with computer images a user is looking at; notably, by looking at them. Tobii Technology, of Stockholm, Sweden, uses a camera to track where a user’s pupils are pointing in its Gaze system. “If you think about it, just about everything you do in front of a computer, it all starts with what you are looking at,” says Carl Korobkin, vice president of business development at Tobii.
That allows tasks to be activated by eye tracking alone. Widgets on a desktop, to provide weather reports or stock updates, can be opened with a glance. “It can tell you’re looking at it, you want some information; it provides the information,” Korobkin says.
Another way the eye tracker differs from the mouse is the ability it gives users to open hidden files just by looking off the edge of the display in a particular direction. Pages that extend beyond the screen can automatically scroll as they are read. In cases where someone needs to click on something, simply tapping a touch pad or saying “open” while the user looks at it would do.
Nobody is arguing that any one input technology will be the only one used, or that the mouse will disappear. “The future is going to be a lovely plurality, when you really will pick up the right tool for the task,” says Underkoffler. “I think the keyboard is not going to get replaced for a really long time, because it is really good at text.”
One advantage of having multiple input modalities—and systems designed to accept different types of input—is that people with physical or cognitive disabilities will have more options for interacting with computers, making them more accessible. Gaze tracking, for instance, could help people with limited hand mobility, as well as working better in situations where touch would be difficult—a sterile environment like a cleanroom or operating room, for instance. Voice might be the preferred input for blind people, as well as for drivers who don’t have their hands free.
Sensor technology is already adequate to these inputs, and is likely to improve so gloves will not be needed, says Underkoffler. And the processing demands aren’t overwhelming; g-speak runs on a $5 ARM chip. The challenge will be in developing ways to exploit the potential of new types of computer interaction. “We’re going to discover new techniques, new principles, new work patterns,” Underkoffler says, “that are going to be pretty radically transformative.”
Further Reading
Wearable Mulitouch Projector http://research.microsoft.com/apps/video/default.aspx?id=160684
Aigner, R, Wigdor, D, Benko, H, Haller, M, Lindbauer, D, Ion, A, Zhao, S, Koh, J.
Understanding Mid-Air Hand Gestures: A Study of Human Preferences in Usage of Gesture Types for HCI, Microsoft Research, November 2012
Wachs, JP, Kölsch, M, Stern, H, Edan, Y.
Vision-Based Hand-Gesture Applications Communications of the ACM, 54, February 2011
Pouke, M, Karhu, A, Hickey, S, Arhippainen, L.
Gaze tracking and non-touch gesture based interaction method for mobile 3D virtual spaces Proceedings of the 24th Australian Computer-Human Interaction Conference, Melbourne, Australia, November 2012.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment