Sign In

Communications of the ACM

ACM News

A New Way to Recognize Objects

View as: Print Mobile App Share:
Google's Project Soli uses miniature radar to detect gestures.

The St Andrews Computer Human Interaction research group at the University of St Andrews is exploring Solinteraction, a way to use radar to enable a computer to identify an object and track its movement.

Credit: Google

These days, there are a lot of ways to interact with a computer using your hands. You can use a standard mouse or trackball, or wave a gyroscopic mouse in the air. More exotic devices use cameras to track the motion of your fingers, or armbands that interpret your nerve twitches.

These approaches all require purpose-built devices. But what if you could interact with a computer by manipulating any object that happens to be available?

That's the goal of the St Andrews Computer Human Interaction research group (SACHI) at the ­niversity of St Andrews in St Andrews, Scotland. ­nder the direction of the group's chair, Aaron Quigley, the researchers are exploring what they call Solinteraction, a way to use radar to enable a computer to identify an object and track its movement.

"I don't want to have to change the object," says Quigley. "I don't want to have to put a sensor into it or go to some sort of menu setting. I literally just want be able to pick something up and have the system know what I'm touching."

The work is still in its early days, exploring ways to identify and track planar objects on a surface—stacked poker chips, playing cards, and so on. The researchers envision several real-world applications, such as a casino gaming table that could count cards and record bets itself.

The roots of Solinteraction

The Solinteraction project builds on Google's Advanced Technology and Projects (ATAP) Soli sensor, an integrated, low-power radar system on a custom-built chip that incorporates both the transmitting and receiving antennae. The intention in developing Soli was to enable touchless computer interactions by tracking finger motions.

In 2015, Google made systems available in developer kits, and the SACHI group applied for and obtained one. The group's goal was to explore one-handed interaction techniques with smartphones or other handheld devices. "Let's say you're holding a coffee mug in one hand, and your reach with your thumb is not sufficient for you to be able to interact with it with your other hand," says Quigley. "The manufacturers would like you to put down the mug, but we don't want to do that, because sometimes your second hand is doing something quite important—holding a rail on a bus or train or something like that."

The SACHI team's first idea was to place the Soli against the user's wrist and use it to detect blood flow and skeletomuscular movements. They discovered such movement overloaded the device with signal, but at the same time they could determine the primary characteristics of the object. "We noticed that as we were placing different wrists on the device, there were variations in the signal, but each signal became very stable. Those two characteristics allowed us a space to develop Radar Cat," a Soli-based device for recognizing and classifying objects and materials in real time.

Radar Cat uses supervised machine learning to categorize the objects it scans. However, the group found that as they rotated objects, the radar signature would shift, so the pattern-matching algorithms did not work well. The Solinteraction project is an effort to extend Radar Cat principles to support tangible interactions by enabling the computer to not simply recognize static materials and objects, but also to track their movements.

How it works

Quigley points out that radar sensing is a "very noisy proposition," and that it's easy to overstate what it can actually do. "We have to be very careful to say that this is really focusing on planar objects that appear in stacks," he says. The team uses the sensor to track the order of the stacks, and how objects are added to or removed from them.

The Soli sensor incorporates two emitting elements and four receiving elements. The emitting elements are patch radars—narrowband, wide-beam, two-dimensional antennae—with the ability to change the power distribution to the two emitters. That enables beamforming, so "You can actually direct the antenna in different directions," says Quigley.

To interpret the radar signals, the team first started with the random forest machine learning technique they had used with Radar Cat. (The random forest approach relies on a progression of decision trees.) However, that had limitations; "with supervised machine learning, we can put down two cards or 10 cards or three paper coasters," explains Quigley, "but what we can't do is put down an egg timer filled with sand, because the amount of sand is going to be changing. The technique to actually identify how much sand is inside of a container needs to take a different approach."

The researchers ended up using the random forest approach "to kind of get us into the quick bucket," says Quigley. After combining the random forest approach with modified supervised machine learning, "By the end we had about 700 features, all with the intention that they should be computationally efficient, because we imagine that the sensing will eventually need to run on a watch or an embedded device or something like that."

Potential applications

The SACHI team identified six "sensing modes": count, order, identity, orientation, movement, and distance. Their initial work determined that they can recognize objects of varying shapes, but of the same material (cutlery, for example), as well as even more similar objects, like different credit cards or dominoes.

The researchers propose multiple ways those six modes can be used together in potential interactions. For example, by combining counting and ordering, a small number of objects can provide a large range of input options—stacking one to three chips in different orders affords 15 different possible inputs. Extending those ideas, they propose several application scenarios in addition to the gaming table, including educational tools in which tokens could represent mathematical operations; verification of the subassemblies of a complex model before final construction; retail or dining scenarios in which customers can position a token to choose an item or place an order, and smart home situations in which moving a lamp could control an entire room's lighting, or opening a book to a certain page could bring up related content on the computer screen.

"There are three ways to think about it," says Quigley. "First is as a sensor in the environment. Let's say you had one of those Google Home devices, and it could know what objects were on or in proximity to it—that's one type of input. A second type of input is where it's in a phone, and you pick up your phone and gesture at it from a distance.

"But one of my visions is that it's in a watch," he continues. "Because it's on your wrist, as your hands are touching things in the environment, it can know, 'oh, you've picked up a mug' and have an interface appear. It's blending reality. You have this digital information and you have this physical world. One important characteristic to be able to stitch them together is to be able to recognize the objects that you're interacting with."

Future developments

The limited work Quigley's group has done so far has not stopped them from envisioning scenarios that could become possible with more powerful and ubiquitous sensors. Experimenting to see if radar can differentiate among liquids of differing compositions suggests that a system could eventually determine the nutritional composition of food and drink. A sensor that can "read" a person's skin could supplement existing forms of biometric recognition. And putting Soli-like sensors into clothing could enable location awareness through identification of the materials in a room.

"­sing it on the go and recognizing different new objects—that's far in the future," says Quigley. "But imagine if augmented reality becomes commonplace. It means that the physical world needs to interconnect. That's my mental model: to have the entire physical world be recognizable easily, and then programmable."

Jake Widman is a San Francisco, CA-based freelance writer focusing on connected devices and other Smart Home and Smart City technologies.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account