acm-header
Sign In

Communications of the ACM

ICT Results

Software: Running Commentary For Smarter Surveillance?


Hermes, messenger of the Greek gods

Credit: Mystic Medusa

Cutting-edge surveillance software that automatically detects human motion, behavior and facial expressions, generates a running commentary of what's happening and re-enacts events virtually could soon be helping police and security services.

The system, developed by a team of researchers from five European countries, provides a comprehensive and innovative solution to the information overload facing police forces and public and private security services.

With millions of surveillance cameras across Europe capturing what happens on city streets and major meeting points like airports, malls and buildings, monitoring and analyzing these video streams has become an epic task. Technology such as automated motion detection, object tracking and behavior analysis has eased some of the burden, but a gap continues to exist between what surveillance cameras see and how it can be described and interpreted in terms a human operator or computer can understand. Bridging this semantic gap is important because meaningful descriptions of events can trigger meaningful automated or human responses that could spot a crime in progress, prevent injuries or save lives.

"The semantic gap in the analysis of human behavior from digital video is huge," says Andrew Bagdanov, a senior researcher at the Computer Vision Centre (CVC) of the Universitat Autonoma in Barcelona, Spain. "Most surveillance software operates only at a very low level . . . in order to bridge the gap it is necessary to build an artificial cognitive solution that operates at a much higher level, which is able to analyze footage, describe the events taking place and reason about what is going on."

Thanks to research carried out by a multidisciplinary team working in the HERMES project, an EU-funded initiative, such a solution now exists. HERMES, named after the messenger of the gods in Greek mythology, stands for Human Expressive Representations of Motion and their Evaluation in Sequences.

The state-of-the-art HERMES system consists of a scalable, flexible platform, integrating software components that not only detect events in real time as they are filmed by surveillance cameras but also describe them semantically and react to them intelligently. It operates at three levels: tracking the movement of people and objects; monitoring the behavior of people; and, in the case of high-resolution footage taken at close quarters, detecting changes in facial expression.

Monitoring Motion, Detecting Behavior

Whereas most surveillance video tracking systems operate in a state of perpetual surprise, dumbly following a single target and struggling to reacquire it if lost, the HERMES tracking technology functions more like a human monitoring the same scene, making predictions about where a target is heading and also reacting to any other events in the scene that appear unusual.

"Say two people meet in the street and start to run. The system will detect the change in behavior and start to follow them. It could alert a human operator if the pattern of behavior seems suspicious . . . such as if it appears someone has had their bag stolen," says Bagdanov, who oversaw the project's validation activities.

Using a combination of static cameras, which provide an overall view of an area, and Pan-Tilt-Zoom cameras, so-called "eyes in the sky" that zoom and move to follow a target, the system is able to automatically track a person as they walk down a street or even across an entire city.

This smarter tracking is made possible by the HERMES researchers' approach to solving the semantic gap. Instead of tracking objects in a scene directly—the current, low-level approach—the HERMES platform generates a running commentary in natural language text of what is going on: "A pedestrian labeled 'Actor 3' appears in the field of view," "He moves on the southeastern sidewalk," "Actor 3 stands nearby another pedestrian" etc.

This semantic information, generated automatically in real time, is then used by the artificial cognitive system to reason about events and behaviors of interest. Human operators, in turn, receive a more accurate description of what is occurring, and can more easily and quickly retrieve specific scenes from a recording with a simple text-based search. The current version of the system can generate text in six different languages.

Virtual Scenes, New Angles

Generating semantic information from video in this way also enabled the HERMES researchers to develop another powerful tool as part of the system: a virtual 3D representation of the scene.

"The virtual graphical representation of the footage is generated in near real time and can be displayed alongside the actual video stream. Because it is virtual and 3D it allows operators to look at events from angles they would otherwise be unable to," Bagdanov notes.

The outdoor applications for the system—focused, primarily, on motion and behavior detection—were tested extensively in Barcelona earlier this year, where cameras attached to the CVC building were used to monitor events in the street outside.

"The system held up better than we expected, though when there are more than 20 people in the scene it starts to break down. This, however, is a problem that can be solved with more cameras and more computer processing power, so the system should scale well," Bagdanov says.

Indoor applications of the system were developed and tested at ETH Zurich in Switzerland and Oxford University in the United Kingdom, both project partners. There, the facial expression recognition component showed the potential for the system to detect different emotions, especially powerful ones such as fear or anger.

Though facial expression detection does have security applications, Bagdanov notes that the technology could prove useful in research on human-computer interaction, for example, to make communication between humans and robots more natural.

"The HERMES project focused principally on developing technology for security and surveillance, but our research has uses in many other fields, not least human-computer interaction, natural language processing, multimedia communications and semantic annotation and search," the project technical coordinator says.

He notes that several project partners are developing commercial applications based on the work carried out in HERMES, and that one or more spin-off companies are under consideration.

The HERMES project received funding from the ICT strand of the EU's Sixth Framework Program for research.

From ICT Results


 

No entries found