Sign In

Communications of the ACM

Research highlights

Technical Perspective: A Whitebox Solution for Blackbox-Like Behaviors


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

Deep neural networks (DNNs) are rapidly becoming an indispensable part of the computing toolbox, with particular success in helping to bridge the messy analog world into forms we can process with more conventional computing techniques (image and speech recognition, as some of the most obvious examples).

The price we pay, however, is inscrutability: DNNs behave like black boxes, without clearly explainable logic for their functioning. Admitting for the moment that most complex software systems are also approximately impossible to fully reason about, we have—and continue to develop—methods for formally reasoning about and extensively testing critical components. Almost nothing equivalent exists for DNNs. This is particularly worrying precisely because of the power of DNNs to allow us to extend computing into domains previously inaccessible. In at least one area of medical diagnostics—identifying diabetic retinopathy—DNN-based approaches already match expert human performance, but we have little experience yet to help us understand what kind of bugs those systems may fall prey to when deployed in the real world.


I often tell students to keep an eye out for the papers in an area that everyone else claims to have beaten: Those are the papers that stimulated other researchers. DeepXplore will be such a paper.


DeepXplore brings a software testing perspective to DNNs and, in doing so, creates the opportunity for enormous amounts of follow-on work in several ways. Much of the prior work in finding errors in DNNs focused on finding individual adversarial modifications of images, but without the explicit focus on a diversity of computational paths taken by the DNN to achieve them. The metric introduced in DeepXplore—neuron coverage—is an analogue of the code coverage metric traditionally used in software testing. This metric has utility beyond the techniques used in DeepXplore; security bug hunting, for example, has found coverage-guided fuzzing to be a powerful and effective technique, and the neuron coverage metric and its derivatives can enable similar approaches in the DNN context.

I often tell students, when first starting to learn about research, that they should keep an eye out for the papers in an area that everyone else claims to have beaten: Those are the papers that stimulated other researchers. DeepXplore will be such a paper. Its specific metrics and constraints on example generation are unlikely to be the final word in DNN testing, but the work that follows will exist because of researchers seeing these ideas and trying to improve upon them. The core framework from DeepXplore will likely endure: Establish an effective coverage metric based upon the numerical values obtained by the activations of the neural network and use a constrained search procedure to maximize coverage with respect to that metric.

Back to Top

Author

David G. Andersen is a professor in the computer science department at Carnegie Mellon University, Pittsburgh, PA, USA, and is CTO of BrdgAI.

Back to Top

Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/3361566


Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.


 

No entries found