The past few years have seen an incredible rise in the use of smart systems based on artificial neural networks (ANNs), owing to their remarkable classification capability and decision making comparable to that of humans. Yet, as shown in Figure 1, the addition of even a small amount of noise to the input may trigger these networks to give incorrect results.13 This is an alarming limitation of the ANNs, particularly for those deployed in safety-critical applications such as autonomous vehicles, aviation, and healthcare. For instance, consider a self-driving car using an ANN to perceive traffic signs as shown in Figure 2; the correct classification by the ANN in noisy real-world environments is crucial for the safety of humans and objects in the vicinity of the car.
Figure 1. Magnitudes of image input and the noise applied to it. The addition of noise causes the input previously classified correctly to be misclassified by the trained network.
Figure 2. The addition of small, imperceptible noise to traffic signs makes the trained network provide incorrect output classification.5
The standard design cycle of an ANN involves training the network on clean data, tuning the network's hyperparameters based on clean data, and ensuring acceptable accuracy of the trained network using clean data. However, noise is a ubiquitous component of the real world.
The impacts of noise stretch beyond the correct/incorrect response to noisy inputs.
This means a trained ANN deployed in a real-world application is fed with noisy data, to which the ANN may provide an unacceptable response from a wide spectrum of unlikely responses. This calls for a better understanding of the impacts (that is, the possibility of unlikely responses) of noise on the trained ANN, before it is deployed in practical applications.12
Toward this end, this article provides an overview of the state of the art and its limitations for analyzing the impacts of noise on ANNs. It also provides an effective framework for identifying the vulnerabilities of trained ANNs pertaining to noise, and consequently discusses the impacts of noise on the ANNs' performance.
The existing literature, including both empirical and formal efforts, have been directed to observe the response of trained ANNs to different noise bounds.3,5 However, given the completeness of the formal methods,3,4,11,14 they are of particular interest to provide reliable guarantees for the behavior of ANNs under noise.9,13,14 Among these formal methods, the use of linear programming and satisfiability (SAT) solving prevail for ANN analysis. The linear programming uses an ANN model given as a set of linear constraints and employs optimization for property verification.7,9,10 On the other hand, the SAT solving-based works aim to identify a satisfying solution to erroneous behavior by the ANN, using network models often expressed in conjunctive normal form (CNF).3,14
However, a major limitation of these works is the consideration of robustness as the only impact noise has on trained ANNs. Additionally, even though the constant use of linear programming and SAT solving has significantly improved the state of the art for the formal analysis of trained ANNs, formal methods like model checking remain a scarcely explored area for ANN analysis. This work aims to address the aforementioned limitations by leveraging model checking to verify the various impacts of noise on ANNs, stretching beyond robustness.
As hinted earlier, the impacts of noise stretch beyond the correct/incorrect response to noisy inputs. Our work aims to explore the vast variety of ANNs' responses to noise, including checking robustness, identifying noise tolerance of the network, discovering an underlying bias in the network to certain output classes, and also studying the sensitivity of individual input nodes. To achieve this, we propose the use of the formal analysis framework FANNet,11 as summarized in Figure 3, which takes a systematic approach to analyzing the various responses of ANNs to noise. It leverages the automated operation of a model checker and, depending on the type of model checker used, can provide qualitative and/or quantitative results for the analysis.
Figure 3. Overview of the formal analysis framework FANNet. The framework writes a formal model for the trained ANN, verifies the ANN's properties, and identifies the noise tolerance of the ANN.
To achieve its targets, the framework uses the architectural details and the trained parameter values to define a formal ANN model. The model is validated (p0) using seed inputs from the (labeled) testing dataset; that is, the results from the formal ANN model and the actual trained network must be identical. Predefined noise bounds are provided to the framework, using which, the model checker non-deterministically selects incident noise vectors for the formal analysis. Properties defined in (probabilistic) temporal logic then are verified for the validated ANN model, with seed inputs under the incidence of bounded noise. These properties include:
Given an ANN f: X→L(X), the addition of small noise n to a correctly classified input x ∈ X with classification label L(x), should not change the output classification of x.
Given an ANN f: X→L(X), f is said to delineate training bias if the addition of small noise n to a correctly classified input from output class B,Xb ∈XB is more likely to be misclassified by f than a correctly classified input from output class A,Xa∈XA.
Given an ANN f: X→ηL(X), and Xi is an arbitrary input node of the network, the node is said to be insensitive if the addition of small noise η to the node, for a correctly classified input x∈X with classification label L(x), that is, f(x\xi,xi+η) does not make the ANN misclassify the input x.
Additionally, an iterative noise reduction loop is also deployed in FANNet, which identifies the maximum noise nmax, in the presence of which the trained ANN does not exhibit any misclassification. This provides the noise tolerance bounds of the network.
To demonstrate the impact of noise on a real-world dataset, we trained a single hidden-layer, ReLU-based fully connected binary classifier on Leukemia dataset2 to the training and testing accuracies of 100% and 94.12%, respectively. The following results were obtained using FANNet deploying the Storm model checker,1 on AMDRyzen Threadripper 2990WX processors running the Ubuntu 18.04 LTS operating system. It must be noted that the FANNet is independent of the choice of model checker used. The use of a qualitative model checker like NuXMV, as done in our prior work,11 will essentially provide a binary (that is, SAT or UNSAT) result. On the other hand, the employment of a quantitative tool like the Storm model checker provides more precise results (that is, the exact probability of how likely it is for the property to hold for the given network).
Figure 4 shows the probability of correct classification by the ANN under the influence of noise. As expected, the probability of correct classification decreases with increasing noise to the seed inputs, indicating the decreasing robustness of the network. The change in probability is apparent, however, only once the incident noise increases beyond the noise tolerance of the network. Figure 4 indicates this decrease can be attributed to the low classification accuracy of Class B. Even a considerably large noise does not trigger incorrect classification for Class A, hence depicting a training bias in the network. This phenomenon can also be observed visually in Figure 5, where the addition of noise (that is, z-axis) is seen to cause misclassifications of inputs from Class A (blue points on the graph) but not from Class B (red points on the graph). In addition, Figure 6 demonstrates the varying sensitivities of individual input nodes since the increase in noise reduces the correct classification probability for certain (sensitive) input nodes more than the others.
Figure 4. The effects of increasing noise on the probability of correct classification of inputs by the trained ANN.
Figure 5. Application of noise (z-axis) to inputs from Class A and B (x- and y-axis, respectively) results in inputs from Class A being misclassified significantly more compared to Class B, indicating a bias in the trained network.
Figure 6. The effects of increasing noise on the probability of correct classification of individual input nodes: different nodes may have different sensitivities to incident noise.
As observed in the aforementioned results, noise impacts trained ANNs in numerous ways. Robustness of networks is generally lower for ANNs fed with noisy inputs. Noise tolerance, on other hand, is an essential (constant) attribute of a trained ANN that can aid during system design by providing the acceptable noise level to ensure the robustness of the ANN. Training bias is a stealthy vulnerability of the ANN, which may grow under noisy input and may lead the ANN to provide incorrect responses in real-world applications. In addition to these, the analysis of individual input nodes may provide useful insights to the working of a trained ANN and pave a new way towards research on the interpretability of trained ANNs.
The findings from the framework can also be used to improve the training of ANNs. The knowledge of existing training bias in the dataset could be used for dataset resampling and bias-awareness training.6 Likewise, the noise tolerance and sensitivity of input nodes may be leveraged by multi-objective training algorithms to provide robust ANNs, in addition to providing highly accurate ANNs with optimal network architecture.8
Despite being an essential component of the real-world environment, noise is often absent in the design cycle of ANNs trained for practical systems. A vast literature covers the impact of such noise on the robustness of ANNs. However, the impacts of noise beyond robustness are rarely studied. We propose the use of the model checking-based framework FANNet to fill this gap in the state of the art. Such targeted analysis to study the impacts of noise on trained ANNs will likely improve the reliability and safety of future smart systems.
Acknowledgment. This work was partially supported by Doctoral College Resilient Embedded Systems, which is run jointly by TU Wien's Faculty of Informatics and FH-Technikum Wien.
1. Dehnert, C Junges, S., Katoen, J.-P. and Volk, M. A storm is coming: A modern probabilistic model checker. In Proceedings of the 2017 Intern. Conf. Computer Aided Verification, Heidelberg, Germany.
2. Golub, T.R et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 5439, (1999), 531–537.
3. Huang, X., Kwiatkowska, M., Wang, S. and Wu, M. Safety verification of deep neural networks. In Proceedings of the 2017 Intern. Conf. Computer Aided Verification, Heidelberg, Germany
4. Katz, G. et al. The Marabou framework for verification and analysis of deep neural networks. In Proceedings of the 2019 Intern. Conf. Computer Aided Verification, New York City, NY.
5. Khalid, F., Hanif, M.A., Rehman, S., Ahmed, R. and Shafique, M. TrISec: Training data-unaware imperceptible security attacks on deep neural networks. In Proceedings of the 2019 Intern. Symp. On-Line Testing and Robust System Design.
6. Li, Y. and Vasconcelos, N. REPAIR: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF Conf. Computer Vision and Pattern Recognition (Long Beach, CA, USA, 2019).
7. Lin, W. Yang, Z., Chen, X., Zhao, Q., Li, X., Liu, Z. and He, J. Robustness verification of classification deep neural networks via linear programming. In Proceedings of the IEEE/CVF Conf. Computer Vision and Pattern Recognition (Long Beach, CA, USA, 2019).
8. Marchisio, A., Mrazek, V., Hanif, M.A. and Shafique, M. FEECA: Design space exploration for low-latency and energy-efficient capsule network accelerators. IEEE Trans. Very Large Scale Integration Systems 29, 4 (2021), 716–729.
9. Müller, M.N., Makarchuk, G., Singh, G., Püschel, M. and Vechev, M. PRIMA: General and precise neural network certification via scalable convex hull approximations. In Proceedings of the ACM on Programming Languages 6, (Jan. 2022), Article 43, 1–33.
10. Narasimhamurthy, M., Kushner, T., Dutta, S. and Sankaranarayanan, S. Verifying conformance of neural network models. In 2019 IEEE/ACM Intern. Conf. on Computer-Aided Design (Westminster, U.K., 2019).
11. Naseer, M. Minhas, F., Khalid, F., Hanif, M.A., Hasan, O. and Shafique, M. FANNet: Formal analysis of noise tolerance, Training bias and input sensitivity in neural networks. In Proceedings of the 23rd Design, Automation and Test in Europe Conf. Exhibition, (Grenoble, France, 2020).
12. Shafique, M. Naseer, T. Theocharides, C. Kyrkou, O. Mutlu, L. Orosa and J. Choi, Robust machine learning systems: Challenges, current trends, perspectives, and the road ahead. In IEEE Design & Test, vol. 37, no. 2, pp. 30–57, 2020.
13. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. 2013; arXiv:1312.6199.
14. Wu, H. et al. Parallelization techniques for verifying neural networks. In Proceedings of Formal Methods in Computer Aided Design. Haifa, Israel, 2020.
©2022 ACM 0001-0782/22/11
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.
No entries found