Sign In

Communications of the ACM

India Region Special Section: Hot Topics

Toward Explainable Deep Learning

workers with machinery surround an oversized robotic head, illustration

Credit: NITI Aayog

Deep learning (DL) models have enjoyed tremendous success across application domains within the broader umbrella of artificial intelligence (AI) technologies. However, their "black-box" nature, coupled with their extensive use across application sectors—including safety-critical and risk-sensitive ones such as healthcare, finance, aerospace, law enforcement, and governance—has elicited an increasing need for explainability, interpretability, and transparency of decision-making in these models.11,14,18,24 With the recent progression of legal and policy frameworks that mandate explaining decisions made by AI-driven systems (for example, the European Union's GDPR Article 15(1)(h) and the Algorithmic Accountability Act of 2019 in the U.S.), explainability has become a cornerstone of responsible AI use and deployment. In the Indian context, NITI Aayog recently released a two-part strategy document on envisioning and operationalizing Responsible AI in India,15,16 which puts significant emphasis on the explainability and transparency of AI models. Explainability of DL models lies at the human-machine interface, and different users may expect different explanations in different contexts. A data scientist may want an explanation to help improve the model; a regulator may want the explanation to support the fairness of decision-making, while a customer support agent may want to respond accordingly to a customer query. This subjectivity necessitates a multipronged technical approach, so a suitable approach can be chosen for a specific application and user context.

Researchers across academic and industry organizations in India have explored the explainability of DL models in recent years. A specific catalyst of these efforts was the development of explainable COVID-19 risk prediction models to support decisionmaking during the pandemic over the last two years.10,12,17 Noteworthy efforts from research groups in India have focused on the transparency of DL models, especially in computer vision and natural language processing. Answering the question: "Which part of the input image or document did the model look at while making its prediction?" is essential to validate DL model predictions with human understanding, and thereby increase the trust of human users in model predictions. To this end, efforts have been developed on providing saliency maps (regions of an image a DL model looks at while making a prediction) through gradient-based19 and gradient-free methods6 in computer vision. Similar methods to provide transparency in attention-based language models13 also have been proposed. Looking forward, moving toward next-generation AI systems that can reason and strategize, Indian researchers have also addressed the integration of commonsense reasoning in language models,2 as well as obtaining model explanations using logic and neurosymbolic reasoning.1,21,22 Industry researchers in India have also led and contributed to developing practical, useful software toolkits for explainability and its use in AIOps.3,4

Our extensive efforts at IIT-Hyderabad have mirrored the need to address the explainability of DL models from multiple perspectives, to benefit different groups of users in different settings. From a post-hoc explainability perspective (methods to explain a previously trained model), one of our earliest efforts, Grad-CAM++,19 aimed to develop a generalized gradient-based visual explanation method for convolutional neural networks (CNNs) by considering pixel-level contributions from a given CNN layer toward the predicted output. This method has been widely used around the world for applications including healthcare, bioinformatics, agriculture, and energy informatics. We have since extended such a gradient-based post-hoc perspective to obtain 3D model-normalized saliency maps for face image understanding in John et al.7 Complementarily, ante-hoc interpretability methods seek to bake the capability to provide explanations, along with a prediction, into a model's training process itself. Such methods help provide accountability to a model's decision, whereas post-hoc methods may have two different modules to predict and explain, respectively, which poses challenges when accounting for responsibility in errors. Our recent work in ante-hoc interpretability provides a methodology to learn concepts (for example, furry skin or whiskers for a "cat" category) implicitly inside a CNN during training itself,20 thus providing the ability to explain model predictions using such learned concepts. We have also recently studied how such concept prototypes used to explain a model can be transferred from one model to another.9

When explaining DL models, an important consideration is whether an input variable merely correlates with the outcome, or whether the input variable caused the outcome. For example, in the use of DL models for a healthcare application to predict the risk of an adverse cardiac event, hypertension may correlate with high risk, but may not be causal by itself. Identifying the causal variable may be critical to providing actionable explanations in such risk-sensitive applications. To provide such causal perspectives in DL model explanations, we presented a first-of-its-kind causal approach in Chattopadhyay et al.5 to explain the DL model decisions through an efficient method to compute the Average Causal Effect of an input variable on output. This can help us understand the causal relationships a DL model has learned. Complementarity, we have also recently shown how one can integrate known causal domain priors into a DL model's attributions during training itself,8 so the learned model captures this domain knowledge into its input-output relationships.

The adoption and implementation of explainable AI systems face unique challenges in India (and similar nations).

Despite the multifarious efforts on explainable DL across India and the world in recent years, many questions remain: How does one truly evaluate DL model explanations? What explainability method should one use for a given problem? Are there theoretical guarantees? Should one always consider causal relationships, or does correlation have its place in input-output attribution? One path to resolving such challenges is to recognize explainability as inherently a human-machine interface artifact, and to develop standards for specifications of user requirements, thereby allowing measurable outcomes.

While the technical needs for explainable DL are like the rest of the world, the adoption and implementation of explainable AI systems face unique challenges in India (and similar nations). A key challenge is the inherent diversity of the country in terms of literacy levels, technology access/awareness, spoken languages, and user expectations. Furthermore, the implementation of explainable AI systems requires a multidisciplinary undertaking that brings together technologists/researchers, industry practitioners, legal experts, law enforcement personnel, and policymakers—collaborations of a scale that is nascent in India at this time and needs concerted initiatives. Complete transparency of decision-making can also result in security loopholes for potential male-factors, which is a concern for India and hence needs to be carefully considered before large-scale implementations. On the brighter side, the presence of significant manpower and AI skill penetration in recent years23 puts India in a unique position to potentially be a torchbearer of explainable and responsible AI use, especially for developing and emerging economies across the world.

Back to Top


1. Agarwal, A., Shenoy, P., and Mausam. End-to-End Neuro-Symbolic Architecture for Image-to-Image Reasoning Tasks. CoRR; abs/2106.03121. 2021.

2. Aggarwal, S., Mandowara, D., Agrawal, V., Khandelwal, D., Singla, P., and Garg, D. Explanations for CommonSenseQA: New Dataset and Models. In Proceedings of Annual Meeting of the Assoc. Computational Linguistics. 2021.

3. Arya, V. et al. AI Explainability 360: An Extensible Toolkit for Understanding Data and Machine Learning Models. J. Machine Learning Research 21, 130 (2020), 1–6.

4. Arya, V., Shanmugam, K., Aggarwal, P., Wang, Q. Mohapatra, P., and Nagar, S. Evaluation of Causal Inference Techniques for AlOps. In Proceedings of 8th ACM IKDD CODS and 26th COMAD. 2021.

5. Chattopadhyay, A., Manupriya, P., Sarkar, A., and Balasubramanian, V.N. Neural Network Attributions: A Causal Perspective. In Proceedings of Intern. Conf. Machine Learning. June 2019.

6. Desai, S. and Ramaswamy, H. G. Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization. In Proceedings of the IEEE Winter Conf. Applications of Computer Vision. 2020.

7. John, T.A., Balasubramanian, V.N., and Jawahar, C.V. Canonical Saliency Maps: Decoding Deep Face Models. IEEE Trans. Biometrics, Behavior, and Identity Science 3, 4 (Oct 2021), 561–572.

8. Kancheti, S.S., Reddy, A.G., Sharma, A., and Balasubramanian, V N., Matching Learned Causal Effects of Neural Networks with Domain Priors. In Proceedings of Intern. Conf. Machine Learning. Jul 2022.

9. Keswani, M., Ramakrishnan, S.R., Nishant, and Balasubramanian, V.N. Proto2Proto: Can You Recognize the Car, the Way I Do? In Proceedings of IEEE/CVF Conf. Computer Vision and Pattern Recognition. Jun 2022.

10. Khincha, R., Soundarya, K., Tirtharaj, D., Lovekesh, V., and Ashwin, S. Constructing and Evaluating an Explainable Model for COVID-19 Diagnosis from Chest X-rays. 2020; arXiv:2012.10787.

11. Lecue, F., Guidotti, R., Minervini P., and Giannotti, F. On Explainable AI: From Theory to Motivation. Industrial Applications and Coding Practices (Apr. 27, 2022);

12. Malhotra, A., Surbhi, M., Puspita, M., Saheb, C., Kartik, T., Mayank, V., Richa, S., Santanu, C., Ashwin, P., and Anjali, A. Multi-task Driven Explainable Diagnosis of COVID-19 Using Chest X-ray Images. Pattern Recognition, 122 (2022).

13. Mohankumar, A K., Nema, P., Narasimhan, S., Khapra, M.M., Srinivasan, B.V., and Ravindran, B. Towards Transparent and Explainable Attention Models. In Proceedings of Annual Meeting of the Assoc. Computational Linguistics (2020), 4206–4216. 2020.

14. Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2nd ed.). 2022;

15. NITI Aayog. Responsible AI, Approach Document for India: Part 1-Principles for Responsible AI;

16. NITI Aayog. Responsible AI: Approach Document for India: Part2 – Operationalizing Principles for Responsible AI;

17. Pandianchery, M.S., and Vinayakumar, R. Explainable AI Framework for COVID-19 Prediction in Different Provinces of India. 2022; arXiv:2201.06997.

18. Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., and Müller, K.R. (Eds.). Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Nature 11700, (2019).

19. Sarkar, A., Chattopadhyay, A., Howlader, P., and Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks. In Proceedings of IEEE Winter Conf. Applications of Computer Vision, Mar. 2018.

20. Sarkar, A., Sarkar, A., Vijayakeerthy, D., and Balasubramanian, V N., A Framework for Learning Ante-hoc Explainable Models via Concepts. In Proceedings of IEEE/CVF Conf. Computer Vision and Pattern Recognition, June 2022.

21. Srinivasan, A., Bain, M., and Baskar, A. Learning Explanations for Biological Feedback with Delays Using an Event Calculus. Machine Learning (Aug. 2021), 1–53.

22. Srinivasan, A., Vig, L., and Bain, M. Logical Explanations for Deep Relational Machines Using Relevance Information. J. Machine Learning Research 20 (2019), 1–47.

23. Stanford University HAI. Artificial Intelligence Index Report 2021;

24. Zhang, Y., Tio, P., Leonardis, A., and Tang, K. A Survey on Neural Network Interpretability. IEEE Trans. Emerging Topics in Computational Intelligence, 2021.

Back to Top


Vineeth N. Balasubramanian is an associate professor at the Indian Institute of Technology, Hyderabad in Kandi, India.

©2022 ACM  0001-0782/22/11

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: