Sign In

Communications of the ACM

Latin America Regional Special Section: Hot Topics

Contextualized Interpretable Machine Learning for Medical Diagnosis

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
computer tablet displays medical graphs

Credit: Robert Kneschke

The evolution of artificial intelligence and related technologies have the potential to drastically increase the clinical importance of automated diagnosis tools. Putting these tools into use, however, is challenging, since the algorithm outcome will be used to make clinical decisions and wrong predictions can prevent the most appropriate treatment from being provided to the patient. Models should not only provide accurate predictions, but also evidence that supports the outcomes, so they can be audited, and their predictions double-checked. Some models are constructed in such a way they are difficult to interpret, hence the name black-box models. While there are methods that generate explanations for generic black-box classifiers,9 the solutions are usually not tailored for the needs of physicians and do not take any medical background into consideration. Our claim, in this work, is that explanations must be based on features that are meaningful to physicians. We call those contextual features.

In order to improve accuracy and transparency in automatic ECG analysis, we propose generating explanations based on contextual features for ECG diagnosis.

Deep neural networks are relevant examples of black-box models. These models, trained on large real datasets, have demonstrated the ability to provide extremely accurate diagnosis.1,5 However, these large and complex models of stacked transformations usually do not allow easy interpretation of the results. Despite their potential to transform healthcare and clinical practice,3,8 there are still significant challenges that must be addressed. For instance, it is commonplace that neural network results are brittle either because it learns to solve the task in unwanted ways or because even small perturbations may have a huge impact on its outcome.2

Cardiovascular diseases are the leading cause of death worldwide7 and the electrocardiogram (ECG) is a major exam for screening cardiovascular diseases (see Figure 1). Our immediate application scenario is the Telehealth Network of Minas Gerais (TNMG), that serves more than 1,000 remote municipalities in six Brazilian states. More than 2,000 ECGs are examined daily and reported by cardiologists using a Web-based system. Our goal is to empower those physicians through not only accurate, automatically generated disease predictions, but also explanations that ease their understanding of the model outcome.

Figure 1. ECG samples for some common diseases.

Classical methods for automated ECG analysis, such as the University of Glasgow ECG analysis program,4 employ a two-step approach: First extracting the main features of the ECG signal using traditional signal processing techniques and then using these features as inputs to a classifier. Deep learning presents an alternative to this approach, since the raw signal itself is used as an input to the classifier, which learns from examples to extract the features, as presented in our previous work.6 In the classical two-step approach, the models are built on top of measures and features that are known by the physicians, making it easier to verify and to understand the algorithm decisions as well as to identify sources of algorithmic mistakes. Such transparency is lost in "end-to-end" deep learning approaches.

In order to improve accuracy and transparency in automatic ECG analysis, we propose generating explanations based on contextual features for ECG diagnosis (Figure 2). To the best of our knowledge, this is the first work that generates explanations tailored to physicians' needs for ECG black-box algorithms, including end-to-end classification models. The proposed method (Figure 3) uses a noise-insertion strategy to quantify the impact of the ECG intervals and segments on the automated classification outcome and to generate meaningful features to the user. These intervals and segments and their impact on the diagnosis are commonplace to cardiologists, and their usage in explanations enables a better understanding of the outcomes and also the identification of sources of mistakes. We applied our method to generate an explanation to the predictions of the deep learning model presented in Ribeiro et al.6 using data from TNMG. Finally, we assessed our approach by analyzing the explanations generated in terms of their interpretability and robustness.

Figure 2. Comparison between methods.

Figure 3. Methodology.

While diagnosing some diseases, cardiologists analyze the ECG (depicted in Figure 4) and apply rules to diagnosis. For instance, the criteria for Left Bundle Branch Block (LBBB) is: QRS duration greater than 120 milliseconds; absence of Q wave in leads I, V5 and V6; monomorphic R wave in I, V5 and V6; and ST and T wave displacement opposite to the major deflection of the QRS complex. Our explanation consists of both a textual and a visual component in order to better explain to cardiologists in terms and criteria familiar to them. In Figure 5, we show an explanation for six classes of diseases based on how much impact the noise has over the different features, quantifying how the different criteria affect the model predictions.

Figure 4. ECG-based diagnosis.

Figure 5. Each explanation has a visual and textual component. The visual component is a horizontal bar graph where each bar represents a feature. The colored bar is the mean value of the impact of the associated feature on the classifier and the error bar at the right end is the standard deviation. An explanation as significant when the mean and the standard deviation are above the threshold vertical dotted line. The textual component is generated automatically.

In summary, improving transparency and accountability of deep learning models is an important step toward utilization. Incorporating such models in the TNMG pipeline may improve the quality of its service and have a positive impact in the treatment of many patients. In countries such as Brazil, where the population is spread across large portions of the territory and access to physicians, in particular specialists, is still an issue, we believe our proposal is an example of research-intensive work that opens new opportunities for the massive and responsible adoption of social impacting initiatives.

Acknowledgment. This work is partially supported by the Brazilian agencies CNPq, CAPES and Fapemig, by the projects MASWEB, INCT-Cyber and Atmosphere, and by the Google Research Awards for Latin America program.

Back to Top


1. Bejnordi, B.E. et al. Diagnostic assessment of deep Learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 22 (Dec. 2017), 2199;

2. Goodfellow, I.J., Shlens, J. and Szegedy, C. Explaining and Harnessing Adversarial Examples, Dec. 2014; arXiv:1412.6572.

3. Hinton, G. Deep learning—A technology with the potential to transform health care. JAMA 320, 11 (Sept. 2018), 1101–1102;

4. Macfarlane, P.W., Devine, B. and Clark, E. The University of Glasgow (Uni-G) ECG Analysis Program. Computers in Cardiology (2005), 451–454;

5. McKinney, S.M. International evaluation of an AI system for breast cancer screening. Nature 577, 7788 (Jan. 2020), 89–94;

6. Ribeiro, A.H. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nature Commun. 11 (2020).

7. Ribeiro, A.L.P., Duncan, B.B., Brant, L.C.C., Lotufo, P.A., Mill, J.G. and Barreto, S.M. Cardiovascular health in Brazil: Trends and perspectives. Circulation 133, 4 (2016), 422–433.

8. Topol, E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. Hachette, U.K., 2019.

9. Zaki, M.Z. and Meira Jr., W. Data Mining and Machine Learning: Fundamental Concepts and Algorithms (2nd ed.). Cambridge University Press, 2020.

Back to Top


Wagner Meira Jr. is a professor in the Department of Computer Science at the Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.

Antonio L. P. Ribeiro is a professor in the Department of Medical Clinic at the Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.

Derick M. Oliveira is a Ph.D. student in the Department of Computer Science at the Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.

Antonio H. Ribeiro is an associate researcher in the Department of Computer Science at the Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.

©2020 ACM  0001-0782/20/11

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.


No entries found