This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and
The study aimed to demonstrate a general and simple framework for generating clinically relevant and interpretable visualizations of
To obtain improved transparency of ML, simplified models and visual displays can be generated using common methods from clinical practice such as decision trees and effect plots. We illustrated the approach based on postprocessing of ML predictions, in this case random forest predictions, and applied the method to data from the Left Ventricular (LV) Structural Predictors of Sudden Cardiac Death (SCD) Registry for individualized risk prediction of SCD, a leading cause of death.
With the LV Structural Predictors of SCD Registry data, SCD risk predictions are obtained from a random forest algorithm that identifies the most important predictors, nonlinearities, and interactions among a large number of variables while naturally accounting for missing data. The
Through a clinically important example, we illustrate a general and simple approach to increase the clinical translation of ML through cliniciantailored visual displays of results from black box algorithms. We illustrate this general modelagnostic framework by applying it to SCD risk prediction. Although we illustrate the methods using SCD prediction with random forest, the methods presented are applicable more broadly to improving the clinical translation of ML, regardless of the specific ML algorithm or clinical application. As any trained predictive model can be summarized in this manner to a prespecified level of precision, we encourage the use of simplified visual displays as an adjunct to the complex predictive model. Overall, this framework can allow clinicians to peek inside the black box and develop a deeper understanding of the most important features from a model to gain trust in the predictions and confidence in applying them to clinical care.
There is growing interest in benefiting from the predictive power of machine learning (ML) to improve the outcomes of medical care at more affordable costs. Although notable for their impressive predictive ability, ML
Many approaches have been developed to explain predictions and determine ML feature importance and effect, but they have limited adoption in realworld clinical applications [
To accelerate the integration of ML into clinical care, an emphasis on the personalization of these tools for the end user is crucial. Our work is motivated by the wellknown clinical challenge of measuring an individual's risk of sudden cardiac death (SCD), a leading cause of death with inherently complex pathophysiology that lends itself to novel approaches [
The Left Ventricular (LV) Structural Predictors of SCD Registry is a prospective observational registry (clinicaltrials.gov, NCT01076660), which enrolled 382 patients for the primary end point of an adjudicated appropriate implantable cardioverter defibrillator firing for ventricular tachycardia or ventricular fibrillation or SCD not aborted by the device [
Our ML approach is based on the random forest (RF) algorithm implemented in the randomForestSRC R package [
Patient characteristics in the Left Ventricular Structural Predictors of Sudden Cardiac Death Registry (N=382).
Variables  No. of SCD^{a} event (n=307)  Patient with SCD event (n=75)  



Age (years), mean (SD)  57 (13)  57 (12)  .75  

Male, n (%)  211 (68.7)  63 (84)  . 







White  200 (65.1)  51 (68) 




African American  99 (32)  21 (28) 




Other  8 (3)  3 (4) 



Body surface area (m^{2}), mean (SD)  1.98 (0.28)  2.05 (0.28)  .07  

Ischemic cardiomyopathy etiology, n (%)  149 (48.5)  44 (59)  .15  

Years from incident MI^{c} or cardiomyopathy diagnosis, mean (SD)  3.83 (5.18)  5.43 (5.61) 








I  64 (21)  20 (27) 




II  137 (44.6)  31 (41) 




III  106 (34.5)  24 (32) 



One or more heart failure hospitalizations, n (%)  0 (0)  19 (25.3) 





Hypertension  180 (58.6)  44 (59)  >.99  

Hypercholesterolemia  180 (58.6)  45 (60)  .93  

Diabetes  85 (28)  19 (25)  .79  

Nicotine use  133 (43.3)  44 (59) 





ACE^{e}inhibitor or ARB^{f}  275 (89.6)  66 (88)  .85  

Betablocker  288 (93.8)  68 (91)  .48  

Lipidlowering  199 (64.8)  56 (75)  .14  

Antiarrhythmics (amiodarone)  18 (6)  8 (11)  .22  

Diuretics  173 (56.4)  54 (72) 



Digoxin  50 (16)  16 (21)  .39  

Aldosterone inhibitor  80 (26)  21 (28)  .85  

Aspirin  215 (70.0)  55 (73)  .67  



Prior atrial fibrillation, n (%)  51 (17)  14 (19)  .80  

Ventricular rate (bpm), mean (SD)  73 (14)  70 (14)  .06  

QRS duration (ms), mean (SD)  118 (31)  122 (27)  .30  

Presence of LBBB^{g}, n (%)  79 (26)  14 (19)  .26  

Biventricular ICD^{h}, n (%)  90 (29)  17 (23)  .31  



Sodium (mEq/L), mean (SD)  139 (3)  139 (3)  .73  

Potassium (mEq/L), mean (SD)  4.26 (0.42)  4.27 (0.39)  .87  

Creatinine (mEq/L), mean (SD)  1.07 (0.59)  1.09 (0.33)  .81  

eGFR^{i} (mL/min/1.73 m^{2}), mean (SD)  81 (24)  80 (21)  .80  

Blood urea nitrogen (mg/dL), mean (SD)  19.62 (8.72)  20.28 (8.33)  .55  

Glucose (mg/dL), mean (SD)  120 (53)  113 (34)  .23  

Hematocrit (%), mean (SD)  40 (4)  41 (5) 



hsCRP^{j} (µg/mL), mean (SD)  6.89 (12.87)  9.10 (16.29)  .22  

NTproBNP^{k} (ng/L), mean (SD)  2704 (6736)  2519 (1902)  .82  

IL6^{l} (pg/mL), mean (SD)  3.05 (5.36)  4.32 (6.28)  .12  

IL10^{m} (pg/mL), mean (SD)  10.74 (49.67)  13.67 (59.94)  .70  

TNFαRII^{n} (pg/mL), mean (SD)  3425 (1700)  3456 (1671)  .90  

cTnT^{o} (ng/mL), mean (SD)  0.03 (0.08)  0.02 (0.05)  .62  

cTnI^{p} (ng/mL), mean (SD)  0.10 (0.28)  0.10 (0.25)  .98  

CKMB^{q} (ng/mL), mean (SD)  3.94 (5.77)  3.87 (3.86)  .93  

Myoglobin (ng/mL), mean (SD)  31.37 (30.80)  37.13 (41.53)  .31  
LVEF^{r}: NonCMR^{s} LVEF (%), mean (SD)  24.2 (7.6)  23.0 (7.4)  .19  



LVEF (%), mean (SD)  27.8 (10.3)  25.1 (8.8) 



LV^{t} enddiastolic volume index (ml/m^{2)}, mean (SD)  122.3 (39.9)  136.2 (48.4) 



LV endsystolic volume index (ml/m^{2}), mean (SD)  91.5 (39.1)  104.3 (45.2) 



LV mass index (ml/m^{2}), mean (SD)  75.1 (24.4)  80.3 (21.2) 





LGE^{u} present (%), mean (SD)  176 (66)  56 (86) 



Gray zone (g), mean (SD)  8.8 (11.6)  13.8 (12.2) 



Core (g), mean (SD)  12.4 (14.9)  17.7 (15.1) 



Total scar (g), mean (SD)  21.1 (25.4)  31.3 (25.6) 

^{a}SCD: sudden cardiac death.
^{b}
^{c}MI: myocardial infarction.
^{d}NYHA: New York Heart Association.
^{e}ACE: angiotensinconverting enzyme.
^{f}ARB: angiotensin II receptor blocker.
^{g}LBBB: left bundle branch block.
^{h}ICD: implantable cardioverter defibrillator.
^{i}eGFR: estimated glomerular filtration rate.
^{j}hsCRP: highsensitivity Creactive protein.
^{k}NTproBNP: Nterminal probtype natriuretic peptide.
^{l}IL6: interleukin6.
^{m}IL10: interleukin10.
^{n}TNFαRII: tumor necrosis factor alpha R II.
^{o}cTnT: cardiac troponin T.
^{p}cTnI: cardiac troponin I.
^{q}CKMB: creatine kinase MB.
^{r}LVEF: left ventricular ejection fraction.
^{s}CMR: cardiac magnetic resonance.
^{t}LV: left ventricular.
^{u}LGE: late gadolinium enhancement.
To communicate the results from ML models, such as our RF for SCD predictions, we develop representative interpretable summaries. As illustrated in
Steps to present machine learning (ML) predictions in an interpretable manner: The black box algorithm is applied to input data comprising outcomes (Y) and predictors (X) to obtain blackbox predictions (P) of the input outcomes. The original X variables and the blackbox predictions (P) are inputs to a simple model or algorithm, for example a single tree, whose predictions (S) are sufficiently close to (P) but more easily understood and explained.
Train the ML model with the input features (
Obtain the predicted values (
Train a simple, interpretable, and clinically understood model, such as a decision tree [
Obtain the predicted values (
R^{2} equation where i=1 to n observations evaluated. S^{(i)} denotes the prediction for the i^{th} observation using the simplified model, P^{(i)} denotes the prediction for the ith observation using the ML model, and P_{avg} denotes the average prediction from the ML model.
Note that the interpretative tree can be grown sufficiently large such that R^{2} is arbitrarily close to 1. If a simple tree has a small R^{2}, extra caution should be exercised to avoid overinterpreting the simplified model. In contrast, if R^{2} is high, the simplified model may be considered as an alternative to the actual ML model for obtaining future predictions in a simplified manner [
By using a single tree as a summary of the RF predictions, we can quantify the importance of individual variables or groups of variables. A useful measure of the total effect on outcome
To present results in other ways familiar to clinicians, predictor effects can be communicated in plots where risk ratios are presented [
Using data from the LV Structural Predictors of SCD Registry, a global summary for SCD risk prediction is obtained by fitting a single decision tree to RF predictions using as inputs the same covariates used in the RF and the outcome as the RF predictions.
Global summary tree of random forest (RF) model for sudden cardiac death (SCD) prediction: Several risk factors (namely heart failure hospitalization, several cardiac magnetic resonance imaging indices, and interleukin6 [IL6], a marker of inflammation) discriminate between low, intermediate, and high risk patients. Decision rules in the tree are shown in bold italics. The 1year risks of SCD are shown in the boxes at the bottom of the decision tree. The boxes are colored according to the magnitude of the percent per year risk, with white corresponding to the lowest risk subgroup and dark red corresponding to the highest risk subgroup. Percentages in parentheses at the bottom of the boxes are the proportions of the total training data that belong to each of the risk subgroups. R^{2} is 0.88 for how well this global summary tree represents the RF model. HF: heart failure; LV: left ventricular.
Visualization of predictor effects in random forest (RF) model for sudden cardiac death (SCD) prediction: Risk ratio point estimates and the 95% confidence intervals generated from 500 bootstrap replications are shown for the RF model for SCD risk prediction. The largest risk ratio is between individuals who never experienced a heart failure hospitalization and those who experienced one or more heart failure hospitalizations. The other risk ratio comparisons show the risk ratios between individuals grouped into different categories based upon inflammation or cardiac magnetic resonance (CMR) imaging variables indicating the structural and functional properties of the heart. HF: heart failure; IL6: interleukin 6; LV: left ventricular.
This table summarizes the global summary tree (shown in
Split variable  HF^{a} hospitalization history  LV^{b} enddiastolic volume index  Total scar  Inflammation (IL6^{c})  Gray zone  Tree total  ML^{d} total 
Number of splits  1  2  2  1  1  7  N/A^{e} 
Deviance explained  1.26  0.255  0.100  0.034  0.020  1.67  1.89 
Percentage of deviance explained  66.6  13.5  5.2  1.7  1.1  0.88^{f}  100 
^{a}HF: heart failure.
^{b}LV: left ventricular.
^{c}IL6: interleukin 6.
^{d}ML: machine learning.
^{e}N/A: not applicable.
^{f}This corresponds to the R^{2} value (0.88) obtained when using the equation shown in
We demonstrate that it is possible to obtain improved transparency of ML by generating simplified models and visual displays adapted from those used commonly in clinical practice. As a specific example of this framework, we use RF extended to survival analysis with timevarying covariates for individualized SCD risk prediction. Commonly used methods for SCD risk prediction, such as Cox proportional hazards regression, do not automatically account for nonlinear and interaction effects or facilitate the application to individualized risk prediction [
Because this framework for interpretability is modelagnostic, the user may benefit from ML’s high predictive performance while also gaining insights into how predictions were generated. Despite the complexity of the original algorithm, these methods for interpretability only depend on the
To implement ML in clinical practice, it is essential to provide
Developing interpretable predictions is particularly important in the application of ML to health care because of the unique challenges related to medical ethics and regulatory or legal considerations [
Although any complex model can be simplified to a summary model, it is possible that the summary and original model predictions are highly dissimilar, as reflected in a small R^{2}. This was not the case in the motivating study, where 5 variables and 7 splits explained 88% of the variation in the RF’s predicted values. We can expect similar results in many problems because the interpretive tree is trained on the predicted values from a complex ML algorithm designed to find relatively lowerdimensional summaries than the original data. When a small interpretative tree has a poor R^{2}, it can be enlarged as needed to achieve a prespecified higher value. The user can then look for simpler summaries by grouping classes of predictors and interactions among them. Finally, the approach has the R^{2} value as a measure of the fidelity of the simpler model predictions to the ML predictions. When this value is too small for a given tree, the user knows that a simple tree has limited interpretative value.
A closely related subfield of ML is actively addressing this topic by comparing different learning algorithms and selecting a final model [
Currently, limited interpretability remains a major barrier to successful translation of ML predictions to the clinical domain [
cardiac magnetic resonance
heart failure
interleukin6
left ventricular
machine learning
random forest
sudden cardiac death
This work was supported by the National Institutes of Health Grant Nos. R01HL103812, F30HL142131, and 5T32GM007309. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
SW, KW, and SZ conceived and formulated the study design. SW and SZ developed the methods and performed the data analysis. KW acquired the patient data for the study and provided input regarding the analytic approach. SW drafted the manuscript. SW, KW, and SZ contributed to critical revision of the manuscript and approved the final manuscript.
None declared.