This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Cardiovascular disorders in general are responsible for 30% of deaths worldwide. Among them, hypertrophic cardiomyopathy (HCM) is a genetic cardiac disease that is present in about 1 of 500 young adults and can cause sudden cardiac death (SCD).
Although the current state-of-the-art methods model the risk of SCD for patients, to the best of our knowledge, no methods are available for modeling the patient's clinical status up to 10 years ahead. In this paper, we propose a novel machine learning (ML)-based tool for predicting disease progression for patients diagnosed with HCM in terms of adverse remodeling of the heart during a 10-year period.
The method consisted of 6 predictive regression models that independently predict future values of 6 clinical characteristics: left atrial size, left atrial volume, left ventricular ejection fraction, New York Heart Association functional classification, left ventricular internal diastolic diameter, and left ventricular internal systolic diameter. We supplemented each prediction with the explanation that is generated using the Shapley additive explanation method.
The final experiments showed that predictive error is lower on 5 of the 6 constructed models in comparison to experts (on average, by 0.34) or a consortium of experts (on average, by 0.22). The experiments revealed that semisupervised learning and the artificial data from virtual patients help improve predictive accuracies. The best-performing random forest model improved R2 from 0.3 to 0.6.
By engaging medical experts to provide interpretation and validation of the results, we determined the models' favorable performance compared to the performance of experts for 5 of 6 targets.
Recent reviews of machine learning (ML) applications in cardiovascular medicine [
Cardiovascular disorders in general are responsible for 30% of deaths worldwide. Among them specifically, hypertrophic cardiomyopathy (HCM) is a genetic cardiac disease that is a cause of sudden cardiac death (SCD), especially among young adults and athletes [
In this paper, we propose a novel ML-based tool for predicting disease progression for patients diagnosed with HCM in terms of adverse remodeling of the heart during a 10-year period. The method consists of 6 contemporaneous predictive regression models that independently predict future values of the following 6 clinical characteristics: left atrial diameter (LA_d), left atrial volume (LA_Vol), left ventricular ejection fraction (LVEF), New York Heart Association (NYHA) functional classification, left ventricular internal diameter at end diastole (LVIDd), and left ventricular internal diameter at end systole (LVIDs). Each prediction is supplemented with the explanation that is generated using the Shapley additive explanation (SHAP) method [
ML techniques are being frequently applied in medicine to improve the prediction of disease progression, extraction of medical knowledge for outcome research, therapy planning and support, and overall patient management [
In cardiology, there are several works addressing disease progression trends related to different cardiological diseases. With the increase in computational power, ML has become a tool to analyze nonlinear dependencies that are present either in relational data or in images. Juarez-Orozco et al [
Further, a hybrid approach for progression of Parkinson disease [
Several other ML approaches also model disease progression well in other medical domains, such as kidney disease progression [
To summarize, the overview indicates that ML models can be successfully applied to problems of predicting disease progression, which is also the goal of this paper. In the next subsection, we overview how ML approaches are used in cardiology, specifically for HCM, which is the focus of this paper.
Most ML contributions to cardiovascular medicine focus on risk stratification of patients. One of the biggest obstacles to using data for a broader variety of ML applications is that data are usually stored in diverse repositories, which are not readily usable for cardiovascular research, due to various data quality challenges [
HCM is a severe disease for which 4 stages of progression have been identified in the medical literature [
It is important to note that patients with HCM who experience cardiac arrest are not identified by typical risk markers used in the American College of Cardiology or the statistical mathematical risk model by the European Society of Cardiology [
Novelties and contributions of this paper include:
A disease progression system that comprises models for prediction of 6 contemporaneous relevant clinical parameters that are relevant to HCM for 10 years ahead. The system includes the implementation of the explanation methodology that provides interpretability of predictive models.
Analysis of predictive performance if training data are extended using semisupervised learning or with artificial patient data.
Validation of predictive accuracy with medical experts by comparing ML and human accuracy and by analyzing sensibility of the computer-generated prediction explanations.
The aim of this paper is to develop a system capable of detecting slow progression of HCM based on longitudinal data.
In this work, we modeled disease progression by predicting 6 relevant patient parameters 10 years in advance. These parameters are indicators of HCM and can be used to determine the stage of HCM according to the known guidelines [
Overview of the proposed disease progression system. The system receives clinical data and disease-related events of a patient as input, uses virtual patient data and semisupervised learning for self-improvement, and returns the predictions and their explanation for 6 target variables.
The output of the system is a set of 6 contemporaneous target predictions for parameters:
LA_d LA_Vol LVEF LVIDd LVIDs NYHA functional classification
In addition to predictions, the system also generates their explanations, revealing the factors with the largest impact on the increase or decrease in the 6 target variables throughout the 10-year period.
We trained the proposed disease progression system using supervised ML techniques. To further improve the results, we augmented the original data using unlabeled data (semisupervised learning) and virtual patients’ data. We applied the semisupervised learning using patients without 10-year follow-ups and generated virtual patients’ data using various techniques for artificial data generation. The semisupervised learning first predicted patients' targets using the trained models on labeled data, so they could be afterward included into the training data set. In the following subsections, we describe the data set, predictive modeling with supervised models, use of semisupervised learning and virtual patient data, and generation of prediction explanations.
The proposed approach was developed on a data set that was provided by the University of Florence as a result of its long-term clinical practice. The data set included patients who were enrolled over the past 40 years (
Relationship between the amount of labeled and unlabeled data. The bars for Yes and No values are stacked, visually revealing the ratio between labeled and unlabeled data. Note that the rightmost columns do not have 10-year follow-up data, as they are less than 10 years.
Basic characteristics of patients for basic continuous parameters (N=10,318).
Continuous parameter | Mean (SD) | Missing data, n (%) |
Age (years) | 52.1 (18.6) | 4 (0.04) |
Weight (kg) | 73.4 (14.6) | 2381 (23.08) |
Height (cm) | 169 (10.3) | 2273 (22.03) |
Body mass index (BMI) | 25.6 (4.09) | 2423 (23.48) |
NYHAa | 1.69 (0.73) | 983 (9.53) |
aNYHA: New York Heart Association.
Basic characteristics of patients for basic binary parameters (N=10,318).
Binary parameter | 1-value, n (%) | 0-value, n (%) | Missing, n (%) |
Alcohol | Yes, 103 (0.99) | No, 10,215 (99) | 0 |
Drug | Yes, 18 (0.17) | No, 10,300 (99.83) | 0 |
Smoking | Yes, 3437 (33.31) | No, 6881 (66.69) | 0 |
Pregnancy | Yes, 443 (4.29) | No, 9875 (95.71) | 2515 (24.37) |
Gender | Male, 6400 (62.03) | Female, 3918 (37.97) | 0 |
Basic characteristics for groups of parameters (N=10,318)a.
Procedure | Parameters, n | Total missing values, n (%) |
ECGb | 9 | 45,839 (49.36) |
Echoc | 26 | 98,191 (36.60) |
CMRd | 10 | 81,174 (78.67) |
aThe table shows aggregated statistics for several parameters obtained from the same procedure. The percentage for each procedure is obtained as follows: [Total missing values/(Parameter × N)] × 100.
bECG: electrocardiogram.
cEcho: echocardiogram.
dCMR: cardiovascular magnetic resonance.
Absolute number and percentage of missing values of target variables as class and as input (N=10,318).
|
LA_da, n (%) | LVEFb, n (%) | NYHAc, n (%) | LVIDdd, n (%) | LVIDse, n (%) | LA_Volf, n (%) |
Target | 8569 (83.05) | 8481 (82.19) | 8313 (80.57) | 8607 (83.42) | 9336 (90.48) | 8631 (83.65) |
Input | 2691 (26.08) | 2399 (23.25) | 983 (9.53) | 2517 (24.39) | 5329 (51.65) | 3680 (35.67) |
aLA_d: left atrial diameter.
bLVEF: left ventricular ejection fraction.
cNYHA: New York Heart Association.
dLVIDd: left ventricular internal diameter at end diastole.
eLVIDs: left ventricular internal diameter at end systole.
fLA_Vol: left atrial volume.
First, we transformed the available data set into a suitable form for predicting a 10-year change in relevant parameters using ML. Similarly, in other real-world data sets, most of the clinical tests were missing many patients or measurements were not taken for the whole span of 10 years (
Formation of training examples: Since not all clinical tests can be conducted on the same day or in the same month, we defined a training example as a set of measurements within a time frame of 1 year. Such time frame corresponds to the annual regular visit period of patients and allows enough time for relevant changes in the observed parameters to become noticeable, as the disease slowly progresses. If the patient had a certain test performed multiple times within this time frame, multiple tests were treated as separate measurements. In case a certain type of test was not performed in the 1-year time frame, the corresponding variables were recorded as missing. Constructing training examples in this way yielded a data set with 13,386 examples, with 3.9 (SD 4.8) examples per patient.
Imputation of missing data: The missing values in the data set, either because of nonperformed tests or because of erroneous input of data, were imputed by copying the closest past values (sensible because the progression of HCM is slow; used on numerical and categorical attributes), imputing values of a healthy patient (sampled from the normal distribution; used for numerical attributes), or imputing mean values where healthy values were unknown (used on numerical and categorical attributes). Since measurements were not taken at equidistant time intervals, we used linear interpolation for computing equidistant measurement approximations.
We used the formed training examples as input to supervised learning algorithms. Prior to modeling, we evaluated the quality of attributes, which is important for decreasing learning complexity, avoiding overfitting, and, therefore, improving the simplicity and performance of ML methods. To facilitate learning with NNs, we also scaled the values to the interval [0,1] and encoded nominal values using the one-hot encoding method.
We used RReliefF [
Selected attributes using RReliefF.a
Variableb | LA_dc score | LVEFd score | NYHAe score | LVIDdf score | LVIDsg score | LA_Volh score | Average rank | |
|
||||||||
|
|
0.198 | 0.194 | 0.166 | 0.142 | 0.166 | 0.158 | 1.000 |
|
|
0.051 | 0.037 | 0.043 | 0.055 | 0.058 | 0.022 | 12.500 |
|
|
0.057 | 0.064 | 0.045 | 0.075 | 0.051 | 0.029 | 9.167 |
|
|
0.075 | 0.073 | 0.053 | 0.095 | 0.085 | 0.045 | 4.167 |
|
||||||||
|
|
0.063 | 0.046 | 0.052 | 0.032 | 0.069 | 0.082 | 7.500 |
|
|
0.072 | 0.042 | 0.052 | 0.039 | 0.044 | 0.056 | 9.667 |
|
History of syncope | 0.026 | 0.036 | 0.029 | 0.022 | 0.029 | 0.048 | 20.000 |
|
|
0.056 | 0.060 | 0.061 | 0.047 | 0.052 | 0.066 | 5.833 |
|
Family history of SCDk | 0.027 | 0.051 | 0.032 | 0.031 | 0.051 | 0.049 | 14.667 |
|
||||||||
|
NYHA | 0.011 | 0.017 | 0.069 | 0.007 | 0.027 | 0.022 | 33.000 |
|
Presence of atrial fibrillation | 0.055 | 0.036 | 0.048 | 0.018 | 0.026 | 0.068 | 16.333 |
|
QRS duration | 0.035 | 0.046 | 0.029 | 0.039 | 0.026 | 0.039 | 17.167 |
|
|
0.043 | 0.052 | 0.049 | 0.041 | 0.057 | 0.052 | 8.167 |
|
LA_d | 0.078 | 0.037 | 0.036 | 0.018 | 0.031 | 0.070 | 15.000 |
|
LA_Vol | 0.055 | 0.029 | 0.026 | 0.012 | 0.025 | 0.059 | 24.000 |
|
LVIDs | 0.017 | 0.022 | 0.027 | 0.029 | 0.043 | 0.031 | 25.167 |
|
LVIDd | 0.021 | 0.017 | 0.017 | 0.036 | 0.044 | 0.026 | 27.667 |
|
LVEF | 0.018 | 0.051 | 0.019 | 0.014 | 0.050 | 0.013 | 27.833 |
|
||||||||
|
|
0.045 | 0.041 | 0.039 | 0.051 | 0.052 | 0.059 | 9.667 |
|
|
0.037 | 0.044 | 0.034 | 0.040 | 0.066 | 0.023 | 14.667 |
|
Negative genetics | 0.036 | 0.037 | 0.027 | 0.043 | 0.030 | 0.031 | 18.667 |
aThe table shows RReliefF feature scores and the average ranks for each target variable.
bNames of the 10 highest-ranked variables are italicized.
cLA_d: left atrial diameter.
dLVEF: left ventricular ejection fraction.
eNYHA: New York Heart Association.
fLVIDd: left ventricular internal diameter at end diastole.
gLVIDs: left ventricular internal diameter at end systole.
hLA_Vol: left atrial volume.
iBSA: body surface area.
jHCM: hypertrophic cardiomyopathy.
kSCD: sudden cardiac death.
lECG: electrocardiogram.
mEcho: echocardiogram.
To model the relationship between input patient data and target variables, we applied the following supervised learning algorithms:
RFs [
Gradient boosting (XGBoost) [
LR is a traditional method of finding a linear dependence between attributes and the selected target variable.
NNs mimic the architecture and working of brain neurons. We used 1 input and 1 output layer and 1 or several hidden layers. In the optimization process, we optimized several learning parameters, such as the learning rate, number of hidden layers, sizes of layers, regularization, sample weights, class weights, dropout, and batch normalization.
The best hyperparameters of these algorithms were tuned using Bayesian optimization and random search implemented in
Semisupervised learning is increasingly used in medicine, especially for medical image segmentation [
To further improve the results of semisupervised learning, we used artificially generated data (ie, virtual patients). Virtual data generation can sometimes replace experiments in biomedical experiments on animals [
Supervised ML models often exhibit a black-box nature, meaning that they can model data but not provide an explanation for the contained knowledge as well as the reasoning used in predictions. This means that the models lack transparency and interpretability. To address this, explanation methods provide justification for each prediction and assess features with the highest impact [
In our work, we applied the SHAP method [
To evaluate and compare the performance of the 6 predictive models, we used stratified 10-fold cross-validation. For each of the 6 predictive problems, 4 different regression models were evaluated (LR, RF, gradient-boosted [GB] trees, and NN). The following parameters were varied in tests:
Application of semisupervised learning (denoted with S)
Addition of virtual patients' data into the learning data set (denoted with VP)
Use of all 112 features (denoted with All) or only a subset of the 21 best features (denoted with Subset)
Interpolation of data points so that measurements were equidistant (denoted with I)
In all, 28 different combinations of the parameters were used in experiments. Some combinations were omitted due to limitations (eg, VP generators cannot generate data for all 112 attributes, so VP was evaluated only with the subset of attributes) or excessive time complexity (eg, the use of virtual patients with NNs).
To compare the accuracy of the obtained models, we computed the following 4 metrics: mean absolute error (MAE), root-mean-square error (RMSE), and 2 variations of the relative root-mean-square error (RRMSEmean and RRMSEconst). The MAE measures the average absolute difference between predicted and true values over all examples in the test set. The RMSE addresses the issue that the squared values of the MSE are hard to interpret. The RRMSE measures the relative ratio between the obtained model and the baseline model. We computed 2 variations of the RRMSE with 2 different baseline models: mean predictor and constant predictor. With the RRMSEmean, we compared the performance of the obtained model to the model that returned the mean of the target variable over all patients (mean predictor), while with the RRMSEconst, we compared the obtained model to the model that assumed that the value of the target variable would remain constant/unchanged over the 10-year period (constant predictor).
We summarized (
To further evaluate the contribution of different data augmentation strategies, we compared the results on different patient sets: original (All features), subset of best features (Subset), virtual patients (VP), semisupervised learning (S), and the combination of the latter 2 (S + VP). The obtained results, shown for the best-performing RF model, are given in
In the following subsection, we apply the explanation methodology that helps interpret the computed predictions and their contributing feature values.
Comparison of the best-performing models for each target variable.
Target | Model and parameter | MAEa | RMSEb | RRMSEcmean | RRMSEconst |
LA_dd | RFe: Sf+VPg+Subset | 3.4 | 4.73 | 0.54 | 0.46 |
LA_Volh | RF: S+VP+Subset | 18.4 | 26.73 | 0.56 | 0.47 |
LVEFi | GBj: S+Subset | 4.92 | 6.73 | 0.67 | 0.61 |
LVIDdk | RF: S+VP+Subset | 3.53 | 5.26 | 0.68 | 0.64 |
LVIDsl | RF: S+VP+Subset | 3.42 | 4.81 | 0.66 | 0.56 |
NYHAm | RF: S+VP+Subset | 0.39 | 0.5 | 0.67 | 0.66 |
aMAE: mean absolute error.
bRMSE: root-mean-square error.
cRRMSE: relative root-mean-square error.
dLA_d: left atrial diameter.
eRF: random forest.
fS: application of semisupervised learning.
gVP: addition of virtual patients' data into the learning data set.
hLA_Vol: left atrial volume.
iLVEF: left ventricular ejection fraction.
jGB: gradient boosted.
kLVIDd: left ventricular internal diameter at end diastole.
lLVIDs: left ventricular internal diameter at end systole.
mNYHA: New York Heart Association.
Plotted results for the R2 statistic for each target variable using different sets (input parameters). Note that VP, S, and S + VP are used on feature subsets. LA_d: left atrial diameter; LA_Vol: left atrial volume; LVEF: left ventricular ejection fraction; LVIDd: left ventricular internal diameter at end diastole; LVIDs: left ventricular internal diameter at end systole; NYHA: New York Heart Association; S: application of semisupervised learning; VP: addition of virtual patients' data into the learning data set.
To augment the output of prediction models, we applied the SHAP method [
An example of the explanation generated for the prediction for the target LA_d (
Example of an explanation of the prediction for the target variable LA_d. LA_d: left atrial diameter; LA_Vol: left atrial volume; LVIDs: left ventricular internal diameter at end systole.
Besides evaluation of prediction models with statistical measures conducted in 2 previous sections, we engaged medical experts to provide further interpretation and validation of the results. First, we compared the accuracy of predictive models with the accuracy of human experts, which was obtained by using a survey (
We prepared a questionnaire for medical experts and distributed it to several medical universities and cardiology clinics. The questionnaire included data about complete medical cases (measurements, events, and medication data) for 10 patients, and the experts were asked to study them and complete the following 2 tasks:
Predict the magnitude of the 10-year change in the 6 studied clinical parameters (LA_d, LA_Vol, LVEF, LVIDd, LVIDs, and NYHA) and mark it on a discrete scale from –3 to 3, where –3 and 3 represented the biggest-possible decrease and increase, respectively. Possible magnitudes of change were represented using discrete intervals, as the prediction of an exact value is a difficult task that does not take place in medical practice.
Evaluate whether the statements generated from the explanation (eg, “The current value of parameter LA_d will cause a decrease in LA_d”) are true or false. For each patient, 6 such statements were generated, covering the features with the highest contribution. More specifically, the questionnaire included evaluation questions for 6 parameters that contribute to a change in LA_d, 4 for LA_Vol, 5 for LVEF, 6 for LVIDd, 7 for LVIDs, and 4 for NYHA.
The questionnaire was fully completed by 13 experts with 16 (SD 8) years of experience. In the following subsections, we present the analysis of the answers.
To compare the prediction accuracy between the experts and the ML model, we first discretized the model's predictions into discrete intervals so that they could be compared to the discrete intervals, predicted by the experts. We performed the discretization using bins of width 0
Mean prediction error of the discretized model prediction (denoted with MD)
Mean prediction error made by individual medical experts (denoted with E)
Mean prediction error of the consortium prediction (ie, the average prediction of all doctors, denoted with C
We could see that the mean prediction error of the discretized model MD (
Mean absolute error (MAE) of the discretized model predictions (MD), individual experts (E), and the entire consortium (C).
Target/prediction | Model (MD), MAE (SD) | Expert (E), MAE (SD) | Consortium (C), MAE (SD) |
NYHAa |
|
0.84 (0.69) | 0.56 (0.34) |
LA_d c | 1.70 (0.82) | 1.69 (0.97) | |
LA_Vold |
|
1.25 (0.98) | 1.13 (0.63) |
LVIDde |
|
1.09 (0.91) | 1.00 (0.77) |
LVIDsf |
|
1.02 (0.86) | 0.88 (0.68) |
LVEFg |
|
1.32 (0.90) | 1.28 (0.79) |
aNYHA: New York Heart Association.
bThe lowest achieved errors are italicized.
cLA_d: left atrial diameter.
dLA_Vol: left atrial volume.
eLVIDd: left ventricular internal diameter at end diastole.
fLVIDs: left ventricular internal diameter at end systole.
gLVEF: left ventricular ejection fraction.
To validate the generated model explanations, we analyzed the agreement of experts with statements generated about the features' influence in 2 steps. First, we calculated the agreement ratio for individual features that were included in the questionnaire, grouped by each of the 6 target variables. Second, we calculated the overall agreement of experts with the explanation for each of the 6 target parameters, based on the agreement data about all features that contributed to their prediction.
The results (
The generated explanation might, indeed, provide incorrect information.
The generated explanation might explain novel relationships between features and target parameters that have not been observed or documented so far.
It was hard for the experts to evaluate the claims in the questionnaire about the influence of particular features, as these tasks deviate from the established medical practice and require the experts to rely on their subjective experience.
For establishing the reasons for imperfect agreement between the explanation and the experts, further investigation is therefore required. We can conclude that the results provide some evidence that the generated prediction explanation might provide a complementary view at the prediction of HCM-related parameters. Such explanations might represent a tool that the experts could consult while making their decisions.
Agreement ratios between experts and prediction explanations for parameters that contribute to predicting each target variable. The last two columns provide summary statistics.
Target variable and parameters | Expert agreement | Summary | ||
|
|
Ratio of agreed features from at least 50% of experts, n | Average agreement, n | |
|
||||
|
|
|
|
|
|
|
|
1.00 (4/4) | 0.73 |
|
|
|
|
|
|
|
|
|
|
|
||||
|
BSAf | 0.15 |
|
|
|
|
|
|
|
|
|
|
0.67 (4/6) | 0.52 |
|
|
|
|
|
|
LVEFg | 0.23 |
|
|
|
|
|
|
|
|
||||
|
QRS duration | 0.38 |
|
|
|
|
|
|
|
|
Syncope | 0.46 | 0.40 (2/5) | 0.49 |
|
|
|
|
|
|
NYHA | 0.38 |
|
|
|
||||
|
|
|
|
|
|
|
|
0.75 (3/4) | 0.48 |
|
Age | 0.15 |
|
|
|
|
|
|
|
|
||||
|
LA_d | 0.38 |
|
|
|
LVIDd | 0.38 |
|
|
|
|
|
|
|
|
|
|
0.43 (3/7) | 0.47 |
|
|
|
|
|
|
Interventricular septum (IVS) | 0.38 |
|
|
|
Family history of HCMh | 0.08 |
|
|
|
||||
|
|
|
|
|
|
Atrial fibrillation | 0.15 |
|
|
|
BSA | 0.08 | 0.17 (1/6) | 0.36 |
|
IVS | 0.38 |
|
|
|
Age | 0.31 |
|
|
|
LVEF | 0.38 |
|
|
aNYHA: New York Heart Association.
bLA_d: left atrial diameter.
cNames of parameters with agreement higher than 50% are italicized.
dLA_Vol: left atrial volume.
eLVIDd: left ventricular internal diameter at end diastole.
fBSA: body surface area.
gLVEF: left ventricular ejection fraction.
iHCM: hypertrophic cardiomyopathy.
We presented a disease progression system for patients diagnosed with HCM that is based on predicting 6 target parameters (LA_d, LA_Vol, LVIDd, LVIDs, LVEF, and NYHA) for 10 years ahead using supervised ML models. The experiments revealed good ML performance for all targets, with the achieved predictive error lower than the error of the default predictors. The experiments also revealed that semisupervised learning and the artificial data from virtual patients helped achieve even higher predictive accuracy for all 6 targets. Finally, we validated our approach with human experts using a structured questionnaire and determined the models' favorable performance compared to performance of experts for 5 of 6 targets.
The design of the study carried several limitations, stemming from the fact that this work was based on real-world data that are expensive to obtain and are subject to noise. The first limitation of this study is that it was based only on a single medical center data set. To further validate this study, it would be beneficial to independently evaluate the models with data sets from other centers or extend the existing data set with more data. Additionally, the benefit for including more data could also be in diminishing a potential bias of our data set, which could potentially include a population distribution that is different from other medical centers and thus different ranges of recorded parameters, which we did, in fact, observe in some cases. Additionally, in the perfect but rather unrealistic scenario due to its cost, both data modalities (echo and CMR) would be available for all patients, which would allow us to use the CMR data as an additional data source for all patients. Due to the unavailability of such data at the time of the study or data that were structured differently, we leave this for our further work.
Further, to prepare the data to be used for ML and obtain stable predictions, we used several preprocessing and data augmentation steps. Since we are dealing with real medical data, this opens questions of how different data transformations influence our predictions. Hence, a sensitivity study of the results would be required, as well as determining how the patient’s record time frame and predicted risk time frame influence the achieved accuracies. An additional limitation of the performed validation was that the ML results were compared to the inputs of medical experts in the structured survey instead of their free diagnoses and evaluations. Although this was required to unify the structure of human answers to enable statistical comparisons, the form of survey might introduce its own bias.
The described limitations, along with our further research questions and ideas, open several ideas for future study directions. First, we will evaluate the proposed system on an independent cardiological data set (eg, the Sarcomeric Human Cardiomyopathy Registry [SHaRe]) [
Although ML can have limitations in medicine [
Bar graphs of parameter influence for each model used.
A sample of the questionnaire for the first patient.
artificial intelligence
body surface area
cardiac magnetic resonance
chronic obstructive pulmonary disease
electrocardiogram
echocardiogram
gradient boosted
hypertrophic cardiomyopathy
k-nearest neighbors
left atrial diameter
left atrial volume
linear regression
left ventricular ejection fraction LVIDd: left ventricular internal diameter at end diastole
left ventricular internal diameter at end systole
mean absolute error
machine learning
multivariate normal distribution
neural network
New York Heart Association
random forest
root-mean-square error
relative root-mean-square error
sudden cardiac death
Shapley additive explanation
support vector machine
This project received funding from the European Union’s Horizon 2020 research and innovation program (grant agreement no. 777204; www.silicofcm.eu). This paper reflects only the authors’ views. The European Commission is not responsible for any use that may be made of the information it contains.
None declared.