This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Scoring systems developed for predicting survival after allogeneic hematopoietic cell transplantation (HCT) show suboptimal prediction power, and various factors affect posttransplantation outcomes.
A prediction model using a machine learning–based algorithm can be an alternative for concurrently applying multiple variables and can reduce potential biases. In this regard, the aim of this study is to establish and validate a machine learning–based predictive model for survival after allogeneic HCT in patients with hematologic malignancies.
Data from 1470 patients with hematologic malignancies who underwent allogeneic HCT between December 1993 and June 2020 at Asan Medical Center, Seoul, South Korea, were retrospectively analyzed. Using the gradient boosting machine algorithm, we evaluated a model predicting the 5-year posttransplantation survival through 10-fold cross-validation.
The prediction model showed good performance with a mean area under the receiver operating characteristic curve of 0.788 (SD 0.03). Furthermore, we developed a risk score predicting probabilities of posttransplantation survival in 294 randomly selected patients, and an agreement between the estimated predicted and observed risks of overall death, nonrelapse mortality, and relapse incidence was observed according to the risk score. Additionally, the calculated score demonstrated the possibility of predicting survival according to the different transplantation-related factors, with the visualization of the importance of each variable.
We developed a machine learning–based model for predicting long-term survival after allogeneic HCT in patients with hematologic malignancies. Our model provides a method for making decisions regarding patient and donor candidates or selecting transplantation-related resources, such as conditioning regimens.
Allogeneic hematopoietic cell transplantation (HCT) is a potentially curative therapeutic option for patients with hematologic malignancies, which has been widely used. The increasing use of allogeneic HCT is attributable to multiple factors, including improved alternative donor availability, reduced-intensity conditioning regimens, advances in the prevention of transplantation-related toxicities, and an improvement in general supportive care. Despite these advances, allogeneic HCT remains associated with considerably high rates of complications, treatment-related mortality, and relapse [
Recently, attempts to predict transplantation-related outcomes more accurately have been made in various clinical settings regarding early mortality [
The survival following allogeneic HCT, however, can vary depending on multiple variables, such as disease relapse and transplantation-related complications, including GVHD, engraftment failure, or infection, which can lead to increased nonrelapse mortality (NRM). Furthermore, these HCT complications are associated with several variables, including donor-related or recipient-related factors, donor-recipient relationship, and conditioning, among others.
We hypothesized that the selection of variables using a machine learning–based approach and the establishment of a prediction model by applying those variables will improve the performance of the model and avoid unexpected biases. Additionally, we assumed that the established prediction algorithm will help choose better transplantation-related factors or donors to improve post-HCT outcomes. In this study, we developed a model for predicting the long-term survival of patients with hematologic malignancies after allogeneic HCT based on selected variables using a machine learning algorithm, and we validated the model’s accuracy in a validation set. Then, we implemented an algorithm to select more appropriate transplantation-related factors using the established prediction model.
Data on 1470 adult patients (≥15 years old) with hematologic malignancies who underwent allogeneic HCT between December 1993 and December 2015 at Asan Medical Center, Seoul, South Korea, were obtained for developing the machine learning–based prediction model. To predict long-term survival after allogeneic HCT, we included patients who survived more than 5 years and who died within 5 years after transplantation. As the data cutoff date was December 2020, we only included patients who underwent allogeneic HCT before December 2015 to ensure that the follow-up duration of each patient could be at least 5 years. Then, 229 variables, including recipient and donor characteristics, disease features, HLA types, graft information, administered medications for conditioning, GVHD prophylaxis, supportive care, and other laboratory data, were collected for analysis.
The primary objective of the study was to predict the 5-year overall survival (OS) after allogeneic HCT, and the secondary objectives include determining the NRM, cumulative incidence of relapse (CIR), and 100-day OS. All censored data were calculated from the date of the transplantation.
The Institutional Review Board of Asan Medical Center approved the protocols of this study (2021-1003), which was conducted according to the 2008 Declaration of Helsinki.
The patients were classified into two groups, those who survived more than 5 years and those who died within 5 years. In the learning process, the former group was labeled 0 and the latter was labeled 1. Therefore, the closer the predicted value to 1, the higher the probability of death within 5 years. The aforementioned predictive factors were classified into categorical or noncategorical variables and used for developing 5 prediction models. The performance for predicting survival after allogeneic HCT was tested using the following 5 machine learning algorithms: gradient boosting machine (GBM), random forest, deep neural network, logistic regression, and adaptive boosting (AdaBoost). Each algorithm was tested using the same training set which was randomly divided (1176/1470, 79.59% of the total number of patients in the training set). The AUCs of the algorithms are shown in
We provided an explainable individualized survival prediction using Shapley values to quantify the probability of surviving for each patient by predicting the OS after allogeneic HCT. A Shapley value is calculated as the average change according to the presence or absence of a single feature over all possible combinations of features [
where
Categorical variables were compared using the chi-square test or Fisher exact test, and continuous variables were compared using the Mann-Whitney
The characteristics of the patients and donors included in the study are shown in
During the median follow-up duration of 8 years (95% CI 7.8-8.3 years), the estimated 5-year OS of all patients was 46.2%. The 2-year incidence of NRM and CIR was 17.7% and 33.3%, respectively.
Patient and donor characteristics.
Variable | Value | |
Patients, N | 1470 | |
Interval between diagnosis to HCTa in months, median (95% CI) | 5.7 (0-268) | |
|
||
|
Male | 833 (56.7) |
|
Female | 637 (43.3) |
|
||
|
Male | 977 (66.5) |
|
Female | 493 (33.5) |
Recipient age in years, median (range) | 41 (15-75) | |
Donor age in years, median (range) | 34 (0-70) | |
|
||
|
Male to male | 551 (37.5) |
|
Female to male | 280 (19) |
|
Male to female | 424 (28.8) |
|
Female to female | 213 (14.5) |
|
||
|
AMLb | 783 (66.9) |
|
MDSc | 188 (16.1) |
|
ALLd | 306 (26.2) |
|
Lymphoma | 56 (4.8) |
|
MMe | 13 (1.1) |
|
CMLf | 92 (7.9) |
|
MPNg | 16 (1.4) |
|
MDS-MPN | 16 (1.4) |
HCT-CIh score, median (range) | 3 (0-8) | |
|
||
|
Standard riski | 830 (56.5) |
|
High risk | 640 (43.5) |
|
||
|
Matched sibling | 591 (40.2) |
|
Unrelated | 387 (26.4) |
|
Haploidentical familial | 491 (33.4) |
|
Cord blood | 1 (0.1) |
|
||
|
Bone marrow | 472 (32.1) |
|
Peripheral blood | 997 (67.8) |
|
Cord blood | 1 (0.1) |
|
||
|
Myeloablative | 536 (36.5) |
|
Reduced intensity | 934 (63.5) |
Treated with antithymocyte globulin to prevent GVHDj, n (%) | 903 (61.4) |
aHCT: hematopoietic cell transplantation.
bAML: acute myeloid leukemia.
cMDS: myelodysplastic syndrome.
dALL: acute lymphoblastic leukemia.
eMM: multiple myeloma.
fCML: chronic myeloid leukemia.
gMPN: myeloproliferative neoplasm.
hHCT-CI: hematopoietic cell transplantation–specific comorbidity index.
iThe standard-risk group is defined as follows: patients with acute leukemia in the first remission (except by salvage chemotherapy), CML in the chronic phase, drug-sensitive lymphoma/MM, or MDS with bone marrow blasts ≤5% at HCT.
jGVHD: graft-versus-host disease.
After deciding on GBM as the prediction algorithm, the variables used for model development were selected using the recursive feature elimination (RFE) method. RFE is one of the widely used feature selection methods that provide a rank to each variable according to feature importance in predicting the target variable and help select a minimum specified number of variables showing good performance in a model [
Diagnosis and disease (eg, AML first complete remission)
Disease risk*
WBC count at diagnosis
Extramedullary disease at diagnosis
Extramedullary disease at HCT
Karyotype at diagnosis
Karyotype at HCT
CMV serostatus of recipient
CMV serostatus of donor
Hepatic score of HCT-CI
Total score of HCT-CI
Conditioning regimen
Donor type
Recipient HLA type: A, B, C, DR, and DQ
Donor HLA type: A, B, C, DR, and DQ
RBC transfusion before HCT
Platelet transfusion before HCT
The performance of the prediction model using GBM and selected variables in 294 patients is depicted in
The final performance of the prediction model. Panel A shows the area under the receiver operating characteristic curve. Panel B shows the calibration plot.
Because we classified patients who died within 5 years as 1, the closer the predicted value of the GBM model to 1, the higher the risk of death. The optimal threshold for determining whether the risk score is positive or negative is calculated using the Youden J statistic along with the ROC curve. From the prediction model, the threshold is 0.5533, and if the risk score is greater than that, the model estimates that the patient will die within 5 years. The predicting probability of the risk score of each patient was tested in a randomly selected patient cohort, which corresponds to 20% of all patients (294/1470) to reduce the probable bias from choosing one of 10 produced models.
Different post-transplantation outcomes of the patients of validation set according to the prediction score (A) Overall survival (B) relapse (C) non-relapse mortality.
To assess whether the risk score can also predict NRM and relapse after HCT, we analyzed the incidence of NRM and relapse using 3 risk groups. High-risk scores were significantly associated with both higher CIR (
We assumed that the prediction score for each patient can be applied in selecting the most appropriate donor when there are multiple donor candidates. For example, the prediction score can help physicians select the donor between a younger HLA-haploidentical individual and an older matched sibling. To verify this, we calculated the scores using Shapley values through which the importance of each variable can be visualized using a specific value. We simulated a real case of a patient with ALL in the first CR who has the following 2 donor candidates: one is a 48-year-old HLA-haploidentical familial female individual, and the other is a 43-year-old locus-mismatched unrelated male individual. A total of 2 prediction scores were calculated using data derived from each donor showing different values (
Survival difference of the patients of validation set according to the prediction score.
Long-term survival after allogeneic HCT in patients with hematologic malignancies is affected by multiple factors but mainly depends on disease relapse and NRM. Multiple variables, including disease status, genetic risk, conditioning regimen, comorbidities, degree of HLA matching, and patient and donor ages, are associated with disease relapse, GVHD, engraftment, or treatment-related toxicities, and these outcomes are closely and mutually related to survival after transplantation. However, traditional statistical methods are unsuitable for analysis considering the interactions between variables or their differences according to the specific values of each factor, such as the relationship between the HLA allele of the patient and donor. In this regard, prediction models based on machine learning algorithms can be an effective alternative for predicting posttransplantation outcomes and can provide guidance for selecting appropriate patients, donors, or resources [
We developed a prediction model and risk score using GBM and selected variables based on machine learning for long-term survival after allogeneic HCT. Our model demonstrated an AUC of 0.788, which showed better performance in predicting posttransplantation outcomes than previously reported machine learning–based models. Shouval et al [
To apply the prediction model to patients planning for allogeneic HCT in practice, a specific tool for comparing the expected outcomes according to multiple different factors is required. We provided a prediction score to quantify the probability of survival, which showed good concordance of the observed and estimated survival after HCT. Additionally, SHAP visualizes the importance of each factor (
The limitations of this study include the relatively small number of patients used for establishing the algorithm-based prediction model. Although the model showed consistency using 10-fold cross-validation in the validation cohort, a larger patient cohort is considered more helpful in verifying the performance of the algorithm. Further external validation using data from a greater number of patients is warranted. Second, the retrospective nature of the study may have resulted in selection and measurement biases. However, we included all patients with hematologic malignancies who underwent allogeneic HCT during a certain period of time to reflect real-world practice.
Here, we present a machine learning–based algorithm and prediction score for quantifying the probability of long-term survival after allogeneic HCT in patients with hematologic malignancies. The prediction score showed a moderate negative correlation with long-term survival, NRM, and relapse after transplantation. Our prediction model provides a personalized method for selecting more appropriate transplantation-related factors and patient or donor candidates for allogeneic HCT.
The AUC of each tested algorithm. cb, CatBoost; rf, random forest; fnn, feedforward neural network; log, logistic regression; ada, AdaBoost.
Recursive feature elimination showing the AUC (area under the curve) according to the number of selected features.
acute lymphoblastic leukemia
area under the ROC curve
cumulative incidence of relapse
gradient boosting machine
graft-versus-host disease
hematopoietic cell transplantation
human leukocyte antigen
myelodysplastic syndrome
myeloproliferative neoplasm
nonrelapse mortality
overall survival
recursive feature elimination
receiver operating characteristic
Shapley Additive Explanations
This work was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute, funded by the Republic of Korea’s Ministry of Health Welfare (Grant HR21C0198).
None declared.