This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.

Securing the representativeness of study populations is crucial in biomedical research to ensure high generalizability. In this regard, using multi-institutional data have advantages in medicine. However, combining data physically is difficult as the confidential nature of biomedical data causes privacy issues. Therefore, a methodological approach is necessary when using multi-institution medical data for research to develop a model without sharing data between institutions.

This study aims to develop a weight-based integrated predictive model of multi-institutional data, which does not require iterative communication between institutions, to improve average predictive performance by increasing the generalizability of the model under privacy-preserving conditions without sharing patient-level data.

The weight-based integrated model generates a weight for each institutional model and builds an integrated model for multi-institutional data based on these weights. We performed 3 simulations to show the weight characteristics and to determine the number of repetitions of the weight required to obtain stable values. We also conducted an experiment using real multi-institutional data to verify the developed weight-based integrated model. We selected 10 hospitals (2845 intensive care unit [ICU] stays in total) from the electronic intensive care unit Collaborative Research Database to predict ICU mortality with 11 features. To evaluate the validity of our model, compared with a centralized model, which was developed by combining all the data of 10 hospitals, we used proportional overlap (ie, 0.5 or less indicates a significant difference at a level of .05; and 2 indicates 2 CIs overlapping completely). Standard and firth logistic regression models were applied for the 2 simulations and the experiment.

The results of these simulations indicate that the weight of each institution is determined by 2 factors (ie, the data size of each institution and how well each institutional model fits into the overall institutional data) and that repeatedly generating 200 weights is necessary per institution. In the experiment, the estimated area under the receiver operating characteristic curve (AUC) and 95% CIs were 81.36% (79.37%-83.36%) and 81.95% (80.03%-83.87%) in the centralized model and weight-based integrated model, respectively. The proportional overlap of the CIs for AUC in both the weight-based integrated model and the centralized model was approximately 1.70, and that of overlap of the 11 estimated odds ratios was over 1, except for 1 case.

In the experiment where real multi-institutional data were used, our model showed similar results to the centralized model without iterative communication between institutions. In addition, our weight-based integrated model provided a weighted average model by integrating 10 models overfitted or underfitted, compared with the centralized model. The proposed weight-based integrated model is expected to provide an efficient distributed research approach as it increases the generalizability of the model and does not require iterative communication.

Multi-institutional studies have many advantages in that they can increase the generalizability and reproducibility of clinical results. Studies based on geographically and demographically diverse populations using multi-institutional data are increasingly common and necessary to improve generalizability [

Data accumulated in multiple institutions should be shared to realize the potential of big data in medicine. Big biomedical data networks, such as the patient-centered Scalable National Network for Effectiveness Research clinical data research network [

However, the availability of such large volumes of data is associated with privacy issues. Privacy must be protected when sensitive biomedical data are being used for research purposes, and this requires implementing several safeguards [

Among the methods that use distributed computing, federated learning has recently been proposed as a promising solution. It is a distributed computing method wherein several clients collaboratively train a shared global model with the coordination of a central server [

The noniterative approach aggregates the intermediate results required for building a global model without requiring an iterative process. A typical method is meta-analysis [

In this study, we focus on developing a noniterative algorithm that can construct predictive models from different sources without sharing horizontally partitioned data, where patient-level data are divided for the same medical information. The proposed model, referred to as the weight-based integrated model, is a predictive model reflecting the characteristics of various populations in multiple institutions without compromising privacy. We evaluated the proposed weight-based integrated model based on 2 aspects: (1) To confirm whether it provides a weighted average model with all characteristics of multi-institutional data, we evaluated its similarity with the centralized model that was developed by combining all institutional data, compared with models from different institutions, in terms of the predictive power and parameter estimation. (2) To confirm whether the proposed weight-based integrated model improves the average predictive performance by building a predictive model with generalizability, we compared the predictive power of the weight-based integrated model with that of the central model, as well as the models of each institution that were used to build weight-based integrated model, through external validation.

The proposed weight-based integrated model involves a 4-step process (_{k}_{k}

(A) Overall process of the weight-based integrated model. (B) Step 3 of the weight-based integrated model showing the process for calculating the weight using the log loss as a criterion to measure the model performance in the logistic regression model.

Randomly split the _{k}^{(1)} with size (_{k}x^{(2)} with size (_{k}^{(1)} is used to estimate any predictive model ^{(2)} is used to measure the predictive performance of the estimated model ^{(1)}. The data set (^{(1)}, ^{(2)}) is generated _{k}^{(1)}, ^{(2)}) of _{k}

_{k}_{,}estimated using

In the

where _{i}^{T}x_{i}^{T}_{i}_{i}

To make the weight larger as the loss becomes smaller, we define

The _{k}

where _{k}_{k}_{k}

The weight calculated by the weight-based integrated model is determined by 2 factors: the data size of the party (ie, the ratio of data size to central data) and how well the model of the party fits into the data of the other parties (ie, the goodness of fit to all parties of the model from each party). In case of a party

The parameters of the model can be also estimated based on weights from the weight-based integrated model process. In step 3, the models and weights of

We performed 3 simulations. The first simulation aimed to validate the optimal number of repetitions of the weight. The second and third simulations were performed to show the features of the weight calculated using the weight-based integrated model and to compare with other weighting methods. For all simulations, the standard logistic regression model was used, and 5 features were set. Three features were sampled from binomial (1, 0.5), and 2 features were sampled from normal (0, 1). The outcome was generated from the binomial (1,

In the first simulation, to set an optimal

The second simulation was performed to confirm the change pattern of the weights by adjusting 2 factors: the data size and the goodness of fit of the model from each party. In this simulation, we considered 2 scenarios. In the first scenario, we generated 3 parties (A, B, and C) with data sizes of 1000. One of the 3 parties was generated with a biased feature by adjusting the parameters for sampling. All 6 parameters of parties A and B were set the same. By setting 5 conditions of parameters, from parameter 1 to parameter 5, the biased degree of party C was increased as it was adjusted from parameter 1 to parameter 5. All 6 parameters of parties A and B were set equal to 1 at 5 conditions, and the parameters of party C were set to 1 at the condition of parameter 1, 0.5 at the condition of parameter 2, –0.5 at the condition of parameter 3, –1 at the condition of parameter 4, and –2 at the condition of parameter 5. That is, under the same data size, the change degree of the weights was confirmed by gradually deteriorating the goodness of fit for the entire data of the biased party C. In the second scenario, after setting one of the 3 parties to be biased, we changed the condition of data size to check the change degree of the weights according to the data size. The 6 parameters of parties A and B were set to 1, and all of party C were set to –2.

In the third simulation, we compared the weight of the weight-based integrated model with other comparable weighting methods to show the unique characteristics of the weight-based integrated model. This simulation aims to confirm to what extent the predictive performance of the integrated model using each weighting method is similar to that of the centralized model. We referred to an approach [_{k}_{k}

We performed 200 simulations under the same conditions. Four parties (A, B, C, and D) were constructed to build a predictive model, and another 4 validation parties were constructed to measure predictive performance. In addition, we assumed 2 scenarios, similar to the second simulation, to show the characteristics of each weight. While adjusting the data characteristics of parties under the same data sizes, and data sizes of parties under the same data characteristics, we observed the change patterns of weights and predictive performance of each weighting method. In the first scenario, the data sizes of the 4 parties were all set to 500. The 6 parameters, [_{0}, _{1}, _{2}, _{3}, _{4}, _{5}], of parties A and B were set to [0, 2, 2, 2, 2, 2], and the data characteristics of parties C and D were adjusted under the following 3 conditions: (1) 6 parameters—[0, 2, 2, 2, 2, 2], outcome generation: binomial (1,

The data sizes of the 4 validation parties were all fixed at 500, and the data characteristics were the same as each condition of the first and the second scenarios. For example, the parameters of the 4 validation parties for condition (1) of the first scenario were set to [0, 2, 2, 2, 2, 2] in the same manner as parties A, B, C, and D. The average area under the receiver operating characteristic (ROC) curve (AUC) was measured for 4 validation parties to compare the similarity of the performance of each weighting method with that of the centralized model.

We used the electronic intensive care unit (eICU) Collaborative Research Database [

The model to be applied to the weight-based integrated model used a logistic regression model to predict mortality after ICU admission. As features, 27 variables included in the Acute Physiology, Age, and Chronic Health Evaluation (APACHE) classification system were considered. The APACHE score is a severity-of-disease classification system [

We selected 10 hospitals with a total of 2845 ICU stays, out of 208 hospitals with a total of 200,859 ICU stays, as our horizontally partitioned data set (

Selection process for hospitals and intensive care unit (ICU) stays.

When developing a predictive model, the number of events compared with the number of predictors is a key factor to determine the performance of the logistic regression model [

The logistic regression model was used for the simulation data, whereas the firth logistic regression model was used for the real data. To calculate the loss of 2 logistic models, we proceeded according to the process detailed in ^{(1)} to ^{(2)} was 3:1 for all simulations. In addition, in real data with low event frequency, ^{(1)} and ^{(2)} were generated at a 1:1 ratio for both dead and alive cases to build a more stable model in ^{(1)}.

To evaluate the weight-based integrated model, we compared the results of the weight-based integrated model and the centralized model using 10 hospitals from the eICU database, in terms of the ROC curve, AUC, and estimated OR, on the 11 features. In addition, we used the Hosmer–Lemeshow test [

The comparison of AUCs and ORs between the 2 models was evaluated based on the proportion of overlap of the 95% CIs. The proportion of overlap was defined as the ratio of overlap of two 95% CIs in the margin of error, which is the half-width of the 95% CI of the longer length. If a CI is remarkably short and is included in the other CI to be compared, then the proportion of overlap calculated based on the shorter CI is 2, which is a perfect match between the 2 CIs, regardless of the value of the longer CI. Therefore, the proportion of overlap was calculated based on the longer CI for a more conservative evaluation criterion. For the independent group

Based on the results of OR estimation for 11 features, we compared the results of our weight-based integrated model and conventional meta-analysis (for a fixed effect model using the inverse of the variance of the effect estimate as a weight). The meta-analysis is similar to the weight-based integrated model as the OR of a multi-institution is estimated by setting institution-specific weights and averaging the OR of each institution based on the weights, although the method of weight calculation of the meta-analysis varies from the proposed weight-based integrated model. We compared the proportional overlap of 95% CI and the relative bias of point estimates for the centralized model between the weight-based integrated model and the meta-analysis.

To perform external validation, we selected the top 5 hospitals as the external validation hospitals (ie, those with a high mortality rate and less than 90% ICU stay rate with all 27 features missing) after selecting 10 hospitals for the central data. By summarizing the AUC as a result of external validation, we confirmed whether the predictive performance on each external validation hospital in the weight-based integrated model is similar to that of the centralized model. We also evaluated whether the weight-based integrated model ultimately improves the average predictive performance when compared with a model of a single hospital through an average AUC on 5 external validations. In addition, the 3 weighting methods (ie, CS-Avg, n-Avg, and Avg) were applied to external validation and compared with the weight-based integrated model.

The simulation studies and experiments with real horizontally partitioned data were performed using R 3.6.0 (R Foundation for Statistical Computing).

In simulation 1, to propose optimal repetitions

Weights of 3 parties according to the number of repetitions for sizes of 200, 400, 600, 800, and 1000. Vertical lines represent 200 repetitions.

To confirm the characteristics of the weights calculated using the weight-based integrated model, party C, among the 3 parties, was considered as a biased party.

Change pattern of weights according to goodness of fit for central data (scenario 1 of simulation 2), and adjusted parameters for the 5 features of parties A, B, and C with size 1000.

As shown in the results of

Change pattern of weights according to the ratio of data size to central data (scenario 2 of simulation 2), adjusted data sizes of party C, and ratios of data size to centralized data for parties A, B, and C.

These two results of simulation 2 show that the weights of the weight-based integrated model consider not only the goodness of fit for the central data but also the ratio of data size to the central data.

The results for the first scenario are shown in

The results for the second scenario are summarized in

A total of 2845 ICU stays (dead: 525, alive: 2320) were arranged from 10 hospitals. Among the 2845 ICU stays, the total of ^{(1)} of the entire hospital was 1430 ICU stays, and the total of ^{(2)} was 1415 ICU stays (refer to

The 200 log loss values for the total ^{(2)} (n=1415) of each hospital model and the final weights of each hospital model were calculated from 200 repetitions (

AUC, log loss, and weights for 10 models of each institution (N=2845).

Hospital number | n/N (%) | AUC^{a} (95% CI) |
Log loss from 200 repetitions | Weight | |

Median | (Min, Max) | ||||

1 | 510/2845 (17.93) | 83.81% (79.99%-87.63%) | 575.18 | (535.45, 668.13) | 0.1188 |

2 | 387/2845 (13.60) | 82.14% (76.82%-87.47%) | 577.40 | (536.59, 754.68) | 0.1181 |

3 | 268/2845 (9.42) | 86.67% (81.57%-91.78%) | 616.63 | (547.65, 755.15) | 0.1109 |

4 | 338/2845 (11.88) | 86.48% (81.43%-91.53%) | 617.14 | (552.61, 787.62) | 0.1109 |

5 | 231/2845 (8.12) | 86.29% (80.19%-92.4%) | 723.90 | (572.31, 1814) | 0.0929 |

6 | 316/2845 (11.11) | 80.93% (74.02%-87.83%) | 626.65 | (539.71, 978.16) | 0.1076 |

7 | 308/2845 (10.83) | 85.95% (78.23%-93.67%) | 665.89 | (561.92, 1071.16) | 0.1024 |

8 | 197/2845 (6.92) | 83.81% (75.88%-91.73%) | 712.29 | (569.31, 7280.35) | 0.0912 |

9 | 165/2845 (5.79) | 86.63% (79.2%-94.05%) | 758.66 | (566.39, 1774.99) | 0.0890 |

10 | 125/2845 (4.39) | 92% (86.66%-97.34%) | 1008.64 | (634.35, 13,722.49) | 0.0583 |

^{a}AUC: area under the receiver operating characteristic curve.

The Hosmer–Lemeshow goodness-of-fit test demonstrated that the weight-based integrated model and the centralized model fit the central data well, and the 10 models of each hospital fit the data of each hospital well (all

Area under the receiver operating characteristic curve (AUC), log loss from 200 repetitions, and weights. WIM: weight-based integrated model.

A total of 535 ICU stays were selected as the 5 external validation hospitals. The frequency and rate of mortality of external validation hospitals 1, 2, 3, 4, and 5 were 20/155 (12.9%), 19/67 (28.36%), 24/226 (10.62%), 11/47 (23.4%), and 8/40 (20%), respectively.

Results of AUC of external validation for the centralized model, the WIM, and 10 models of each hospital (error bar: 95% CI). Black, dark gray, and light gray indicate WIM, centralized model, and 10 models of each hospital, respectively. AUC: area under the receiver operating characteristic curve; WIM: weight-based integrated model.

Comparison of estimated OR and 95% CI on 11 features in the firth logistic regression model: (A) features with OR < 1 and (B) features with OR > 1. The numbers on the right sides of the figures are the proportional overlap of 95% CI of OR between the WIM and the centralized model. AUC: area under the receiver operating characteristic curve; BUN: blood urea nitrogen; FiO2: fraction of inspired oxygen; GCS: Glasgow Coma Scale; OR: odds ratio; PaO2: partial pressure of oxygen; pCO2: partial pressure of carbon dioxide; PR: pulse rate; WIM: weight-based integrated model.

As a result of the comparison with the meta-analysis, depending on the feature, the degree of similarity to the centralized model was slightly different between the weight-based integrated model and the meta-analysis in terms of the proportional overlap of 95% CI and relative bias (

The proposed model (the weight-based integrated model) was developed to build an integrated predictive model from horizontally partitioned data without requiring physical data sharing. The weight-based integrated model is an algorithm that does not require an iterative process and can extend the model to be applied by introducing the concept of a flexible weight of a partition model. Unlike previous methodologies of building a model of central data under privacy-preserving conditions, the proposed model has the following novelties.

First, the weight-based integrated model does not require iterative communication to construct a model that approximates the centralized model. The methods that use distributed computing require an iterative exchange of information between the institutions and the central server, which is time consuming and labor intensive in practice [

Second, the weight of the weight-based integrated model is a flexible weight derived from 2 factors, data size and the goodness of fit of each party’s model to the entire data (

Third, the weight-based integrated model is a flexible algorithm in terms of scalability of the model to be applied. As the proposed model builds each partition model independently and then integrates them based on the weight, it only needs to change the form of parameters in step 2 and the loss function in step 3, depending on the model.

We evaluated the validity of the weight-based integrated model in terms of predictive power and parameter estimation, compared with the centralized model. Experimental results using real horizontally partitioned data demonstrated that the weight-based integrated model provides a close approximation to the centralized model and improves the average predictive performance.

In terms of predictive power, the weight-based integrated model was substantially similar to the centralized model based on the results of the ROC curve and AUC. The weight-based integrated model provided a weighted average model by integrating each partition model overfitted or underfitted, compared with the centralized model (

In terms of parameter estimation, based on the results of the proportional overlap (0.5 or less indicates a significant difference at a significance level of .05; 2 indicates two CIs overlapping completely) for 95% CI of OR (

The results of comparison with the meta-analysis in experiments using real data indicate that, for the OR estimates of 4 out of 11 features, the relative biases of the weight-based integrated model were slightly less than those of the meta-analysis. The weight-based integrated model generally showed similar results to the meta-analysis in terms of estimation of ORs. However, depending on the features, owing to the difference in weight calculation between the meta-analysis and the weight-based integrated model, there were differences in proportional overlap of 95% CI and relative bias. The weight of the meta-analysis has institution-specific characteristics. However, as it is adjusted based on the variance of an estimator of OR, the different weights are generated even for the same institution depending on which feature’s OR is estimated. By contrast, as the weights in our proposed weight-based integrated model are assigned to the model of each institution, even if the features to be estimated are different, the same weight is given to the same institution. Although the weight of the meta-analysis has feature-specific characteristics more than the weight of the weight-based integrated model, it does not represent the weight for a model of an institution unlike the weight-based integrated model. Therefore, it cannot be regarded as a weight that encompasses the purpose of building a predictive model.

When applying the weight-based integrated model, it is necessary to consider the following: To calculate the weight of each institution in the weight-based integrated model, the data of each institution is divided into ^{(1)}, for building the model of each institution, and ^{(2)},for measuring the predictive performances of the models of all institutions. If the data size (especially the frequency of outcome of interest) of an institution is insufficient, the model of the institution generated by ^{(1)} will be unstable, and it will be difficult to accurately calculate the predictive performance from ^{(2)}. Therefore, the data size of each institution should be sufficient to divide them into ^{(1)} and ^{(2)}. In addition, based on the results of the external validation, the predictive performances of each of the 5 external validation hospitals were better in the model of single hospitals, compared with those of the weight-based integrated model. In other words, the weight-based integrated model may not be a good option for the purpose of improving the predictive performance of a specific hospital (of the 5 hospitals). By contrast, as the purpose for improving the average predictive performance of the 5 hospitals, the weight-based integrated model can provide a robust unified model. In our experiment using real data, the weight-based integrated model showed the best average predictive performance on 5 external validation hospitals. However, there may be cases where the weight-based integrated model does not show the best average predictive performance. For example, when a relatively heterogeneous model among the hospitals included in the weight-based integrated model exists, and the hospital exhibits heterogeneous characteristics toward all external hospitals, if the predictive performance of the model of the heterogeneous hospital in all external validation hospitals is low, the average predictive performance of the weight-based integrated model may be poor. As the weight-based integrated model averages the models of each hospital based on the weight, the overall prediction performance may be low owing to the inclusion of a heterogeneous hospital with poor predictive performance for external validation hospitals, although it is given a small weight in the weight-based integrated model. To avoid this case, it is necessary to form hospitals of the weight-based integrated model to ensure that the overall characteristics of the hospitals in which the weight-based integrated model will be applied are evenly reflected.

The weight-based integrated model is a similar algorithm to the MCCG [

We demonstrated the characteristics of the weight of the weight-based integrated model through comparative analysis with other comparable weighting methods (CS-Avg, n-Avg, and Avg) [

In the weight-based integrated model, the weights were adjusted as the data characteristics of the parties changed under the same data size, and the weights were adjusted as the data sizes of the parties changed under the same data characteristics. By contrast, Avg always assigned a fixed weight that does not reflect the different characteristics and data sizes of each party, and n-Avg assigned a weight that reflects only the change in the data size of each party. In addition, CS-Avg did not reflect the change in data size, but rather reflected the change in data characteristics between parties. Because CS-Avg assigns a weight of 0 to a party with the lowest performance to other parties, the party with a weight of 0 was not considered in the model. Therefore, compared with other weights, the predictive performance of CS-Avg was the most different from that of the centralized model. The weight of the weight-based integrated model distinguished from other weights reflects the characteristics of each party in the central data in terms of data size and data characteristics of each party. The weight-based integrated model with these characteristics can build a model that shows similar predictive performance as the centralized model, compared with other weighting methods.

In our experiment using real data, there were few differences in the results of external validation between the weight-based integrated model and other weighting methods as the weights assigned to the 10 hospitals differed only slightly for each weighting method (

It was mentioned that the weight-based integrated model is a model without an iterative process as the novelty. However, we did not evaluate its efficiency due to the absence of iterative processes in the real distributed environment. In addition, this study verified the proposed method using 2 logistic regression models, and we did not confirm the validity of the weight-based integrated model by applying other models. As shown in the results of the estimated OR for bilirubin in

In this study, we developed a weight-based integrated model, which can build an integrated predictive model with noniterative communication between institutions. The weight-based integrated model, which uses the concept of weights for each institution, is a privacy-protecting analytic method that can reduce the burden of distributed computing and improve the average predictive performance of external validation institutions. The proposed weight-based integrated model can provide an efficient distributed research algorithm to improve the usage of multi-institutional data.

The frequency and rate of events for each of total, Z(1) and Z(2) in 10 hospitals.

Estimated OR in the centralized model, the weight-based integrated model, and 10 models of each hospital in experiments using real data.

Hosmer-Lemeshow goodness-of-fit tests to assess the calibration of the weight-based integrated model and centralized model for central data, and the 10 models of each hospital.

Average AUC for 5 external validation hospitals and AUC (95% CI) of each external validation hospital in the centralized model, the weight-based integrated model, and 10 models of each of the 10 hospitals.

Comparison results of the OR (95% CI) of 11 features between the weight-based integrated model and the meta-analysis.

Results of the simulation study for comparison with other weighting methods according to the change of data characteristics under the same data size.

Results of the simulation study for comparison with other weighting methods according to the change of data size under the same data characteristics.

Results of comparative analysis of external validation by the weighting methods using the eICU data.

Acute Physiology, Age, and Chronic Health Evaluation

area under the receiver operating characteristic curve

electronic intensive care unit

Grid Binary LOgistic Regression

intensive care unit

multicenter collaboration gateway

odds ratio

receiver operating characteristic

This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number HI19C1015).

None declared.