TY  - JOUR
AU  - Sumsion, Daniel
AU  - Davis, Elijah
AU  - Fernandes, Marta
AU  - Wei, Ruoqi
AU  - Milde, Rebecca
AU  - Veltink, Jet Malou
AU  - Kong, Wan-Yee
AU  - Xiong, Yiwen
AU  - Rao, Samvrit
AU  - Westover, Tara
AU  - Petersen, Lydia
AU  - Turley, Niels
AU  - Singh, Arjun
AU  - Buss, Stephanie
AU  - Mukerji, Shibani
AU  - Zafar, Sahar
AU  - Das, Sudeshna
AU  - Junior, Valdery Moura
AU  - Ghanta, Manohar
AU  - Gupta, Aditya
AU  - Kim, Jennifer
AU  - Stone, Katie
AU  - Mignot, Emmanuel
AU  - Hwang, Dennis
AU  - Trotti, Lynn Marie
AU  - Clifford, Gari D
AU  - Katwa, Umakanth
AU  - Thomas, Robert
AU  - Westover, M Brandon
AU  - Sun, Haoqi
PY  - 2025
DA  - 2025/4/10
TI  - Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study
JO  - JMIR Med Inform
SP  - e64113
VL  - 13
KW  - electronic health record
KW  - machine learning
KW  - artificial intelligence
KW  - phenotype
KW  - congestive heart failure
KW  - medication
KW  - claims database
KW  - International Classification of Diseases
KW  - effectiveness
KW  - natural language processing
KW  - model performance
KW  - logistic regression
KW  - validity
AB  - Background: Congestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (using International Classification of Diseases [ICD] codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites. Objective: We seek to accurately classify the diagnosis of CHF based on structured and unstructured data from each patient, including medications, ICD codes, and information extracted through NLP of notes left by providers, by comparing the effectiveness of several machine learning models. Methods: We developed an NLP model to identify CHF from medical records using electronic health records (EHRs) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center; from 2010 to 2023), with 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forests, and RoBERTa models. We measured model performance using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other, and an overall estimated error was calculated using a completely random sample from both hospitals. Results: The average age of the patients was 66.7 (SD 17.2) years; 978 (54.3%) out of 1821 patients were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes, with an AUROC of 0.968 (95% CI 0.940-0.982) and an AUPRC of 0.921 (95% CI 0.835-0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample was 1.6%. The model also showed high external validity from training on Mass General Hospital data and testing on Beth Israel Deaconess Medical Center data (AUROC 0.927, 95% CI 0.908-0.944) and vice versa (AUROC 0.968, 95% CI 0.957-0.976). Conclusions: The proposed EHR-based phenotyping model for CHF achieved excellent performance, external validity, and generalization across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms. 
SN  - 2291-9694
UR  - https://medinform.jmir.org/2025/1/e64113
UR  - https://doi.org/10.2196/64113
UR  - http://www.ncbi.nlm.nih.gov/pubmed/40208662
DO  - 10.2196/64113
ID  - info:doi/10.2196/64113
ER  -