@Article{info:doi/10.2196/68704, author="Remaki, Adam and Ung, Jacques and Pages, Pierre and Wajsburt, Perceval and Liu, Elise and Faure, Guillaume and Petit-Jean, Thomas and Tannier, Xavier and G{\'e}rardin, Christel", title="Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study", journal="JMIR Med Inform", year="2025", month="Apr", day="9", volume="13", pages="e68704", keywords="secondary use of clinical data for research and surveillance; clinical informatics; clinical data warehouse; electronic health record; data science; artificial intelligence; AI; natural language processing; ontologies; classifications; coding; tools; programs and algorithms; immune-mediated inflammatory diseases", abstract="Background: Valuable insights gathered by clinicians during their inquiries and documented in textual reports are often unavailable in the structured data recorded in electronic health records (EHRs). Objective: This study aimed to highlight that mining unstructured textual data with natural language processing techniques complements the available structured data and enables more comprehensive patient phenotyping. A proof-of-concept for patients diagnosed with specific autoimmune diseases is presented, in which the extraction of information on laboratory tests and drug treatments is performed. Methods: We collected EHRs available in the clinical data warehouse of the Greater Paris University Hospitals from 2012 to 2021 for patients hospitalized and diagnosed with 1 of 4 immune-mediated inflammatory diseases: systemic lupus erythematosus, systemic sclerosis, antiphospholipid syndrome, and Takayasu arteritis. Then, we built, trained, and validated natural language processing algorithms on 103 discharge summaries selected from the cohort and annotated by a clinician. Finally, all discharge summaries in the cohort were processed with the algorithms, and the extracted data on laboratory tests and drug treatments were compared with the structured data. Results: Named entity recognition followed by normalization yielded F1-scores of 71.1 (95{\%} CI 63.6-77.8) for the laboratory tests and 89.3 (95{\%} CI 85.9-91.6) for the drugs. Application of the algorithms to 18,604 EHRs increased the detection of antibody results and drug treatments. For instance, among patients in the systemic lupus erythematosus cohort with positive antinuclear antibodies, the rate increased from 18.34{\%} (752/4102) to 71.87{\%} (2949/4102), making the results more consistent with the literature. Conclusions: While challenges remain in standardizing laboratory tests, particularly with abbreviations, this work, based on secondary use of clinical data, demonstrates that automated processing of discharge summaries enriched the information available in structured data and facilitated more comprehensive patient profiling. ", issn="2291-9694", doi="10.2196/68704", url="https://medinform.jmir.org/2025/1/e68704", url="https://doi.org/10.2196/68704" }