TY - JOUR AU - Mast, Nicholas H AU - Oeste, Clara L. AU - Hens, Dries PY - 2025 DA - 2025/3/12 TI - Assessing Total Hip Arthroplasty Outcomes and Generating an Orthopedic Research Outcome Database via a Natural Language Processing Pipeline: Development and Validation Study JO - JMIR Med Inform SP - e64705 VL - 13 KW - total hip arthroplasty KW - THA KW - direct anterior approach KW - electronic health records KW - EHR KW - natural language processing KW - NLP KW - complication rate KW - single-surgeon registry KW - hip arthroplasty KW - orthopedic KW - validation KW - surgeon KW - outpatient visits KW - hospitalizations KW - surgery AB - Background: Processing data from electronic health records (EHRs) to build research-grade databases is a lengthy and expensive process. Modern arthroplasty practice commonly uses multiple sites of care, including clinics and ambulatory care centers. However, most private data systems prevent obtaining usable insights for clinical practice. Objective: This study aims to create an automated natural language processing (NLP) pipeline for extracting clinical concepts from EHRs related to orthopedic outpatient visits, hospitalizations, and surgeries in a multicenter, single-surgeon practice. The pipeline was also used to assess therapies and complications after total hip arthroplasty (THA). Methods: EHRs of 1290 patients undergoing primary THA from January 1, 2012 to December 31, 2019 (operated and followed by the same surgeon) were processed using artificial intelligence (AI)–based models (NLP and machine learning). In addition, 3 independent medical reviewers generated a gold standard using 100 randomly selected EHRs. The algorithm processed the entire database from different EHR systems, generating an aggregated clinical data warehouse. An additional manual control arm was used for data quality control. Results: The algorithm was as accurate as human reviewers (0.95 vs 0.94; P=.01), achieving a database-wide average F1-score of 0.92 (SD 0.09; range 0.67‐0.99), validating its use as an automated data extraction tool. During the first year after direct anterior THA, 92.1% (1188/1290) of our population had a complication-free recovery. In 7.9% (102/1290) of cases where surgery or recovery was not uneventful, lateral femoral cutaneous nerve sensitivity (47/1290, 3.6%), intraoperative fractures (13/1290, 1%), and hematoma (9/1290, 0.7%) were the most common complications. Conclusions: Algorithm evaluation of this dataset accurately represented key clinical information swiftly, compared with human reviewers. This technology may provide substantial value for future surgeon practice and patient counseling. Furthermore, the low early complication rate of direct anterior THA in this surgeon’s hands was supported by the dataset, which included data from all treated patients in a multicenter practice. SN - 2291-9694 UR - https://medinform.jmir.org/2025/1/e64705 UR - https://doi.org/10.2196/64705 DO - 10.2196/64705 ID - info:doi/10.2196/64705 ER -