%0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e65371 %T Improving Systematic Review Updates With Natural Language Processing Through Abstract Component Classification and Selection: Algorithm Development and Validation %A Hasegawa,Tatsuki %A Kizaki,Hayato %A Ikegami,Keisho %A Imai,Shungo %A Yanagisawa,Yuki %A Yada,Shuntaro %A Aramaki,Eiji %A Hori,Satoko %+ Division of Drug Informatics, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan, 81 3 5400 2650, satokoh@keio.jp %K systematic review %K natural language processing %K guideline updates %K bidirectional encoder representations from transformer %K screening model %K literature %K efficiency %K updating systematic reviews %K language model %D 2025 %7 27.3.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: A challenge in updating systematic reviews is the workload in screening the articles. Many screening models using natural language processing technology have been implemented to scrutinize articles based on titles and abstracts. While these approaches show promise, traditional models typically treat abstracts as uniform text. We hypothesize that selective training on specific abstract components could enhance model performance for systematic review screening. Objective: We evaluated the efficacy of a novel screening model that selects specific components from abstracts to improve performance and developed an automatic systematic review update model using an abstract component classifier to categorize abstracts based on their components. Methods: A screening model was created based on the included and excluded articles in the existing systematic review and used as the scheme for the automatic update of the systematic review. A prior publication was selected for the systematic review, and articles included or excluded in the articles screening process were used as training data. The titles and abstracts were classified into 5 categories (Title, Introduction, Methods, Results, and Conclusion). Thirty-one component-composition datasets were created by combining 5 component datasets. We implemented 31 screening models using the component-composition datasets and compared their performances. Comparisons were conducted using 3 pretrained models: Bidirectional Encoder Representations from Transformer (BERT), BioLinkBERT, and BioM- Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Moreover, to automate the component selection of abstracts, we developed the Abstract Component Classifier Model and created component datasets using this classifier model classification. Using the component datasets classified using the Abstract Component Classifier Model, we created 10 component-composition datasets used by the top 10 screening models with the highest performance when implementing screening models using the component datasets that were classified manually. Ten screening models were implemented using these datasets, and their performances were compared with those of models developed using manually classified component-composition datasets. The primary evaluation metric was the F10-Score weighted by the recall. Results: A total of 256 included articles and 1261 excluded articles were extracted from the selected systematic review. In the screening models implemented using manually classified datasets, the performance of some surpassed that of models trained on all components (BERT: 9 models, BioLinkBERT: 6 models, and BioM-ELECTRA: 21 models). In models implemented using datasets classified by the Abstract Component Classifier Model, the performances of some models (BERT: 7 models and BioM-ELECTRA: 9 models) surpassed that of the models trained on all components. These models achieved an 88.6% reduction in manual screening workload while maintaining high recall (0.93). Conclusions: Component selection from the title and abstract can improve the performance of screening models and substantially reduce the manual screening workload in systematic review updates. Future research should focus on validating this approach across different systematic review domains. %R 10.2196/65371 %U https://medinform.jmir.org/2025/1/e65371 %U https://doi.org/10.2196/65371