Published on in Vol 8 , No 3 (2020) :March

Clinical Text Data in Machine Learning: Systematic Review

Clinical Text Data in Machine Learning: Systematic Review

Clinical Text Data in Machine Learning: Systematic Review

Authors of this article:

Irena Spasic 1 Author Orcid Image ;   Goran Nenadic 2 Author Orcid Image


  1. Goto T, Hara K, Hashimoto K, Soeno S, Shirakawa T, Sonoo T, Nakamura K. Validation of chief complaints, medical history, medications, and physician diagnoses structured with an integrated emergency department information system in Japan: the Next Stage ER system. Acute Medicine & Surgery 2020;7(1) View
  2. Hahn U, Oleynik M. Medical Information Extraction in the Age of Deep Learning. Yearbook of Medical Informatics 2020;29(01):208 View
  3. Heo T, Kim Y, Choi J, Jeong Y, Seo S, Lee J, Jeon J, Kim C. Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI. Journal of Personalized Medicine 2020;10(4):286 View
  4. Silva J, Almeida J, Matos S. Extraction of Family History Information From Clinical Notes: Deep Learning and Heuristics Approach. JMIR Medical Informatics 2020;8(12):e22898 View
  5. Greco M, Caruso P, Cecconi M. Artificial Intelligence in the Intensive Care Unit. Seminars in Respiratory and Critical Care Medicine 2021;42(01):002 View
  6. Letourneau-Guillon L, Camirand D, Guilbert F, Forghani R. Artificial Intelligence Applications for Workflow, Process Optimization and Predictive Analytics. Neuroimaging Clinics of North America 2020;30(4):e1 View
  7. Shen F, Liu S, Fu S, Wang Y, Henry S, Uzuner O, Liu H. Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition. JMIR Medical Informatics 2021;9(1):e24008 View
  8. Spasic I, Button K. Patient Triage by Topic Modeling of Referral Letters: Feasibility Study. JMIR Medical Informatics 2020;8(11):e21252 View
  9. Jung D, Choi Y. Systematic Review of Machine Learning Applications in Mining: Exploration, Exploitation, and Reclamation. Minerals 2021;11(2):148 View
  10. Bitterman D, Miller T, Mak R, Savova G. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. International Journal of Radiation Oncology*Biology*Physics 2021;110(3):641 View
  11. Setchi R, Spasić I, Morgan J, Harrison C, Corken R. Artificial intelligence for patent prior art searching. World Patent Information 2021;64:102021 View
  12. Balcombe L, De Leo D. Digital Mental Health Challenges and the Horizon Ahead for Solutions. JMIR Mental Health 2021;8(3):e26811 View
  13. Dipaola F, Shiffer D, Gatti M, Menè R, Solbiati M, Furlan R. Machine Learning and Syncope Management in the ED: The Future Is Coming. Medicina 2021;57(4):351 View
  14. White-Dzuro C, Schultz J, Ye C, Coco J, Myers J, Shackelford C, Rosenbloom S, Fabbri D. Extracting Medical Information from Paper COVID-19 Assessment Forms. Applied Clinical Informatics 2021;12(01):170 View
  15. D’Amore B, Smolinski-Zhao S, Daye D, Uppot R. Role of Machine Learning and Artificial Intelligence in Interventional Oncology. Current Oncology Reports 2021;23(6) View
  16. Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Medical Informatics and Decision Making 2021;21(1) View
  17. Li J, Zhou Y, Jiang X, Natarajan K, Pakhomov S, Liu H, Xu H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. Journal of the American Medical Informatics Association 2021 View
  18. Percha B. Modern Clinical Text Mining: A Guide and Review. Annual Review of Biomedical Data Science 2021;4(1):165 View
  19. Shorten C, Khoshgoftaar T, Furht B. Text Data Augmentation for Deep Learning. Journal of Big Data 2021;8(1) View
  20. Singleton J, Li C, Akpunonu P, Abner E, Kucharska-Newton A. Using natural language processing to identify opioid use disorder in electronic health record data. International Journal of Medical Informatics 2023;170:104963 View
  21. Murphy R, Klopotowska J, de Keizer N, Jager K, Leopold J, Dongelmans D, Abu-Hanna A, Schut M, Qamar U. Adverse drug event detection using natural language processing: A scoping review of supervised learning methods. PLOS ONE 2023;18(1):e0279842 View
  22. Yan M, Gustad L, Nytrø Ø. Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. Journal of the American Medical Informatics Association 2022;29(3):559 View
  23. Vijayakumar S, P. S. N. Use of Natural Language Processing in Software Requirements Prioritization – A Systematic Literature Review. International Journal of Applied Engineering and Management Letters 2021:152 View
  24. Chopard D, Treder M, Corcoran P, Ahmed N, Johnson C, Busse M, Spasic I. Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach. JMIR Medical Informatics 2021;9(12):e28632 View
  25. Olthof A, van Ooijen P, Cornelissen L. Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance. Journal of Medical Systems 2021;45(10) View
  26. Eresen A. Diagnosis of meniscal tears through automated interpretation of medical reports via machine learning. Academic Radiology 2022;29(4):488 View
  27. Herrero González A. El valor de los datos y su aplicabilidad en el Sector Sanitario. Revista Española de Medicina Nuclear e Imagen Molecular 2022;41(1):39 View
  28. Ebbehoj A, Thunbo M, Andersen O, Glindtvad M, Hulman A, Chua Chin Heng M. Transfer learning for non-image data in clinical research: A scoping review. PLOS Digital Health 2022;1(2):e0000014 View
  29. Seong D, Choi Y, Shin S, Yi B. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Medical Informatics and Decision Making 2023;23(1) View
  30. Huang T, Liu S, Huang J, Li J, Liu G, Zhang W, Wang X. Prediction and associated factors of hypothyroidism in systemic lupus erythematosus: a cross-sectional study based on multiple machine learning algorithms. Current Medical Research and Opinion 2022;38(2):229 View
  31. Hudon A, Beaudoin M, Phraxayavong K, Dellazizzo L, Potvin S, Dumais A. Implementation of a machine learning algorithm for automated thematic annotations in avatar: A linear support vector classifier approach. Health Informatics Journal 2022;28(4):146045822211424 View
  32. Chang E. A vector-based semantic relatedness measure using multiple relations within SNOMED CT and UMLS. Journal of Biomedical Informatics 2022;131:104118 View
  33. Xu X, Qin L, Ding L, Wang C, Wang M, Li Z, Li J. Identifying stroke diagnosis-related features from medical imaging reports to improve clinical decision-making support. BMC Medical Informatics and Decision Making 2022;22(1) View
  34. Xiao W, Jing L, Xu Y, Zheng S, Gan Y, Wen C, Belmonte Fernández Ó. Different Data Mining Approaches Based Medical Text Data. Journal of Healthcare Engineering 2021;2021:1 View
  35. Chiavi D, Haag C, Chan A, Kamm C, Sieber C, Stanikić M, Rodgers S, Pot C, Kesselring J, Salmen A, Rapold I, Calabrese P, Manjaly Z, Gobbi C, Zecca C, Walther S, Stegmayer K, Hoepner R, Puhan M, von Wyl V. The Real-World Experiences of Persons With Multiple Sclerosis During the First COVID-19 Lockdown: Application of Natural Language Processing. JMIR Medical Informatics 2022;10(11):e37945 View
  36. Karystianis G, Adily A, Schofield P, Wand H, Lukmanjaya W, Buchan I, Nenadic G, Butler T. Surveillance of Domestic Violence Using Text Mining Outputs From Australian Police Records. Frontiers in Psychiatry 2022;12 View
  37. Phatak A, Savage D, Ohle R, Smith J, Mago V. Medical Text Simplification Using Reinforcement Learning (TESLEA): Deep Learning–Based Text Simplification Approach. JMIR Medical Informatics 2022;10(11):e38095 View
  38. Houssein E, Mohamed R, Ali A. Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review. IEEE Access 2021;9:140628 View
  39. Seinen T, Fridgeirsson E, Ioannou S, Jeannetot D, John L, Kors J, Markus A, Pera V, Rekkas A, Williams R, Yang C, van Mulligen E, Rijnbeek P. Use of unstructured text in prognostic clinical prediction models: a systematic review. Journal of the American Medical Informatics Association 2022;29(7):1292 View
  40. Burgos‐Gonzalez A, Bryant V, Maciá‐Martinez M, Huerta C. A strategy for assessment and validation of major bleeding cases in a primary health care database in Spain. Pharmacoepidemiology and Drug Safety 2021;30(12):1696 View
  41. Kaplar A, Stošović M, Kaplar A, Brković V, Naumović R, Kovačević A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. International Journal of Medical Informatics 2022;164:104805 View
  42. Han P, Fu S, Kolis J, Hughes R, Hallstrom B, Carvour M, Maradit-Kremers H, Sohn S, Vydiswaran V. Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation. JMIR Medical Informatics 2022;10(8):e38155 View
  43. Filimonov M, Chopard D, Spasić I, Wren J. Simulation and annotation of global acronyms. Bioinformatics 2022;38(11):3136 View
  44. de Oliveira J, da Costa C, Antunes R. Data structuring of electronic health records: a systematic review. Health and Technology 2021;11(6):1219 View
  45. Frei J, Kramer F. German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation. JMIR Formative Research 2023;7:e39077 View
  46. Žunić A, Corcoran P, Spasić I. Aspect-based sentiment analysis with graph convolution over syntactic dependencies. Artificial Intelligence in Medicine 2021;119:102138 View
  47. Cuenca-Zaldívar J, Torrente-Regidor M, Martín-Losada L, Fernández-De-Las-Peñas C, Florencio L, Sousa P, Palacios-Ceña D. Exploring Sentiment and Care Management of Hospitalized Patients During the First Wave of the COVID-19 Pandemic Using Electronic Nursing Health Records: Descriptive Study. JMIR Medical Informatics 2022;10(5):e38308 View
  48. Edrees H, Song W, Syrowatka A, Simona A, Amato M, Bates D. Intelligent Telehealth in Pharmacovigilance: A Future Perspective. Drug Safety 2022;45(5):449 View
  49. Brazeal J, Alekseyenko A, Li H, Fugal M, Kirchoff K, Marsh C, Lewin D, Wu J, Obeid J, Wallace K. Assessing quality and agreement of structured data in automatic versus manual abstraction of the electronic health record for a clinical epidemiology study. Research Methods in Medicine & Health Sciences 2021;2(4):168 View
  50. McDermott M, Nestor B, Szolovits P. Clinical Artificial Intelligence. Clinics in Laboratory Medicine 2023;43(1):29 View
  51. Yakimovich A, Beaugnon A, Huang Y, Ozkirimli E. Labels in a haystack: Approaches beyond supervised learning in biomedical applications. Patterns 2021;2(12):100383 View
  52. Evans C, Dorris H, Kane M, Mervak B, Brice J, Gray B, Moore C. A Natural Language Processing and Machine Learning Approach to Identification of Incidental Radiology Findings in Trauma Patients Discharged from the Emergency Department. Annals of Emergency Medicine 2023;81(3):262 View
  53. Kessler R, Luedtke A. Pragmatic Precision Psychiatry—A New Direction for Optimizing Treatment Selection. JAMA Psychiatry 2021;78(12):1384 View
  54. Quazi S, Saha R, Singh M. Applications of Artificial Intelligence in Healthcare. Journal of Experimental Biology and Agricultural Sciences 2022;10(1):211 View
  55. Corcoran P, Spasić I. Self-Supervised Representation Learning for Geographical Data—A Systematic Literature Review. ISPRS International Journal of Geo-Information 2023;12(2):64 View
  56. Suh H, Tully J, Meineke M, Waterman R, Gabriel R. Identification of Preanesthetic History Elements by a Natural Language Processing Engine. Anesthesia & Analgesia 2022;135(6):1162 View
  57. Chang R, Shing J, Erves J, Du L, Koyama T, Deppen S, Rentuza A, McAfee C, Stroebel C, Cates J, Harnack L, Andrews D, Bramblett R, Hull P. Measurement of provider fidelity to immunization guidelines: a mixed-methods study on the feasibility of documenting patient refusals of the human papillomavirus vaccine. BMC Medical Informatics and Decision Making 2022;22(1) View
  58. Kenei J, Opiyo E. Semantic modeling and visualization of semantic groups of clinical text documents. International Journal of Information Technology 2022;14(5):2585 View
  59. Lin F, Salih O, Scott N, Jameson M, Epstein R. Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer. JCO Clinical Cancer Informatics 2022;(6) View
  60. Fu R, Kundu A, Mitsakakis N, Elton-Marshall T, Wang W, Hill S, Bondy S, Hamilton H, Selby P, Schwartz R, Chaiton M. Machine learning applications in tobacco research: a scoping review. Tobacco Control 2023;32(1):99 View
  61. Ge W, Alabsi H, Jain A, Ye E, Sun H, Fernandes M, Magdamo C, Tesh R, Collens S, Newhouse A, MVR Moura L, Zafar S, Hsu J, Akeju O, Robbins G, Mukerji S, Das S, Westover M. Identifying Patients With Delirium Based on Unstructured Clinical Notes: Observational Study. JMIR Formative Research 2022;6(6):e33834 View
  62. van Buchem M, Neve O, Kant I, Steyerberg E, Boosman H, Hensen E. Analyzing patient experiences using natural language processing: development and validation of the artificial intelligence patient reported experience measure (AI-PREM). BMC Medical Informatics and Decision Making 2022;22(1) View
  63. Hudon A, Beaudoin M, Phraxayavong K, Dellazizzo L, Potvin S, Dumais A. Use of Automated Thematic Annotations for Small Data Sets in a Psychotherapeutic Context: Systematic Review of Machine Learning Algorithms. JMIR Mental Health 2021;8(10):e22651 View
  64. Sabharwal R, Miah S. An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain. Journal of Big Data 2022;9(1) View
  65. Afshar M, Sharma B, Dligach D, Oguss M, Brown R, Chhabra N, Thompson H, Markossian T, Joyce C, Churpek M, Karnik N. Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study. The Lancet Digital Health 2022;4(6):e426 View
  66. Kiser A, Eilbeck K, Ferraro J, Skarda D, Samore M, Bucher B. Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection. JMIR Medical Informatics 2022;10(8):e39057 View
  67. Pethani F, Dunn A. Natural language processing for clinical notes in dentistry: A systematic review. Journal of Biomedical Informatics 2023;138:104282 View
  68. Ahmed H, Traore I, Mamun M, Saad S. Text augmentation using a graph-based approach and clonal selection algorithm. Machine Learning with Applications 2023;11:100452 View
  69. Chen Y, Hao L, Zou V, Hollander Z, Ng R, Isaac K. Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system. BMC Medical Research Methodology 2022;22(1) View
  70. Lacson R, Eskian M, Licaros A, Kapoor N, Khorasani R. Machine Learning Model Drift: Predicting Diagnostic Imaging Follow-Up as a Case Example. Journal of the American College of Radiology 2022;19(10):1162 View
  71. Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Medical Informatics 2022;10(12):e40534 View
  72. Herrero González A. The value of data and its applicability in the Health Sector. Revista Española de Medicina Nuclear e Imagen Molecular (English Edition) 2022;41(1):39 View
  73. Wu H, Wang M, Wu J, Francis F, Chang Y, Shavick A, Dong H, Poon M, Fitzpatrick N, Levine A, Slater L, Handy A, Karwath A, Gkoutos G, Chelala C, Shah A, Stewart R, Collier N, Alex B, Whiteley W, Sudlow C, Roberts A, Dobson R. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. npj Digital Medicine 2022;5(1) View
  74. Humbert-Droz M, Mukherjee P, Gevaert O. Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes. JMIR Medical Informatics 2022;10(3):e32903 View
  75. Peterson J, Plana D, Bitterman D, Johnson S, Aerts H, Kann B. Growth in eligibility criteria content and failure to accrue among National Cancer Institute ( NCI )‐affiliated clinical trials. Cancer Medicine 2023;12(4):4715 View
  76. Lederman A, Lederman R, Verspoor K. Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support. Journal of the American Medical Informatics Association 2022;29(10):1810 View
  77. Li X, Yuan W, Peng D, Mei Q, Wang Y. When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification. BMC Medical Informatics and Decision Making 2021;21(S9) View
  78. Sabharwal R, Miah S, Fosso Wamba S. Extending artificial intelligence research in the clinical domain: a theoretical perspective. Annals of Operations Research 2022 View
  79. Cliffe C, Seyedsalehi A, Vardavoulia K, Bittar A, Velupillai S, Shetty H, Schmidt U, Dutta R. Using natural language processing to extract self-harm and suicidality data from a clinical sample of patients with eating disorders: a retrospective cohort study. BMJ Open 2021;11(12):e053808 View
  80. Lee K, Lee H, Park J, Kim Y, Lee Y. ANNO: A General Annotation Tool for Bilingual Clinical Note Information Extraction. Healthcare Informatics Research 2022;28(1):89 View
  81. Kosowan L, Singer A, Zulkernine F, Zafari H, Nesca M, Muthumuni D. Pan-Canadian Electronic Medical Record Diagnostic and Unstructured Text Data for Capturing PTSD: Retrospective Observational Study. JMIR Medical Informatics 2022;10(12):e41312 View
  82. Afshar M, Sharma B, Dligach D, Oguss M, Brown R, Chhabra N, Thompson H, Markossian T, Joyce C, Churpek M, Karnik N. Substance Misuse Algorithm for Referral to Treatment Using Artificial Intelligence (SMART-AI): Multi-Modal Validation with Interpretation and Bias Assessment. SSRN Electronic Journal 2021 View
  83. Szekér S, Fogarassy G, Vathy-Fogarassy Á. A General Text Mining Method to Extract Echocardiography Measurement Results from Echocardiography Documents. SSRN Electronic Journal 2022 View
  84. Soma Mitra , Dr. Saikat Basu . Remote Sensing Based Land Cover Classification Using Machine Learning and Deep Learning: A Comprehensive Survey. International Journal of Next-Generation Computing 2023 View
  85. Bean D, Kraljevic Z, Shek A, Teo J, Dobson R, Tariq A. Hospital-wide natural language processing summarising the health data of 1 million patients. PLOS Digital Health 2023;2(5):e0000218 View
  86. Bull N, Honan B, Spratt N, Quilty S, Iadanza E. A method for rapid machine learning development for data mining with doctor-in-the-loop. PLOS ONE 2023;18(5):e0284965 View
  87. Toner T, Pancholi R, Miller P, Forster T, Coleman H, Overton I. Strategies and techniques for quality control and semantic enrichment with multimodal data: a case study in colorectal cancer with eHDPrep. GigaScience 2022;12 View

Books/Policy Documents

  1. Liu Z, Zhang J, Hou Y, Zhang X, Li G, Xiang Y. Health Information Processing. View
  2. Chandru A, Seetharam K. Software Engineering Perspectives in Systems. View
  3. Kocbek P, Gosak L, Musović K, Stiglic G. Artificial Intelligence in Medicine. View
  4. Kumar Attar R, Komal . Artificial Intelligence for Innovative Healthcare Informatics. View
  5. Nakonechnyi O, Martsenyuk V, Klos-Witkowska A, Zhehestovska D. Proceedings of Sixth International Congress on Information and Communication Technology. View
  6. Campos R, Jatowt A, Jorge A. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. View