An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study

doi:10.2196/55318

Published on 08.Apr.2024 in Vol 12 (2024)

This is a member publication of University of Pittsburgh

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/55318, first published 08.Dec.2023.

AI in healthcare: doctor, computer, brain, medical records, data analysis

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study

Sonish Sivarajkumar¹

; Mark Kelley²

; Alyssa Samolyk-Mazzanti²

; Shyam Visweswaran^{1, 3}

; Yanshan Wang^{1, 2, 3}

Article Authors Cited by (143) Tweetations (4) Metrics

Journals

Fang Y, Ryan P, Weng C. Knowledge-guided generative artificial intelligence for automated taxonomy learning from drug labels. Journal of the American Medical Informatics Association 2024;31(9):2065 View
Nwachukwu B, Varady N, Allen A, Dines J, Altchek D, Williams R, Kunze K. Currently Available Large Language Models Do Not Provide Musculoskeletal Treatment Recommendations That Are Concordant With Evidence‐Based Clinical Practice Guidelines. Arthroscopy 2025;41(2):263 View
Shahriar S, Lund B, Mannuru N, Arshad M, Hayawi K, Bevara R, Mannuru A, Batool L. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency. Applied Sciences 2024;14(17):7782 View
Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. Journal of Medical Internet Research 2024;26:e60501 View
Tong L, Zhang C, Liu R, Yang J, Sun Z. Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis. Journal of Orthopaedic Surgery and Research 2024;19(1) View
Tam T, Sivarajkumar S, Kapoor S, Stolyar A, Polanska K, McCarthy K, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani G, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. npj Digital Medicine 2024;7(1) View
Ronquillo J, Ye J, Gorman D, Lemeshow A, Watt S. Practical Aspects of Using Large Language Models to Screen Abstracts for Cardiovascular Drug Development: Cross-Sectional Study. JMIR Medical Informatics 2024;12:e64143 View
Workman T, Ahmed A, Sheriff H, Raman V, Zhang S, Shao Y, Faselis C, Fonarow G, Zeng-Treitler Q. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Progress in Cardiovascular Diseases 2024;87:44 View
Das M, Senapati A. Co-reference Resolution in Prompt Engineering. Procedia Computer Science 2024;244:194 View
Othman A, Chemnad K, Tlili A, Da T, Wang H, Huang R. Comparative analysis of GPT-4, Gemini, and Ernie as gloss sign language translators in special education. Discover Global Society 2024;2(1) View
Acut D, Malabago N, Malicoban E, Galamiton N, Garcia M. “ChatGPT 4.0 Ghosted Us While Conducting Literature Search:” Modeling the Chatbot’s Generated Non-Existent References Using Regression Analysis. Internet Reference Services Quarterly 2025;29(1):27 View
Cardamone N, Olfson M, Schmutte T, Ungar L, Liu T, Cullen S, Williams N, Marcus S. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study. JMIR Medical Informatics 2025;13:e65454 View
Tarris G, Martin L. Performance assessment of ChatGPT 4, ChatGPT 3.5, Gemini Advanced Pro 1.5 and Bard 2.0 to problem solving in pathology in French language. DIGITAL HEALTH 2025;11 View
Kuerbanjiang W, Peng S, Jiamaliding Y, Yi Y. Performance Evaluation of Large Language Models in Cervical Cancer Management Based on a Standardized Questionnaire: Comparative Study. Journal of Medical Internet Research 2025;27:e63626 View
Geevarghese R, Solomon S, Alexander E, Marinelli B, Chatterjee S, Jain P, Cadley J, Hollingsworth A, Chatterjee A, Ziv E. Utility of a Large Language Model for Extraction of Clinical Findings from Healthcare Data following Lung Ablation: A Feasibility Study. Journal of Vascular and Interventional Radiology 2025;36(4):704 View
Kim S, Schramm S, Adams L, Braren R, Bressem K, Keicher M, Platzek P, Paprottka K, Zimmer C, Hedderich D, Wiestler B. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. npj Digital Medicine 2025;8(1) View
Fung M, Tang E, Wu T, Luk Y, Au I, Liu X, Lee V, Wong C, Wei Z, Cheng W, Tai I, Ho J, Wong J, Lang B, Leung K, Wong Z, Wu J, Wong C. Developing a named entity framework for thyroid cancer staging and risk level classification using large language models. npj Digital Medicine 2025;8(1) View
Valadez-de la Paz N, Vazquez-Lopez J, Hernandez-Lopez A, Aviles-Viñas J, Navarro-Gonzalez J, Reyes-Acosta A, Lopez-Juarez I. Automation Applied to the Collection and Generation of Scientific Literature. Publications 2025;13(1):11 View
Burstein R, Mafuta E, Proctor J. Large language models for analyzing open text in global health surveys: why children are not accessing vaccine services in the Democratic Republic of the Congo. International Health 2025;17(5):843 View
Talay L, Lagesen L, Yip A, Vickers M, Ahuja N. ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis. Healthcare 2025;13(6):647 View
Cao Y, Hu L, Cao X, Peng J. Can large language models facilitate the effective implementation of nursing processes in clinical settings?. BMC Nursing 2025;24(1) View
Lauderdale S, Schmitt R, Wuckovich B, Dalal N, Desai H, Tomlinson S. Effectiveness of generative AI-large language models’ recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model. Frontiers in Psychiatry 2025;16 View
Güvel M, Kıyak Y, Varan H, Sezenöz B, Coşkun Ö, Uluoğlu C. Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation. European Journal of Clinical Pharmacology 2025;81(6):875 View
Lauderdale S, Griffin S, Lahman K, Mbaba E, Tomlinson S. Unveiling Public Stigma for Borderline Personality Disorder: A Comparative Study of Artificial Intelligence and Mental Health Care Providers. Personality and Mental Health 2025;19(2) View
Shen M, Shen Y, Liu F, Jin J. Prompts, privacy, and personalized learning: integrating AI into nursing education—a qualitative study. BMC Nursing 2025;24(1) View
Sumner J, Wang Y, Tan S, Chew E, Wenjun Yip A. Perspectives and Experiences With Large Language Models in Health Care: Survey Study. Journal of Medical Internet Research 2025;27:e67383 View
Hickman C, Pridgen K, Hughes D, Pair L, Holland A. The Role of Artificial Intelligence in Increasing Efficiency, Reducing Errors, and Improving Patient Outcomes in Clinical Practice. Clinical Journal for Nurse Practitioners in Women's Health 2025;2(2):101 View
Elabd N, Rahman Z, Abu Alinnin S, Jahan S, Campos L, Baltatu O. Designing Personalized Multimodal Mnemonics With AI: A Medical Student’s Implementation Tutorial. JMIR Medical Education 2025;11:e67926 View
Hein D, Christie A, Holcomb M, Xie B, Jain A, Vento J, Rakheja N, Shakur A, Christley S, Cowell L, Brugarolas J, Jamieson A, Kapur P. Iterative refinement and goal articulation to optimize large language models for clinical information extraction. npj Digital Medicine 2025;8(1) View
Radi M, Omar N, Kaur W. Syntactic-Guided Chain of Thought for Iterative Implicit and Explicit Target Detection in Aspect-Based Sentiment Analysis. IEEE Access 2025;13:84738 View
Thota D, Alt D, Cole J, Tring V. Prompting Pro Tips! Best Practices for Generating Clinical Narrative Summaries. Military Medicine 2026;191(1-2):e445 View
Miller K, Bedrick S, Lu Q, Wen A, Hersh W, Roberts K, Liu H. Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models. Journal of the American Medical Informatics Association 2025;32(7):1164 View
Fleurence R, Wang X, Bian J, Higashi M, Ayer T, Xu H, Dawoud D, Chhatwal J. A Taxonomy of Generative Artificial Intelligence in Health Economics and Outcomes Research: An ISPOR Working Group Report. Value in Health 2025;28(11):1601 View
Boie S, Glastetter E, Lux M, Balzer F, von Kalle C, Lenz C, Müller U. Evaluating a Chatbot as a Companion for Patients With Breast Cancer: Collaborative Pilot Study. JMIR Cancer 2025;11:e68426 View
Hwang M, Lee K, Lee H. A word to the wise: Crafting impactful prompts for ChatGPT. System 2025;133:103756 View
Hassanein F, El Barbary A, Hussein R, Ahmed Y, El‐Guindy J, Sarhan S, Abou‐Bakr A. Diagnostic Performance of ChatGPT‐4o and DeepSeek‐3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis. Oral Diseases 2025;31(12):3361 View
Chen H, Alfred M, Cohen E. Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study. JMIR Medical Informatics 2025;13:e68955 View
Pulari S, Umadevi M, Vasudevan S. Optimizing multimodal personalized disease prediction accuracy using generated prompts and large language models. Image and Vision Computing 2025;161:105649 View
Bartels S, Carus J. From text to data: Open-source large language models in extracting cancer related medical attributes from German pathology reports. International Journal of Medical Informatics 2025;203:106022 View
Kantor J. Generative Artificial Intelligence in Dermatology. Dermatologic Clinics 2025;43(4):603 View
Garcia-Carmona A, Prieto M, Puertas E, Beunza J. Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study. JMIR AI 2025;4:e68776 View
Liu J, Liu F, Wang C, Liu S. Prompt Engineering in Clinical Practice: Tutorial for Clinicians. Journal of Medical Internet Research 2025;27:e72644 View
Yao M, Chae A, Saraiya P, Kahn C, Witschey W, Gee J, Sagreiya H, Bastani O. Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines. Communications Medicine 2025;5(1) View
Qian Y. Prompt Engineering in Education: A Systematic Review of Approaches and Educational Applications. Journal of Educational Computing Research 2025;63(7-8):1782 View
Bahng J. The Potential and Applications of Artificial Intelligence in the Field of Audiology. Audiology and Speech Research 2025;21(3):209 View
Bandeira A, Gonçalves L, Holl F, Shaibu J, Gonçalves M, Payinda R, Paudel S, Berionni A, Purnat T, Mackey T. Viewpoint on the Intersection Among Health Information, Misinformation, and Generative AI Technologies. JMIR Infodemiology 2025;5:e69474 View
Çakar M, Avcı A, Düzgün S, Aslan T, Hekimoğlu K. Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries. Australian Endodontic Journal 2025;51(3):732 View
Vieira-Vieira C, Kulkarni S, Zalewski A, Löffler J, Münch J, Kreuchwig A. From data silos to insights: the PRINCE multi-agent knowledge engine for preclinical drug development. Frontiers in Artificial Intelligence 2025;8 View
Wang H, Bai X, Cui X, Chen G, Fan G, Wei G, Zheng Y, Wu J, Gao S. Symptom Recognition in Medical Conversations Via multi- Instance Learning and Prompt. Journal of Medical Systems 2025;49(1) View
Li K, Nguyen T, Moss H. Performance of vision language models for optic disc swelling identification on fundus photographs. Frontiers in Digital Health 2025;7 View
Feyijimi T, Aliu J, Oke A, Aghimien D. ChatGPT’s Expanding Horizons and Transformative Impact Across Domains: A Critical Review of Capabilities, Challenges, and Future Directions. Computers 2025;14(9):366 View
Emilova Doneva S, de Viragh S, Hubarava H, Schandelmaier S, Briel M, Ineichen B. StudyTypeTeller—Large language models to automatically classify research study types for systematic reviews. Research Synthesis Methods 2025;16(6):1005 View
Raghavendran A, Musunuri B, Rajpurohit S, C. G, Shetty S, Kumari P, Shetty R, Shetty A, Bhat G. Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome. Indian Journal of Gastroenterology 2025 View
Dilbaz O, Ozates M, Bolat B, Gunduz-Demir C, Kulac I. Systematic comparison of GPT models for the analysis of pathology reports in a low-resource language: A case study for Turkish. American Journal of Clinical Pathology 2025;164(5):721 View
Alter I, Chan K, Andreadis K, Rameau A. Generative Artificial Intelligence Methodology Reporting in Otolaryngology: A Scoping Review. The Laryngoscope 2026;136(3):1109 View
Le A, Shvekher T, Nguyen L, Krylov S. A Conversational Large‐Language‐Model Tutor that Accelerates Machine‐Learning Method Development in Routine Bioanalytical Workflows. ChemBioChem 2025;26(21) View
Jin R, Zhao M, Niu C, Xia Y, Zhou H, Liu N. Evaluating the performance of ChatGPT and Claude in automated writing scoring: Insights from the Many-facet Rasch model. Education and Information Technologies 2025;30(18):25881 View
Haupt F, Rödig T, Liersch P. Evaluating ChatGPT-4o as an Educational Support Tool for the Emergency Management of Dental Trauma: Randomized Controlled Study Among Students. JMIR Medical Education 2025;11:e80576 View
Daulat S, Dholaria N, Burnet G, Patil S, Manne B, Choudhary A, Mitha R, Zeeshan Q, Hamilton D, Agarwal N. Prompt Engineering and Follow-Up Questioning Improves the Readability of Spine Surgery Questions in Large Language Models. World Neurosurgery 2025;203:124423 View
Ardila C, Pineda-Vélez E, Vivares-Builes A. Artificial Intelligence in Endodontic Education: A Systematic Review with Frequentist and Bayesian Meta-Analysis of Student-Based Evidence. Dentistry Journal 2025;13(11):489 View
Ordak M, Adamczyk J, Oskroba A, Majewski M, Nasierowski T. Evaluation of the Accuracy and Reliability of Responses Generated by Artificial Intelligence Related to Clinical Pharmacology. Journal of Clinical Medicine 2025;14(21):7563 View
Duque A, Araujo L, Martinez-Romo J, Esteban-Vasallo M, Domínguez-Berjón M, Malillos Perez D. An integrated approach for rare disease detection and classification in Spanish pediatric medical reports. Scientific Reports 2025;15(1) View
Asif S, Hadi F, Qurrat-ul-ain , Yan Y, Wang V, Xu D. The impact of large language models on medical research and patient care: A systematic review of current trends, challenges, and future innovations. Computer Science Review 2026;59:100847 View
Di Maio F, Gozzi M. Degradation of Multi-Task Prompting Across Six NLP Tasks and LLM Families. Electronics 2025;14(21):4349 View
Vasilev Y, Vladzymyrskyy A, Omelyanskaya O, Alymova Y, Akhmedzyanova D, Shumskaya Y, Kodenko M, Blokhin I, Reshetnikov R. Development and Validation of a Questionnaire to Evaluate AI-Generated Summaries for Radiologists: ELEGANCE (Expert-Led Evaluation of Generative AI Competence and ExcelleNCE). AI 2025;6(11):287 View
Shawi R, Jamel L. Leveraging ChatGPT and explainable AI for enhancing clinical decision support. Scientific Reports 2025;15(1) View
Albosaif W, Aljughaiman A, Alsayed A. The role of diversity in enrichment programs in shaping the career paths of gifted individuals: An analysis of influential factors and emerging trends. International Journal of ADVANCED AND APPLIED SCIENCES 2025;12(11):82 View
Koo S, Choi K. Unstructured Medical Data Entry System using Gaussian Probabilities and Large Language Models. The Journal of Korean Institute of Information Technology 2025;23(10):23 View
Duan Z, Huang X, Lu R, Xu W, Liu H, Geng Y, Takahashi N, Wu Y, Wang Q, Song Y, Xu H, Tang H, Lan F, Eils R, Tan L. Multi-center benchmarking of large language models for clinical decision support in lung cancer screening. Cell Reports Medicine 2025;6(12):102465 View
Li S, Fang X, Jin Y, Deng Y, Hu W, Wu B, Zhou X, Wang G, Li K, Yue Q. Improving diagnostic accuracy in preoperative glioma classification: performance of knowledge-enhanced large language models compared with radiologists. npj Precision Oncology 2025;9(1) View
Moura Junior V, Hadar P, Murphy S, Moura L. AI Prompt Engineering for Neurologists and Trainees. Seminars in Neurology 2026;46(01):016 View
Abumelha M, AL-Ghamdi A, Fayoumi A, Ragab M. Medical Feature Extraction From Clinical Examination Notes: Development and Evaluation of a Two-Phase Large Language Model Framework. JMIR Medical Informatics 2025;13:e78432 View
Ding X, Wang J. The analysis of bidirectional long short-term memory network model for construction of cultural gene map and information extraction. Scientific Reports 2025;16(1) View
Singh P, Shqair L, Naphade O, Sanchez K, Namiri N, Sharma S, Mohamed K, Yu A, Kaur R, Alasadi Y, Hoang T, Walsh A. Accuracy of artificial intelligence in carpal tunnel syndrome management: A comparative analysis of ChatGPT-4o and Gemini 1.5 Pro. Hand Surgery and Rehabilitation 2026;45(1):102560 View
Lisboa R, Braido A, de-Jesus-Soares A, Tewari N, Soares C, Paranhos L, Vieira W. Performance of five free large language models in dental trauma: a 30-day longitudinal benchmark study. Frontiers in Oral Health 2025;6 View
Nordmann K, Fischer F. Harnessing ChatGPT for abstract screening in health-related scoping reviews: the role of structured eligibility criteria. BMC Health Services Research 2025;26(1) View
Dietrich N, McShannon D, Rzepka M, Al-Jumeily OBE D. Evaluating few-shot prompting for spectrogram-based lung sound classification using a multimodal language model. PLOS Digital Health 2026;5(1):e0001179 View
Chen Z, Gao Y, Gu S, Yong Z, Lei L, Wang R, Lei Q, Zeng S. Enhancing personalized anesthesia plans in cardiac surgery with AI: ChatGPT's advantages and the imperative for clinical oversight. Anesthesiology and Perioperative Science 2026;4(1) View
Schmidt L, Ibing S, Borchert F, Hugo J, Marshall A, Peraza J, Cho J, Böttinger E, Renard B, Ungaro R. Automating clinical phenotyping using natural language processing. Communications Medicine 2026;6(1) View
Zhu J, Li J, Zhao S, Deng Y, Miao Y, Xu J. Adapting LLMs for biomedical natural language processing: a comprehensive benchmark study on fine-tuning methods. The Journal of Supercomputing 2026;82(2) View
Del Campo A, Lituiev D, Varma G, Manoharan M, Kumar Ravi S, Aman A, Kansagra A, Greshock J, Venkatakrishnan A, Batavia A. Automated abstraction of clinical parameters of multiple myeloma from real-world clinical notes using large language models. BMC Medical Informatics and Decision Making 2026;26(1) View
Wang Q, Shi F, Wang M, Geng X, Zhao M. Boundary-aware and multi-angle modeling-based object tracking in polarimetric images. Knowledge-Based Systems 2026;338:115442 View
Liu W, Wan J, Lv N, Zhou X. Bi-term association based on fuzzy logic. Array 2026;29:100704 View
Ding D, Xi W, Ding Z, Gao J. Deep Reinforcement Learning-Driven Adaptive Prompting for Robust Medical LLM Evaluation. Applied Sciences 2026;16(3):1514 View
Geukes Foppen R, Morkūnas M, Traverso A, Dienstmann R. AI assistance in tumor multidisciplinary teams. ESMO Real World Data and Digital Oncology 2026;11:100684 View
Oami T, Okada Y, Nakada T. Automated systematic reviews using machine learning and large language models in clinical practice guideline development: A perspective. Hong Kong Journal of Emergency Medicine 2026;33(1) View
van der Loo W, van der Valk V, van den Broek T, Atsma D, Staring M, Scherptong R. Large language models for structured cardiovascular data extraction: a foundation for scalable research and clinical applications. European Heart Journal - Digital Health 2026;7(2) View
Joseph E, Vallee P, Perennec T, Wagneur N, Frenel J, Campone M, Bocquet F, Le Borgne F. Development and Assessment of a Pipeline for Extracting Structured Data From Free-Text Medical Reports Using a Large Language Model. JCO Clinical Cancer Informatics 2026;10(1) View
Akkhawatthanakun K, Narupiyakul L, Wongpatikaseree K, Hnoohom N, Termritthikun C, Muneesawang P. Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding. Informatics 2026;13(3):39 View
Chaudhary J, Sharma P, Kumar S, Takalkar N, Kumar R, Haresh K, Das C, Sahoo R, Kaushal S, Kalaivani M, Khairwa H, Pandey A. Evaluating Locally Deployed Large Language Models for Ga-68 PSMA PET/CT Report Mining in Prostate Cancer. Nuclear Medicine and Molecular Imaging 2026 View
Low C, Wang Z, Zhang T, Zhuo Z, Zeng Z, Mazomenos E, Jin Y. SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis. IEEE Robotics and Automation Letters 2026;11(4):4857 View
Brown C, Spillias S. Prompting large language models for quality ecological statistics. Methods in Ecology and Evolution 2026;17(4):1012 View
Yadav T, Tekale A, Chong J, Masum M. Understanding Tradeoffs in Clinical Text Extraction: Prompting, Retrieval-Augmented Generation, and Supervised Learning on Electronic Health Records. Algorithms 2026;19(3):215 View
Yang Y, Chang C, Lin Y, Cheng H, Huang S, Lin H, Lin C, Lan Y, Wang H, Chang S, Yang S, Chen W, Jiang J. Automated Flow and local LLM-Driven clinical Context Engineering: Precision colorectal cancer recurrence registry. International Journal of Medical Informatics 2026;213:106383 View
YENİKAYA M, ODABAŞOĞLU M. Black-Box Büyük Dil Modellerinde Çıktı Kalitesini Artırmak için Pre-Informing Yaklaşımı ve Mevcut İstem Tekniklerinin Sentezi. Üçüncü Sektör Sosyal Ekonomi Dergisi 2026;61(1):1243 View
Yang L, Mulford K, Girod-Hoffman M, Khela M, Khosravi A, Crossman D, Kanabar A, Saniei S, Ulrich M, Taunton M, Wyles C. Comparison of Large Language Models with Rules-Based Natural Language Processing Algorithms for Extracting Data from Operative Notes. Journal of Bone and Joint Surgery 2026 View
Russ P, Bedenbender S, Einloft J, Meyer H, Wenzel L, Ganser A, Hirsch M, Grgic I. Potential of large language models for rapid clinical information support: evidence from acute kidney injury knowledge testing. Scientific Reports 2026;16(1) View
Kaya O, Huri G, Özbek E, Gönder N, Demir İ, Dalkır K. Large language models in sports injury care: a comparative expert evaluation of GPT-4o and GPT-5. BMC Sports Science, Medicine and Rehabilitation 2026;18(1) View
Güler I, Grieb G, Kraus A, Stelling H. Artificial Intelligence in Plastic Surgery Education: A Global Multimodel Benchmark of Large Language Models on the Plastic Surgery In-Service Training Examination. Aesthetic Surgery Journal Open Forum 2026;8 View
Hassanein F, Ibrahim S, Tomo S, Alsahhaf A, Abou-Bakr A. Prompt engineering shapes diagnostic accuracy and explanation quality of LLM in oral lesion diagnosis: a prospective, expert-blinded benchmark study. Odontology 2026 View
Yeh Y, Yang H, Chiu C, Chao A, Chuang Y, Chan W. Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study. Intensive Care Medicine Experimental 2026;14(1) View
Hu S, Keeley T, Halvorson R, Campbell S, Lefaivre K, Levack A, Lundy D, Meinberg E, Schweser K, Shymon S, Marmor M. Deriving the OTA/AO fracture classification from routinely collected radiology reports using a large language model. OTA International 2026;9(2) View
Harada Y. Prompt-Induced Output Variability and Structured-Output Integrity in Local Open Large Language Models: A Multi-model In Silico Benchmark Using Synthetic Acute-Care Scenarios. Cureus 2026 View
Tian J, Lou Q, Wang X, Xu H, Mei H, Yu Y. Large Language Models in Colorectal Cancer Care and Clinical Decision Support: Systematic Review. Journal of Medical Internet Research 2026;28:e89862 View
Chen J, Wang F. Evaluating retrieval-augmented generation for guideline-grounded textual planning in implant dentistry: A comparative study. Journal of Dentistry 2026;172:106750 View
Huang R, Cecil J, Freedman M, Chattopadhyay S. Quantifying Factors that Drive Trust and Satisfaction with AI Health Chatbots: A Mixed-Methods Vignette Survey of Caregivers for Pediatric Infectious Diseases (Preprint). Journal of Medical Internet Research 2025 View
Fink M, Bischoff A, Atsiatorme E, Kremer A, Kroschke J, Moll M, Stein P, Riebl V, Leichenich T, Kauczor H, Schlamp K. Who labels best? Radiologists, rules, or large language models for CT reports on pulmonary embolism. European Radiology Experimental 2026;10(1) View
Ong A, Merle D, Shah N, Tham Y, Wong T, Keane P. Co-intelligence: a proposal for human–artificial intelligence collaboration for large language models in medical research. The Lancet Digital Health 2026;8(6):100982 View
Jin C, Kim J, Im J. Can we use generative AI for tourism research? A guide to applying and validating large language models for text analytics. Tourism Management 2026;117:105468 View
Güler I, Grieb G, Kraus A, Moog P, Cambaz U, Yavasca E, Stelling H. Artificial Intelligence in Medical Assessment: Reliability and Performance of Multimodal Large Language Models in a High-Stakes Licensing Examination. Behavioral Sciences 2026;16(5):822 View
Zhou F, Saha A, Afzal M, Parrish R, Haynes R, Iorio A, Lokker C. Understanding Transformer-Based Classifications of Medical Text Using a Large Language Model for the Attribution of Feature Importance: Proof-of-Concept Algorithm Development and Validation Study. JMIR Medical Informatics 2026;14:e81644 View
Li J, Xie Y, Li B, He R, Meng W, Gong J, Fan Y, Li L, Li B. LLMs as emerging tools for understanding and managing bone metastasis and cancer-induced bone pain. iScience 2026;29(6):116158 View
Balaji A, Fox B, Seger P, Gorugantu A, Nordin A, Fabiano T, Yang G, Himidan S, Schwaitzberg S, Kim P. Improving Trauma Triage Accuracy with Large Language Models: A Comparison to Human Expert Decisions. Journal of the American College of Surgeons 2026;243(1):153 View
Lu Z, Cao H, Ma C, Zheng J, Ma X. Mapping the Reliability–Readability Gap in AMD Patient Education Across Six Large Language Models (Preprint). JMIR Medical Informatics 2026 View
Sato T, Kawano S, Yoshino K. Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs. Journal of Natural Language Processing 2026;33(2):880 View
Cases M, Imre A, Giles R, Puga L, Piggin M, Geissler J, Racovita M, Leto di Priolo S, Wogu L, Hyseni-Bocolli A, Morgan K, Hosszú D, Pitter J, Ágh T, Plate A, Józwiák-Hagymássy J, Bögös A, Józwiák Á. Variation in use of PROMs and geographical distribution in clinical trials of selected hematological diseases in Europe. Blood Global Hematology 2026;2(3):100107 View
Imaezue G, Maram K, Ajayi D, Alohali I, Butta R. Preclinical Dialogue Simulation: Evaluating Response Accessibility in Conversational Artificial Intelligence for Aphasia Therapy. Journal of Speech, Language, and Hearing Research 2026:1 View
Piccaro N, Brown M, Meyer C, Cowen L. CistromeMeta: a large language model powered tool for automated ChIP-seq metadata extraction. Bioinformatics 2026;42(6) View
Pan H, Liu J, Liu S. Enhancing Physician Resilience to Generative AI: Multilevel Framework for Shared Authority, Verification, and Skill Preservation. Journal of Medical Internet Research 2026;28:e88058 View
Dong Y, Shu R, Qiu X, Huang J, Yang G. A systematic comparison of ChatGPT and DeepSeek for guideline-based question answering in obstetric anesthesia. Scientific Reports 2026;16(1) View
Chen S, Maddali M, Langlotz C, Bluethgen C, Chen J, Raj R. Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study. Journal of Medical Internet Research 2026;28:e90547 View
Yang H, Niu Z, Li M, Zhou H, Xiao Y, Zhou S, Zhan Z, Liu Y, Liu S, Tignanelli C, Melton G, Zhang R. Benchmarking information extraction of physical activity from electronic health record with large language models: an natural language processing pipeline and comparative evaluation. Journal of the American Medical Informatics Association 2026 View
Cao W, Lloyd I, Dichmann M, Halder N, Thomas D, Chen Z, Blomain E, Adejolu F, Beck K, Faherty P, Khan M, Simone N, Jain V, Storozynsky E, Dicker A, Choi W, Vinogradskiy Y. Cross-Institutional Validation of a novel LLM-Based Cardiac Event Extraction framework from Electronic Health Records. International Journal of Radiation Oncology*Biology*Physics 2026 View
Muhammad I, Rospocher M, Knez T, Žitnik S. Benchmarking large language models for target-based financial sentiment and stock return. International Journal of Data Science and Analytics 2026;22(1) View
Verma R, Bains S, Reddy Muthani S, Arunachalam A, Mohan V, Gold J. Feasibility of Tailoring Artificial Intelligence–Assisted Ambient Scribes for Intensive Care Unit Rounds: Algorithm Development and Validation. JMIR Medical Informatics 2026;14:e85015 View

Books/Policy Documents

Miller S, Busby-Earle C. Proceedings of the Future Technologies Conference (FTC) 2024, Volume 4. View
Akbar N, Lenzitti B, Tegolo D. AIxIA 2024 – Advances in Artificial Intelligence. View
Chung Y, Tung C, Chang Y. Advances and Trends in Artificial Intelligence. Theory and Applications. View
Fawareh H, Alanazi S. Artificial Intelligence for Sustainable Innovation Management and Risk Management. View
Srinivasa Rao M, Anitha G. Proceedings of Fifth International Conference on Computing and Communication Networks. View
Brown K, Chatzipanagiotou N, Elf A, Henriksson E. Human-Computer Interaction. View

Conference Proceedings

García-Barragán Á, Calatayud A, Prieto-Santamaría L, Robles V, Menasalvas E, Rodríguez A. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS). Step-forward structuring disease phenotypic entities with LLMs for disease understanding View
Teng S, Zhang T, D'Alfonso S, Kostakos V. Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing. Predicting Affective States from Screen Text Sentiment View
Maceda L. 2024 International Conference on Computer and Applications (ICCA). Enhanced Sentiment Classification in Code-Mixed Texts Using Hybrid Embeddings and Synthetic Data Generation View
Weerathunge T, Jayalal S, Wijayasiriwardhane K. 2025 5th International Conference on Advanced Research in Computing (ICARC). Optimizing Response Consistency of Large Language Models in Medical Education through Prompt Engineering View
Arabzadeh N, Bagheri E. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. VAP3: Variation-Aware Prompt Performance Prediction View
Mamud A, Kim J. 2025 IEEE/ACIS 29th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Evaluating One-Shot and Multi-Shot Prompting Strategies in a Transparent Educational Chatbot View
Bakagianni J, Dalakleidi K, Stamatis K, Pavlopoulos J. 2025 IEEE 25th International Conference on Bioinformatics and Bioengineering (BIBE). DiaShift: An Explainable System for Temporal Diagnostic Shift Detection in Clinical Notes View
Yuyun , Hazriani , Nurfaedah , Gusnawaty , Pammuda , Zahrani , Muis A, Khairat U, Mubarak A, Taha M, Arif S. 2025 International Conference on Computer, Control, Informatics and its Applications (IC3INA). The Anoa-L01 Benchmark: Prompt-Based Zero-Shot Evaluation for Sulawesi’s Regional Languages Detection in LLMs View
Vayadande K, Shinde R, Bende S, Sathe H, Walunj S, Jha S. 2025 IEEE 5th International Conference on ICT in Business Industry & Government (ICTBIG). Specialized Large Language Models for Hindi Medical Natural Language Processing: A Clinical Entity in a Multi-Modal Framework Recognition and Semantic Understanding View
Ljubisavljević D, Bačić M, Savić D, Vlajić S. 2026 30th International Conference on Information Technology (IT). Evaluation of the Models for Abstract Generation View
Katranas I, Dokas I, Kafoutis G. 2026 New Trends in Civil Aviation (NTCA). Multi-Agent LLM Classification of STPA Context Tables View
Garces K, Fernandez-Nieto G, Zhao L, Samaraweera S, Gasevic D, Martinez-Maldonado R, Echeverria V. Proceedings of the Thirteenth ACM Conference on Learning @ Scale. Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact View

Citation

Please cite as:

Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study
JMIR Med Inform 2024;12:e55318
doi: 10.2196/55318 PMID: 38587879 PMCID: 11036183

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Natural Language Processing (1228) Clinical Informatics (2144) Clinical Information and Decision Making (3543) Tools, Programs and Algorithms (567) Ontologies, Classifications, and Coding (417) Machine Learning (3055) Chatbots and Conversational Agents (1135) Artificial Intelligence (4543) Generative Language Models Including ChatGPT (1421)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn