Published on in Vol 12 (2024)

This is a member publication of University of Pittsburgh

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/55318, first published .
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study

Journals

  1. Fang Y, Ryan P, Weng C. Knowledge-guided generative artificial intelligence for automated taxonomy learning from drug labels. Journal of the American Medical Informatics Association 2024;31(9):2065 View
  2. Nwachukwu B, Varady N, Allen A, Dines J, Altchek D, Williams R, Kunze K. Currently Available Large Language Models Do Not Provide Musculoskeletal Treatment Recommendations That Are Concordant With Evidence-Based Clinical Practice Guidelines. Arthroscopy: The Journal of Arthroscopic & Related Surgery 2025;41(2):263 View
  3. Shahriar S, Lund B, Mannuru N, Arshad M, Hayawi K, Bevara R, Mannuru A, Batool L. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency. Applied Sciences 2024;14(17):7782 View
  4. Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. Journal of Medical Internet Research 2024;26:e60501 View
  5. Tong L, Zhang C, Liu R, Yang J, Sun Z. Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis. Journal of Orthopaedic Surgery and Research 2024;19(1) View
  6. Tam T, Sivarajkumar S, Kapoor S, Stolyar A, Polanska K, McCarthy K, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani G, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. npj Digital Medicine 2024;7(1) View
  7. Ronquillo J, Ye J, Gorman D, Lemeshow A, Watt S. Practical Aspects of Using Large Language Models to Screen Abstracts for Cardiovascular Drug Development: Cross-Sectional Study. JMIR Medical Informatics 2024;12:e64143 View
  8. Workman T, Ahmed A, Sheriff H, Raman V, Zhang S, Shao Y, Faselis C, Fonarow G, Zeng-Treitler Q. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Progress in Cardiovascular Diseases 2024;87:44 View
  9. Das M, Senapati A. Co-reference Resolution in Prompt Engineering. Procedia Computer Science 2024;244:194 View
  10. Othman A, Chemnad K, Tlili A, Da T, Wang H, Huang R. Comparative analysis of GPT-4, Gemini, and Ernie as gloss sign language translators in special education. Discover Global Society 2024;2(1) View
  11. Acut D, Malabago N, Malicoban E, Galamiton N, Garcia M. “ChatGPT 4.0 Ghosted Us While Conducting Literature Search:” Modeling the Chatbot’s Generated Non-Existent References Using Regression Analysis. Internet Reference Services Quarterly 2025;29(1):27 View
  12. Cardamone N, Olfson M, Schmutte T, Ungar L, Liu T, Cullen S, Williams N, Marcus S. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study. JMIR Medical Informatics 2025;13:e65454 View
  13. Tarris G, Martin L. Performance assessment of ChatGPT 4, ChatGPT 3.5, Gemini Advanced Pro 1.5 and Bard 2.0 to problem solving in pathology in French language. DIGITAL HEALTH 2025;11 View
  14. Kuerbanjiang W, Peng S, Jiamaliding Y, Yi Y. Performance Evaluation of Large Language Models in Cervical Cancer Management Based on a Standardized Questionnaire: Comparative Study. Journal of Medical Internet Research 2025;27:e63626 View
  15. Geevarghese R, Solomon S, Alexander E, Marinelli B, Chatterjee S, Jain P, Cadley J, Hollingsworth A, Chatterjee A, Ziv E. Utility of a Large Language Model for Extraction of Clinical Findings from Healthcare Data following Lung Ablation: A Feasibility Study. Journal of Vascular and Interventional Radiology 2025;36(4):704 View
  16. Kim S, Schramm S, Adams L, Braren R, Bressem K, Keicher M, Platzek P, Paprottka K, Zimmer C, Hedderich D, Wiestler B. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. npj Digital Medicine 2025;8(1) View
  17. Fung M, Tang E, Wu T, Luk Y, Au I, Liu X, Lee V, Wong C, Wei Z, Cheng W, Tai I, Ho J, Wong J, Lang B, Leung K, Wong Z, Wu J, Wong C. Developing a named entity framework for thyroid cancer staging and risk level classification using large language models. npj Digital Medicine 2025;8(1) View
  18. Valadez-de la Paz N, Vazquez-Lopez J, Hernandez-Lopez A, Aviles-Viñas J, Navarro-Gonzalez J, Reyes-Acosta A, Lopez-Juarez I. Automation Applied to the Collection and Generation of Scientific Literature. Publications 2025;13(1):11 View
  19. Burstein R, Mafuta E, Proctor J. Large language models for analyzing open text in global health surveys: why children are not accessing vaccine services in the Democratic Republic of the Congo. International Health 2025 View
  20. Talay L, Lagesen L, Yip A, Vickers M, Ahuja N. ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis. Healthcare 2025;13(6):647 View
  21. Cao Y, Hu L, Cao X, Peng J. Can large language models facilitate the effective implementation of nursing processes in clinical settings?. BMC Nursing 2025;24(1) View
  22. Lauderdale S, Schmitt R, Wuckovich B, Dalal N, Desai H, Tomlinson S. Effectiveness of generative AI-large language models’ recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model. Frontiers in Psychiatry 2025;16 View
  23. Güvel M, Kıyak Y, Varan H, Sezenöz B, Coşkun Ö, Uluoğlu C. Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation. European Journal of Clinical Pharmacology 2025;81(6):875 View
  24. Lauderdale S, Griffin S, Lahman K, Mbaba E, Tomlinson S. Unveiling Public Stigma for Borderline Personality Disorder: A Comparative Study of Artificial Intelligence and Mental Health Care Providers. Personality and Mental Health 2025;19(2) View
  25. Shen M, Shen Y, Liu F, Jin J. Prompts, privacy, and personalized learning: integrating AI into nursing education—a qualitative study. BMC Nursing 2025;24(1) View
  26. Sumner J, Wang Y, Tan S, Chew E, Wenjun Yip A. Perspectives and Experiences With Large Language Models in Health Care: Survey Study. Journal of Medical Internet Research 2025;27:e67383 View
  27. Hickman C, Pridgen K, Hughes D, Pair L, Holland A. The Role of Artificial Intelligence in Increasing Efficiency, Reducing Errors, and Improving Patient Outcomes in Clinical Practice. Clinical Journal for Nurse Practitioners in Women's Health 2025;2(2):101 View
  28. Elabd N, Rahman Z, Abu Alinnin S, Jahan S, Campos L, Baltatu O. Designing Personalized Multimodal Mnemonics With AI: A Medical Student’s Implementation Tutorial. JMIR Medical Education 2025;11:e67926 View
  29. Hein D, Christie A, Holcomb M, Xie B, Jain A, Vento J, Rakheja N, Shakur A, Christley S, Cowell L, Brugarolas J, Jamieson A, Kapur P. Iterative refinement and goal articulation to optimize large language models for clinical information extraction. npj Digital Medicine 2025;8(1) View
  30. Radi M, Omar N, Kaur W. Syntactic-Guided Chain of Thought for Iterative Implicit and Explicit Target Detection in Aspect-Based Sentiment Analysis. IEEE Access 2025;13:84738 View
  31. Thota D, Alt D, Cole J, Tring V. Prompting Pro Tips! Best Practices for Generating Clinical Narrative Summaries. Military Medicine 2025 View
  32. Miller K, Bedrick S, Lu Q, Wen A, Hersh W, Roberts K, Liu H. Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models. Journal of the American Medical Informatics Association 2025;32(7):1164 View
  33. Fleurence R, Wang X, Bian J, Higashi M, Ayer T, Xu H, Dawoud D, Chhatwal J. A Taxonomy of Generative Artificial Intelligence in Health Economics and Outcomes Research: An ISPOR Working Group Report. Value in Health 2025 View
  34. Boie S, Glastetter E, Lux M, Balzer F, von Kalle C, Lenz C, Müller U. Evaluating a Chatbot as a Companion for Patients With Breast Cancer: Collaborative Pilot Study. JMIR Cancer 2025;11:e68426 View
  35. Hwang M, Lee K, Lee H. A word to the wise: Crafting impactful prompts for ChatGPT. System 2025;133:103756 View
  36. Hassanein F, El Barbary A, Hussein R, Ahmed Y, El‐Guindy J, Sarhan S, Abou‐Bakr A. Diagnostic Performance of ChatGPT‐4o and DeepSeek‐3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis. Oral Diseases 2025 View
  37. Chen H, Alfred M, Cohen E. Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study. JMIR Medical Informatics 2025;13:e68955 View
  38. Pulari S, Umadevi M, Vasudevan S. Optimizing multimodal personalized disease prediction accuracy using generated prompts and large language models. Image and Vision Computing 2025;161:105649 View
  39. Bartels S, Carus J. From text to data: Open-source large language models in extracting cancer related medical attributes from German pathology reports. International Journal of Medical Informatics 2025;203:106022 View
  40. Kantor J. Generative Artificial Intelligence in Dermatology. Dermatologic Clinics 2025 View
  41. Garcia-Carmona A, Prieto M, Puertas E, Beunza J. Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study. JMIR AI 2025;4:e68776 View
  42. Liu J, Wang C, Liu S. Prompt Engineering in Clinical Practice: A Tutorial for Clinicians (Preprint). Journal of Medical Internet Research 2025 View
  43. Yao M, Chae A, Saraiya P, Kahn C, Witschey W, Gee J, Sagreiya H, Bastani O. Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines. Communications Medicine 2025;5(1) View
  44. Qian Y. Prompt Engineering in Education: A Systematic Review of Approaches and Educational Applications. Journal of Educational Computing Research 2025 View
  45. Bahng J. The Potential and Applications of Artificial Intelligence in the Field of Audiology. Audiology and Speech Research 2025;21(3):209 View
  46. Bandeira A, Gonçalves L, Holl F, Shaibu J, Gonçalves M, Payinda R, Paudel S, Berionni A, Purnat T, Mackey T. Viewpoint on the Intersection Between Health Information, Misinformation, and Generative AI Technologies (Preprint). JMIR Infodemiology 2024 View
  47. Çakar M, Avcı A, Düzgün S, Aslan T, Hekimoğlu K. Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries. Australian Endodontic Journal 2025 View
  48. Vieira-Vieira C, Kulkarni S, Zalewski A, Löffler J, Münch J, Kreuchwig A. From data silos to insights: the PRINCE multi-agent knowledge engine for preclinical drug development. Frontiers in Artificial Intelligence 2025;8 View
  49. Wang H, Bai X, Cui X, Chen G, Fan G, Wei G, Zheng Y, Wu J, Gao S. Symptom Recognition in Medical Conversations Via multi- Instance Learning and Prompt. Journal of Medical Systems 2025;49(1) View
  50. Li K, Nguyen T, Moss H. Performance of vision language models for optic disc swelling identification on fundus photographs. Frontiers in Digital Health 2025;7 View

Books/Policy Documents

  1. Miller S, Busby-Earle C. Proceedings of the Future Technologies Conference (FTC) 2024, Volume 4. View
  2. Akbar N, Lenzitti B, Tegolo D. AIxIA 2024 – Advances in Artificial Intelligence. View
  3. Chung Y, Tung C, Chang Y. Advances and Trends in Artificial Intelligence. Theory and Applications. View

Conference Proceedings

  1. García-Barragán Á, Calatayud A, Prieto-Santamaría L, Robles V, Menasalvas E, Rodríguez A. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS). Step-forward structuring disease phenotypic entities with LLMs for disease understanding View
  2. Teng S, Zhang T, D'Alfonso S, Kostakos V. Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing. Predicting Affective States from Screen Text Sentiment View
  3. Maceda L. 2024 International Conference on Computer and Applications (ICCA). Enhanced Sentiment Classification in Code-Mixed Texts Using Hybrid Embeddings and Synthetic Data Generation View
  4. Weerathunge T, Jayalal S, Wijayasiriwardhane K. 2025 5th International Conference on Advanced Research in Computing (ICARC). Optimizing Response Consistency of Large Language Models in Medical Education through Prompt Engineering View
  5. Arabzadeh N, Bagheri E. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. VAP3: Variation-Aware Prompt Performance Prediction View