Iterative Large Language Model–Guided Sampling and Expert-Annotated Benchmark Corpus for Harmful Suicide Content Detection: Development and Validation Study

doi:10.2196/73725

Published on 05.Feb.2026 in Vol 14 (2026)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/73725, first published 12.Mar.2025.

Young Asian woman working on a laptop at a desk with a plant

Iterative Large Language Model–Guided Sampling and Expert-Annotated Benchmark Corpus for Harmful Suicide Content Detection: Development and Validation Study

Kyumin Park¹

; Myung Jae Baik²

; YeongJun Hwang³

; Yen Shin⁴

; HoJae Lee⁴

; Ruda Lee⁵

; Sang Min Lee²

; Je Young Hannah Sun²

; Ah Rah Lee²

; Si Yeun Yoon²

; Dong-ho Lee¹

; Jihyung Moon¹

; JinYeong Bak³

; Kyunghyun Cho⁶

; Jong-Woo Paik²

; Sungjoon Park¹

Article Authors Cited by Tweetations Metrics

Kyumin Park ^{1
*} , MS ;   Myung Jae Baik ^{2
*} , MD ;   YeongJun Hwang ^{3
*} , MS ;   Yen Shin ⁴ , BS ;   HoJae Lee ⁴ , BS ;   Ruda Lee ⁵ , MA ;   Sang Min Lee ² , PhD ;   Je Young Hannah Sun ² , MD ;   Ah Rah Lee ² , PhD ;   Si Yeun Yoon ² , MA ;   Dong-ho Lee ¹ , PhD ;   Jihyung Moon ¹ , MS ;   JinYeong Bak ³ , PhD ;   Kyunghyun Cho ⁶ , PhD ;   Jong-Woo Paik ² , PhD ;   Sungjoon Park ¹ , PhD

¹ SoftlyAI, Seoul, Republic of Korea

² Department of Psychiatry, Kyung Hee University College of Medicine, Seoul, Republic of Korea

³ Department of Artificial Intelligence, Sungkyunkwan University, Suwon-si, Gyeonggi-do, Republic of Korea

⁴ KAIST, Daejeon, Republic of Korea

⁵ Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States

⁶ Department of Computer Science, New York University, New York, NY, United States

*these authors contributed equally

Corresponding Author:

JinYeong Bak, PhD
Department of Artificial Intelligence
Sungkyunkwan University
Office 27306, Engineering Building 2, 2066 Seobu-ro Jangan-gu
Suwon-si, Gyeonggi-do 16419
Republic of Korea
Phone: +82 31 290 7104
Email: jy.bak@skku.edu

Citation

Please cite as:

Park K, Baik MJ, Hwang Y, Shin Y, Lee H, Lee R, Lee SM, Sun JYH, Lee AR, Yoon SY, Lee DH, Moon J, Bak J, Cho K, Paik JW, Park S
Iterative Large Language Model–Guided Sampling and Expert-Annotated Benchmark Corpus for Harmful Suicide Content Detection: Development and Validation Study
JMIR Med Inform 2026;14:e73725
doi: 10.2196/73725 PMID: 41643119 PMCID: 12875420

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Natural Language Processing (1228) Depression and Mood Disorders; Suicide Prevention (2395) Infoveillance, Infodemiology, Digital Disease Surveillance, Infodemic Management (1387) Mental Health Surveillance and Epidemiology (211) Misinformation and Disinformation Outbreaks and Information Prevalence Studies (155) Applications of AI (863)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn