TY  - JOUR
AU  - Bilotta, Isabel
AU  - Tonidandel, Scott
AU  - Liaw, Winston R
AU  - King, Eden
AU  - Carvajal, Diana N
AU  - Taylor, Ayana
AU  - Thamby, Julie
AU  - Xiang, Yang
AU  - Tao, Cui
AU  - Hansen, Michael
PY  - 2024
DA  - 2024/5/23
TI  - Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis
JO  - JMIR Med Inform
SP  - e50428
VL  - 12
KW  - bias
KW  - sociodemographic factors
KW  - health care disparities
KW  - natural language processing
KW  - sentiment analysis
KW  - diabetes
KW  - electronic health record
KW  - racial
KW  - ethnic
KW  - diversity
KW  - Hispanic
KW  - medical interaction
AB  - Background: Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias. Objective: We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias. Methods: In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes. Results: We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient −0.02, SE 0.007), trust verbs (coefficient −0.009, SE 0.004), and joy words (coefficient −0.03, SE 0.01) than those for White non-Hispanic patients. Conclusions: This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias. 
SN  - 2291-9694
UR  - https://medinform.jmir.org/2024/1/e50428
UR  - https://doi.org/10.2196/50428
DO  - 10.2196/50428
ID  - info:doi/10.2196/50428
ER  -