This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
The increasing adoption of electronic health records (EHRs) has been associated with a number of unintended negative consequences with provider efficiency and job satisfaction. To address this, there has been a dramatic increase in the use of medical scribes to perform many of the required EHR functions. Despite this rapid growth, little has been published on the training or assessment tools to appraise the safety and efficacy of scribe-related EHR activities. Given the number of reports documenting that other professional groups suffer from a number of performance errors in EHR interface and data gathering, scribes likely face similar challenges. This highlights the need for new assessment tools for medical scribes.
The objective of this study was to develop a virtual video-based simulation to demonstrate and quantify the variability and accuracy of scribes’ transcribed notes in the EHR.
From a pool of 8 scribes in one department, a total of 5 female scribes, intent on pursuing careers in health care, with at least 6 months of experience were recruited for our simulation study. We created three simulated patient-provider scenarios. Each scenario contained a corresponding medical record in our simulation instance of our EHR. For each scenario, we video-recorded a standardized patient-provider encounter. Five scribes with at least 6 months of experience both with our EHR and in the specialty of the simulated cases were recruited. Each scribe watched the simulated encounter and transcribed notes into a simulated EHR environment. Transcribed notes were evaluated for interscribe variability and compared with a gold standard for accuracy.
All scribes completed all simulated cases. There was significant interscribe variability in note structure and content. Overall, only 26% of all data elements were unique to the scribe writing them. The term
We created a high-fidelity, video-based EHR simulation, capable of assessing multiple performance indicators in medical scribes. In this cohort, we demonstrate significant variability both in terms of structure and accuracy in clinical documentation. This form of simulation can provide a valuable tool for future development of scribe curriculum and assessment of competency.
The electronic health record (EHR) is a vital tool in the delivery of clinical care. The EHR adoption rates have grown rapidly largely because of government programs such as the Health Technology for Economic and Clinical Health (HITECH) Act of 2009 [
One key factor that contributes to the dissatisfaction is the paradigm of “information chaos” resulting from EHR use that can lead to impaired situational awareness and increased mental workload [
Growing concerns with EHR usability and efficiency have been mirrored by concomitant increased utilization of medical scribes. To alleviate challenges associated with EHR data entry, physicians have increasingly incorporated scribes into clinic and hospital workflows. Though studies lauding their potential benefits have been present for nearly 30 years, recently the scribe workforce has demonstrated a significant and rapid growth; there were approximately 10,000 scribes working in 2014 with a projection of 20,000 scribes in the workforce by 2016 [
Scribes who use the EHR may find the complex interface and usability constraints of the EHR potentially even more challenging than physicians do because they lack clinical learning and EHR-specific workflow training. In essence, this paradigm adds another layer of physician responsibility but does not eliminate the errors inherent with poor EHR use.
These issues are further magnified by the fact that scribes do not necessarily just engage in data entry activities during the clinical encounter but may also have a variable and expanded role at the discretion of the provider they are scribing for [
To ensure that standardized activities are accomplished, scribes require appropriate training that directly links their learning needs with measured outcomes. This can be accomplished through training regimens that evaluate individual competencies pertinent to accurate EHR documentation. Training should maintain Health Insurance Portability and Accountability Act (HIPAA) compliance and ensure patient safety. Given the relationship between communication errors and patient safety [
On the basis of these concerns, it is imperative that methodology exists to ensure that scribes can be effectively trained and their competency assessed for safe and effective use of EHR in the appropriate clinical settings. Simulation has been a means of evaluating complicated systems, while posing no risk to patients, and providing high-fidelity standardized subject experiences [
The study was approved by the institutional review board of the Oregon Health & Science University. All data were deidentified and stored securely.
Three Obstetrics-Gynecology (Ob-Gyn) scenarios were created by a clinical subject matter expert (Ob-Gyn attending physician) to represent standard ambulatory encounters. We created a replica of each clinical case in our simulation instance of EpicCare (Epic Systems) using techniques we have described in previous publications [
A list of all medical scribes was collected from the Scribe Program Supervisor of the OHSU medical scribing program. Medical scribes working at the OHSU Center for Women’s Health (CWH) were selected because they represented the largest proportion of all medical scribes working at OHSU. They were approached via email, phone texts, and phone calls to arrange simulation participation times. All scribes had a minimum of 1 year of scribe experience and minimum 6 months of experience scribing for CWH before study participation.
In order for the simulations to accurately replicate scribes’ work environment in real-world settings, the activity was conducted at the CWH, OHSU. For each simulated case, subjects were instructed to (1) familiarize themselves with each simulated patient chart before beginning the simulated physician-patient video, and (2) perform scribe activities in simulation just as they would during a real physician-patient interaction.. Simulations were performed in patient exam rooms at the CWH, OHSU that replicated real-world conditions accurately. Videos were displayed from a laptop computer on the exam table. Scribes used dedicated exam-room computers. The standardized narrative was read aloud to each scribe. Each simulation lasted between 6 and 18 min and scribes performed all three cases, in the same order.
Scribe- and physician-created notes were transferred from the Epic simulation environment into Pages (Apple Inc). Screenshots were taken of the Encounter, Labs, and Imaging tabs of Chart Review to determine whether the orders were pended. The gold-standard note was transferred from the Epic simulation environment into Pages in the same manner.
Scribe notes were evaluated for note length, word economy, data elements, copy and paste blocks, pended orders, and attestations. These structural elements were compared with each other to determine interscribe variability. Structural elements were also compared with our gold-standard note to determine accuracy and positive predictive value (PPV). PPV was defined as the ratio of scribe’s data elements also found in the gold-standard note to all those data elements included by the scribe. Data elements were defined as the individual positive and negative facts created by the scribe or gold standard from each of the patient-physician videos and provided resources. Data elements represented the interpretation of the scribe and the gold standard with respect to what was verbalized and performed during the encounter. Data elements were tabulated by note section, subjective, objective, or assessment and plan. The presence of copy and pasted blocks was determined using Plagiarism Checker X (Plagiarism Checker X, LLC), a plagiarism detection software package. Word economy was defined as the number of words required to create 1 data element or the number of words divided by data elements. Attestations were considered present if the medical-scribe included a statement at the end of their note signifying that they were a scribe working on behalf of the physician-provider.
We first wanted to determine the general structure and interscribe variability determined by data elements, note length, word economy, pended orders, attestations, and the specific structure of each note section. A total of 150, 183, and 118 unique data elements were found in case 1, case 2, and case 3, respectively (
We next sought to determine the commonality of data elements between scribes. For each scribe, for a given element, we determined what fraction of the total cohort of scribes documented this element in their note for and individual case. Data from all three cases were then pooled for analysis. We further subdivided the analysis to the three main sections: Subjective, Physical exam, and Assessment and plan (
These differences in note elements were associated with significant variability in global note structure and content. There was almost an 87-fold difference in note length in case 1 between the high and low, 55-fold difference in case 2, and 115-fold difference in case 3. Of note, variance was observed across all structural domains of the note (
Finally, we wished to determine differences in the general structure of scribes’ note with that of the gold-standard note. Errors of omission were demonstrated by calculating for accuracy, that is, the frequency by which scribes included all the data elements that were found in the gold-standard note. Similarly, errors of commission were demonstrated through the use of PPV, whereby we were able to calculate how often scribes in our study included information that was not present, and therefore assumed to be inaccurate, in the gold-standard note. Individual scribe accuracy ranged from 50% to 76%, whereas the accuracy of subjective, objective, and assessment and plan was 72%, 60%, and 56%, respectively. For individual scribes the PPV ranged from 38% to 81%. When scribe notes were averaged, the PPV of subjective, objective, and assessment and plan was 54%, 52%, and 69%, respectively (
Accuracy and Positive Predictive Value (PPV) for each simulated case by structural element.
Note section | Case #1 | Case #2 | Case #3 | ||||||||||||||||||||||||||||
1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | |||||||||||||||||
Subjective | 16 | 34 | 31 | 24 | 10 | 23 | 33 | 16 | 15 | 12 | 4 | 10 | 9 | 8 | 8 | ||||||||||||||||
PE | 14 | 15 | 4 | 1 | 0 | 12 | 13 | 4 | 8 | 6 | |||||||||||||||||||||
A&P | 6 | 7 | 3 | 4 | 2 | 13 | 9 | 16 | 15 | 11 | 2 | 4 | 2 | 2 | 1 | ||||||||||||||||
Subjective | 6 | 11 | 13 | 10 | 3 | 7 | 12 | 15 | 9 | 8 | 0 | 1 | 9 | 6 | 1 | ||||||||||||||||
PE | 3 | 2 | 4 | 3 | 4 | 1 | 3 | 4 | 3 | 4 | |||||||||||||||||||||
A&P | 3 | 2 | 14 | 0 | 4 | 3 | 2 | 3 | 5 | 2 | 7 | 6 | 5 | 7 | 3 | ||||||||||||||||
Subjective | 34 | 16 | 19 | 26 | 40 | 28 | 18 | 35 | 36 | 39 | 2 | 2 | 2 | 2 | 2 | ||||||||||||||||
PE | 2 | 1 | 12 | 15 | 16 | 2 | 1 | 10 | 6 | 8 | |||||||||||||||||||||
A&P | 2 | 1 | 2 | 3 | 3 | 4 | 4 | 3 | 4 | 4 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||
Subjective | 0.73 | 0.76 | 0.7 | 0.71 | 0.77 | 0.77 | 0.73 | 0.52 | 0.63 | 0.6 | 1 | 0.91 | 0.5 | 0.57 | 0.86 | ||||||||||||||||
PE | 0.82 | 0.88 | 0.5 | 0.25 | 0 | 0.92 | 0.81 | 0.5 | 0.73 | 0.6 | |||||||||||||||||||||
A&P | 0.67 | 0.78 | 0.18 | 1 | 0.33 | 0.81 | 0.82 | 0.84 | 0.75 | 0.85 | 0.22 | 0.4 | 0.29 | 0.22 | 0.25 | ||||||||||||||||
Subjective | 0.32 | 0.68 | 0.62 | 0.48 | 0.2 | 0.45 | 0.65 | 0.31 | 0.29 | 0.24 | 0.67 | 0.83 | 0.82 | 0.8 | 0.75 | ||||||||||||||||
PE | 0.88 | 0.94 | 0.25 | 0.06 | 0 | 0.86 | 0.93 | 0.29 | 0.57 | 0.43 | |||||||||||||||||||||
A&P | 0.75 | 0.88 | 0.6 | 0.57 | 0.4 | 0.76 | 0.69 | 0.84 | 0.79 | 0.73 | 0.67 | 0.8 | 0.67 | 0.67 | 0.5 |
Distribution of data elements. Each of the 5 scribes completed 3 separate simulation exercises. The absolute number of data elements for each section of the note was tabulated for each individual scribe. Subjective (Panel A), Physical exam (Panel B), and Assessment and plan (Panel C).
Interscribe commonality in data elements. Each of the 5 scribes completed 3 separate simulation exercises. For each section of the note, Subjective (Panel A), Physical exam (Panel B), and Assessment and plan (Panel C), the fraction of data elements for each scribe in common among the other scribes for all three cases is presented.
Distribution of Word Count. Five scribes each completed 3 separate simulation exercises. The absolute number of words for each section of the note was tabulated for each individual scribe. Subjective (Panel A), Physical Exam (Panel B), and Assessment and Plan (Panel C).
In this study, we created a novel virtual simulation to specifically assess scribe use and function. The use of a standardized video encounter carries the distinct advantage of untethering the simulation from a traditional simulation center, thereby improving accessibility of the training activity to multiple clinical environments. This represents a more scalable alternative, given how scribes are already reported to work in a variety of clinical environments and are deeply embedded in community clinics, many of which may not have access to traditional simulation. In addition, the use of a standardized video ensures consistency of the delivery of content, allowing for direct comparison of work-product between scribes and across practices.
With the standardization of the delivery of content and inclusion of the EHR as an integral part of the simulation activity, we were able to allow direct interscribe comparisons between notes, which revealed significant variability in note structure and length. There is a lack of clarity with respect to the extent of experience medical scribes require to attain any particular level of competency. Despite the fact that all of the scribes had at least 1 year of experience both in the specialty and with the EHR, there was almost a 3-fold difference in note length. Even more interesting was the difference in actual “note” elements between scribes. This is consistent with findings from studies showing discrepancies between physicians in the content and quality of documentation in notes [
Although the simulation provides the basis to assess differences in note structure, we were also able to create a methodology to look at note content. We found evidence of errors of commission (incorrect data) and omission (missing data) by comparing the data elements found in notes written by scribes versus the notes written by an expert clinician. Notably, there was a paucity of overlap in content between the notes, with less than 40% of the documented plan items and diagnoses being common across the scribes. This is consistent with the observation that there is wide variability in the content of resident-physician-generated progress notes, where the primary author of the note (the resident) was also responsible for acquisition of the primary data and synthesizing that information into medical decision making [
It is important to note some important limitations to this study. Whereas this study focused on note creation, which is the primary role of the scribe, it did not address other scribe-specific activities such as data entry and data gathering [
In conclusion, our study highlights the variability of scribe documentation and the need for a more standardized approach to training. This proof-of-concept study demonstrated a means of effectively evaluating scribe performance.
Center for Women’s Health
electronic health record
Health Insurance Portability and Accountability Act
Health Technology for Economic and Clinical Health
National Center for Advancing Translational Sciences
National Institutes of Health
Obstetrics and Gynecology
Oregon Clinical and Translational Research Institute
Oregon Health & Science University
positive predictive value
This publication was supported by AHRQ RO1 HS23793, the Donaghue Foundation, the Oregon Clinical and Translational Research Institute (OCTRI), grant number (TL1TR000129) from the National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (NIH).
None declared.