This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off.
The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow.
We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness.
The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents.
The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.
Information flow, defined as channels, contact, communication, or links to pertinent people [
Nursing handover is a form of clinical narrative [
The Australian National Health and Medical Research Council [
An audio recording of a complete nursing handover requires ethical consenting of the nursing team, patients, visitors, and all other incidental clinical staff. It is difficult to obtain a “natural” recording—that could be provided without restriction on its use—under such conditions. Audio recordings also present significant difficulties in terms of identification of patients [
Ethical deidentification of the nursing handover for open data is not realistic. The
In the case of clinical nursing notes and handover, precise data does not exist in an open form. By open we mean without restriction [
Free-form text, as an entry type, is essential to release clinicians’ time from documentation for other tasks [
The development of these techniques is hindered by access to data for research, development, and evaluation [
By providing an open clinical dataset, that includes verbatim conversations and associated audio recordings, we anticipate a greater impact from the shared computational tasks, and increased development in natural language technologies for clinical text. Consequently, the significance of this study hinges not only on opening our data and some processing software to the research and development community, but also on publishing our performance evaluation results as a benchmark for tracking of performance improvements in time.
We created a synthetic dataset of 101 handover records (see
An example record that originates from our dataset.
The patient profile was developed using common user profile generation techniques [
Each imaginary profile was given a stock photo from a royalty-free gallery, name, age, admission story, in-patient time, and familiarity to the nurses giving and receiving the handover. All patients were adults, but both young and old people were included. Some patients were recently admitted to the ward, some had been there for some days already, and some were almost ready to be discharged. For some patients, the in-patient time was short and for other patients it was longer. Within the admission story, the reason for admission was always an acute condition, but some patients had also chronic diseases.
The first author created a synthetic, written, free-form text document for the sample profile and supervised a RN in creating these documents for the remaining 100 profiles.
The RN had over twelve years experience from clinical nursing. She spoke Australian English as her second language and was originally from the Philippines. The RN’s written consent was obtained for gathering, using, and releasing the spoken and written documents she created. She performed all these creative speaking and writing tasks as a National Information and Communications Technology, Australia (NICTA) employee alone in an office environment.
The RN was guided to imagine herself working in the medical ward and delivering verbal handovers to another nurse at a nursing shift change by the patient’s bedside (see
The RN was asked to write, for each patient profile, a realistic, but fully imaginary text document (ie, TXT file) as if she was talking and using normal wordings. The document length was set to 100-300 words.
In consultation with Nursing Handover domain experts, the first and third authors developed a handover form (
The form consisted of six headings (ie,
This form structure is also consistent with the five-step nursing process model by the American Nurses Association (ANA): Assessment, Diagnosis, Outcomes/Planning, Implementation, and Evaluation [
ANA specifies that information about the first three steps should be documented under the patient’s care plan in the patient’s record so that nurses and other health care professionals caring for the patient have access to it. The Assessment step refers to a nurse collecting and analyzing patient information, including, physiological data together with psychological, sociocultural, spiritual, economic, and life-style factors. The Diagnosis step refers to his/her clinical judgment about the patient’s response to actual or potential health conditions or needs. The Outcomes/Planning step refers to the nurse setting, based on the two previous steps, measurable and achievable short- and long-range goals for this patient. In our form, these three steps were covered under the headings of
The Implementation step refers to the implementation of nursing care in accordance with the care plan in order to assure the continuity of care for the patient during hospitalization and in preparation for discharge. Also, this delivered care is to be documented in the patient’s record. In our form, it was covered under the headings of
The Evaluation step refers to the continuous evaluation of the patient’s status and the effectiveness of the nursing care and the respective modifications of the (written) care plan. Our form captured this step by considering the longitudinal series of handover documents in time.
Descriptive statistics of text snippets highlighted by the registered nurse in the 101 written, structured documents used as a reference standard in information extraction together with the performance of our best information extraction system. RN: registered nurse; RS: reference standard; IE: information extraction; NA: not applicable; min: minimum; max: maximum.
The first author created a model structuring of the sample patient’s written, free-form text document with respect to the mutually exclusive categories of the handover form and supervised the RN in creating these written, structured documents for the remaining 100 profiles. The RN proofed and agreed on this sample structuring. The first author installed Protégé 3.1.1 with the Knowtator 1.9 beta [
The RN was reminded that, on one hand, not all documents include information for all form categories and, on the other hand, some documents have relevant information to a given category multiple times (eg, if a given patient was referred to in a document with both a given name Michael and nickname Mike, both these occurrences were to be assigned to the category of
The first and second author performed light proofing of these 101 structured documents in total. More precisely, they improved the consistency in including/excluding articles or titles, as well as in marking gender information in each document if it was available.
The first author supervised the RN in creating the spoken, free-form text documents by reading the 100 written free-form text documents out loud as the nurse giving the handover. She was guided to try to speak as naturally as possible, avoid sounding like reading text, and repeat the take until she was satisfied with the outcome (see
The Olympus WS-760M digital recorder and Olympus ME52W noise-canceling lapel-microphone (see
The first author edited each Windows Media Audio (WMA) audio recording to include only one handover document. This included assuring the file beginning and end did not include recordings that the RN was unsatisfied with, file identifiers, or other additional content.
We used Dragon Medical 11.0 to convert the audio files to written, free-form text documents. This software was initialized with respect to the RN’s details of age of 22-54 years and accent of Australian English, and trained to her voice by recording her reading the document of The Final Odyssey (3893 words, 29 minutes 22 seconds, 4 minutes needed) using the aforementioned recorder and microphone. This training, tailoring, or adaptation to a speaker’s voice was left minimal, since it could limit comparability with other studies and might not be feasible for every clinician in practice. To meet the software requirements, the first author converted WMA recordings from stereo to mono tracks and exported them from WMA to WAVeform (WAV) files on Audacity 2.0.3 [
We compared the Dragon vocabularies of
We applied the SCLITE scoring tool of the SR Scoring Toolkit 2.4.0 [
We chose the vocabulary resulting in the best performance in terms of the correctly recognized words (see the Results section) for a more detailed error analysis. The correct, substituted, inserted, and deleted words were defined by the aforementioned SCLITE scoring tool. As the most fundamental concept in this analysis, we measured the phonetic similarity (PS), defined as a perceptual distance between speech sounds [
We implemented a simple PS measure, which combines the Double Metaphone phonetic encoding algorithm [
We used our expert-annotated dataset to train and evaluate IE systems. We considered this learning problem as a task where each word in text is considered as an entity with features and the goal is to assign it automatically to one or none of the categories. We chose to apply the conditional random field (CRF) [
We generated the features by processing the original records using Stanford CoreNLP (English grammar) by the
In the CRF++ template, we defined in the
Experimented syntactic features.
ID | Name | Definition | Example | Software | In our best IE system |
1 | Word | Word itself | “Patients” or “had” | None | Yes |
2 | Lemma | Lemma of the word | “patients” or “have” | CoreNLP | Yes |
3 | NERa | NERa tag of the word for named entities (ie, person, location, organization, other proper name) and numerical entities (ie, date, time, money, number) | “number” for “5” | CoreNLP | Yes |
4 | POSb | POSb tag of the word | “IN” (ie, preposition) for “in”, “NN” (ie, common noun as opposed to Proper Name, “PN”) for “bed”, “CN” (ie, cardinal number) for “5” | CoreNLP | Yes |
5 | Parse tree | Parse tree of the sentence from the root to the current word | “ROOT-NP-NN” |
CoreNLP | Yes |
6 | Basic dependents | Basic dependents of the word | “Cardinal number 5” that refers to the bed ID for “bed” in “In bed 5 we have...” | CoreNLP | Yes |
7 | Basic governors | Basic governors of the word | Preposition “in” and subject “we” for “have” in “In bed 5 we have...” | CoreNLP | Yes |
8 | Phrase | Phrase that contains this word | “In bed 5” for “bed” in “In bed 5 we have”... | MetaMap | Yes |
a NER = named entity recognition
b POS = part of speech
Experimented semantic features.
ID | Name | Definition | Example | Software | In our best IE system |
9 | Top 5 candidates | Top 5 candidates retrieved from UMLSa | “BP” may refer to, for example, “Bachelor of Pharmacy” |
MetaMap | Yes |
10 | Top mapping | Top UMLSa mapping for the concept that is the best match with a given text snippet | “pneumonia” is a type “respiratory tract infection” | MetaMap | Yes |
11 | Medication score | 1 if the word is a full term in ATCLb; else 0.5 if it can be found in ATCLb; 0 otherwise | 1 for “acetylsalicylic acid” | NICTA | Yes |
a UMLS = Unified Medical Language System
b ATCL = Anatomical Therapeutic Chemical List
Experimented feature types, statistical features.
ID | Name | Definition | Example | Software | In our best IE system |
12 | Location | Location of the word on a ten-point scale from the beginning of the document to its end | “1” for the first word and “10” for the last word | NICTA | Yes |
13 | Normalized term frequency | Number of times a given term occurs in a document divided by the maximum of this term frequency over all terms in the document |
|
NICTA | No |
14 | Top 5 candidates’ | As 9 using SNOMED-CT-AUa |
|
Ontoserver | No |
15 | Top mapping’ | As 10 using SNOMED-CT-AUa |
|
Ontoserver | No |
16 | Top 5 candidates’’ | As 9 using AMTb |
|
Ontoserver | No |
17 | Tom mapping’’ | As 10 using AMTb |
|
Ontoserver | No |
a SNOMED-CT-AU = Systematized Nomenclature of Medicine - Clinical Terms - Australian Release
b AMT = Australian Medicines Terminology
To evaluate the system performance, we used cross-validation (CV) with 100 documents for training and leaving out one for testing (ie,
In these evaluations, we measured the Precision, Recall, and F1 (ie, the harmonic mean of Precision and Recall) as implemented in use in
We also used two baseline systems: (1) the random baseline assigned a class to each word randomly and (2) the majority baseline the most frequent class (ie,
Finally, to assess the stability and robustness of our categorization form, expert annotations, and IE system, we performed an experiment, where our goal was to predict only the highest-level classification task to the heading categories of
This experiment tested the null hypothesis of detailed annotations not being helpful for system performance. On the one hand, if we gained evidence to support the alternative hypothesis of detailed annotations being helpful, we would need to divide the more loosely defined and verbose categories (eg,
In any case, even though it was more laborious to annotate free-form text with respect to the fifty categories of our form versus using the seven heading-level categories only, automatically generated structured documents, enabled by these more detailed annotations have many benefits. Namely, they support the documents reuse in computerized decision making and surveillance in health care better than the loosely classified documents.
The released dataset, called NICTA Synthetic Nursing Handover Data [
Descriptive statistics of the dataset are given in
The licensing constraints were set as follows, the license of the spoken, free-form text documents (ie, WMA and WAV files) was set as “Creative Commons - Attribution Alone - Noncommercial - No Derivative Works” [
All documents were made publicly available on the Internet. They will be used in the CLEFeHealth 2015 evaluation laboratory for a shared task on SR [
The technical pipeline (ie, recorded voice, transcription, analysis) has been validated in clinical settings and published [
Although the data we provided are a simulation of nursing handover, the written text for the handover scenario was based upon 150 live audio recordings of nursing handover in several Sydney-based hospitals [
Finally, also the technical performance, including the suitability of different vocabularies for SR and features resulting in the best IE system, was similar [
Descriptive statistics of the 100 written, free-form text documents produced by the RN.
Descriptor | Subdescriptor | Patient type | Patient type | Patient type | Patient type | All |
|
|
Cardiovascular | Neurological | Renal | Respiratory |
|
Documents | Number documents | 25 | 25 | 25 | 25 | 100 |
|
Number of words | 1795 | 1545 | 1818 | 2119 | 7277 |
|
Number of unique words | 556 | 500 | 496 | 604 | 1304 |
|
Number of inside words | 1140 | 1006 | 1086 | 1305 | 4547 |
|
Number of unique inside words | 447 | 397 | 408 | 483 | 1106 |
Number of words in a document | Minimum | 19 | 26 | 29 | 31 | 19 |
|
Maximum | 162 | 106 | 149 | 209 | 209 |
|
Mean | 70 | 60 | 71 | 83 | 71 |
|
SD | 37 | 22 | 33 | 39 | 34 |
Top 10 words in documents | 1st (n)a | and (95) | and (64) | and (88) | and (100) | and (347) |
|
2nd (n)a | he (59) | is (60) | is (72) | is (69) | is (256) |
|
3rd (n)a | for (58) | he (54) | he (69) | on (63) | he (243) |
|
4th (n)a | is (55) | she (38) | is (46) | he (61) | in (170) |
|
5th (n)a | the (43) | in (35) | she (46) | with (51) | for (163) |
|
6th (n)a | with (43) | with (34) | the (38) | in (49) | with (162) |
|
7th (n)a | in (40) | on (33) | with (34) | for (43) | she (151) |
|
8th (n)a | to (32) | for (31) | came (32) | she (42) | on (141) |
|
9th (n)a | of (30) | to (29) | for (31) | the (37) | the (138) |
|
10th (n)a | came (27) | came (24) | to (30) | to (33) | to (124) |
Top 10 inside words in documents | 1st (n)a | he (57) | he (52) | he (63) | and (51) | he (220) |
|
2nd (n)a | for (47) | she (35) | she (39) | he (48) | she (139) |
|
3rd (n)a | and (26) | for (25) | and (34) | she(40) | and (131) |
|
4th (n)a | bed (25) | dr (22) | bed (24) | for (27) | for (118) |
|
5th (n)a | she (25) | and (20) | is (24) | dr (25) | dr (88) |
|
6th (n)a | dr (23) | old (20) | to (23) | is (20) | to (84) |
|
7th (n)a | to (22) | bed (19) | old (21) | on (20) | bed (80) |
|
8th (n)a | the (21) | to (19) | yrs (21) | to (20) | is (76) |
|
9th (n)a | her (18) | yrs (17) | all (20) | room (18) | old (72) |
|
10th (n)a | old (18) | her (16) | for (19) | of (16) | all (61) |
a The notation “word, n” specifies that the word “word” occurred “n” times.
Descriptive statistics of the 100 written documents produced by the RN.
Descriptor | Subdescriptor | Sample document | Patient type | Patient type | Patient type | Patient type | All |
|
|
|
Cardiovascular | Neurological | Renal | Respiratory |
|
Documents | Number of documents | 1 | 25 | 25 | 25 | 25 | 101 |
|
Number of words | 167 |
|
|
|
|
8487 |
|
Number of unique lemmas | 92 |
|
|
|
|
1283 |
Number of words in a document | Minimum | 167 | 26 | 37 | 35 | 32 | 26 |
|
Maximum | 167 | 181 | 170 | 239 | 120 | 239 |
|
Mean | 167 | 80.80 | 82.24 | 98.12 | 71.96 | 84.10 |
|
SD | 0 | 38.70 | 35.24 | 43.46 | 24.06 | 38.02 |
Number of unique lemmas in documents | Minimum | 92 | 22 | 22 | 27 | 27 | 22 |
|
Maximum | 92 | 99 | 96 | 126 | 79 | 126 |
|
Mean | 92 | 53.64 | 54.48 | 63.84 | 48.60 | 55.50 |
|
SD | 0 | 19.83 | 17.44 | 21.84 | 12.80 | 19.35 |
Top 10 lemmas in documents | 1st (n)a | be (15) | be (115) | be (119) | be (126) | be (111) |
|
|
2nd (n)a | he (13) | and (95) | he (95) | and (100) | he (68) |
|
|
3rd (n)a | and (4) | he (75) | and (88) | he (79) | and (64) |
|
|
4th (n)a | to (4) | for (58) | she (63) | on (63) | she (57) |
|
|
5th (n)a | a (3) | she (44) | in (46) | she (59) | in (35) |
|
|
6th (n)a | headache (3) | the (43) | the (38) | with (51) | with (34) |
|
|
7th (n)a | it (3) | with (43) | have (36) | in (49) | on (33) |
|
|
8th (n)a | that (3) | in (40) | with (34) | for (43) | for (31) |
|
|
9th (n)a | the (3) | to (32) | come (33) | the (37) | to (29) |
|
|
10th (n)a | carotid (2) | of (30) | for (31) | to (33) | have (26) |
|
Number of highlighted text snippets in a document | Minimum |
|
|
|
|
|
8 |
|
Maximum |
|
|
|
|
|
33 |
|
Mean |
|
|
|
|
|
16.15 |
|
SD |
|
|
|
|
|
5.29 |
a The notation “word, n” specifies that the word “word” occurred “n” times.
The best vocabulary for SR was
When considering the different patient types and the
In text relevant to the form, 836 unique errors were present when using the
Speech recognition performance with the vocabularies of general, medical, nursing, cardiology, neurology, and pulmonary disease illustrated as a summary over the 100 documents. The notation of the x axis details the mean and SD for each Dragon vocabulary.
TRANSCRIPTION OF SPOKEN, FREE-FORM TEXT DOCUMENT:
“On a bed three is Ken Harris, 71 years old under Dr Gregor. He came in with arrhythmia. He complained of chest pain this morning and ECG was and was reviewed by the team. He was given some anginine and morphine for the pain and he is still tachycardic and new meds have been ordered in the medchart. Still for pulse checks for one full minute. Still awaiting for echo this afternoon. His blood pressure is just normal though he is scoring MEWS of three for the tachycardia. Otherwise he still for monitoring.”
WRITTEN, FREE-FORM TEXT DOCUMENT:
“Ken harris, bed three, 71 yrs old under Dr Gregor, came in with arrhythmia. He complained of chest pain this am and ECG was done and was reviewed by the team. He was given some anginine and morphine for the pain. Still tachycardic and new meds have been ordered in the medchart. still for pulse checks for one full minute. Still awaiting echo this afternoon. His BP is just normal though he is scoring MEWS of 3 for the tachycardia. He is still for monitoring.”
WRITTEN, SPEECH-RECOGNIZED, FREE-FORM TEXT DOCUMENT USING THE NURSING VOCABULARY:
“Own now on bed 3 he is then Harry 70 is 71 years old under Dr Greco he came in with arrhythmia he complained of chest pain this morning in ECG was done and reviewed by the team he was given some and leaning in morphine for the pain in she is still tachycardic in new meds have been ordered in the bedtime is still 4 hours checks for one full minute are still waiting for echocardiogram this afternoon he is BP is just normal though he is scarring meals of 3 for the tachycardia larger otherwise he still for more new taurine.”
Our best IE system classified 6349 out of the 8487 words correctly with respect to the 36 categories present in the RS (
Most frequent category confusions related to irrelevant words (
In comparison, the majority baseline achieved overall a very modest performance (macro-averaged Precision, Recall, and F1 of 0.051, 0.091, and 0.065 over the 35 form categories and zero Precision, Recall, and F1 for
Each system feature contributed to the 36 categories differently (see
In the highest-level classification task with all but the
Automatically structured text that corresponds to our example document (
Learning curves for cross-validation settings that included training set sizes of 20, 40, 60, 80, and 100 (ie, leave-one-out) documents with mutually exclusive folds, which in combination covered all data. CV: cross validation; and LOO: leave one out.
Confusion matrix between the reference standard (rows) and our best information extraction system (columns) in the 36-class multi-class classification task. Zero columns of 10, 15, 18, 20, 24, 25, and 28 have been removed for space constraints. For clarity, diagonal elements have been emphasized, and zero elements have been left empty. The category numbering corresponds to
To demonstrate the SR and IE system design and workflow, we implemented a Web-app, written in the HyperText Markup Language, version 5 to allow any Web-browser to use it (
As an input, the app receives a form structure and an XML document, which includes all information needed to fill out this form. That is, the input has typed or speech-recognized text documents and their word-by-word classification with respect to the form categories.
The user (eg, a nurse) can choose a report to be structured from the
Extending the app to other IE tasks is straightforward by simply updating the input. However, we need to emphasize that this app performs visualization and not processing. That is, the spoken documents need to be converted to writing (by typing or SR) and classified with respect to the form structure (by manual highlighting or automated IE) in advance.
SR has not been included in the app. This is mainly because of the licensing constraints related to using a domain-specialized SR method (for a Microsoft Windows computer) that also needs to be trained to each speaker individually. However, also the aspect of being able to demonstrate the app in a noisy conference, technology festival, and other showcase environments led us to not include SR in the app.
National Information and Communications Technology, Australia (NICTA) speech to clinical text demonstration system that visualizes the example record.
Cascaded SR and IE to fill out a handover form for clinical proofing and sign-off provide a way to make clinical documentation more effective and efficient. This way also improves accessibility and availability of existing documents in clinical judgment, situational-awareness, and decision making. Thereby, it contributes to the health care quality and people’s health.
This cascading also evokes fruitful research challenges. First, conducting SR at clinical wards with noisy background and accented speakers is much more difficult than in a peaceful office. Second, its errors multiply when cascaded with IE. Third, every system error may have severe implications in clinical decision making. However, neither shared evaluation sets, nor baseline methods exist for this task.
In this paper, we have opened realistic, but synthetic data, methods, and evaluations related to clinical handover, SR, and IE to the research community in order to stimulate research and track continuous performance improvements in time. We have also introduced a Web app to demonstrate the system design and workflow.
A real hospital setting cannot be idealized or modeled in a laboratory. Although we have attempted to capture the main components of a nursing handover scenario, there are several limitations in the data.
These limitations represent opportunities for future data gathering exercises. First, we used a single narrative voice rather than a team environment. In order to further develop any real system, collection of multiple voices communicating in a group setting is needed. Second, we did not include patient responses. In the recorded data from real nursing scenarios, patients rarely contributed to the conversation. Third, the data comprises 100 full verbatim documents. This provides a low power to any statistical analysis, and hence more data are always beneficial.
A detailed performance evaluation and error analysis of the system as a whole (ie, extrinsic evaluation) and each of its components (ie, intrinsic evaluation) is a crucial step in the development of cascaded pipeline apps [
These rates of sound-alike SR-errors and slightly incorrect highlighting boundaries are not likely to harm a document’s human readability. This is because the context around the highlighted text snippets is likely to assist in reading the text correctly. However, the extrinsic performance of this cascaded system remains to be formally evaluated.
Every corrected error is one less potential error in clinical decision making and in SR, a substantial amount of errors occur with words that are phonetically similar to each other. Based on our error analysis, the correction method should consider the following five characteristics: (1) PS between words or word sequences; (2) detection and correction of errors in proper names, by using, for example, other parts of a given patient’s record; (3) difference between single-word and multi-word errors; (4) proofing for spelling and grammar; and (5) clear marking of automatically corrected words and possibility to choose a correction candidate interactively from a ranked list.
Clinical SR has resulted in 1.3-5.7 times faster turnover-time in scientific studies [
Clinical SR achieves an impressive word correctness percentage of 90-99, with only 30 to 60 minutes of training to a given clinician’s speech. In other words, correcting SR errors by hand as a part of proofing is not likely to be time consuming. This recognition rate is supported by studies using the speech of twelve US-English male physicians on two medical progress notes, one assessment summary, and one discharge summary [
Similarly to the good correctness of clinical SR, clinical IE has gradually improved to exceed F1 of 0.90 in 1995-2008 [
The benefits of the combined use of SR and IE for handover documentation are twofold [
Additional illustrations and guidelines for creating showcase data.
American Nurses Association
Conference and Labs of the Evaluation Forum
conditional random field
cross validation
information extraction
leave one out
not applicable
named entity recognition
National Information and Communications Technology, Australia
natural language processing
part of speech
phonetic similarity
registered nurse
reference standard
speech recognition
WAVeform (audio format)
Windows Media Audio
The Australian Government, through the Department of Communications, and the Australian Research Council, through the Information and Communications Technology Centre of Excellence Program, fund NICTA. NICTA is also funded and supported by the Australian Capital Territory, the New South Wales, Queensland, and Victorian Governments, the Australian National University, the University of New South Wales, the University of Melbourne, the University of Queensland, the University of Sydney, Griffith University, Queensland University of Technology, Monash University, and other university partners. We used the Protégé resource, which is supported by grant GM10331601 from the National Institute of General Medical Sciences of the United States National Institutes of Health. We express our gratitude to Maricel Angel, RN at NICTA, for helping us to create the datasets for SR and IE. We acknowledge the contribution of Andrea Lau and Jack Zhao at Small Multiples for implementing our demonstration system. The first and third authors conceptualized the study, justified its significance, defined the form categories, and developed our demonstration system. The first author designed and supervised the process of creating realistic, but synthetic datasets as well as performed all SR experiments. Together with the last author, she analyzed SR errors and feasibility of phonetic similarity for their correction. The first two authors conducted all work related to IE. Together with the third author, the first author drafted the manuscript, and after this, all authors critically commented and revised it.
None declared.