This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
As electronic medical records (EMRs) grow in size and complexity, there is increasing need for automated EMR tools that highlight the medical record items most germane to a practitioner’s task-specific needs. The development of such tools would be aided by gold standards of information relevance for a series of different clinical scenarios. We have previously proposed a process in which exemplar medical record data are extracted from actual patients’ EMRs, anonymized, and presented to clinical experts, who then score each medical record item for its relevance to a specific clinical scenario. In this paper, we present how that body of expert relevancy data can be used to create a test framework to validate new EMR search strategies.
As electronic medical records (EMR) become more common throughout the medical community, a wider variety of structured and unstructured data are being incorporated into them. Increasing EMR content has meant that some data necessary for clinical decision making are spread among several documents and repositories. This has the potential to increase practitioner workload, predispose to medical errors, and result in unnecessary utilization of health care resources [
Task-specific EMR search algorithms could ameliorate this situation by better addressing the diverse needs of practitioners [
In this paper, we describe how the expert relevancy ratings data can be employed as a test framework to validate search strategies. We include proposed formats for transmitting data between separate steps and a preliminary algorithm for assessing the concordance between the “hits” from a search strategy and the expert relevance ratings.
There are three main subprocesses that are required to implement this vision for any given clinical scenario (
The flow of data through a process of validated medical record searches for a specific clinical context. For a defined clinical context, a set of representative patients is selected and medical record items are extracted and anonymized. These datasets are then presented to a panel of domain experts who generate a set of rating data. Meanwhile, an automated search to highlight relevant items is designed and then run against all of the anonymized medical record data to determine which items would be considered “hits.” This result set is then compared with the expert relevance ratings and a normalized score is generated which quantifies the level of agreement between the search and the experts, which can then be used to design improvements in the search.
For a given clinical scenario, a set of matching patients can be selected. A sample of matching medical record items can then be extracted from the EMR system and anonymized. This set of medical record items for one patient is deemed a scenario, and can be expressed as an eXtensible Markup Language (XML) data file matching the following RELAX NG Compact open source schema found at the referenced link [
A set of such scenarios that are examples of a single clinical scenario is termed a <scenario_family>, as defined by the open source schema found at the referenced link [
Once a set of medical record data is available, it can be presented to a group of experts. The expert panel is made up of clinicians from the particular medical specialty tasked with the clinical scenario of interest. For example, if the method were being employed to identify medical record items pertinent to the clinical task of interpreting an MRI examination of the liver, the expert panel would be made up of abdominal radiologists knowledgeable in the clinical information germane to that task. The experts will rate each item for its relevance to the particular scenario along a four-step scale. The steps are labeled-“Irrelevant,” “Unlikely relevant,” “Probably relevant,” and “Certainly relevant.” These rating data can be gathered into an XML file that matches the open source schema, found at the referenced link [
The results of running a given search strategy against the medical record items contained in a <scenario_family> can be represented using an open source schema found at the referenced link [
A scoring metric was developed for describing the extent of agreement between results returned by a particular search strategy and the expert rating data. The strategy is based on calculation of the kappa statistic [
The overall performance of the search strategy is captured by a single metric,
A metric for the degree of concordance only for relevant included items,
These metrics can be represented according to the open source schema, available at the referenced link [
Equation to calculate a performance score for a search strategy based off of the expert relevancy ratings.
Federal subsidies in the Health Information Technology for Economic and Clinical Health Act have essentially ensured that EMR will become commonplace in US health care facilities [
The process described herein would allow for the use of an interactive search strategy design tool. After loading the sample medical record data and relevance ratings, the designer could modify a search strategy’s metadata conditions and regular expressions and assess the overall performance changes. The tool could also be engineered to allow the designer to drill into the result set to find exemplars of the items that result in a mismatch of relevance ratings and search results. When an optimized search strategy is found, it is essentially prevalidated.
One advantage to the process outlined above is that by basing the sample data on real patient medical records and physicians’ specific impressions of which items are useful in a particular context, a very specific, detailed model of relevance is created which simple search heuristics are unlikely to capture well. As search strategy developers add complexity to their tools, they will be able to tell whether modifications are actually resulting in better matching.
The datasets and relevance tools can be shared, and even made semipublicly available. The universally unique identifiers attached to the scenario families, scenarios, medical record items, and raters minimize the chances of duplicated data. Individual sites can add their own patient data to already specified clinical scenarios and recheck performance given their site-specific sample data. Adding new raters and incorporating their responses can reduce the effect of individual raters’ idiosyncrasies. The library of clinical scenarios can be expanded over time and shared.
The initial conception of the tool was to aid radiologists who desire relevant medical record information at the time of interpretation. However, many medical practitioners would benefit from having relevant items in the medical record highlighted for them, especially if the tool’s accuracy for including relevance and excluding irrelevance is high. Additionally, these context-specific search strategies represent potentially powerful research tools, specifically related to outcomes tracking [
There are many limitations to the search strategy validation process as described. First, the process of collecting the expert relevancy ratings is only semiautomated and therefore time intensive. Collection of the data requires clinical personnel, many of whom are already stretched thin and working in an atmosphere of shrinking margins, to take time away from clinical duties to perform the relevancy rating. The long-term viability of this semiautomated process requires further study and continuous process improvements to reduce the impact on experts. Second, the process relies completely on relevancy ratings communicated using a nondichotomous, ordinal scale of values. As a result, the method of data collection and subsequent validation framework fails to capture potentially valuable qualitative feedback from expert raters. Potential future work can be aimed to provide further nuance to the validation framework by incorporating qualitative feedback, such as free text entries from expert raters. Last, since this work only proposes and lays out this process, future work will be needed to validate the method of calculating the performance score and to determine whether search strategies validated by the process are actually deemed as useful by clinical providers in their daily practice. We expect that this mode of calculated search strategy performance will be only one component of evaluating and improving search strategies. Other important metrics of performance as well as the subjective experience of the returned results should also be considered to evaluate automated search strategies deployed for clinical use.
In this paper, we have outlined a process for developing and validating context-specific search strategies based on context-specific expert relevancy ratings. Since both the method for collecting the expert relevancy ratings and the framework for validating search strategies are provided as open-source tools with open formats for data interchange, any research group or commercial entity can develop software to bring data into this process and perform the proposed steps. We anticipate that the formats and process will be further refined over time as it is adapted to new tasks and clinical applications.
electronic medical record
eXtensible Markup Language
None declared.