This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Inclusion criteria for observational studies frequently contain temporal entities and relations. The use of digital phenotypes to create cohorts in electronic health record–based observational studies requires rich functionality to capture these temporal entities and relations. However, such functionality is not usually available or requires complex database queries and specialized expertise to build them.
The purpose of this study is to systematically assess observational studies reported in critical care literature to capture design requirements and functionalities for a graphical temporal abstraction-based digital phenotyping tool.
We iteratively extracted attributes describing patients, interventions, and clinical outcomes. We qualitatively synthesized studies, identifying all temporal and nontemporal entities and relations.
We extracted data from 28 primary studies and 367 temporal and nontemporal entities. We generated a synthesis of entities, relations, and design patterns.
We report on the observed types of clinical temporal entities and their relations as well as design requirements for a temporal abstraction-based digital phenotyping system. The results can be used to inform the development of such a system.
The increasing costs of health care [
The broad adoption of electronic health records (EHRs) [
“Good data” is a relative concept [
The value of structured data lies in its capacity of being computed without major processing, therefore several attempts have been made to overcome the lack of structured clinical data. A review by Shivade et al [
Despite current advances in the area, cohort building systems require a significant amount of effort to develop and test and, in real scenarios, the most commonly used strategy to deal with limited EHR data quality is to use a combination of simple rules and manual verification of clinical data from patient records [
Clinical researchers face many barriers when querying clinical databases to find patients that match a specific cohort definition. One problem is that querying clinical databases is a complex task requiring multiple interactions between clinical researchers and database experts. Among those complexities, inclusion criteria frequently define temporal patterns of clinical events, which need convoluted temporal database queries [
In this study, we systematically reviewed the critical care literature to characterize the temporal representation of inclusion criteria, interventions, and outcomes, used by clinical researchers when designing a clinical study. The product of this review is a set of basic temporal entities, temporal relations, and the resulting temporal phenotype design patterns. The results can be used to inform the design of temporal abstraction-based digital phenotyping systems.
We conducted a systematic literature review of published articles in the critical care domain. Using the Web of Science Journal Citation Reports, we selected the top 5 critical care journals according to their impact factor. Paired reviewers (MB, CD, JT, and JS) manually reviewed all publications and decided on inclusion or exclusion according to criteria described in the following section. Disagreements were solved by consensus.
We included retrospective studies conducted in intensive care unit settings which used data obtained from EHRs, clinical databases generated from EHRs, or through manual chart abstractions. We excluded studies which presented exclusively outpatient or emergency department data.
For every included study paired reviewers (MB, CD, JT, and JS) manually identified and extracted—using a purposefully built online form—all elements characterizing the study’s inclusion criteria, the interventions or exposures being studied (or the comparison group), and primary outcomes as defined by the original study authors following the Patient/Population, Intervention, Comparison, Outcome (PICO) framework [
Overview of the data extraction process.
When possible, if an interval or instant was itself an abstraction of lower-level concepts, it was decomposed into its parts according to the description explicitly provided or cited by the authors. If there were no details in the paper, we used standard definitions, when available. For example, when a systemic inflammatory response syndrome [
To minimize variability in the data extraction process, all researchers followed an initial training period. Researchers classified the identified elements—inclusion criteria, interventions or exposures, and outcomes—using the framework described above. Discrepancies on the concept extraction and temporal representations were resolved by group agreement. We performed descriptive statistics from the concept extractions and the temporal elements obtained.
The abstraction process was conducted iteratively and continued until the point of saturation. We predefined saturation as being met when including additional studies did not add any new types of temporal elements.
Finally, researchers systematically documented temporal and nontemporal relationships between the identified temporal elements. This allowed us to identify the temporal query design patterns present in the literature. Finally, we documented the required functionality for a novel temporal abstraction-based system to identify patient cohorts, interventions/exposures, and outcomes in large clinical databases.
After iteratively extracting clinical concepts, the point of saturation—where no new types of temporal elements were identified—was reached after reviewing 28 primary studies. We obtained a total of 362 clinical entities, 48.6% (n=176) were inclusion criteria, 24.3% (n=88) were classified as interventions or exposures, and 27.0% (n=98) were outcomes. Abstracted entities were further classified into categories according to their clinical type, which are described, with examples, in
Categories, examples, and frequencies of identified clinical entities (N=362).
Classification | Example | Count, n (%) |
Therapeutic intervention | Drugs or procedures: vancomycin, orotracheal intubation | 95 (26.2) |
Laboratory/diagnostic tests | Serum creatinine, hematocrit | 75 (20.7) |
Vital signs | Body temperature, respiratory rate, central venous pressure | 41 (11.3) |
Diagnosis | Pneumonia, urinary infection | 35 (9.7) |
Patient location | Intensive care unit hospitalization, patient transfer | 26 (7.2) |
Clinical scores | APACHE IIa, Cerebral Performance Category | 25 (6.9) |
Nontemporal attribute | Sex, ethnicity | 17 (4.7) |
Death | In-hospital deaths, 30-day mortality | 15 (4.1) |
Physical examination finding | Pupil diameter, abdominal pain | 11 (3.0) |
Past medical history | History of trauma | 7 (1.9) |
Disposition | Discharge to home, institution, or other health center | 5 (1.4) |
Other | Appropriate antibiotic usage | 10 (2.8) |
aAPACHE II: Acute Physiology And Chronic Health Evaluation II.
Of the 362 abstracted entities, 328 could be classified as clinical instants or intervals. Most entities could be abstracted as instants (54.1%, 196/362). This type of abstraction is used to represent a clinical event that does not have a duration but has a timestamp. For example, one inclusion criteria in this category was “the presence of arterial lactate > 2.5 mmol/L.” As much as 36.5% (132/362) of abstracted entities were of type interval. This type of abstraction is used to represent a clinical event that has a duration—defined by a start and end time—greater than 0. An example of a clinical interval is “noninvasive mechanical ventilation for at least 48 hours.”
Further analysis of clinical intervals showed that they can also be subdivided into 3 different categories:
Instant-based intervals: clinical intervals that are abstractions of identical instants. An example of this is hypothermia interval in which the interval is an abstraction of multiple instants of low body temperature measurements. Sometimes specific conditions have to be met to abstract this kind of interval: a time interval for a patient receiving more than 100 mL/hour of intravenous fluids. In other occasions, the instants were only used as categorical variables, regardless of the quantity: patient receiving normal saline infusion.
Bounded intervals: clinical intervals that are abstractions of specific instants defining their start and end times. An example of this is a hospitalization interval, where the start is defined by an admission instant and the end is defined by a discharge instant. Additional arithmetic operations may need to be applied to these intervals, for example, a clinical interval describing a hospitalization longer than 7 days.
Moving window intervals: clinical intervals where a specific condition needs to be met during a predefined window of time. An example of this was an oliguria interval, in which the condition oliguria (urinary output < 0.5 mL/kg/hour) has to be met during a 6-hour window. This denomination is consistent with previous descriptions [
Graphic examples of the 3 types of intervals are presented in
Three observed categories of clinical temporal intervals.
In a small subset of intervals, arithmetic calculations were needed to correctly abstract them. For example, calculating an interval of pulse pressure variation (PPV) within a defined range would require calculating PPV (%) = 100 × 2 ([PPmax – PPmin]/[PPmax + PPmin]) at each instant before executing the abstraction. Other examples of within-interval calculations included counting the number of instants occurring inside an abstracted interval. An example of this would be an outcome defined as the number of chest x-rays performed on each patient during his or her stay in the intensive care unit; the interval is of type bounded (admission/discharge from the intensive care unit) and we need to count the number of additional instants (chest x-rays) occurring within the interval.
We explored the temporal relations between instants and intervals and, as expected, all of them conformed to the temporal logic described by Allen [
Examples of temporal relations.
In addition, some intervals were constructed by combinations of Boolean relations between intervals and instants. For example, to adequately represent a pediatric sepsis interval as defined by the International Pediatric Sepsis Consensus Conference [
Some of the extracted entities did not have a temporal component and were denominated nontemporal patient attributes. Examples of these are age, race, and sex.
Finally, 5.5% (20/362) of the extracted concepts were not able to be represented using this proposed framework. For example, the outcome appropriate antimicrobial administration defined as whether the isolated bacteria were susceptible to the administered antibiotic implies a qualitative interpretation of a laboratory examination, which is out of scope of a temporal representation of clinical entities.
One additional functionality that was particularly salient was the need to perform nested queries. In a nested query, a query uses the output of another query as its input. Observational studies frequently explore the effect of a specific exposure; this study design involves creating 2 patient cohorts that are identical except for the exposure. When the outcome is assessed in both cohorts, a nested query is the most natural way to satisfy this requirement:
SELECT (Outcome Phenotype) FROM (Exposure Cohort)
In this case
The combination of temporal entities (instants and intervals), temporal relations, and nontemporal patient attributes can be used to describe the different observed patterns. A graphic description is presented in
Intervals can be temporally related to either instants or intervals. The same was observed for instants. We observed all 13 temporal relations described by Allen [
Intervals can use external variables as a condition to meet either before or after being abstracted. The first case would be the abstraction of an interval of reduced urinary output (<5 mL/kg/hour), in which each urinary output instant needs to be checked against the patient’s body weight (the external variable) before being added to the interval. An example for the second case could be
Examples of identified clinical temporal design patterns. ICU: intensive care unit.
This study presents a systematic, literature-based assessment of design requirements to develop a temporal abstraction-based digital phenotyping tool. Such a tool would facilitate the conduction of retrospective clinical studies in critical care using routinely collected electronic clinical data through enabling a rich description of clinical phenotypes. Once validated, these temporally abstracted digital phenotypes should be able to correctly represent patient cohorts, clinical interventions or exposures, as well as relevant clinical outcomes. The iterative nature of this review, which was conducted until reaching information saturation, adds robustness to its findings.
The initial findings of this review are consistent with previous research describing the nature of temporal clinical entities, in the form of clinical instants and intervals [
First, this review shows that 3 subtypes of clinical intervals—instant-based, bounded, and moving window—are necessary to adequately represent digital phenotypes. Second, in addition to these interval subtypes, there is a need to perform calculations both within a clinical interval and with data external to the interval being abstracted. The third component involves the need to allow for nested queries when building digital phenotypes for observational studies. Other findings of this systematic review confirm the need to query for temporal relations and Boolean relations as described by Mo et al [
The main limitation of this review is its focus only on intensive care studies. We chose this setting given the temporal density of clinical data collected during critical care episodes. We cannot claim that these findings will be similar in other clinical domains; that statement would need to be explicitly verified in additional studies. A second limitation is the exclusion of inclusion criteria based on free text contained in clinical notes or reports. This was an explicit decision given our goal of designing a digital phenotyping system able to abstract higher-level concepts from structured data without relying on free text. We still need to demonstrate the feasibility of this approach [
Acute Physiology And Chronic Health Evaluation II
electronic health record
Patient/Population, Intervention, Comparison, Outcome
pulse pressure variation
This work was supported by DC’s CONICYT-FONDECYT (Chile) grant (no. 11130577).
MB contributed with data analysis, manuscript write-up, and final document review. CD, JS, and JT contributed with data analysis and final document review. DC contributed with study conceptualization and design, data analysis, manuscript write-up, final document review, and decision to submit.
None declared.