Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Monday, December 24 through Wednesday, December 26 inclusive. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Published on 16.01.19 in Vol 7, No 1 (2019): Jan-Mar

Preprints (earlier versions) of this paper are available at, first published Jul 15, 2018.

This paper is in the following e-collection/theme issue:

    Original Paper

    SNOMED CT Concept Hierarchies for Computable Clinical Phenotypes From Electronic Health Record Data: Comparison of Intensional Versus Extensional Value Sets

    1University of Texas Southwestern Medical Center, Dallas, TX, United States

    2University of Wisconsin School of Medicine and Public Health, Madison, WI, United States

    Corresponding Author:

    Ling Chu, MD

    University of Texas Southwestern Medical Center

    5323 Harry Hines Boulevard

    Dallas, TX, 75390

    United States

    Phone: 1 214 648 1303



    Background: Defining clinical phenotypes from electronic health record (EHR)–derived data proves crucial for clinical decision support, population health endeavors, and translational research. EHR diagnoses now commonly draw from a finely grained clinical terminology—either native SNOMED CT or a vendor-supplied terminology mapped to SNOMED CT concepts as the standard for EHR interoperability. Accordingly, electronic clinical quality measures (eCQMs) increasingly define clinical phenotypes with SNOMED CT value sets. The work of creating and maintaining list-based value sets proves daunting, as does insuring that their contents accurately represent the clinically intended condition.

    Objective: The goal of the research was to compare an intensional (concept hierarchy-based) versus extensional (list-based) value set approach to defining clinical phenotypes using SNOMED CT–encoded data from EHRs by evaluating value set conciseness, time to create, and completeness.

    Methods: Starting from published Centers for Medicare and Medicaid Services (CMS) high-priority eCQMs, we selected 10 clinical conditions referenced by those eCQMs. For each, the published SNOMED CT list-based (extensional) value set was downloaded from the Value Set Authority Center (VSAC). Ten corresponding SNOMED CT hierarchy-based intensional value sets for the same conditions were identified within our EHR. From each hierarchy-based intensional value set, an exactly equivalent full extensional value set was derived enumerating all included descendant SNOMED CT concepts. Comparisons were then made between (1) VSAC-downloaded list-based (extensional) value sets, (2) corresponding hierarchy-based intensional value sets for the same conditions, and (3) derived list-based (extensional) value sets exactly equivalent to the hierarchy-based intensional value sets. Value set conciseness was assessed by the number of SNOMED CT concepts needed for definition. Time to construct the value sets for local use was measured. Value set completeness was assessed by comparing contents of the downloaded extensional versus intensional value sets. Two measures of content completeness were made: for individual SNOMED CT concepts and for the mapped diagnosis clinical terms available for selection within the EHR by clinicians.

    Results: The 10 hierarchy-based intensional value sets proved far simpler and faster to construct than exactly equivalent derived extensional value set lists, requiring a median 3 versus 78 concepts to define and 5 versus 37 minutes to build. The hierarchy-based intensional value sets also proved more complete: in comparison, the 10 downloaded 2018 extensional value sets contained a median of just 35% of the intensional value sets’ SNOMED CT concepts and 65% of mapped EHR clinical terms.

    Conclusions: In the EHR era, defining conditions preferentially should employ SNOMED CT concept hierarchy-based (intensional) value sets rather than extensional lists. By doing so, clinical guideline and eCQM authors can more readily engage specialists in vetting condition subtypes to include and exclude, and streamline broad EHR implementation of condition-specific decision support promoting guideline adherence for patient benefit.

    JMIR Med Inform 2019;7(1):e11487





    Given widespread adoption of electronic health records (EHRs) certified to follow terminology standards, why does achieving interoperable clinical phenotype definitions remain challenging? Practical approaches to analytic interoperability among EHR-originated datasets would provide value both for population health analytics and clinical research [1-3]. Clinical registries define most patient subpopulations—important clinical phenotypes—by either a shared condition or a shared exposure (eg, to a type of procedure or medication) [4]. EHRs now encode patient conditions in clinical terminologies mapped to SNOMED CT, an international comprehensive clinical terminology [5-7]. By federal standard, exchanging patient conditions (problems) between EHRs via health information exchanges employs SNOMED CT concepts.

    Accordingly, clinical quality measures derived from EHR data increasingly define clinical phenotypes with SNOMED CT concept value sets, analogous to the International Classification of Diseases (ICD) code value sets traditionally defined for claims data. Initial SNOMED CT value sets primarily have taken an “extensional” form—that is, an enumerated list of terms—in keeping with the long-standing structure of ICD code value sets [8]. But SNOMED CT, being a polyhierarchical ontology, affords the powerful option of employing rule-based or “intensional” value sets leveraging the relationships within the ontology. Such intensional value sets can more concisely identify included and excluded subtypes of a clinical condition by referring to SNOMED CT’s hierarchical “is a” supertype-subtype (parent-child) relationships. Those subtype relationships can be a close match to clinicians’ thinking about clinical phenotypes and the subtypes of conditions they wish to be included or excluded. In a report on 125 such hierarchy-based value sets, we’ve shown they also are simple to create in an EHR and employ in an analytic data warehouse [9].

    In this study, we examined value sets defining 10 conditions referenced by 2018 Centers for Medicare and Medicaid Services (CMS) high-priority electronic clinical quality measures (eCQMs) for adults. We compare corresponding intensional versus extensional SNOMED CT value sets for their conciseness, time to construct, and completeness of SNOMED CT concept inclusion. We also compare their completeness in covering the SNOMED CT-mapped clinical terms selectable by clinicians within the EHR as patient Problem List entries and encounter diagnoses, since those selections ultimately drive clinical phenotypes for population health activities and pragmatic clinical studies employing EHR source data.

    Extensional SNOMED CT Value Sets

    Why Value Sets?

    Transactional source data from administrative and clinical information systems typically include diagnosis information encoded in either ICD or SNOMED CT. Value sets of specified ICD or SNOMED CT terms define conditions (clinical phenotypes) for use in clinical guidelines, clinical quality measures, and patient registries [4]. Two categories of conditions commonly need to be defined: (1) one or more primary, population-defining conditions and (2) comorbid conditions used for exclusions and/or risk stratification.

    Why SNOMED CT Value Sets?

    To be certified for the Meaningful Use program in the United States, EHRs must be able to transmit patient diagnosis information to another EHR using SNOMED CT–encoded concepts [10,11]. Thus, in most EHRs clinicians enter patient conditions onto their Problem List by using either SNOMED CT directly, or, more commonly, a clinician-friendly clinical terminology premapped to SNOMED CT concepts. Both methods enable preserving a higher level of clinical fidelity and relevant clinical detail than ICD does due to the enhanced clinical specificity of SNOMED CT [9].

    Why Extensional SNOMED CT Value Sets?

    Extensional value sets refer to simple lists of codes or concepts. ICD value sets traditionally have been constructed this way, in keeping with the structure of ICD [12,13]. As the need for SNOMED CT value sets arose, the same approach was continued. The innovative Value Set Authoring Tool made available in 2013 by the Value Set Authority Center (VSAC) initially supported only creation of extensional value sets [14].

    Challenges with Extensional SNOMED CT Value Sets

    Extensional value sets, as specifically enumerated lists, are brittle and prone to “break” or become stale with updates to the underlying terminology. SNOMED CT updates can include addition of new clinical concepts or refining an existing concept by creating or expanding its “descendant” concepts. Preexisting extensional value set lists cannot handle these automatically and may require frequent reupdating after new SNOMED CT version releases, followed by reimportation or copying into every EHR or other system employing the value set.

    Some extensional value sets include many items, which inhibits rapid human comprehension of exactly which subtypes of a given clinical condition are being included and excluded. Thus clinical vetting of such value sets becomes laborious. Similarly, construction of the value set and performing quality assurance are correspondingly difficult and labor-intensive. Inaccuracies in value sets can significantly affect clinical quality measure calculations [15].

    Intensional Value Sets of SNOMED CT Concept Hierarchies

    SNOMED CT intensional value sets, by contrast, are rule-based and leverage the polyhierarchy structure of SNOMED CT. That is, one can include or exclude an entire “tree” of real-world condition subtypes via a single reference to a SNOMED CT concept and all its descendants. Combining such tree references with simple Boolean logic (or with SNOMED CT Expression Constraint Language) enables efficient definition of a desired clinical phenotype [9]. For instance, osteoporosis and all of its subtypes can be defined by reference to one SNOMED CT concept (SCT ID 64859006 Osteoporosis) and all its descendants. The corresponding extensional list would require 42 SNOMED CT concepts to fully define. In turn, in our EHR 2287 diagnosis clinical terms map to this single SNOMED CT concept hierarchy; a clinician selecting any one of these for a patient’s diagnosis would automatically include them in the broad computable clinical phenotype of osteoporosis.

    Possible benefits of SNOMED CT intensional value sets include closely matching how clinicians think about what condition subtypes to include or exclude from a given clinical phenotype. Being able to reference the entire tree of a concept’s descendants enables far simpler, succinct value set definitions that are easier to understand and construct. Additionally, they should be more resilient to change and less likely to omit descendants and break with future SNOMED CT concept additions. Consequently, intensional value sets have potential to be simultaneously simpler and more complete and thus more useful for population health analytics and clinical research using EHR data.

    Objective of the Study

    For each of 10 conditions (clinical phenotypes), evaluate the differences between an intensional (concept hierarchy-based) versus extensional (list-based) SNOMED CT value set approach in (1) conciseness, (2) time to create, and (3) completeness of both SNOMED CT concepts included and relevant clinical terms available for clinician selection in an EHR.


    Selection of Value Sets

    Value sets included in this study were identified starting from the CMS website for choosing Merit-Based Incentive Payment System (MIPS) quality measures [16]. MIPS measures were filtered for high-priority measures and data submission method of EHR, yielding 21 candidate measures. Four measures covering the following 4 common adult conditions were selected: hypertension, diabetes mellitus, depression, and prostate cancer.

    Next, the online VSAC “search value sets” feature was employed to find condition-defining SNOMED CT value sets for these measures [17]. Value sets were first filtered for CMS eCQM Release = “eCQM Update 2018 EP-EC and EH” and Code System = “SNOMEDCT.” Then each of the eCQMs was selected individually, displaying the related SNOMED CT value sets. Any value sets specifying a condition (diagnosis) were included, yielding an initial total of 12 SNOMED CT extensional value sets (see Multimedia Appendix 1).


    Creation of EHR vendor-neutral SNOMED CT intensional value sets and automatic derivation of extensional value sets were both done using Symedical (Clinical Architecture LLC), a clinical terminology management and mapping software tool for health care professionals. SNOMED CT intensional value sets (groupers) for EHR-based registry and clinical decision support functionality were created using the grouper management features of our EHR, Epic (Epic Systems Corporation). The clinical terminology vocabulary within University of Texas Southwestern Medical Center’s Epic EHR during this study was the proprietary IMO Problem IT terminology, version 2018 R1 (Intelligent Medical Objects Inc), mapped to the SNOMED CT International Edition July 2017 release and the SNOMED CT US Edition September 2017 release.


    Using the VSAC website’s “export value set results” feature, the list of codes for each SNOMED CT extensional value set was exported to Excel (Microsoft Corp) for subsequent comparison.

    Comparable intensional (rule-based) value set diagnosis groupers for these conditions were established in our EHR. The majority already existed, having been created for disease registries and/or clinical decision support [1]; two were newly created for this study (pain related to prostate cancer, personality disorder). Identically matching intensional value sets were then created in Symedical (in addition to Epic) and the time to create each intensional value set recorded.

    To enable meaningful direct comparison with intensional value sets, two combinations of VSAC value sets were performed prior to comparing the SNOMED CT concept lists: (1) chronic kidney disease, stage 5, (CKD-5) was combined with end-stage renal disease (ESRD) since clinically they refer to the same condition, and so only one intensional value set covered both, and (2) major depression including remission was combined with dysthymia, as together they constitute the condition of depressive disorders covered by a single intensional value set. This yielded a final set of 10 clinical conditions for comparison. The eCQMs, VSAC value set identifiers, and extensional value set contents are available in Excel format in Multimedia Appendix 1.

    The pregnancy value set (2.16.840.1.113883.3.526.3.378) downloaded from VSAC was found to include concepts focused on pregnancy itself but in general did not include concepts for complications or disorders of pregnancy. Our existing EHR-based intensional value set for pregnancy deliberately included the latter to provide a broad net for identifying any pregnant patients via EHR-entered diagnoses and problems. Accordingly, to better match the VSAC contents, we constructed a second narrow intensional value set for pregnancy based on the pregnancy conditions listed in the VSAC extensional value set by deliberately omitting SNOMED CT concepts for pregnancy-related conditions (eg, complication occurring during pregnancy, disorder of pregnancy). The VSAC extensional pregnancy value set was compared separately with both the broad and the narrow intensional pregnancy value sets.

    For each intensional value set, a corresponding extensional value set list was automatically derived using Symedical (ie, a list of all included SNOMED CT concept descendants). These derived extensional value sets were downloaded and stored for subsequent analysis. The intensional value sets and corresponding derived extensional value sets are available in Excel format in Multimedia Appendix 2.

    Measures and Outcomes

    Value Set Definition Conciseness

    Conciseness of value set definition was measured simply by the number of SNOMED CT concepts needed to fully define the set, either as a list (extensional value set) or the number of concepts in the defining rule (intensional value set). A dimensionless ratio to define was calculated in two forms:

    Ratio to define (download) = (# concepts in VSAC-downloaded extensional value set) / (# concepts in intensional value set defining rule)
    Ratio to define (derived) = (# concepts in derived extensional value set) / (# concepts in intensional value set defining rule)
    Time to Create

    The time to create each of 11 intensional value sets (including both pregnancy value set versions) as well as 3 of the extensional value sets (CKD-5 & ESRD, prostate cancer, pain related to prostate cancer) in Symedical was measured. From this a best-fit linear equation was derived: time (min) = 0.4177*(# SNOMED CT concepts) + 3.8707. This corresponds to an obligate time of just under 4 minutes to create any value set (eg, for configuring basic common settings) plus approximately 0.42 minutes (25 seconds) to add each SNOMED CT concept. The time to create the remaining extensional value sets was estimated using this equation.

    The difference in time to create an extensional versus an intensional value set was calculated as (time to create extensional value set) – (time to create intensional value set), expressed in minutes. The dimensionless ratio was calculated as (time to create extensional value set) / (time to create intensional value set).

    Completeness: SNOMED CT Concepts

    For each of 10 conditions, the list of SNOMED CT concepts included in the VSAC-downloaded set and the intensional-derived set were compared. The total number of concept discrepancies present in one set and not the other was assessed by summing two discrepancy types:

    • Number of concepts present in the VSAC-downloaded set but not in the intensional-derived set
    • Number of concepts present in the intensional-derived set but not in the VSAC-downloaded set

    Since virtually all of the SNOMED CT concepts in the downloaded extensional value sets were included in the corresponding intensional-derived value set, the ratio of the two was calculated as: (# concepts in intensional-derived set) / (# concepts in VSAC-downloaded set), expressed as a number greater than 1. The percentage of SNOMED CT concepts included in the downloaded extensional value set was calculated as: (# concepts in VSAC-downloaded set) / (# concepts in intensional-derived set), expressed as a percentage.

    Completeness: Electronic Health Record Clinical Term Coverage

    To evaluate the impact of condition-specific discrepancies, value sets were created in the EHR in both an intensional form (existing) and an extensional form (to exactly match the VSAC list of concepts, without including descendants). The EHR automatically creates a compiled list of IMO-sourced clinical terms mapped to the SNOMED CT value set. These IMO clinical terms comprise the diagnoses visible to clinicians for selection as Problem List entries and as diagnoses to associate with patient orders, encounters, and professional charges. The number of clinical terms compiled for each intensional and extensional value set was recorded. Comparisons were then performed on the number of clinical terms available for selection by clinicians in the EHR that would result in patient inclusion in a given clinical phenotype.

    Just as for SNOMED CT concept completeness, the ratio of the two was calculated as (# clinical terms from intensional-derived set) / (# clinical terms from VSAC-downloaded set), expressed as a number greater than or equal to 1. The percentage of clinical terms covered by the downloaded extensional value set was calculated as (# clinical terms from VSAC-downloaded set) / (# clinical terms from intensional-derived set), expressed as a percentage.


    Overall Format of Result Tables

    Tabulated comparisons by each of the 10 conditions follow. Summary calculated measures are included at the bottom of each table. In addition to overall sums and ratios, the median of the 10 condition-specific values was selected as the primary measure of central tendency. This method was chosen a priori to avoid the potential for skew if one or more conditions exhibited marked difference from the others or contained many more concepts. The minimum, maximum, and range across the 10 conditions are also reported.

    For pregnancy, both the narrow and broad definitions are shown in tabular form; however, only the more narrow intensional value set based on the CMS extensional value set was used in all summary calculations (to avoid double-counting). Use of the narrow pregnancy definition reduces the reported differences between intensional and extensional value sets so that the summary findings and conclusions shown are conservative. Were the broad pregnancy definition selected instead, the magnitude of effects would be larger. All tables are available in Excel format in Multimedia Appendix 3.

    Value Set Conciseness

    We expected that intensional value sets should be more concise to construct by leveraging the hierarchical supertype-subtype structure of SNOMED CT. Table 1 shows that the median number of SNOMED CT concepts employed to define a condition with the VSAC value sets was 21.5 concepts versus only 3.0 for intensional value sets.

    Table 1. Clinical phenotypes with value set definition conciseness and time to create.
    View this table

    For the full extensional value sets derived from the intensional rules, the median number of concepts-to-define was 78.5 concepts. The median ratio of concepts needed to fully define an equivalent extensional value set was 25 times that needed for the intensional value set rule.

    As one example, the clinical phenotype of personality disorder is specified by 26 SNOMED CT concepts in the downloaded extensional value set (Figure 1). In contrast, the corresponding intensional value set rule (inferring intent of subtypes desired from examining the VSAC downloaded list) includes just two concepts: (1) Personality disorder (SCT ID 33449004), including descendants, AND NOT (2) Organic personality disorder (SCT ID 36217008), including descendants. This rule includes all 26 SNOMED CT concepts in the VSAC extensional value set plus an additional 22 closely related concepts that reasonably belong, for 48 included concepts and a concept ratio-to-define of 48/2 or 24.0.

    Time to Create

    Not surprisingly, more concise value sets are easier and faster to construct, perform quality assurance on, review, and update as needed. As shown in Table 1, it takes 6 to 8 times longer to construct an extensional value set completely equivalent in contents to an intensional value set (median 6.7, overall ratio 7.9). In this set, creating intensional value sets (groupers) for all 10 conditions was accomplished in just 1 hour (60 minutes) of keyboard time, while creating the equivalent extensional value sets required nearly 11 hours (650 minutes). The median creation time for these 10 conditions was 5 minutes for an intensional value set and 37 minutes for an equivalent extensional value set.

    Completeness: SNOMED CT Concepts

    SNOMED CT is updated twice yearly [18] and an intensional rule-based approach presumably should be more resilient to updates by automatically including new descendants within an existing included hierarchy, for instance. Accordingly, we examined the relative completeness of downloaded extensional versus corresponding intensional value sets.

    Figure 1. (a) SNOMED CT extensional value set list (26 items) downloaded for the condition personality disorder, shown as green-colored members of a SNOMED CT hierarchy (partial diagram only). Red-colored items aren’t on the list, downloaded from the Value Set Authority Center. (b) Matched intensional value set combining SNOMED CT hierarchies with Boolean logic: personality disorder (disorder; 33449004), including descendants AND NOT organic personality disorder (disorder; 36217008), including descendants. (c) Implementation of intensional value set in an electronic health record (EHR) (Epic Systems). (d) Implementation of intensional value set in an EHR-agnostic terminology software program (Symedical). Also shown is part of the exactly equivalent extensional value set (containing 48 SNOMED CT concepts), automatically derived from the intensional logic.
    View this figure

    Table 2 compares the number of SNOMED CT concepts included in the full extensional list derived from the intensional rule versus the extensional 2018 list downloaded from VSAC. Across the 10 conditions, the full derived list included a median of 3.3 times as many SNOMED CT concepts as the corresponding downloaded list (range 1.1 to 19.4). In percentage terms, a median of only 35% of SNOMED CT concepts in the full derived extensional list were present in the corresponding downloaded extensional list (range 5% to 91%), as shown in Figure 2 (left panel).

    The vast majority of discrepancies between the two sets of extensional lists (877/889, 98.7%) were present in the intensional-derived list only and missing from the VSAC-downloaded list (Table 2); 1.3% (12/889) of concepts in the VSAC download were not in the intensional-derived list. Of these 12, 6 were kidney transplant procedural concepts rather than disorder or condition concepts and had no corresponding diagnosis clinical terms defined in the EHR’s clinical terminology. The remaining 6 were judged clinically relevant omissions from the intensional-derived list.

    Completeness: Coverage of Relevant Electronic Health Record Clinical Terms

    Pragmatic clinical trials, registries, and other research projects that rely on EHR data for clinical phenotypes need the most accurate and complete value sets possible to define primary and comorbid conditions. We thus compared the number of EHR clinical terms (sourced from IMO, overall n>800,000) selectable by clinicians that are in extensional versus intensional value set compiled lists (see Table 3 and Figure 2, right panel).

    In 9 of 10 conditions, the number of EHR clinical terms identified using the downloaded extensional value set was less than when using the corresponding intensional value set, in some cases dramatically so. In this subset of 10 conditions, a median 65% of the EHR diagnostic clinical terms selectable by clinicians in a commonly used EHR are included when using a published list-based extensional value set compared with using a corresponding concept hierarchy-based intensional value set. That is, a median of 35% of clinician-selectable diagnosis terms in the EHR for defining a clinical phenotype are missing when using a 2018 downloaded extensional value set.

    Table 2. Comparison of downloaded versus derived SNOMED CT value set contents.
    View this table
    Figure 2. Left: SNOMED CT concepts included in 2018 extensional value sets as a percentage of those in intensional value sets. Right: Electronic health records clinical terms included using the 2018 extensional value sets as a percentage of those using intensional value sets. CKD-5: chronic kidney disease, stage 5; ESRD: end-stage renal disease; Ca: cancer.
    View this figure
    Table 3. Number of diagnosis clinical terms selectable by clinicians in the electronic health record by source of value set.
    View this table


    Principal Results

    For 10 conditions referenced by the CMS 2018 high-priority clinical quality measures, we compared extensional SNOMED CT lists of codes downloaded in the fall of 2018 from the VSAC with intensional (rule-based) value sets for the same conditions. Intensional value set definitions were far more concise (median number of concepts needed for equivalent value sets 3 vs 75), faster to construct (median 5 vs 37 minutes each), and more complete. VSAC-downloaded value sets were missing a median 65% of the SNOMED CT concepts included in the intensional rule-based value sets and 35% of the mapped diagnosis clinical terms selectable by clinicians within the EHR.

    The conciseness of intensional value sets expedites construction in the EHR. This should also streamline vetting with busy clinical experts and harmonizing multiple value set specifications of the same real-world condition. Many systems will directly import large extensional value set files, mitigating the value set construction time/workload difference for those organizations. Still, someone must create the source value sets initially and periodically update them, and some customer organizations will have to enter them manually as well. For those, the large time reduction offered by intensional value set definitions remains an advantage. Because intensional value sets are rule-based and can include references to all descendants of a parent concept, they are more likely to include all relevant concepts than an enumerated list. That is, they are less likely to inadvertently omit descendant concepts and more likely to remain complete following future SNOMED CT updates.

    Clinical Phenotyping for Clinical-Translational Studies Using Electronic Health Record Data

    Pragmatic clinical trials, registries, and other clinical and translational research studies employing EHR data for computable clinical phenotypes (rather than manual abstraction) rely on having as accurate and complete value sets to define primary and comorbid conditions as feasible [19,20]. Concern typically arises about missing diagnosis data not yet entered in the EHR by clinicians on the patient’s Problem List or as Encounter Diagnoses. While Problem List completeness in particular remains a subject of active inquiry and improvement efforts [21-26], this study raises a different concern for completeness of value set definitions when physicians and advanced practice providers have conscientiously recorded their patients’ specific diagnoses in the EHR. In this subset of 10 conditions, a median 35% of the EHR diagnostic clinical terms selectable by clinicians in a commonly used EHR are missing when using a published extensional value set compared with using a simpler rule-based intensional value set. Patients for whom those missing EHR terms are selected by clinicians will fail to be included in the selected population with the clinical phenotype. Controlling for comorbid conditions in multivariable modeling will similarly be negatively impacted by missing clinical EHR terms. Defining clinical phenotypes more completely with rule-based intensional value sets leveraging SNOMED CT’s hierarchical structure advances the feasibility and reliability of pragmatic clinical studies and learning health care system cycles conducted with EHR data produced during normal clinical care [27-29].

    Analytic Interoperability for Population Health

    With the expansion of clinically integrated networks and cross-institution specialty registries to provide and measure value-based care, definition of subpopulations of patients becomes crucial for risk assessment and tailored interventions [1,30-32]. Many networks encompass a variety of EHRs. Since the designated interoperability language between EHRs for diagnoses (conditions) is SNOMED CT, employing SNOMED CT value sets enables EHR-agnostic consistent definition of subpopulations for registries, clinical decision support to promote best practices within the EHR, care gap closure, and quality measurement [9]. This provides analytic interoperability across disparate EHRs even if using clinical terminologies from different vendors. The populations that would most benefit from intervention may change over time, thus generating requests for new computable clinical phenotype definitions. The conciseness and clinical understandability of intensional value sets streamline rapid-cycle definition and vetting by specialists, as well as more facile and consistent implementation across a broad range of EHRs, population health tools, and clinical settings. These advantages make employing SNOMED CT concept rule-based intensional value sets a higher quality, better fit-for-purpose method for defining computable clinical phenotypes for population health than traditional extensional lists.

    Authoring Practice Guidelines and Electronic Clinical Quality Measures for Streamlined Implementation

    With the expansion in medical knowledge and appreciation of the complexities of achieving optimal care for subpopulations of patients with a wide variety of conditions, the number of clinical practice guidelines continues to grow [33,34]. Significant effort and expense (in terms of experts’ time) goes into writing consensus guidelines and optimal practices for a condition. Achieving real-world practice change takes a long time and is often incomplete [35-37]. EHR-based clinical decision support has been shown to improve clinical process measures across multiple clinical domains [38-48].

    Yet current guidelines can be difficult to implement as point-of-care clinical decision support to help “make the right thing the easy thing to do” for busy clinicians within their daily work tool, the EHR [49-53]. Non–value-added work can include:

    • Translating prose definition of conditions covered by guideline, conditions excluded, and comorbid conditions into value sets implementable in EHRs to cover clinical terms/codes present in EHRs in practice
    • Translating prose definitions of medication types and/or procedure types into EHR-implementable value sets
    • Translating prose descriptions (and some flow charts and decision trees, if constructed ambiguously) into implementable decision algorithms for clinical decision support logic [54]

    EHRs have local codes that can hamper implementation, but increasingly these are mapped to standard terminology codes to achieve interoperability with other EHRs as organizations participate in health information exchanges [27].

    To accelerate implementation, we propose that specialty guideline and eCQM writing committees include a medical informaticist (as either a consultant or a formal member of the writing group representing a clinical informatics specialty society). During initial guideline development discussion and through subsequent detailed specification, the medical informaticist could then assist specialist experts on the committee in expressing the clinical conditions relevant to the guideline or eCQM in a SNOMED CT supertype-subtype form, readily implementable in an EHR or other internet-accessible repository as a concise, easily shareable intensional rule (Figure 3).

    Doing so would avoid the considerable extra work of constructing a de novo extensional value set, vetting the full list with clinical experts, distributing it, and having multiple teams of EHR analysts and clinical informaticists around the country independently reverse-engineer the list into a supertype-subtype rule-based form to gain its benefits of conciseness, maintainability, completeness, and understandability for their local EHR implementation. In lean terms, that extra work (red arrows in Figure 3) could be considered non–value-adding waste. In contrast, coproducing concise, shareable SNOMED CT intensional value sets contemporaneously with the guideline and/or eCQM specification would expedite practical dissemination of clinical decision support to promote the new best practice in a consistent subpopulation of patients across the country, matching the guideline writing specialists’ intent.


    Changes to SNOMED CT

    Importantly, although intensional value sets retain accuracy and completeness across many updates to SNOMED CT’s contents, they are not impervious to changes [55-57]. With intensional rules referencing SNOMED CT’s hierarchical structure, additions of new descendants are generally automatically included. Some value sets may need clinical vetting for updates after new SNOMED CT releases, perhaps particularly when an intensional rule includes some, but not all, of a parent concept’s children. To enhance rapid re-vetting when needed, automated detection of new SNOMED CT concept additions that are within the span of a given rule-based grouper would be useful. One question to explore further is whether a specific inclusion strategy (include these specific siblings) versus an exclusion strategy (include all the children of the parent except these specific children) proves more resilient (remains more complete and accurate).

    Migrations of existing SNOMED CT concepts to a different location in the hierarchy due to cleanup of SNOMED CT quality issues [58] pose a different challenge, although in many cases an intensional value set will handle the correction gracefully [9]. As clinicians and medical informaticists work with intensional value sets to define important clinical phenotypes, iterative improvements in SNOMED CT’s hierarchical arrangement will likely ensue, following the data quality aphorism “what gets used, gets better” [1].

    Scope of This Paper’s Analysis

    One limitation of this paper is that the comparable intensional value sets were developed and vetted only at one institution (University of Texas Southwestern Medical Center) and cannot be considered to represent national specialty society views. However, our experience demonstrates feasibility for a medical informaticist to build an initial candidate rule for defining a condition, then identify any clinical inclusion/exclusion questions for vetting with a clinician specializing in the condition [9]. For multi-institutional and/or specialty society vetting, a Modified Delphi technique can be employed as was successfully used by Buchanan [59] previously to gain working consensus across institutions. Our vision is that increasingly intensional value sets are produced as a byproduct of clinical guideline and eCQM authoring, dramatically reducing the need for individual institutions to reinvent the wheel (Figure 3).

    Figure 3. Clinicians’ thinking about type/subtype inclusion criteria matches SNOMED CT concept hierarchy implementation in electronic health records (EHRs). Deriving flat code lists (red arrows) requires reverse engineering by each EHR team to achieve the benefits of a concept hierarchy-based definition.
    View this figure

    This study only covers 10 conditions and may not be representative of all and so should be considered merely as a deep dive into one set of conditions for CMS-designated eCQMs (as described in the Methods section). We took a conservative approach in matching intensional definitions to VSAC-downloaded extensional definitions, otherwise differences reported between intensional and extensional value set completeness would have been even greater. Specifically, for pregnancy we did not use our existing broad pregnancy intensional value set: instead we constructed a new, much more narrowly defined value set intended to match the scope of the VSAC-released pregnancy value set. Similarly, for pain related to prostate cancer, we lacked an existing intensional value set and constructed our new intensional value set closely mirroring the contents of the VSAC extensional value set. Both result in minimizing differences between the extensional and intensional approaches. Given the high percentage of missing concepts and clinical terms in conditions with large numbers of terms (hypertension), our prespecified use of medians instead of means (averages) also reduced the magnitude of the reported difference between intensional and extensional approaches.


    Although extensional lists of codes have long been used for ICD-based value sets, the use of extensional lists of SNOMED CT codes is suboptimal and fails to leverage the capabilities and clinical relevance of ontological relationships within SNOMED CT. Compared with SNOMED CT extensional (list) value sets, intensional (rule-based) value sets are far simpler to create, maintain, understand, and vet with specialist clinicians. For the 10 conditions studied here from the 2018 CMS high-value eCQMs for the MIPS program, intensional value sets also proved substantially more complete than their corresponding extensional list versions: a median 35% of diagnosis terms selectable in the EHR by clinicians were missing when using a downloaded extensional value set, with risk of failing to identify patients with a given clinical phenotype despite physician-entered discrete diagnoses in the EHR.

    Consequently, in the EHR era we believe defining conditions as computable clinical phenotypes preferentially should employ SNOMED CT concept hierarchy-based (intensional) value sets rather than extensional lists. By doing so, clinical guideline and eCQM authors can streamline broad EHR implementation of condition-specific decision support promoting guideline adherence and patient benefit.


    Research reported in this publication was supported by the University of Texas Southwestern Health System Information Resources department and by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR001105. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

    Conflicts of Interest

    None declared.

    Multimedia Appendix 1

    Source electronic clinical quality measures and corresponding VSAC-downloaded extensional value sets in MS Excel format.

    XLSX File (Microsoft Excel File), 29KB

    Multimedia Appendix 2

    Intensional value set definitions with corresponding derived extensional value sets in MS Excel format.

    XLSX File (Microsoft Excel File), 129KB

    Multimedia Appendix 3

    Study data for tables and analyses in MS Excel format.

    XLSX File (Microsoft Excel File), 42KB


    1. Kannan V, Fish JS, Mutz JM, Carrington AR, Lai K, Davis LS, et al. Rapid development of specialty population registries and quality measures from electronic health record data: an agile framework. Methods Inf Med 2017 Jun 14;56(99):e74-e83 [FREE Full text] [CrossRef] [Medline]
    2. El Fadly A, Rance B, Lucas N, Mead C, Chatellier G, Lastic P, et al. Integrating clinical research with the Healthcare Enterprise: from the RE-USE project to the EHR4CR platform. J Biomed Inform 2011 Dec;44 Suppl 1:S94-S102 [FREE Full text] [CrossRef] [Medline]
    3. Richesson RL, Smerek MM, Blake CC. A framework to support the sharing and reuse of computable phenotype definitions across health care delivery and clinical research applications. EGEMS (Wash DC) 2016;4(3):1232 [FREE Full text] [CrossRef] [Medline]
    4. Agency for Healthcare Research and Quality. Registries for Evaluating Patient Outcomes: A User's Guide   URL: https:/​/effectivehealthcare.​​sites/​default/​files/​pdf/​registries-guide-3rd-edition_research.​pdf [accessed 2018-12-23] [WebCite Cache]
    5. Spackman KA, Campbell KE, Côté RA. SNOMED RT: a reference terminology for health care. Proc AMIA Annu Fall Symp 1997:640-644 [FREE Full text] [Medline]
    6. SNOMED CT Basics.   URL: [accessed 2018-12-19] [WebCite Cache]
    7. SNOMED CT Concept Model.   URL: [accessed 2018-01-15] [WebCite Cache]
    8. Value Set Authority Center: Datasets.   URL: [accessed 2018-12-18] [WebCite Cache]
    9. Willett DL, Kannan V, Chu L, Buchanan JR, Velasco FT, Clark JD, et al. SNOMED CT concept hierarchies for sharing definitions of clinical conditions using electronic health record data. Appl Clin Inform 2018 Jul;9(3):667-682 [FREE Full text] [CrossRef] [Medline]
    10. Office of the National Coordinator for Health Information Technology (ONC)‚ Department of Health and Human Services (HHS). 2015 edition health information technology (Health IT) certification criteria, 2015 edition base electronic health record (EHR), definition, and ONC health IT certification program modifications. Final rule. Fed Regist 2015 Oct 16;80(200):62601-62759 [FREE Full text] [Medline]
    11. Blumenthal D. Launching HITECH. N Engl J Med 2010 Feb 4;362(5):382-385. [CrossRef] [Medline]
    12. Chute CG. Clinical classification and terminology: some history and current observations. J Am Med Inform Assoc 2000;7(3):298-303 [FREE Full text] [Medline]
    13. Benson T, Grieve G. Principles of Health Interoperability NOMED CT, HL7, and FHIR. Third edition. London: Springer; 2016.
    14. Khatipov E, Madden M, Chiang P, Chuang P, Nguyen D, D'Souza I, et al. Creating, maintaining and publishing value sets in the VSAC. 2014 Presented at: AMIA Annu Symp Proc; 2014; Washington p. 1459.
    15. D'Amore JD, Li C, McCrary L, Niloff JM, Sittig DF, McCoy AB, et al. Using clinical data standards to measure quality: a new approach. Appl Clin Inform 2018 Apr;9(2):422-431. [CrossRef] [Medline]
    16. Merit-based Incentive Payment System (MIPS) quality measures.: Centers for Medicare and Medicaid Services; 2017.   URL: [accessed 2018-12-19] [WebCite Cache]
    17. Value Set Authority Center (VSAC).: National Library of Medicine   URL: [accessed 2018-12-18] [WebCite Cache]
    18. Release schedule and file formats: SNOMED International.   URL: [accessed 2018-12-18] [WebCite Cache]
    19. Richesson RL, Horvath MM, Rusincovitch SA. Clinical research informatics and electronic health record data. Yearb Med Inform 2014 Aug 15;9:215-223 [FREE Full text] [CrossRef] [Medline]
    20. Adamusiak T, Shimoyama N, Shimoyama M. Next generation phenotyping using the unified medical language system. JMIR Med Inform 2014 Mar 18;2(1):e5 [FREE Full text] [CrossRef] [Medline]
    21. Galanter WL, Hier DB, Jao C, Sarne D. Computerized physician order entry of medications and clinical decision support can improve problem list documentation compliance. Int J Med Inform 2010 May;79(5):332-338. [CrossRef] [Medline]
    22. Jao C, Hier D, Galanter W. Automating the maintenance of problem list documentation using a clinical decision support system. AMIA Annu Symp Proc 2008 Nov 06:989. [Medline]
    23. Wright A, Pang J, Feblowitz JC, Maloney FL, Wilcox AR, McLoughlin KS, et al. Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial. J Am Med Inform Assoc 2012;19(4):555-561 [FREE Full text] [CrossRef] [Medline]
    24. Wright A, McCoy AB, Hickman TT, Hilaire DS, Borbolla D, Bowes WA, et al. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. Int J Med Inform 2015 Oct;84(10):784-790 [FREE Full text] [CrossRef] [Medline]
    25. McEvoy D, Gandhi TK, Turchin A, Wright A. Enhancing problem list documentation in electronic health records using two methods: the example of prior splenectomy. BMJ Qual Saf 2018 Jan;27(1):40-47. [CrossRef] [Medline]
    26. Simons SM, Cillessen FH, Hazelzet JA. Determinants of a successful problem list to support the implementation of the problem-oriented medical record according to recent literature. BMC Med Inform Decis Mak 2016 Dec 02;16:102 [FREE Full text] [CrossRef] [Medline]
    27. Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015 Nov;22(6):1220-1230 [FREE Full text] [CrossRef] [Medline]
    28. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med 2010 Nov 10;2(57). [CrossRef] [Medline]
    29. Richesson RL, Green BB, Laws R, Puro J, Kahn MG, Bauck A, et al. Pragmatic (trial) informatics: a perspective from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc 2017 Dec 01;24(5):996-1001. [CrossRef] [Medline]
    30. Tonner C, Schmajuk G, Yazdany J. A new era of quality measurement in rheumatology: electronic clinical quality measures and national registries. Curr Opin Rheumatol 2017 Mar;29(2):131-137 [FREE Full text] [CrossRef] [Medline]
    31. Drummond FJ, Kinnear H, Donnelly C, O'Leary E, O'Brien K, Burns RM, et al. Establishing a population-based patient-reported outcomes study (PROMs) using national cancer registries across two jurisdictions: the Prostate Cancer Treatment, your experience (PiCTure) study. BMJ Open 2015 Apr 17;5(4):e006851 [FREE Full text] [CrossRef] [Medline]
    32. Dulai PS, Singh S, Ohno-Machado L, Sandborn WJ. Population health management for inflammatory bowel disease. Gastroenterology 2018 Dec;154(1):37-45. [CrossRef] [Medline]
    33. US National Library of Medicine. 2018. Summary of professional guidelines: National Center for Biotechnology Information   URL: [accessed 2018-07-08] [WebCite Cache]
    34. AHRQ. 2018. National Guideline Clearinghouse: Browse by Clinical Specialty   URL: [accessed 2018-07-08] [WebCite Cache]
    35. Balas EA, Boren SA. Managing clinical knowledge for health care improvement. Yearb Med Inform 2000(1):65-70. [Medline]
    36. Grant J, Green L, Mason B. Basic research and health: a reassessment of the scientific basis for the support of biomedical science. Res Eval 2003 Dec 01;12(3):217-224. [CrossRef]
    37. Westfall JM, Mold J, Fagnan L. Practice-based research—blue highways on the NIH roadmap. JAMA 2007 Jan 24;297(4):403-406. [CrossRef] [Medline]
    38. Middleton B, Sittig DF, Wright A. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearb Med Inform 2016 Aug 02;Suppl 1:S103-S116. [CrossRef] [Medline]
    39. Dunn LK, Gephart SM, Raszewski R, Sousa V, Shehorn LE, Abraham J. Integrative review of clinical decision support for registered nurses in acute care settings. J Am Med Inform Assoc 2017 Dec 01;24(2):441-450. [CrossRef] [Medline]
    40. Ali SM, Giordano R, Lakhani S, Walker DM. A review of randomized controlled trials of medical record powered clinical decision support system to improve quality of diabetes care. Int J Med Inform 2016 Mar;87:91-100. [CrossRef] [Medline]
    41. Wall HK, Wright JS. The role of clinical decision support systems in preventing cardiovascular disease. Am J Prev Med 2015 Nov;49(5):e83-e84. [CrossRef] [Medline]
    42. Tapper EB, Sengupta N, Lai M, Horowitz G. A decision support tool to reduce overtesting for ceruloplasmin and improve adherence with clinical guidelines. JAMA Intern Med 2015 Sep;175(9):1561-1562. [CrossRef] [Medline]
    43. Holstiege J, Mathes T, Pieper D. Effects of computer-aided clinical decision support systems in improving antibiotic prescribing by primary care providers: a systematic review. J Am Med Inform Assoc 2015 Jan;22(1):236-242 [FREE Full text] [CrossRef] [Medline]
    44. Anchala R, Kaptoge S, Pant H, Di Angelantonio E, Franco OH, Prabhakaran D. Evaluation of effectiveness and cost-effectiveness of a clinical decision support system in managing hypertension in resource constrained primary health care settings: results from a cluster randomized trial. J Am Heart Assoc 2015 Jan 05;4(1):e001213 [FREE Full text] [CrossRef] [Medline]
    45. Murphy EV. Clinical decision support: effectiveness in improving quality processes and clinical outcomes and factors that may influence success. Yale J Biol Med 2014 Jun;87(2):187-197 [FREE Full text] [Medline]
    46. Fathima M, Peiris D, Naik-Panvelkar P, Saini B, Armour CL. Effectiveness of computerized clinical decision support systems for asthma and chronic obstructive pulmonary disease in primary care: a systematic review. BMC Pulm Med 2014 Dec 02;14:189 [FREE Full text] [CrossRef] [Medline]
    47. Bright TJ, Wong A, Dhurjati R, Bristow E, Bastian L, Coeytaux RR, et al. Effect of clinical decision-support systems: a systematic review. Ann Intern Med 2012 Jul 3;157(1):29-43. [CrossRef] [Medline]
    48. Dixon BE, Alzeer AH, Phillips EO, Marrero DG. Integration of provider, pharmacy, and patient-reported data to improve medication adherence for type 2 diabetes: a controlled before-after pilot study. JMIR Med Inform 2016 Feb 08;4(1):e4. [CrossRef] [Medline]
    49. Greenes RA, Bates DW, Kawamoto K, Middleton B, Osheroff J, Shahar Y. Clinical decision support models and frameworks: seeking to address research issues underlying implementation successes and failures. J Biomed Inform 2018 Dec;78:134-143. [CrossRef] [Medline]
    50. Bates DW, Kuperman GJ, Wang S, Gandhi T, Kittler A, Volk L, et al. Ten commandments for effective clinical decision support: making the practice of evidence-based medicine a reality. J Am Med Inform Assoc 2003;10(6):523-530 [FREE Full text] [CrossRef] [Medline]
    51. Shiffman RN, Wright A. Evidence-based clinical decision support. Yearb Med Inform 2013;8(1):120-127. [Medline]
    52. Basit MA, Baldwin KL, Kannan V, Flahaven EL, Parks CJ, Ott JM, et al. Agile acceptance test-driven development of clinical decision support advisories: feasibility of using open source software. JMIR Med Inform 2018 Apr 13;6(2):e23 [FREE Full text] [CrossRef] [Medline]
    53. Lyman JA, Cohn WF, Bloomrosen M, Detmer DE. Clinical decision support: progress and opportunities. J Am Med Inform Assoc 2010 Oct;17(5):487-492 [FREE Full text] [CrossRef] [Medline]
    54. Boxwala AA, Rocha BH, Maviglia S, Kashyap V, Meltzer S, Kim J, et al. A multi-layered framework for disseminating knowledge for computer-based decision support. J Am Med Inform Assoc 2011 Dec;18 Suppl 1:i132-i139 [FREE Full text] [CrossRef] [Medline]
    55. Lee D, Cornet R, Lau F. Implications of SNOMED CT versioning. Int J Med Inform 2011;80(6):442-453. [Medline]
    56. Wade G. Implementing SNOMED CT for quality reporting: avoiding pitfalls. Appl Clin Inform 2011;2(4):534-545. [Medline]
    57. Wade G, Rosenbloom S. The impact of SNOMED CT revisions on a mapped interface terminology: terminology development and implementation issues. J Biomed Inform 2009;42(3):490-493. [Medline]
    58. Bodenreider O, Smith B, Kumar A, Burgun A. Investigating subsumption in SNOMED CT: an exploration into large description logic-based biomedical terminologies. Artif Intell Med 2007 Mar;39(3):183-195 [FREE Full text] [CrossRef] [Medline]
    59. Buchanan J. Accelerating the benefits of the problem oriented medical record. Appl Clin Inform 2017 Dec 15;8(1):180-190 [FREE Full text] [CrossRef] [Medline]


    CKD-5: chronic kidney disease, stage 5
    CMS: Centers for Medicare and Medicaid Services
    ESRD: end-stage renal disease
    eCQM: electronic clinical quality measure
    EHR: electronic health record
    ICD: International Classification of Diseases
    IMO: Intelligent Medical Objects
    MIPS: Merit-Based Incentive Payment System
    NIH: National Institutes of Health
    VSAC: Value Set Authority Center

    Edited by G Eysenbach; submitted 15.07.18; peer-reviewed by P Greco, J Fountain, I Mircheva; comments to author 30.08.18; revised version received 23.11.18; accepted 09.12.18; published 16.01.19

    ©Ling Chu, Vaishnavi Kannan, Mujeeb A Basit, Diane J Schaeflein, Adolfo R Ortuzar, Jimmie F Glorioso, Joel R Buchanan, Duwayne L Willett. Originally published in JMIR Medical Informatics (, 16.01.2019.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.