Bridging Data Models in Health Care With a Novel Intermediate Query Format for Feasibility Queries: Mixed Methods Study

doi:10.2196/58541

¹IT Center for Clinical Research, University of Lübeck, , Gebäude 64, 2.OG, Raum 05, Ratzeburger Allee 160, Lübeck, , Germany

²Chair for Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, , Erlangen, , Germany

³Leipzig Research Centre for Civilization Diseases, University of Leipzig, , Leipzig, , Germany

⁴Federated Information Systems, German Cancer Research Center (DKFZ), , Heidelberg, , Germany

⁵Complex Medical Informatics, Medical Faculty Mannheim, Heidelberg University, , Mannheim, , Germany

⁶Mannheim Institute for Intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, , Mannheim, , Germany

⁷Institute for Medical Informatics, University Clinic Rheinisch-Westfälische Technische Hochschule Aachen, , Aachen, , Germany

*these authors contributed equally

Corresponding Author:

Lorenz Rosenau, MSc

Background: To advance research with clinical data, it is essential to make access to the available data as fast and easy as possible for researchers, which is especially challenging for data from different source systems within and across institutions. Over the years, many research repositories and data standards have been created. One of these is the Fast Healthcare Interoperability Resources (FHIR) standard, used by the German Medical Informatics Initiative (MII) to harmonize and standardize data across university hospitals in Germany. One of the first steps to make these data available is to allow researchers to create feasibility queries to determine the data availability for a specific research question. Given the heterogeneity of different query languages to access different data across and even within standards such as FHIR (eg, CQL and FHIR Search), creating an intermediate query syntax for feasibility queries reduces the complexity of query translation and improves interoperability across different research repositories and query languages.

Objective: This study describes the creation and implementation of an intermediate query syntax for feasibility queries and how it integrates into the federated German health research portal (Forschungsdatenportal Gesundheit) and the MII.

Methods: We analyzed the requirements for feasibility queries and the feasibility tools that are currently available in research repositories. Based on this analysis, we developed an intermediate query syntax that can be easily translated into different research repository–specific query languages.

Results: The resulting Clinical Cohort Definition Language (CCDL) for feasibility queries combines inclusion criteria in a conjunctive normal form and exclusion criteria in a disjunctive normal form, allowing for additional filters like time or numerical restrictions. The inclusion and exclusion results are combined via an expression to specify feasibility queries. We defined a JSON schema for the CCDL, generated an ontology, and demonstrated the use and translatability of the CCDL across multiple studies and real-world use cases.

Conclusions: We developed and evaluated a structured query syntax for feasibility queries and demonstrated its use in a real-world example as part of a research platform across 39 German university hospitals.

JMIR Med Inform 2024;12:e58541

doi:10.2196/58541

Keywords

Background

In the rapidly evolving field of medical research, patient data have emerged as a critical resource. The vast amounts of data generated through clinical encounters, laboratory tests, imaging studies, and other patient interactions hold the potential to significantly advance our understanding of disease processes and treatment outcomes. Clinical Data Repositories (CDRs) are a valuable tool for storing, organizing, and retrieving this wealth of patient data. These repositories facilitate data storage in a structured and standardized manner, enabling researchers to query these data efficiently for various research purposes.

One key aspect of effectively using CDRs is the ability to perform feasibility queries. These queries allow researchers to assess the availability and adequacy of data for specific research questions before embarking on full-scale studies. Doing so can save considerable time and resources by identifying potential issues, such as insufficient sample size or a lack of necessary data elements.

Distributed Data Collections

The landscape of data repositories is not homogeneous. There are 2 primary approaches to data repository management: the classical single repository approach and the federated approach. Traditionally, these repositories have been centralized, pooling data from various sources into a single repository [Pfaff ER, Girvin AT, Gabriel DL, et al. Synergies between centralized and federated approaches to data quality: a report from the national covid cohort collaborative. J Am Med Inform Assoc. Mar 15, 2022;29(4):609-618. [CrossRef] [Medline]1]. However, this classical approach has been challenged by the emergence of federated data repositories [Pfaff ER, Girvin AT, Gabriel DL, et al. Synergies between centralized and federated approaches to data quality: a report from the national covid cohort collaborative. J Am Med Inform Assoc. Mar 15, 2022;29(4):609-618. [CrossRef] [Medline]1,Prayitno, Shyu C-R, Putra KT, et al. A systematic review of federated learning in the healthcare area: from the perspective of data properties and applications. Appl Sci (Basel). Nov 25, 2021;11(23):11191. [CrossRef]2].

The classic single repository approach involves a centralized system where all data are stored and managed in one place. This solution offers the advantage of uniformity and ease of data management. It enables efficient data quality benchmarking at scale and the generation of derivatives, harmonized variables, and units of measure for comparable and consistent analytics [Pfaff ER, Girvin AT, Gabriel DL, et al. Synergies between centralized and federated approaches to data quality: a report from the national covid cohort collaborative. J Am Med Inform Assoc. Mar 15, 2022;29(4):609-618. [CrossRef] [Medline]1]. However, it is often impractical or impossible to implement, especially when dealing with multiple institutions, each having its own schema for its clinical data repository.

On the other hand, the federated approach involves a network of repositories, each maintained by different institutions. These repositories operate independently but are interconnected for data sharing and collaboration. The data generally remain at the generating site, which offers the advantages of local curation by personnel deeply familiar with the data [Pfaff ER, Girvin AT, Gabriel DL, et al. Synergies between centralized and federated approaches to data quality: a report from the national covid cohort collaborative. J Am Med Inform Assoc. Mar 15, 2022;29(4):609-618. [CrossRef] [Medline]1] and maintains data anonymity and security [Prayitno, Shyu C-R, Putra KT, et al. A systematic review of federated learning in the healthcare area: from the perspective of data properties and applications. Appl Sci (Basel). Nov 25, 2021;11(23):11191. [CrossRef]2]. The data can then be analyzed using a federated approach or, if the correct patient consent is given, be transferred to a central data management unit for a specific analysis.

This approach respects individual institutions’ autonomy and data governance policies, making it a more feasible option for multi-institutional collaborations [Sebire NJ, Cake C, Morris AD. HDR UK supporting mobilising computable biomedical knowledge in the UK. BMJ Health Care Inform. Jul 2020;27(2):e100122. [CrossRef] [Medline]3-Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods Inf Med. May 2018;57(S 01):e50-e56. [CrossRef]9] and can enhance the scope and depth of clinical research by enabling access to a broader range of data.

Despite the potential benefits of federated data repositories, performing feasibility queries across multiple CDRs presents significant challenges [Gruendner J, Deppenwiese N, Folz M, et al. The architecture of a feasibility query portal for distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) patient data repositories: design and implementation study. JMIR Med Inform. May 25, 2022;10(5):e36709. [CrossRef] [Medline]10]. Each repository contains data originating from different source systems, leading to heterogeneity in data formats, terminologies, and quality. This heterogeneity can significantly complicate the process of data integration and harmonization, making it challenging to perform comprehensive and accurate feasibility queries [Gruendner J, Deppenwiese N, Folz M, et al. The architecture of a feasibility query portal for distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) patient data repositories: design and implementation study. JMIR Med Inform. May 25, 2022;10(5):e36709. [CrossRef] [Medline]10].

Moreover, the federated nature of the system introduces additional complexities. Data privacy regulations and institutional policies may restrict the sharing and use of certain data, further complicating the query process. Additionally, the technical infrastructure required to support secure and efficient data exchange across multiple repositories can be challenging to implement and maintain.

Data Exchange Standards for Interoperability

In a federated network, the commitment to an interoperability standard becomes pivotal to tackling these challenges. Prominent examples include but are not limited to Fast Healthcare Interoperability Resources (FHIR) [Benson T, Grieve G. Principles of Health Interoperability: SNOMED CT, HL7 and FHIR. Springer International Publishing; 2016. [CrossRef] ISBN: 978-3-319-30368-011], OMOP CDM [Stang PE, Ryan PB, Racoosin JA, et al. Advancing the science for active surveillance: rationale and design for the observational medical outcomes partnership. Ann Intern Med. Nov 2, 2010;153(9):600-606. [CrossRef] [Medline]12], i2b2 [Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124-130. [CrossRef] [Medline]13], and OpenEHR [Kalra D, Beale T, Heard S. The openEHR foundation. Stud Health Technol Inform. 2005;115:153-173. [Medline]14] share the commonality of being centered around the patients’ medical history.

Agreeing on an interoperability standard only partially solves the challenge. While a health care data exchange standard facilitates the conversion of existing data into a common format at each hospital, a distributed feasibility query platform for the data is still missing.

Tools for Feasibility Queries

Besides the data integration standardization, interactive user interfaces enable researchers to design and submit feasibility queries. For this purpose, a multitude of tools for feasibility queries exist (eg, i2b2 [Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124-130. [CrossRef] [Medline]13,Weber GM, Murphy SN, McMurry AJ, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624-630. [CrossRef] [Medline]15], TriNetX [Topaloglu U, Palchuk MB. Using a federated network of real-world data to optimize clinical trials operations. JCO Clin Cancer Inform. Dec 2018;2:1-10. [CrossRef] [Medline]16], tranSMART [Scheufele E, Aronzon D, Coopersmith R, et al. tranSMART: an open source knowledge management and high content data analytics platform. AMIA Jt Summits Transl Sci Proc. 2014;2014:96-101. [Medline]17], SampleLocator [Lablans M, Kadioglu D, Mate S, Leb I, Prokosch HU, Ückert F. Strategien zur Vernetzung von Biobanken: Klassifizierung verschiedener Ansätze zur Probensuche und Ausblick auf die Zukunft in der BBMRI-ERIC. Bundesgesundheitsbl. Mar 2016;59(3):373-378. [CrossRef]18-Schüttler C, Huth V, von Jagwitz-Biegnitz M, Lablans M, Prokosch HU, Griebel L. A federated online search tool for biospecimens (sample locator): usability study. J Med Internet Res. Aug 18, 2020;22(8):e17739. [CrossRef] [Medline]20], Observational Health Data Science and Informatics [OHDSI] ATLAS [ATLAS. GitHub. URL: https://github.com/OHDSI/Atlas/wiki/Home [Accessed 2024-01-02] 21], DZHK Feasibility Explorer [Hoffmann J, Hanß S, Kraus M, et al. The DZHK research platform: maximisation of scientific value by enabling access to health data and biological samples collected in cardiovascular clinical studies. Clin Res Cardiol. Jul 2023;112(7):923-941. [CrossRef]22]), each with its own data formats, standards, and query languages, including Structured Query Language (SQL), Clinical Quality Language (CQL), FHIR-Search, and Archetype Query Language (AQL). Consequently, querying across these different tools is difficult as there is no common query representation, and researchers must navigate these diverse tools and formats, particularly when dealing with cross-institutional data or distributed data storage within an institution.

Within the broader context of establishing a feasibility platform as part of the central German Portal for Health Data (FDPG), this research introduces a novel query syntax, serving as an intermediary between user interfaces and data repositories. This syntax is designed to be sufficiently flexible to ensure interoperability while maintaining simplicity. It focuses on the primary needs of a feasibility query, while allowing the syntax to be translated into repository-specific languages like FHIR-Search or CQL.

Our approach is grounded in the broader context of clinical research, where the reuse of eligibility criteria is common, whether in their original form or with modifications. These criteria are instrumental not just for feasibility studies but also for prescreening, data selection, extraction, and validation. Consequently, a need has emerged to decouple the representation of eligibility criteria from their implementation in specific systems. A mechanism to express complex criteria and combinations thereof in a way that is both intuitive and adaptable to varying implementation needs is required.

In this study, we describe the development and application of the query syntax within the network of the Medical Informatics Initiative (MII), encompassing 39 German university hospitals, specifically, the FDPG feasibility platform and show how it achieves interoperability across different research platforms.

Requirement Analysis

In our pursuit to create an intermediate query syntax to express eligibility criteria, we performed a requirement analysis. Within it, we combine insight from feasibility queries and cohort selection, with the latter often manifesting as a query output in the form of cohort size rather than a list of discrete patient identifiers.

Our research reviewed existing tooling, namely i2b2, TriNetX, and OHDSI Atlas. We aimed to identify common functionalities and essential features across these tools. To obtain insight into the criteria’s structure and complexity, we analyzed existing eligibility criteria from ClinicalTrials.gov [Home. ClinicalTrials.gov URL: https://clinicaltrials.gov/ [Accessed 2024-03-11] 23] and incorporated the findings from Ross et al [Ross J, Tu S, Carini S, Sim I. Analysis of eligibility criteria complexity in clinical trials. Summit Transl Bioinform. Mar 1, 2010;2010:46-50. [Medline]24] and Gulden et al [Gulden C, Mate S, Prokosch HU, Kraus S. Investigating the capabilities of FHIR search for clinical trial phenotyping. In: German Medical Data Sciences: A Learning Healthcare System. IOS Press:2018. [CrossRef]25]. Moreover, we integrated insights from the usability study by Schüttler et al evaluating feasibility tools [Schüttler C, Prokosch H-U, Sedlmayr M, Sedlmayr B. Evaluation of three feasibility tools for identifying patient data and biospecimen availability: comparative usability study. JMIR Med Inform. Jul 21, 2021;9(7):e25531. [CrossRef] [Medline]26], conducted expert interviews and recursively synchronized the requirements within our project. This multifaceted analysis allowed us to infer a set of requirements crucial for developing our query syntax. These requirements were categorized into query expressiveness, interoperability, and accessibility.

Expressiveness Requirements

The query syntax should:

allow for the definition of inclusion and exclusion criteria
be expressed in Boolean logic.
allow the expression of exclusion criteria.
support at least patient as query subject (feasibility queries can be performed on different query subjects: find all patients with specific criteria, find all encounters with specific criteria, find all specimens with specific criteria).
use unique identifiers for criteria and concepts.
support the following filter on the criterion level:
- existence of a criterion
- numeric restriction
- concept filter
- time restrictions
- attribute filters

Interoperability and Accessibility Requirements

The query syntax should:

provide an abstract (decoupling) layer between the user interface and the query execution.
have a low level of complexity and be easily translatable to different query languages.
be suitable for integration with the Health Level Seven International (HL7) FHIR standard used by the MII.
use a widely used data exchange format like JSON to ease parsing and generation
human readability or writability
ideally directly support the use of standard medical terminology (LOINC [Logical Observation Identifiers Names and Codes], SNOMED-CT [Systematized Nomenclature of Medicine–Clinical Terms], ICD-10 [International Statistical Classification of Diseases, Tenth Revision], etc) to lower mapping efforts

Evaluation

To evaluate the specification of the query syntax, we compared the final specification with our requirements and additionally demonstrated its applicability beyond the scope of FHIR by applying it to AQL.

We incorporated the solution into a large-scale real-world distributed feasibility query infrastructure, including a user interface, where it was integrated as the central intermediate query syntax. We further evaluated the applicability of the syntax to a wide range of clinical criteria and investigated its translatability, as well as how well it lends itself to creating a user interface for feasibility queries. Beyond the use in our projects based on German data sets and specifications, we also successfully applied the Clinical Cohort Definition Language (CCDL) to the international Synthea [Walonoski J, Kramer M, Nichols J, et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. Mar 1, 2018;25(3):230-238. [CrossRef]27] data set.

Ethical Considerations

No ethics board decision is required as we are presenting a technical solution without working on patients’ data.

Based on the requirements of a team of experts, we created the “Clinical Cohort Definition Language,” an intermediate query syntax for feasibility queries. The exchange format for the syntax was chosen to be JSON, which is currently widely used across the software community and is familiar to software developers from user interfaces, REST application programming interfaces (APIs), and query execution backends alike.

Criterion Types and Filters

The atomic component of CCDL is the criterion, serving as the foundational building block for inclusion or exclusion criteria. Each criterion is uniquely identified using a tuple of code system and code (which we named termCode) analogous to FHIR and OMOP-CDM (For conceptual equivalence between concepts across medical terminologies, multiple termCodes can be provided, eg, the criterion for sleep apnea may be represented by the termCodes G47.3 from ICD-10 and 73430006 from SNOMED-CT). Each termCode may have an additional “display” attribute, which serves purely as a visual representation to make the interpretation of a CCDL easier for humans. Within our CCDL, the criteria can occur as 1 of 4 different base types of criteria:

Exist criteria with no additional filters (eg, conditions or a laboratory concept with no filter, like the existence of a hemoglobin value regardless of the value)
Comparatively restricted numerical criteria (eg, hemoglobin laboratory value <12 g/dL)
Range-restricted numerical criteria (eg, hemoglobin laboratory value between 10 and 12 g/dL)
Value set restricted criteria (eg, gender=female or male)

Additionally, each criterion can be further restricted to a date range (eg, a Condition that occurred between January 1, 2024, and February 5, 2024), and it supports additional “attribute” filters, which can be added to each of the base types of criteria. The attribute filters support similar filters that identify the criterion types, ie, comparative numerical, comparative range, and value set restriction (eg, the body site=skin for a tissue specimen—see

Multimedia Appendix 1

Example of specimen criteria with an ICD-o-3 (International Classification of Diseases for Oncology, 3rd Edition) attribute indicating the location the specimen was taken from.

PNG File, 62 KB Multimedia Appendix 1).

The Explicit Logic Layer

The logic layer of the query aligns with existing solutions (i2b2/tranSMART/TriNetX) in representing the structured query as a combination of conjunctive normal form (CNF) and disjunctive normal form (DNF). Every criterion is embedded into the logic layer in a CNF for inclusion criteria and DNF for exclusion criteria (Figure 1). Inclusion and exclusion criteria are then logically combined via an AND NOT operator by subtracting the result of the exclusion criteria from the result of the inclusion criteria. Every feasibility query also receives a syntax version number and an additional description. The syntax version allows to distinguish the current version from future versions and changes, and the description allows the query to transport additional human-readable information about the query.

**Figure 1.** Structured query syntax top-level elements and logic layer. Certain criterion types will imply additional intrinsic logical relations. See ValueSet criteria and attribute filters and time restrictions.

The Implicit Logic (Criteria Expansion)

Apart from the explicit logic layer across criteria, different types of criteria and their filters further impact the execution logic as follows.

ValueSet criteria (see Figure 2D) allow the selection of multiple values (concepts). In this case, the value selections are treated as OR choices. For example, gender = (male, female) expands to: (gender=male) OR (gender=female).

**Figure 2.** Different types of criteria definitions. (A) Simple conceptual criterion. (B) Numeric criterion with quantitative comparison. (C) Numeric criterion with range restriction. (D) ValueSet criterion.

Attribute filters for each criterion are additional filters that can be set for each criterion. All individual filters on a criterion are combined using AND. For example, a specimen of type “Tissue specimen” and body site “skin” only applies to specimens with the type of Tissue and the body site skin.

The same applies to time restrictions. In this example, the time restriction “between 2020-01-01 and 2021-01-01” will predictably be added using an AND conjunction of the type, body site, and time restriction.

Furthermore, there is an implicit OR expansion of criteria when the criterion-identifying code is a parent code of multiple child codes within a terminology hierarchy. For example, suppose a researcher adds the diagnosis of type 2 diabetes mellitus as a criterion (ICD-10 code=E11). In that case, it can be expanded to search all subtypes of type 2 diabetes mellitus (E11, E11.3, E11.31, E11.30, E11.1, E11.11, E11.0, E11.01, E11.7, E11.75, E11.74, E11.73, E11.72, E11.4, E11.41, E11.40, E11.8, E11.81, E11.80, E11.2, E11.21, E11.20, E11.5, E11.51, E11.50, E11.6, E11.61, E11.60, E11.9, E11.91, E11.90) combining them using a logical OR operation).

Context-Dependent Criteria

In some cases, a criterion cannot be uniquely defined by its term code within a terminology, making it impossible to map a criterion for execution. One example of this is the use of ICD-10 condition codes for causes of death, specimen-specific conditions, or the general condition of a patient.

In modern terminologies like SNOMED-CT, this can be resolved using postcoordination, where a combined code, which carries the context, is created. For example, 419620001|Death|:42752001|Due to|=22298006|Myocardial infarction| which, while in line with SNOMED Compositional Grammar [Drenkhahn C, Ohlsen T, Wiedekopf J, Ingenerf J. WASP—A web application to support syntactically and semantically correct SNOMED CT postcoordination. Appl Sci (Basel). May 16, 2023;13(10):6114. [CrossRef]28], a template to express this is not currently part of the SNOMED-CT implementation.

The syntax we developed here allows for post-coordinated codes; however, we allow for an additional “context” attribute for some use cases where postcoordination is unsuitable. The context attribute is modeled after our termCode attribute and provides an extra term code to identify the context. Figure 3 provides an example for myocardial infarction as condition aand cause of death.

**Figure 3.** Myocardial infarction in 2 contexts (condition and cause of death).

Data Availability

As a technical solution to define the structure of the CCDL, we decided on the JSON Schema definition and made it publicly available [Release v100 · medizininformatik-initiative/clinical-cohort-definition-language. URL: https://github.com/medizininformatik-initiative/clinical-cohort-definition-language/releases/tag/v1.0.0 [Accessed 2024-03-18] 29]. The schema serves implementation guidance and validation purposes; the git repository also contains documentation examples, test data, and the capabilities to create matching test queries.

Requirement Verification

An analysis was performed based on the structure defined in the JSON schema to evaluate the developed intermediate query syntax. The following table (Table 1) presents the detailed results of this analysis:

This syntax efficiently meets a wide range of expressiveness and interoperability requirements, demonstrating capabilities in defining complex medical queries with standard terminologies and logical operators.

Table 1. CCDL^a components and their purpose regarding the expressiveness requirements.

Component	Key properties	Purpose and function	Requirements met
inclusionCriteria	CNF^b without negation	Conjunction of criteria with logical operators.	Expressive query formulation, boolean logic
exclusionCriteria	DNF^c without negation	Allows negation of criteria for comprehensive exclusion.	Negation of criteria on a group level, Boolean logic
termCode	code, system, version, display	Identifies concepts using standard coding systems.	Standard medical terminology, uniqueness
criterion	context, termCodes, valueFilter, attibuteFilter, timeRestriction	Sets criteria with defined context, using term codes and filters.	Expressiveness of simple and complex eligibility criteria
timeRestriction	afterDate, beforeDate	Specifies time frame for criteria fulfillment.	Time restrictions
unit	code, display	Standardized unit definition, adhering to UCUM^d units.	Use of standardized units
valueFilter	type (concept, quantity-comparator, etc)	Varied filtering types for flexible data querying.	Numeric restriction, concept restrictions
attributeFilters	type (concept, quantity-comparator, reference)	Mechanism for detailed filtering at the attribute level.	Detailed filtering, clinical relations

^aCCDL: Clinical Cohort Definition Language.

^bCNF: conjunctive normal form.

^cDNF: disjunctive normal form.

^dUCUM: ___.

Evaluation and Use of the CCDL in Real-World Scenarios

We believe the potential of the CCDL extends beyond its application in the federated feasibility portal of the German Research Portal for Health. Nevertheless, the CCDL remains a crucial technical solution within the FDPG’s feasibility portal.

We created a user interface for the feasibility queries in the FDPG (Figure 4), which generates the CCDL, demonstrating how it lends itself well to building feasibility query user interfaces [medizininformatik-initiative/feasibility-deploy. Medizininformatik-Initiative. 2024. URL: https://github.com/medizininformatik-initiative/feasibility-deploy [Accessed 2024-06-16] 30,medizininformatik-initiative/feasibility-gui. Medizininformatik-Initiative. 2024. URL: https://github.com/medizininformatik-initiative/feasibility-gui [Accessed 2024-06-16] 31]. The CCDL especially supports this as its design follows the typical way of feasibility query creation as seen in platforms such as the FDPG, i2b2, OMOP, and TrinetX. We evaluated the usability of the user interface across multiple projects [Sedlmayr B, Sedlmayr M, Kroll B, Prokosch HU, Gruendner J, Schüttler C. Improving covid-19 research of university hospitals in Germany: formative usability evaluation of the codex feasibility portal. Appl Clin Inform. Mar 2022;13(2):400-409. [CrossRef] [Medline]32,Schüttler C, Zerlik M, Gruendner J, et al. Empowering researchers to query medical data and biospecimens by ensuring appropriate usability of a feasibility tool: evaluation study. JMIR Hum Factors. 2023;10:e43782. [CrossRef]33] and embedded it in a German-wide distributed research infrastructure [Gruendner J, Deppenwiese N, Folz M, et al. The architecture of a feasibility query portal for distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) patient data repositories: design and implementation study. JMIR Med Inform. May 25, 2022;10(5):e36709. [CrossRef] [Medline]10,Prokosch HU, Gebhardt M, Gruendner J, et al. Towards a national portal for medical research data (FDPG): vision, status, and lessons learned. Stud Health Technol Inform. May 18, 2023;302:307-311. [CrossRef] [Medline]34]. These evaluations highlighted the applicability of the CCDL to a feasibility query and that the usability issues found were not due to a lack of expressiveness of the CCDL. We further used the Synthea data set to test the CCDL against [flare/.github/integration-test at main. medizininformatik-initiative/flare. GitHub URL: https://github.com/medizininformatik-initiative/flare/tree/main/.github/integration-test [Accessed 2024-06-16] 35], demonstrated the ability of the CCDL to represent a wide range of criteria [Rosenau L, Majeed RW, Ingenerf J, et al. Generation of a Fast Healthcare Interoperability Resources (FHIR)-based ontology for federated feasibility queries in the context of COVID-19: feasibility study. JMIR Med Inform. Apr 27, 2022;10(4):e35789. [CrossRef] [Medline]36], and showed that it could be fully translated to FHIR Search [medizininformatik-initiative/flare: Feasibility Analysis Request Executor. URL: https://github.com/medizininformatik-initiative/flare [Accessed 2024-01-04] 37], CQL [medizininformatik-initiative/sq2cql. URL: https://github.com/medizininformatik-initiative/sq2cql [Accessed 2024-01-04] 38], and AQL [Rosenau L, Ingenerf J. Structured queries to AQL: querying openEHR data leveraging a FHIR-based infrastructure for federated feasibility queries. In: MEDINFO 2023 — The Future Is Accessible. IOS Press; 2024:33-37. [CrossRef]39]. At the time of writing, almost 9000 CCDLs have been created and executed across Germany.

**Figure 4.** Example of a feasibility query in the central German Portal for Health Data (FDPG) feasibility portal to find patients with a leukocyte count within a normal range, with a malignant neoplasm of the brain, available tumor tissue specimen, and a CT scan after January 1, 2020, who did not take doxorubicin.

Principal Findings

We presented an intermediate feasibility query syntax that separates concerns between the user interface and the execution of a feasibility query on different research repositories and their specific query languages. The syntax defined here fulfills all the interoperability and accessibility requirements while supporting a broad range of expressiveness requirements we identified by analyzing existing query tools. The solution is fully compatible with established medical terminology standards, notation of parameters, and restriction semantics.

The solution we describe here is compatible with the query logic established by i2b2 and, therefore, tranSMART and TriNetX. This means that tools like i2b2 or similar could be easily extended to produce our syntax.

The CCDL was further used as part of a larger infrastructure for feasibility queries in Germany and is currently used as the interface for feasibility queries within the German research portal for health, supporting feasibility queries across 39 university hospitals in Germany. We successfully created translation components for FHIR Search and CQL in the current implementation. Current research also indicates the adaptability for FHIR Pathling’s aggregation API [Grimes J, Szul P, Metke-Jimenez A, Lawley M, Loi K. Pathling: analytics on FHIR. J Biomed Semant. Sep 8, 2022;13(1):23. [CrossRef]40], and SQL. The criteria content and the required reintroduction of data model—dependent information are obtained from an automatically generated search ontology [Rosenau L, Majeed RW, Ingenerf J, et al. Generation of a Fast Healthcare Interoperability Resources (FHIR)-based ontology for federated feasibility queries in the context of COVID-19: feasibility study. JMIR Med Inform. Apr 27, 2022;10(4):e35789. [CrossRef] [Medline]36].

Related Work

While the expression of eligibility criteria within a specific data model context is well established and adequately discussed in this work, research on a data-agnostic intermediate format for computable eligibility has been sparse in recent years.

Alper et al [Alper BS, Dehnbostel J, Shahin K, Ojha N, Khanna G, Tignanelli CJ. Striking a match between FHIR-based patient data and FHIR-based eligibility criteria. Learn Health Syst. Oct 2023;7(4):e10368. [CrossRef] [Medline]41] closely align with our approach of representing eligibility criteria in a structured format, namely the FHIR EvidenceVariable, which currently does not directly support the representation of eligibility criteria but may be refined to do so. Presumably, an FHIR representation would provide a structure beyond the realm of the MII, which could add significant value and improve syntax interoperability. However, in the early stages, the challenges of adopting new solutions could have impeded the development presented here. Our ongoing communication with the HL7 working group, which focuses on Research Studies, gives us confidence that once a suitable FHIR Resource is established or adapted to meet the needs outlined in our publication, the established technical components could be efficiently modified to align with these changes. Parallels can be drawn to implementing structured eligibility criteria, as presented by Yuan et al [Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. Apr 1, 2019;26(4):294-305. [CrossRef] [Medline]42] and Fang et al [Fang Y, Idnay B, Sun Y, et al. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc. Jun 14, 2022;29(7):1161-1171. [CrossRef] [Medline]43]. Their publications present a half-automated approach to generate feasibility queries based on free text study protocols from ClinicalTrials.gov [Home. ClinicalTrials.gov URL: https://clinicaltrials.gov/ [Accessed 2024-03-11] 23]. Their system is built around the OHDSI data model and uses the concept IDs. After converting the free text criteria, they allow users to edit and download an intermediate representation in JSON format. Unfortunately, no clear implementation guidelines on the format are given by Yuan et al [Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. Apr 1, 2019;26(4):294-305. [CrossRef] [Medline]42] and Fang et al [Fang Y, Idnay B, Sun Y, et al. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc. Jun 14, 2022;29(7):1161-1171. [CrossRef] [Medline]43]. However, recurring themes include differentiating inclusion and exclusion criteria and defining temporal constraints. To our knowledge, contrary to our approach, they do not allow for further restrictions beyond the value constraint on specific criteria.

Limitations

The separation of concerns, which the CCDL provides, also leads to the need for a mapping to identify the correct way of translating the CCDL information model to the local information model and terminology. The mapping allows the link between the specific data model and the criterion as identified in the CCDL to be created. One example of this is that for FHIR Search, the mapping for a condition criterion identified by a specific ICD-10 code C50.0 would provide the information that the condition is found in the end point “/Condition” and the search parameter for the term code is “code” – Leading to the translated FHIR Search URL:

[fhir-base-url]/Condition?code=http://fhir.de/CodeSystem/bfarm/icd-10-gm|C50.0.”

Further, additional information about the terminology is necessary to allow the selection of criteria within a terminology hierarchy, where the criterion resolves to multiple child criteria. Finally, this then requires the query executor and the CCDL-generating user interface to agree on criteria or terminology entries.

One common requirement currently not supported by the CCDL is temporal interdependencies between different criteria. Therefore, queries like a specific laboratory value within a certain period of diagnosis cannot be currently expressed using the CCDL.

We deliberately decided to delay the implementation of this extension as time dependencies significantly increase the complexity and performance requirements of any query execution.

The data model agnostic nature of the CCDL is inherently valuable. Its full potential—the capability to be used across different health care data models—requires more than technical translation. For cross-model query capability, the existence of the concepts in all target data models must be ensured.

Future Work

The CCDL described here provides a good base to make feasibility queries possible across various research repositories and close the gap between the different research repositories and their access. We have demonstrated the applicability of the CCDL to FHIR Search, CQL, and AQL; however, more repositories and other query languages, such as SQL on FHIR, OHDSI OMOP, or i2b2 might be added in the future. Further, one could imagine how separating the query syntax and execution would theoretically allow one to query different internationally distributed repositories such as FHIR, OMOP-CDM, and i2b2 simultaneously. Additionally, the CCDL is currently limited in how much it can express, and new capabilities will be added in the future. In this pursuit of making the CCDL more expressive, any extension must be weighed against the added complexity and overhead it introduces.

Conclusion

We presented a query syntax for medical feasibility queries, which creates an abstract layer between the user interface and the execution query language. We showed how it is flexible enough to be translated into different query languages and can be used to express various complex feasibility queries. The applicability of the query syntax was further demonstrated by embedding it into a large research project where it is used to query multiple millions of patients across 39 German university hospitals. The CCDL for feasibility queries will be extended in the future to allow more features, and we are currently working on a modified version for data selection and extraction.

Acknowledgments

The project was funded by the German Federal Ministry of Education and Research (BMBF) under the FDPG-PLUS Project (grants 01ZZ2309A, 01ZZ2309C, 01ZZ2309D, 01ZZ2309E, and 01ZZ2309F).

Conflicts of Interest

None declared.

Multimedia Appendix 1

Example of specimen criteria with an ICD-o-3 (International Classification of Diseases for Oncology, 3rd Edition) attribute indicating the location the specimen was taken from.

PNG File, 62 KB

Pfaff ER, Girvin AT, Gabriel DL, et al. Synergies between centralized and federated approaches to data quality: a report from the national covid cohort collaborative. J Am Med Inform Assoc. Mar 15, 2022;29(4):609-618. [CrossRef] [Medline]
Prayitno, Shyu C-R, Putra KT, et al. A systematic review of federated learning in the healthcare area: from the perspective of data properties and applications. Appl Sci (Basel). Nov 25, 2021;11(23):11191. [CrossRef]
Sebire NJ, Cake C, Morris AD. HDR UK supporting mobilising computable biomedical knowledge in the UK. BMJ Health Care Inform. Jul 2020;27(2):e100122. [CrossRef] [Medline]
Morrato EH, Lennox LA, Sendro ER, et al. Scale-up of the Accrual to Clinical Trials (ACT) network across the clinical and translational science award consortium: a mixed-methods evaluation of the first 18 months. J Clin Trans Sci. Dec 2020;4(6):515-528. [CrossRef]
Litton JE. Launch of an infrastructure for health research: BBMRI-ERIC. Biopreserv Biobank. Jun 2018;16(3):233-241. [CrossRef] [Medline]
AKTIN and SPoCK Research Group, Bienzeisler J, Triefenbach L, et al. A federated and distributed data management infrastructure to enable public health surveillance from intensive care unit data. In: Séroussi B, Weber P, Dhombres F, Grouin C, Liebe JD, Pelayo S, et al, editors. Studies in Health Technology and Informatics. IOS Press; 2022. [CrossRef]
Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578-582. [CrossRef] [Medline]
Lawrence AK, Selter L, Frey U. SPHN - the Swiss personalized health network initiative. Stud Health Technol Inform. Jun 16, 2020;270:1156-1160. [CrossRef] [Medline]
Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods Inf Med. May 2018;57(S 01):e50-e56. [CrossRef]
Gruendner J, Deppenwiese N, Folz M, et al. The architecture of a feasibility query portal for distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) patient data repositories: design and implementation study. JMIR Med Inform. May 25, 2022;10(5):e36709. [CrossRef] [Medline]
Benson T, Grieve G. Principles of Health Interoperability: SNOMED CT, HL7 and FHIR. Springer International Publishing; 2016. [CrossRef] ISBN: 978-3-319-30368-0
Stang PE, Ryan PB, Racoosin JA, et al. Advancing the science for active surveillance: rationale and design for the observational medical outcomes partnership. Ann Intern Med. Nov 2, 2010;153(9):600-606. [CrossRef] [Medline]
Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124-130. [CrossRef] [Medline]
Kalra D, Beale T, Heard S. The openEHR foundation. Stud Health Technol Inform. 2005;115:153-173. [Medline]
Weber GM, Murphy SN, McMurry AJ, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624-630. [CrossRef] [Medline]
Topaloglu U, Palchuk MB. Using a federated network of real-world data to optimize clinical trials operations. JCO Clin Cancer Inform. Dec 2018;2:1-10. [CrossRef] [Medline]
Scheufele E, Aronzon D, Coopersmith R, et al. tranSMART: an open source knowledge management and high content data analytics platform. AMIA Jt Summits Transl Sci Proc. 2014;2014:96-101. [Medline]
Lablans M, Kadioglu D, Mate S, Leb I, Prokosch HU, Ückert F. Strategien zur Vernetzung von Biobanken: Klassifizierung verschiedener Ansätze zur Probensuche und Ausblick auf die Zukunft in der BBMRI-ERIC. Bundesgesundheitsbl. Mar 2016;59(3):373-378. [CrossRef]
Schüttler C, Prokosch HU, Hummel M, et al. The journey to establishing an IT-infrastructure within the German Biobank Alliance. PLoS ONE. 2021;16(9):e0257632. [CrossRef] [Medline]
Schüttler C, Huth V, von Jagwitz-Biegnitz M, Lablans M, Prokosch HU, Griebel L. A federated online search tool for biospecimens (sample locator): usability study. J Med Internet Res. Aug 18, 2020;22(8):e17739. [CrossRef] [Medline]
ATLAS. GitHub. URL: https://github.com/OHDSI/Atlas/wiki/Home [Accessed 2024-01-02]
Hoffmann J, Hanß S, Kraus M, et al. The DZHK research platform: maximisation of scientific value by enabling access to health data and biological samples collected in cardiovascular clinical studies. Clin Res Cardiol. Jul 2023;112(7):923-941. [CrossRef]
Home. ClinicalTrials.gov URL: https://clinicaltrials.gov/ [Accessed 2024-03-11]
Ross J, Tu S, Carini S, Sim I. Analysis of eligibility criteria complexity in clinical trials. Summit Transl Bioinform. Mar 1, 2010;2010:46-50. [Medline]
Gulden C, Mate S, Prokosch HU, Kraus S. Investigating the capabilities of FHIR search for clinical trial phenotyping. In: German Medical Data Sciences: A Learning Healthcare System. IOS Press:2018. [CrossRef]
Schüttler C, Prokosch H-U, Sedlmayr M, Sedlmayr B. Evaluation of three feasibility tools for identifying patient data and biospecimen availability: comparative usability study. JMIR Med Inform. Jul 21, 2021;9(7):e25531. [CrossRef] [Medline]
Walonoski J, Kramer M, Nichols J, et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. Mar 1, 2018;25(3):230-238. [CrossRef]
Drenkhahn C, Ohlsen T, Wiedekopf J, Ingenerf J. WASP—A web application to support syntactically and semantically correct SNOMED CT postcoordination. Appl Sci (Basel). May 16, 2023;13(10):6114. [CrossRef]
Release v100 · medizininformatik-initiative/clinical-cohort-definition-language. URL: https://github.com/medizininformatik-initiative/clinical-cohort-definition-language/releases/tag/v1.0.0 [Accessed 2024-03-18]
medizininformatik-initiative/feasibility-deploy. Medizininformatik-Initiative. 2024. URL: https://github.com/medizininformatik-initiative/feasibility-deploy [Accessed 2024-06-16]
medizininformatik-initiative/feasibility-gui. Medizininformatik-Initiative. 2024. URL: https://github.com/medizininformatik-initiative/feasibility-gui [Accessed 2024-06-16]
Sedlmayr B, Sedlmayr M, Kroll B, Prokosch HU, Gruendner J, Schüttler C. Improving covid-19 research of university hospitals in Germany: formative usability evaluation of the codex feasibility portal. Appl Clin Inform. Mar 2022;13(2):400-409. [CrossRef] [Medline]
Schüttler C, Zerlik M, Gruendner J, et al. Empowering researchers to query medical data and biospecimens by ensuring appropriate usability of a feasibility tool: evaluation study. JMIR Hum Factors. 2023;10:e43782. [CrossRef]
Prokosch HU, Gebhardt M, Gruendner J, et al. Towards a national portal for medical research data (FDPG): vision, status, and lessons learned. Stud Health Technol Inform. May 18, 2023;302:307-311. [CrossRef] [Medline]
flare/.github/integration-test at main. medizininformatik-initiative/flare. GitHub URL: https://github.com/medizininformatik-initiative/flare/tree/main/.github/integration-test [Accessed 2024-06-16]
Rosenau L, Majeed RW, Ingenerf J, et al. Generation of a Fast Healthcare Interoperability Resources (FHIR)-based ontology for federated feasibility queries in the context of COVID-19: feasibility study. JMIR Med Inform. Apr 27, 2022;10(4):e35789. [CrossRef] [Medline]
medizininformatik-initiative/flare: Feasibility Analysis Request Executor. URL: https://github.com/medizininformatik-initiative/flare [Accessed 2024-01-04]
medizininformatik-initiative/sq2cql. URL: https://github.com/medizininformatik-initiative/sq2cql [Accessed 2024-01-04]
Rosenau L, Ingenerf J. Structured queries to AQL: querying openEHR data leveraging a FHIR-based infrastructure for federated feasibility queries. In: MEDINFO 2023 — The Future Is Accessible. IOS Press; 2024:33-37. [CrossRef]
Grimes J, Szul P, Metke-Jimenez A, Lawley M, Loi K. Pathling: analytics on FHIR. J Biomed Semant. Sep 8, 2022;13(1):23. [CrossRef]
Alper BS, Dehnbostel J, Shahin K, Ojha N, Khanna G, Tignanelli CJ. Striking a match between FHIR-based patient data and FHIR-based eligibility criteria. Learn Health Syst. Oct 2023;7(4):e10368. [CrossRef] [Medline]
Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. Apr 1, 2019;26(4):294-305. [CrossRef] [Medline]
Fang Y, Idnay B, Sun Y, et al. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc. Jun 14, 2022;29(7):1161-1171. [CrossRef] [Medline]

‎

API: application programming interface

AQL: Archetype Query Language

CCDL: Clinical Cohort Definition Language

CDR: Clinical Data Repository

CNF: conjunctive normal form

CQL: Clinical Quality Language

DNF: disjunctive normal form

FDPG: central German Portal for Health Data

FHIR: Fast Healthcare Interoperability Resources

HL7: Health Level Seven International

ICD-10: International Statistical Classification of Diseases, Tenth Revision

LOINC: Logical Observation Identifiers Names and Codes

MII: Medical Informatics Initiative

SNOMED-CT: Systematized Nomenclature of Medicine–Clinical Terms

SQL: Structured Query Language

Edited by Christian Lovis; submitted 18.03.24; peer-reviewed by Brian Alper, P Horki; final revised version received 16.06.24; accepted 23.06.24; published 14.10.24.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Bridging Data Models in Health Care With a Novel Intermediate Query Format for Feasibility Queries: Mixed Methods Study