Original Paper
Abstract
Background: Data element repositories facilitate high-quality medical data sharing by standardizing data and enhancing semantic interoperability. However, the application of repositories is confined to specific projects and institutions.
Objective: This study aims to explore potential issues and promote broader application of data element repositories within the medical field by evaluating and analyzing typical repositories.
Methods: Following the inclusion of 5 data element repositories through a literature review, a novel analysis framework consisting of 7 dimensions and 36 secondary indicators was constructed and used for evaluation and analysis.
Results: The study’s results delineate the unique characteristics of different repositories and uncover specific issues in their construction. These issues include the absence of data reuse protocols and insufficient information regarding the application scenarios and efficacy of data elements. The repositories fully comply with only 45% (9/20) of the subprinciples for Findable and Reusable in the FAIR principle, while achieving a 90% (19/20 subprinciples) compliance rate for Accessible and 67% (10/15 subprinciples) for Interoperable.
Conclusions: The recommendations proposed in this study address the issues to improve the construction and application of repositories, offering valuable insights to data managers, computer experts, and other pertinent stakeholders.
doi:10.2196/60293
Keywords
Introduction
Background
The sharing of medical data can enhance the efficiency of medical research, bolster transparency within the field of medicine, and respond to the stringent demands for research reproducibility and data openness [
]. Nonetheless, medical data present challenges due to their high complexity in semantics and heterogeneity, and they lack standards and uniform specifications at the level of fields and value domains. For instance, the numeric value “18” could represent the age at which a patient started smoking in one study, while in another, it might signify the total number of years a person has been smoking. This issue of semantic ambiguity in the data renders it challenging for other researchers to comprehend and use the information. It impedes the integration, comparison, and joint analysis of different data sets [ ], thereby obstructing data sharing.Metadata, essentially data about data, offers a solution to address such issues. Metadata can describe data, providing researchers with a comprehensive overview to aid understanding and application. Furthermore, it supports more precise retrieval and traceability. When data are accurately associated with metadata (such as “18” being linked to an individual’s total years of smoking), its semantics become much more straightforward. Metadata has already found applications in various fields, including molecular biology [
, ] and clinical medicine [ , ]. Guidelines for data management and sharing, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, also provide specifications for metadata to ensure that data are Findable, Accessible, Interoperable, and Reusable [ ]. However, researchers often find that creating and annotating metadata are time-consuming and prone to errors [ ]. This makes it challenging to ensure metadata quality and increases metadata heterogeneity across research. Hence, using standardized metadata for data collection to achieve semantic consistency from the inception of the data life cycle is essential to maximize semantic interoperability across multiple data sources.Data elements (DEs) are vital components of metadata, representing indivisible data units within a given context. The underlying framework of DEs can furnish rich metadata information, including unique identifiers, definitions, and value domains, among other attributes. The DE repository represents a platform structured in accordance with a standardized framework dedicated to the construction, storage, administration, and dissemination of DEs. Within this repository, DEs adhere to rigorous standardization, with their conceptual aspects, value ranges, and related attributes systematically linked to controlled vocabularies and other terminological systems. A DE repository facilitates the unified management and maintenance of internal metadata, ensuring semantic consistency and reducing the cost associated with redundant design efforts for project-specific metadata. By fostering the reuse and sharing of standardized DEs, barriers to data integration are diminished, thus propelling applications such as cross-institutional and cross-study meta-analyses in the realm of medical data [
]. This, in turn, unlocks the value of medical data.Currently, the prevailing international standards for DEs and repository construction are set by the ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) 11179 standard. The ISO/IEC 11179 standard establishes a conceptual model for DEs and their repositories, while also providing regulations for activities such as DE registration and management. Many DE repositories have been constructed in the medical field based on the ISO/IEC 11179 standard. However, the broader application of DE repositories has not yet been achieved, often limited to specific projects or internal use within particular institutions [
]. As the central platform for storing, managing, and sharing DEs and metadata, the degree of completeness in its construction directly influences the practical usage of DEs. Current research tends to focus more on the specific technical aspects and standards for constructing DE repositories. Simultaneously, there is a discernible deficiency in evaluating and analyzing typical repositories in the medical domain.Literature Review
Medical DE
Data elements, defined and standardized by the ISO/IEC 11179 standard, constitute the minors units for collecting, processing, and disseminating data [
]. The definition of DEs should ideally encompass 3 aspects—research questions, data acquisition, and data storage—to reflect the life cycle of a repository best [ ]. DEs play a pivotal role in standardizing clinical data collection, enhancing data quality, facilitating secondary analysis and applications [ , ], and serving as a base for systems based on artificial intelligence (AI) [ ].Currently, the development of Des primarily relies on multidomain expert consensus and collaboration, often achieved through iterative Delphi methods for discussion, identification, and refinement of relevant DEs [
]. This approach ensures the professionalism of DEs within specific domains but demands considerable time and personnel involvement. National Institute of Neurological Disorders and Stroke (NINDS) categorizes the development of common data elements (CDEs) into 4 phases: discovery, internal validation, external validation, and distribution [ ]. Numerous domains or projects have undergone multiple iterations of DE development, such as Stroke V2.0 CDE [ ]. More granular domain-specific DEs have been developed or reached consensus, spanning therapeutic methods [ ], examinations [ ], and others. With the continuous expansion of DEs, Kim et al [ ] proposed a comprehensive representation of real-world clinical semantics by defining semantic relationships and constraints between DEs.The application and evaluation of DEs have indeed garnered considerable attention. For instance, Fitzgerald et al [
] analyzed the seizure burden using clinical data in childhood epilepsies collected from CDE-based forms within the electronic medical record. Evaluation studies encompass DE quality [ ] and the effectiveness of data collection. Chen et al [ ] assessed the data collection effectiveness of DEs in real-world scenarios, while Ryan et al [ ] separately evaluated data capture rates for DEs in in-person and virtual visits scenarios.Recently, several studies have sought to advance the application of AI technologies throughout the life cycle of DEs. Natural language processing can assist in extracting specific DEs from clinical documents [
- ]. Renner et al [ ] explored the use of artificial neural networks to semiautomatically map DE models to the BRIDG model, thereby reducing the burden of manual mapping by experts. In addition, DEs play a role in collecting high-quality data to aid in training machine learning algorithms, further expanding their applications in the health information domain [ ]. Littlefield et al [ ], based on data collected through DEs, compared the performance of major machine learning algorithms with traditional statistical methods.DE Repository
DE repositories serve as platforms for storing and managing DEs, facilitating standardization, and promoting the integration and sharing of medical data through both top-down and bottom-up approaches [
]. The bottom-up approach relies on users creating and maintaining their DEs. Hegselmann et al [ ] have expanded upon this model by extracting real-world DEs from medical documents and standardizing them, thereby promoting the reusability of DEs. The DE repository can standardize metadata across various studies and institutions, facilitating data integration. Mallya et al [ ] coordinated variables in 4 research endeavors through the effective usage of the DE repository.Another crucial function of the DE repository is to ensure internal semantic consistency, thereby enhancing the semantic interoperability of DEs. One perspective suggests that the maintenance and updating of terms should be separated from the repository’s operational tasks [
]. Schladetzky et al [ ] developed the Mettertron system to enhance the linkage between the DE repository and the terminology system, simplifying terminology maintenance services. Meanwhile, mapping the repository model to the Web Ontology Language (OWL) ontology model can expand its semantic applications. Yuan and Li [ ] constructed a semantic relation metamodel for the repository and defined mapping rules to the ontology model.Recent research has also been conducted on data quality assessment based on the DE repository. For instance, Juárez et al [
] attempted to validate local data repositories by the central DE repository of networks, thereby providing a comparative method for assessing data quality across different sites. Kapsner et al [ ] centralized the maintenance of data quality checks by associating data quality assessment tools with DE definitions in the DE repository.Related Works
Current research lacks a comprehensive evaluation and analysis of multiple typical medical DE repositories. Ulrich et al [
] referenced information about specific metadata repositories in evaluating the application of the metadata exchange language QL4MDR. Hegselmann et al [ ] also provided a brief overview of repository practices based on the ISO/IEC 11179 standard in his study on Pragmatic MDR. Nonetheless, both studies stopped short of providing a detailed evaluation or analysis and did not endeavor to suggest an analytical framework or standard.Sasse et al [
] conducted a literature review on semantic annotation services for biomedical metadata. Through the review, they identified 10 supporting tools and conducted a detailed comparison based on 7 criteria. While their comparative dimensions are unidimensional and more aligned with tools rather than repositories, the variables in their semantic services provide a reference for the semantic dimensions in constructing the analytical framework for this study.Stoehr et al [
] assessed the portal usability of the CoMetaR repository. They divided the web page into different modules and used the Think Aloud method along with a usability scale, conducting a combined quantitative and qualitative evaluation. While their method of module-based usability assessment provides insights for constructing usability evaluation dimensions in this study, it is worth noting that their focus is on optimizing the web page’s interaction and does not compare it with the web pages of other repositories. Reichenpfader et al [ ] similarly assessed the usability of the Portal of Medical Data Models (MDM-Portal) repository by analyzing the users’ experience with the web page through various tests. The dimensions they analyzed also provide insights for the usability evaluation in this study.Objectives
The primary objective of this study is to explore potential issues and promote the broader application of DE repositories within the medical field by evaluating and analyzing typical repositories. Furthermore, we also endeavor to address the gap in the existing literature concerning the lack of evaluation of DE repositories, offering an overview of the typical DE repository construction in the medical field.
Methods
The method used in this study for screening medical DE repositories involves three distinct steps: (1) literature review, (2) literature curation, and (3) repositories identification (
).Literature Review
This study conducted literature searches on PubMed, Web of Science, and Scopus. The searches were performed using a combination of keywords such as “metadata,” “data element,” and “DE,” combined with “repository,” “registry,” “platform,” and “portal.” The language was restricted to English, and the research area was focused on life sciences or biomedicine. Up to April 2023, a total of 4119 papers were retrieved.
Repository Curation
The retrieved literature was imported into Endnote, and an advanced search was conducted explicitly targeting titles or abstracts containing terms such as “metadata repository,” “metadata registry,” and “data element repository.” After this secondary screening, a total of 192 papers were obtained. After reviewing the titles and abstracts of these papers, 98 papers related to DE repositories were identified and subsequently read in full. In the end, 11 DE repositories (shown in
) within the medical field were gathered. The information and data related to DE repositories were primarily collected from three sources: (1) the portals of various repositories, (2) relevant literature, and (3) project archives up to April 2023.Data element repositories | Country |
Samply.MDR [ | ]Germany |
MDM.Portal [ | ]Germany |
CoMetaR [ | ]Germany |
CentraXX MDR [ | ]Germany |
CancerGrid (2005-2010) [ | ]United Kingdom |
METEOR (METeOR) [ | ]Australia |
Aristotle Metadata Registry [ | ]Australia |
caDSR [ | ]United States |
USUIK [ | ]United States |
NIH CDE Repository [ | ]United States |
CEDAR [ | ]United States |
Repository Identification
To facilitate a more effective comparison, we established inclusion and exclusion criteria for screening the 11 repositories. The specific inclusion and exclusion criteria and the process are as follows:
- C1: DE repositories should be open-access public platforms, meeting noncommercial or managed by nonprofit organizations (such as universities or research institutions).
- C2: The repository’s metadata or DE resources should comprise more than 20,000 records.
- C3: We required the repository to have a well-established, independent portal to support access.
Five DE repositories were ultimately included (
): Cancer Data Standards Registry and Repository (caDSR) [ ], NIH (National Institutes of Health) CDE Repository, MDM-Portal [ ], Metadata online registry (METEOR), and Center for Expanded Data Annotation and Retrieval (CEDAR) [ ].Repositories | Country | First release year | Hosted by |
caDSRa | America | 2003 | National Cancer Institute |
NIHb CDEc Repository | America | 2015 | National Library of Medicine |
METEORd | Australia | 2022 | Australian Institute of Health and Welfare |
MDMe | Germany | 2012 | Heidelberg University Hospital |
CEDARf | America | 2014 | Stanford University |
acaDSR: Cancer Data Standards Registry and Repository.
bNIH: National Institutes of Health.
cCDE: Common Data Element.
dMETEOR: Metadata Online Registry.
eMDM: Medical Data Model.
fCEDAR: Center for Expanded Data Annotation and Retrieval.
Analysis Framework
We aimed to comprehensively analyze the repositories, encompassing multiple dimensions, including technology, management, and services. To achieve this, we developed a comprehensive analysis framework consisting of 7 dimensions and 36 secondary indicators (
). The 7 dimensions include the following:- Data resources: providing an overview of the repository’s data resources, including data volume, data types, data sources, coverage, and domains.
- Resource organization: focusing on how metadata or DE resources are effectively organized and managed throughout their life cycle, including underlying frameworks, traceability, and version control.
- Quality control: analyzing how the platform ensures the quality of stored data.
- Semantic annotation: assessing how the repository achieves internal semantic consistency to enhance semantic interoperability.
- Service support: examining the services offered to users by the repository, including basic services, such as retrieval and download, and advanced features such as analysis tools.
- Usability: evaluating the platform’s openness, accessibility, and intelligibility, including the availability of support documents and training materials.
- Practice of FAIR principles: finally, analyzing the repository’s adherence to the FAIR principles as a supplementary assessment.
Data resources and services dimensions are primarily determined by repository and portal characteristics, while resource organization, quality control, and semantics leverage insights from relevant literature and the ISO/IEC 11179 standard. Practice of FAIR adheres to the FAIR principle and its 15 subprinciples. Furthermore, 4 experts in data management, data warehousing construction, and data standardization participated in consultations to refine the analysis framework. Their input informed the division, naming, and selection of secondary indicators for the dimensions. The analysis framework was further refined based on expert suggestions primarily through (1) revising the name of Semantic Annotations and Service Support dimensions; (2) dividing the Usability dimension into 3 distinct modules: openness, accessibility, and intelligibility; and (3) adding more granular secondary indicators, such as source link and historical versions encoding, to enhance the depth of analysis (
). For a detailed description of the indicators included in this analytical framework, see .Results
Data Resource
A comparison of the data resources of the 5 repositories was conducted (
). The data resources of all 5 repositories are comprehensive, but each has its emphasis on specific subdomains. For example, caDSR focuses on cancer-related DEs, METEOR emphasizes health and welfare, while others encompass DEs from various biomedical research domains. The types of resources in the repositories include elements and forms. However, the names for DEs are not yet standardized across the repositories and may consist of terms such as CDE, DE, and Field, among others. Resources in caDSR, METEOR, and NIH CDE Repository are sourced from government and institutional research projects and are released through a top-down approach. In contrast, the other platforms rely on contributions from individual users, following a bottom-up data source model. The latter category tends to have a larger volume of resources, with MDM, for instance, cataloging the most extensive collection of DEs, totaling up to 500,000 elements and more than 20,000 forms.Repositories | Area | Type | Total amount | Submitter |
caDSRa | Cancer research, etc | DEsb | 71,743 DEs | NIHc research institutes and programs |
NIH CDEd Repository | Biomedical field | DEs and formse | 20,970 DEs; 1704 forms | NIH research institutes and programs |
METEORf | Health and welfare | DEs | 21,180 DEs | Australian health department or research institution |
MDMg | Clinical trials, special diseases, etc | DEs and forms | 500,000 DEs; 24,810 forms | Individual user or project submissions |
CEDARh | Biomedical field | DEs and forms | 120,829 DEs; 2000 forms | Individual user or project submissions |
acaDSR: Cancer Data Standards Registry and Repository.
bDEs: Data Elements.
cNIH: National Institutes of Health.
dCDE: Common Data Elements.
eForms: forms composed of data elements (eg, case report forms, questionnaires).
fMETEOR: Metadata Online Registry.
gMDM: Medical Data Model.
hCEDAR: Center for Expanded Data Annotation and Retrieval.
Resource Organization
Regarding repository frameworks, all repositories except for MDM are constructed based on the ISO/IEC 11179 standard (
). The DEs in these repositories are built upon the conceptual model of DEs and value domains outlined in the ISO/IEC 11179 standard. METEOR has extended this framework by introducing a top-level category called “data set specification” (DSS). This category is used to group specific DEs. For example, the “Diabetes (Clinical) DSS” in METEOR contains DEs related to standardized data collection for patients with diabetes. In MDM, which uses a custom DE framework, the attributes of DEs are relatively concise. They typically include only the DE description, data type, concept, and value domain information.All repositories have assigned unique identifiers to their resources, although the granularity of the assignment varies. In the case of MDM, the smallest unit assigned an internal identifier is a form, and unique identifiers are not provided for individual DEs. On the other hand, other repositories assign unique identifiers at the level of individual DEs. Furthermore, the encoding of unique identifiers is standardized only within caDSR and METEOR. Simultaneously, some other repositories have inconsistent encoding methods for resources at the same hierarchical level, or they directly reference the source identifiers.
Regarding external provenance, most platforms can provide basic provenance information for DEs, such as the data submitter or the source institution. Among them, MDM provides the most detailed information, including the owner or institution of the DE, source links, and partial contact information. Regarding internal referencing and provenance, METEOR demonstrates the most comprehensive practice. It can support granularity to value domains, object classes, and properties. DEs in METEOR are listed with links to the attributes they reference and from which elements they are derived. Corresponding attributes such as value domains and object classes also provide links to all DEs that reference them. This allows for bidirectional provenance between elements and attributes.
Repositories | Framework | Naming specification | Classification scheme | Identifier | Traceability | Version control | ||||||||
Name | Encoding | Submitter/source information | Source link | Source identifier | Internal citation link | Historical versions accessible | Version encoding | |||||||
caDSRa | ISO/IEC 11179 | Yes | Yes | Public ID | 7 digits | Yes | No | No | No | No | Yes | |||
NIHb CDEc Repository | ISO/IEC 11179 | No | Yes | Identifiers | N/Ad | Yes | No | Yes | Yes | Yes | No | |||
METEORe | ISO/IEC 11179 | Yes | Yes | Identifiers | 6 digits | Yes | Yes | No | Yes | Yes | No | |||
MDMf | N/A | No | Yes | Public ID | N/A | Yes | Yes | No | No | Yes | Yes | |||
CEDARg | ISO/IEC 11179 | No | No | Identifiers | N/A | N/A | No | Yes | No | No | No |
acaDSR: Cancer Data Standards Registry and Repository.
bNIH: National Institutes of Health.
cCDE: Common Data Elements.
dN/A: not applicable.
eMETEOR: Metadata Online Registry.
fMDM: Medical Data Model.
gCEDAR: Center for Expanded Data Annotation and Retrieval.
The version number formats for DEs in most repositories lack uniformity; in some cases, no version numbers are provided. In addition, some repositories do not allow access to historical versions of DEs, making them inaccessible for viewing. MDM has better version control practices in place. Historical versions of DEs are accessible and come with a standardized version number format. The version number includes a detailed editing data and information about the editor (eg, “4/6/22-Smith”). This allows users to navigate and browse historical versions using the version number as a reference.
Quality Control
DE quality control is primarily achieved through the audit process during registration. Currently, the audit process relies mainly on manual review, and it is evident shown in
that all 3 top-down repositories have established governance committees to conduct quality control audits. The audit process includes reviewing the basic attributes of elements (such as concepts and value domains), mapping or references between elements and controlled vocabularies, and the domain-specific expertise of elements. This audit process helps ensure the quality and authority of the published DEs, ensuring that their structural attributes are correct and appropriately specialized within their respective domains. However, it can be resource-intensive and time-consuming, requiring the involvement of experts. The bottom-up repositories MDM, on the other hand, cannot implement this process in the same way. Instead, it relies on repository administrators to conduct quality control audits. While this method can ensure only the basic structural integrity of data elements, its higher review efficiency makes it more suitable for bottom-up repositories handling large volumes of data element submissions.A complete and well-defined registration workflow is a crucial part of quality control of DEs. MDM and CEDAR have not provided an entire registration process, while other repositories offer information on the registration workflow for DEs within the platform. They also assign identifiers for different registration statuses. METEOR and caDSR have more comprehensive registration statuses, with a finer-grained classification. In addition, only the NIH CDE Repository provides quality identifiers for DEs and includes only 1-level identifier (NIH-Endorsed). Other repositories do not appear to have detailed quality scoring or rating information. Only the NIH CDE Repository provides quality indicators for DEs, including a single-level indicator (NIH-Endorsed). Conversely, MDM relies on users to rate DEs, and other repositories do not seem to have detailed quality ratings or grading content.
Repositories | Review method | Auditors | Quality mark | Registration | ||||
Registration workflow | Status identifier | Status type | Quality control records/documents | |||||
caDSRa | Manual review | Committee experts | No | Yes | Full life cycle | 10 | No | |
NIHb CDEc Repository | Manual review | Committee experts | NIH-Endorsed CDE | Yes | Full life cycle | 2 | No | |
METEORd | Manual review | committee experts | No | Yes | Full life cycle | 9 | Yes | |
MDMe | Manual review | Portal administrator | No | N/Af | No | N/A | Partially provided | |
CEDARg | N/A | N/A | No | N/A | No | N/A | No |
acaDSR: Cancer Data Standards Registry and Repository.
bNIH: National Institutes of Health.
cCDE: Common Data Element.
dMETEOR: Metadata Online Registry.
eMDM: Medical Data Model.
fN/A: not applicable.
gCEDAR: Center for Expanded Data Annotation and Retrieval.
Semantic Annotation
The repositories achieve semantic annotation by standardizing the mapping of DEs to terminology systems, ensuring internal semantic consistency (
). The primary terminology systems used by these repositories include Unified Medical Language System (UMLS), Logical Observation Identifiers Names and Codes (LOINC), and Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT), with others such as National Cancer Institute Thesaurus (NCIT) and National Center for Biomedical Ontology (NCBO) also being used. METEOR has developed its internal glossary and achieves semantic annotation through metadata items called “Glossary Items (GIs).” GIs share the same DE framework as other elements but store the definition of a term. Other DEs can achieve semantic annotation by referencing the corresponding GI associated with a specific term. Creating and referencing internal glossaries effectively harnesses the advantages of the ISO/IEC 11179 DE framework. GIs essentially facilitate clustering according to the DE framework, including object class, property, value domain, and more. DEs belonging to the same object class can be associated with the terminology item by referencing it. For instance, by querying the GI item “person,” you can observe all DEs that reference this term as their object class. This clustering enhances the interrelatedness of DEs at the conceptual level. However, the shortcomings of internal glossaries are also evident. If DEs need to be used across institutions, there is a need for remapping terminology, or semantic inconsistencies may persist. Regarding semantic interoperability, internal glossaries are less effective referencing internationally recognized terminology repositories.Repositories | Annotation source | Mapping vocabulary | Granularity | Annotation method | Annotation content |
caDSRa | Controlled vocabulary | NCITb | DEc concept and permissible value | Manual mapping | Terms and links |
NIHd CDEe Repository | Controlled vocabulary | NCIT, UMLSf, etc | DE concept and permissible value | Manual mapping | Terms and coding |
METEORg | Self-built vocabulary | Self-built vocabulary | DE concept | Manual mapping | Terms and links |
MDMh | Controlled vocabulary | UMLS, LOINC,i and SNOMED CTj | DE concept and description | Automatic mapping | Terms and coding |
CEDARk | Controlled vocabulary | NCBOl | DE concept and permissible value | Manual mapping | Terms and links |
acaDSR: Cancer Data Standards Registry and Repository.
bNCIT: National Cancer Institute Thesaurus.
cDE: Data Element.
dNIH: National Institutes of Health.
eCDE: Common Data Element.
fUMLS: Unified Medical Language System.
gMETEOR: Metadata Online Registry.
hMDM: Medical Data Model.
iLOINC: Logical Observation Identifiers Names and Codes.
jSNOMED CT: Systematized Nomenclature of Medicine—Clinical Terms.
kCEDAR: Center for Expanded Data Annotation and Retrieval.
lNCBO: National Center for Biomedical Ontology.
Service Support
A robust retrieval system can enhance the discoverability of data resources within repositories. Each of the 5 repositories possesses unique search capabilities, for instance, caDSR and NIH CDE Repository allow users to search by the names of NIH-affiliated institutions (
). METEOR and MDM allow users to construct search queries using Boolean operators and keywords. Furthermore, these platforms also differ in their secondary filtering criteria, with caDSR and METEOR supporting additional filters such as submitting organization, registration status, and registering organization, among others.The repositories offer personalized services to users, including features such as personal favorites in NIH CDE Repository and METEOR, enabling users to collect elements of interest and record and browse their own created or edited metadata. In CEDAR, DEs are organized in a folder structure, facilitating the categorization and management of metadata.
With regard to data element download and export services, all repositories except CEDAR offer support for multiple export formats. MDM supports export in 18 formats, including Comma Separated Values (CSVs) and Operational Data Model (ODM), but it is limited to exporting data by form and allows only 50 downloads per week. METEOR provides Word and PDF export formats with lower levels of structure, which can impact interoperability. caDSR and NIH CDE Repository allow DEs and forms to be exported in various structured document formats such as EXCEL, XML, and JSON, providing a relatively comprehensive download service. On the other hand, CEDAR offers only JSON source code for elements without direct download capabilities. Although it has a REST API interface, it may not be as convenient for nonbatch exports.
All 4 platforms except CEDAR provide web-based metadata comparison tools, but they differ in the dimensions they support for comparison. MDM and METEOR can perform horizontal comparisons for all information of 2 DEs, while caDSR supports comparisons for multiple DEs. NIH CDE Repository offers vertical comparisons, allowing users to compare DEs with their historical versions. In addition to the comparison tools, MDM also provides a rich set of auxiliary tools, including ODMedit (for creating ODM format DEs and forms) [
], CDEGenerator (for visualizing concept frequencies in forms) [ ], OpenEDC (for web-based data collection using forms), and more. MDM offers a more significant number of tools and functionality than other repositories.Repositories | Features of retrieval | Results secondary screening | Register account | Account service | Sharing agreement | Download service | Download granularity | Export format | Comparison tool | Other tools |
caDSRa | Abbreviation of the institute’s name, identifier, etc | Registration status, submitter, etc | N/Ab | N/A | N/A | Unlimited downloads, batch download | DE and form | EXCEL, XML, and JSON | Compare 2 or more DEsc | Form creation |
NIHd CDEe Repository | Abbreviation of the institute name, identifier, etc | Data type, submitter, etc | UTSf account | Personal favorites, browsing history, etc | N/A | Unlimited downloads, batch download | DE and form | EXCEL, XML, JSON | Compare different versions of DE | Not support |
METEOR | Keywords, identifier, Boolean operators, etc | Registration organization, data type, etc | Internal account | Personal favorites and settings, browsing history, etc | N/A | Unlimited downloads, batch download | DE | Word, PDF | Compare 2 DEs | DE creation |
MDMg | Keywords, Boolean operators, wild card character, etc | Keywords, research field | Internal account | Personal favorites, browsing history, etc | Four version CC 4.0 licenses | 50 forms per week | Form | CSV, EXCEL, SQL, and other 18 formats | Compare 2 or more DEs | Web-based date capture, visualization, visual analysis tools, etc |
CEDARh | Keywords, terminology, etc | Data type, version, etc | Internal account | APIi keys, personal folder | N/A | Not support | Not support | JSON code | Not support | DE and form creation |
acaDSR: Cancer Data Standards Registry and Repository.
bN/A: not applicable.
cDEs: Data Elements.
dNIH: National Institutes of Health.
eCDE: Common Data Element.
fUTS: UMLS Terminology Services.
gMDM: Medical Data Model.
hCEDAR: Center for Expanded Data Annotation and Retrieval.
iAPI: Application Programming Interface.
Usability
We analyze the usability of the repositories from 3 perspectives: openness, accessibility, and intelligibility. Openness focuses on the extent to which the repository’s resources and services are available for browsing and use. Among the 5 repositories in the study, access is typically restricted by requiring user accounts. Regarding data resources, caDSR, METEOR, and MDM provide unrestricted browsing access, including both forms and DEs. However, the NIH CDE Repository restricts viewing some semantic annotation content. Regarding services, MDM and CEDAR restrict auxiliary tools to logged-in users, including web-based creation and submission of DEs, among other features. In contrast, the DE creation and registration tools in the other 3 top-down repositories are not open to regular users. CEDAR requires registration for access to all services and resources, but it provides source code and technical documentation on GitHub. In summary, caDSR and METEOR exhibit the highest level of openness regarding resources and tools (
).Accessibility considers the types of accessible resources, the methods of accessibility, and the extent to which resources are accessible. There are primarily 2 ways to access repository resources: web downloads and application programming interface (API) interfaces. CEDAR does not provide web downloads and offers only JSON source code and an API interface. MDM requires user login for downloading forms and performing batch downloads, with a limit of 50 forms per week. In contrast, caDSR and NIH CDE Repository allow free downloads and batch exports of DEs without the need for login, making them relatively more accessible regarding resource availability.
Intelligibility focuses on the availability of supplementary information provided by the repositories and the complexity of constructing DEs. First, all 5 repositories offer user guide documents on their portals, which introduce basic information and operations. In addition, CEDAR and caDSR have Archive and Wiki web pages to provide further information and support. The repositories also pay attention to teaching concepts related to DEs. Since not all users have a computer-related background, all 4 platforms except MDM provide introductions or tutorials on metadata, DEs, and the ISO/IEC 11179 standard.
In addition, most repositories lack descriptions and visual representations of their data resources’ coverage areas and quantities. On its portal page, MDM provides visualizations of its DEs categorized by proportion, which can help users understand the resources within the repository. Compared with the complexity of DEs across the 5 platforms, MDM benefits from its self-built framework, resulting in relatively more straightforward and more concise DEs with better comprehensibility. In contrast, other platforms build their DEs based on the ISO/IEC 11179 standard and often expand or subdivide the framework, increasing the amount of information and complexity, which can affect comprehensibility.
Repositories | Openness | Accessibility | Intelligibility | |||||||||||||||||||||
Open access | Restriction | Create and submit | Open source | Auxiliary tool | Method | Limitation | Batch download | Quantity limitation | User guide | DE tutorial | DE complexity | |||||||||||||
caDSRa | DEsb | No | No | No | Yes | Download and APIc | No | Yes | No | Document | Yes | High | ||||||||||||
NIHd CDEe Repository | DEs and forms | Partial DEs | No | No | Yes | Download and API | No | Yes | No | Documents | Yes | High | ||||||||||||
METEORf | DEs | No | No | No | Yes | Download | No | Require log-in | No | Document | Yes | Middle | ||||||||||||
MDMg | DEs and forms | No | Yes | No | Partially require log-in | Download | No | Require log-in | 50 forms per week | Video | No | Low | ||||||||||||
CEDARh | DEs and forms | All resources | Yes | Yes | Require log-in | JSON code and API | Require log-in | No | No | Video and document | Yes | Middle |
acaDSR: Cancer Data Standards Registry and Repository.
bDEs: Data Elements.
cAPI: Application Programming Interface.
dNIH: National Institutes of Health.
eCDE: Common Data Elements.
fMETEOR: Metadata Online Registry.
gMDM: Medical Data Model.
hCEDAR: Center for Expanded Data Annotation and Retrieval.
Practice of FAIR Principles
Finally, this study supplemented the analysis by evaluating the extent to which the 5 repositories comply with the FAIR principles. The level of compliance was categorized into 4 groups: complies completely, complies entirely, fails to comply, and unclear. The detailed content of each principle in FAIR can be found in
.In
, a horizontal tally was conducted, with each of the 4 subprinciples of FAIR considered separately. The proportions of different levels of compliance to the subprinciples were calculated individually. For instance, considering the findable subprinciple, which comprises 4 principles (F1-F4), there are 20 cells. The proportions of “complies completely,” “complies partly,” “fails to comply,” and “unclear” were then calculated for these 20 cells. The same process was applied to the remaining 3 subprinciples. Based on this step, A was generated, depicting the overall adherence of repositories to each subprinciple. B, calculated using the same method on a column basis, illustrates each repository’s implementation of the FAIR principles.In comparison, among the 5 repositories, the NIH CDE Repository demonstrates the highest level of compliance with the FAIR principles, while CEDAR falls behind the other 4 platforms. When examining the 4 subprinciples of FAIR, overall, Accessibility is relatively well practiced and at the same time, Findability and Reusability have lower percentages of full compliance, indicating subpar adherence to these principles’ aspects.
In the “Findable” principle, F4 “(Meta)data are registered or indexed in a searchable resource” is crucial for ensuring the discoverability of data resources on the web. Most of the 5 repositories analyzed in this study did not fully practice this aspect, impacting their web-based resources’ discoverability. Only MDM has been registered in the international academic domain of registry and indexing for repositories, such as re3data and FAIR sharing.
In terms of the “Specific (meta)data are referred to by their identifier” subprinciple of interoperability, 3 repositories did not fully comply. This is mainly due to a lack of rich cross-referencing between DEs. METEOR had the best compliance with this subprinciple, as it provides comprehensive reference information for DEs on their detail pages.
Regarding the practice of “Reusability,” the issues with the repositories primarily focus on data usage licenses and source information. Most repositories lack clear data usage licenses, which hinders data sharing. In addition, source information is often limited, with most repositories providing only submitter and time stamp information. Details about how the data were created and whether they had been previously published are typically not provided, impacting reusability.
Discussion
Principal Findings
The results of the analysis provide us with an overview of 5 DE repositories. The 2 approaches in repositories, namely, the top-down and bottom-up approaches, bring about differences and distinct characteristics regarding resources, semantics, and quality control. The community-driven, bottom-up approach where users submit resources, as seen in MDM and CEDAR, results in a richer pool of resources. This implies that repositories of this type need to implement more automation in various activities, including automated verification and terminology mapping. On the other hand, the top-down approach is the opposite of community-driven models. It relies on collaboration among experts from various domains. Expert committees are involved in designing, creating, and reviewing DEs in all 3 repositories following this approach. DEs following this approach have higher quality and authority, with a finer granularity in semantic annotations. However, consideration should still be given to their applicability outside the specific institution or research context. For example, DEs provided by repositories such as caDSR and NIH CDE Repository may be tailored to particular NIH-affiliated institutions and research scenarios. Conversely, community-driven DEs have a broader source base, potentially better reflecting real-world research situations. Their cross-study applicability might be more extensive.
Balancing the complexity and usability of DEs and repository metamodels is crucial. The data model structures built upon the ISO/IEC 11179 standard can be complex, and clinical researchers may not easily understand their underlying frameworks. It is essential to strike a balance to ensure that the repository remains user-friendly and accessible to its intended audience. Simplifying the framework, however, can complicate the organization of the repository. This may reduce the available information, which can negatively impact activities such as DE deduplication, establishing relationships, and hinder the development of advanced applications such as intelligent recommendations. While the self-built model of MDM is simple and user-friendly, it can organize resources only at the level of forms, lacking granularity down to the level of individual DEs. In contrast, repositories such as caDSR, built on the ISO/IEC 11179 standard, require more investment in learning and usage, but they offer more comprehensive and detailed management and organization capabilities.
Standardize Data Sharing
Promoting data sharing does not necessarily mean unrestricted sharing. DE sharing also requires clear agreements and statements. Among the 5 repositories in this study, only MDM provides 4 different versions of the CC-4.0 license as options for form resources, which offers clarity in licensing for these resources. The other 4 repositories have not provided such information on their platforms, and their affiliated institutions’ data policies regarding the applicability to resources within these repositories are also somewhat unclear. Overall, these repositories seem to focus less on data sharing and reuse.
In the rapidly evolving landscape of open science, many mature examples of data-sharing strategies can serve as valuable references [
, ]. DEs are a form of data, and designing their sharing strategies can benefit from looking at the practices of other data-sharing platforms. We recommend that repositories clearly define protocols for sharing and reusing DEs in their portals. Furthermore, they should offer granularity down to the level of individual DEs, allowing resource submitters to choose specific sharing agreements. This approach can prevent unrestricted sharing and ensure greater control over DE access and usage.The Interconnected Ecosystem of Repositories
While DE repositories facilitate the integration of DEs across institutions and projects, the gaps between DE repositories should not become new barriers to integration. In this study, the 5 repositories analyzed do not support direct sharing and exchange of resources among each other. Instead, resources must be exported and then recreated in the target repository. However, the exported formats may not be highly structured, and there is no support for importing these files for quick creation in another repository.
Despite most repositories being built based on the ISO/IEC 11179 standard, there is still a lack of interoperability and data exchange between these repositories. These limitations suggest establishing a comprehensive interconnected ecosystem for DE repositories. Both top-down and bottom-up approaches can complement each other in achieving this goal, thereby avoiding redundant construction and facilitating domain-specific developments. This can ultimately lead to more efficient and collaborative medical research efforts.
To build the interconnected ecosystem of repositories, our recommendations are as follows:
- Choose standardized repository frameworks (such as the ISO/IEC 11179 standard) and terminology systems (eg, UMLS) to avoid the need for secondary mapping of underlying frameworks or semantics.
- Enhance the export of DEs to provide more structured documents, such as CSV and JSON.
- Develop DE creation features that offer rapid import services, supporting content creation from structured documents.
- Consider developing a unified interface, like QL4MDR [ ].
Enrich Information About DEs
A significant portion of DEs in DE repositories remains at the level of satisfying basic framework information. That is, they provide fundamental semantic information but lack application-oriented details. This includes contextual information such as applicable scenarios, background details, and application outcomes. In this scenario, DEs are isolated fragments scattered throughout the repository, providing users minimal application support. Users are left uncertain whether a DE adheres to a particular standard or belongs to a specific data set, making it challenging to select accurate DEs and organize them into the required format. The repository also falls short in delivering advanced services such as intelligent recommendations.
Therefore, this study suggests that DE repositories should enrich the application information of DEs to support their practical use. We categorize application information into two aspects: (1) Application scenarios and background details: specifying the scenarios for which DEs are applicable, whether generic or specialized, and the standards or data sets from which they originate. Such contextual information can assist the repository in better associating and organizing relevant DEs. (2) Performance-related information: this can include statistics on the number of applications of a DE and user ratings, feedback, and other relevant details.
Furthermore, we recommend that the repositories consider using ontology resources to provide standardized terminology. Mature ontology repositories and tool kits, such as NCBO BioPortal [
] and Ontology Lookup Service [ ], offer a wealth of ontology resources and support the download and localization of various ontology resources or their invocation through APIs. By using methods such as precise matching and semantic similarity calculation, DEs can be mapped to ontology terms, thereby standardizing DE concepts, value domains, and so on. This can provide specific term annotations for DEs and further enrich the available information.Focus on Sensitive Data Protection
The existing repository contains DEs that collect sensitive information such as ID numbers, addresses, and phone numbers. However, these elements lack specific classification or identification to indicate that these DEs are used to collect sensitive data and may need to be deidentified or deleted. While the repository does not contain original research data, this remains a crucial issue for subsequent DE usage. We propose that the repository should align with the Health Insurance Portability and Accountability Act [
] or other relevant regulations and map the repository’s DEs with the protected personal health information. The repository should create classifications and identifiers for privacy-related DEs. This will serve as a reminder to users about the sensitivity of such data and promote standardized usage practices.Addressing the balance between FAIR data-sharing principles and privacy protection, we emphasize that FAIR promotes secure, compliant, and interoperable data sharing, not unrestricted dissemination. It advocates for data classification and the application of tailored sharing environments. Privacy data can be deidentified or directly removed during the aggregation phase. In subsequent sharing and reuse processes, while adhering to FAIR principles, we should establish a secure usage environment and sharing guidelines for the data. This includes data classification and grading, implementing differential sharing protocols, and using privacy-enhancing technologies such as privacy computing and federated learning to control data accessibility. This approach ensures effective data sharing and reuse under the FAIR principles while upholding privacy protection.
Implications
Theoretical Implications
In contrast to existing research that mainly concentrates developing specific technical aspects of DE repository construction, this study compares 5 typical DE repositories within the medical field and systematically evaluates and analyzes them. Furthermore, this study introduces a novel analysis framework consisting of 7 dimensions and 36 secondary indicators, based on the ISO/IEC 11179 standard and integrated with the FAIR principles. While this study focuses on the analysis of 5 DE repositories, we are confident that the proposed framework holds broad applicability to a wide range of repositories in the medical field. First, the 5 repositories included in this study have good representativeness, and their functions basically cover small repositories such as samply.MDR and CoMetaR. Therefore, the dimensions and indicators constructed by referring to these repositories can better cover general DE repositories and have more detailed content to be mined. Second, the ISO/IEC 11179 standard is an internationally used standard for the construction of DE repositories, and the FAIR principle is also a widely recognized data management and sharing guideline. Therefore, the dimensions and indicators constructed based on these 2 documents also have good applicability. Finally, in the process of constructing the analysis framework, we invited experts in data management and standardization to discuss and suggest the analysis framework. Simultaneously, the ISO/IEC 11179 standard provides specific definitions for the concept model of DEs and standardizes related management activities. Integrating of these 2 components in the analytical framework serves as the foundation for potential future research endeavors, allowing for further refinement of relevant standards and theories related to DE repositories.
Practical Implications
The practical significance of this study lies in its potential to drive the construction of DE repositories, facilitating a more robust implementation of the FAIR principles during the construction and management processes. This, in turn, contributes to a more substantial role in the data-driven advancement of medicine. For DE repository administrators, this study’s findings assist them in understanding the repository’s strengths and limitations, offering the necessary information for further improvements to the repository.
In addition, the integrated information on DE repositories from this research may hold practical implications for individuals involved in medical informatics research. For clinical research data managers, this information can assist them in gaining a better understanding of DE repositories. They can use this knowledge to make informed choices regarding suitable repositories and reuse DEs, reducing redundant design work in the clinical research process. For computer experts developing medical information systems, this research encompasses resource organization and management information from multiple repositories, along with service design offered by web apps. This can reference the top-level structure of DE repositories within their respective institutes.
Conclusions and Limitations
Medical DEs enhance data quality, foster data reuse, and maximize the value of data in the era of health big data. They also form the foundational basis for AI-based medical systems. This study, using a constructed multidimensional analytical framework, evaluates and analyzes the current state of construction of typical medical DE repositories. It summarizes the characteristics of different repositories and provides recommendations based on identified issues. This study’s findings can promote the broader application of DE repositories, ensuring that DEs and repositories better serve clinical and medical research needs. Furthermore, this research can have applications in medical knowledge organization, and semantic representation, thus contributing to the development of AI technologies in medicine.
This study also has some limitations and areas for future improvement. First, the study had limited inclusion of databases, focusing solely on comprehensive, noncommercial DE repositories, all in the English language. Smaller or domain-specific repositories may have been overlooked. Furthermore, the data primarily came from repository websites and literature, with little attention given to other sources such as social media accounts. This approach might have missed some of the latest updates or changes. Therefore, future research will consider expanding the scope to include more repositories for analysis, relaxing constraints related to quantity and language. In addition, efforts will be made to enhance the generality of the analysis framework and develop a practical model for DE repositories.
Acknowledgments
This work was supported by the Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences Program (grant 2021-I2M-1-057).
Conflicts of Interest
None declared.
Complete overview of the analysis framework.
DOCX File , 24 KBDetails of the FAIR principle.
DOCX File , 16 KBReferences
- Schofield PN, Bubela T, Weaver T, Portilla L, Brown SD, Hancock JM, et al. CASIMIR Rome Meeting participants. Post-publication sharing of data and tools. Nature. Sep 10, 2009;461(7261):171-173. [FREE Full text] [CrossRef] [Medline]
- Dugas M, Hegselmann S, Riepenhausen S, Neuhaus P, Greulich L, Meidt A, et al. Compatible data models at design stage of medical information systems: leveraging related data elements from the MDM portal. Stud Health Technol Inform. Aug 21, 2019;264:113-117. [CrossRef] [Medline]
- Courtot M, Cherubin L, Faulconbridge A, Vaughan D, Green M, Richardson D, et al. BioSamples database: an updated sample metadata hub. Nucleic Acids Res. Jan 08, 2019;47(D1):D1172-D1178. [FREE Full text] [CrossRef] [Medline]
- Vempati U, Chung C, Mader C, Koleti A, Datar N, Vidović D, et al. Metadata standard and data exchange specifications to describe, model, and integrate complex and diverse high-throughput screening data from the Library of Integrated Network-based Cellular Signatures (LINCS). J Biomol Screen. Jun 2014;19(5):803-816. [FREE Full text] [CrossRef] [Medline]
- Pacheco AGC, Krohling RA. An attention-based mechanism to combine images and metadata in deep learning models applied to skin cancer classification. IEEE J Biomed Health Inform. Sep 2021;25(9):3554-3563. [CrossRef] [Medline]
- Olar A, Biricz A, Bedőházi Z, Sulyok B, Pollner P, Csabai I. Automated prediction of COVID-19 severity upon admission by chest X-ray images and clinical metadata aiming at accuracy and explainability. Sci Rep. Mar 14, 2023;13(1):4226. [FREE Full text] [CrossRef] [Medline]
- Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. Mar 15, 2016;3:160018. [FREE Full text] [CrossRef] [Medline]
- Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, et al. Data sharing by scientists: practices and perceptions. PLoS One. 2011;6(6):e21101. [FREE Full text] [CrossRef] [Medline]
- Stellmach C, Hopff SM, Jaenisch T, Nunes de Miranda SM, Rinaldi E, NAPKON‚ LEOSS‚ ORCHESTRA‚ReCoDID Working Groups. Creation of standardized common data elements for diagnostic tests in infectious disease studies: semantic and syntactic mapping. J Med Internet Res. Jun 10, 2024;26:e50049. [FREE Full text] [CrossRef] [Medline]
- Kush RD, Warzel D, Kush MA, Sherman A, Navarro EA, Fitzmartin R, et al. FAIR data sharing: The roles of common data elements and harmonization. J Biomed Inform. Jul 2020;107:103421. [FREE Full text] [CrossRef] [Medline]
- Pahuja G. Comparative study of metadata standards and metadata repositories. 2011. Presented at: 2ND International Conference on Methods and Models in Science and Technology; 2011 November 19-20; Jaipur, India. [CrossRef]
- Stausberg J, Harkener S, Burgmer M, Engel C, Finger R, Heinz C, et al. Metadata definition in registries: what is a data element? Stud Health Technol Inform. May 25, 2022;294:174-178. [CrossRef] [Medline]
- Berenspöhler S, Minnerup J, Dugas M, Varghese J. Common data elements for meaningful stroke documentation in routine care and clinical research: retrospective data analysis. JMIR Med Inform. Oct 12, 2021;9(10):e27396. [FREE Full text] [CrossRef] [Medline]
- Sheehan J, Hirschfeld S, Foster E, Ghitza U, Goetz K, Karpinski J, et al. Improving the value of clinical research through the use of Common Data Elements. Clin Trials. Dec 2016;13(6):671-676. [FREE Full text] [CrossRef] [Medline]
- Zare S, Meidani Z, Ouhadian M, Akbari H, Zand F, Fakharian E, et al. Identification of data elements for blood gas analysis dataset: a base for developing registries and artificial intelligence-based systems. BMC Health Serv Res. Mar 08, 2022;22(1):317. [FREE Full text] [CrossRef] [Medline]
- Hirji SA, Salenger R, Boyle EM, Williams J, Reddy VS, Grant MC, et al. Expert consensus of data elements for collection for enhanced recovery after cardiac surgery. World J Surg. Apr 2021;45(4):917-925. [CrossRef] [Medline]
- Grinnon ST, Miller K, Marler JR, Lu Y, Stout A, Odenkirchen J, et al. National Institute of Neurological Disorders and Stroke Common Data Element Project - approach and methods. Clin Trials. Jun 2012;9(3):322-329. [FREE Full text] [CrossRef] [Medline]
- Gay K, Collie D, Sheikh M, Saver J, Warach S, Wright C, et al. National Institute of Neurological Disorders and Stroke Common Data Elements: Stroke Version 2.0 Recommendations. Stroke. Mar 2021;52(Suppl_1):52. [CrossRef]
- Vemulapalli S, Simonato M, Ben Yehuda O, Wu C, Feldman T, Popma JJ, et al. Minimum core data elements for transcatheter mitral therapies: scientific statement by PASSION CV, HVC, and TVTR. JACC Cardiovasc Interv. Jun 26, 2023;16(12):1437-1447. [FREE Full text] [CrossRef] [Medline]
- Boesch RP, de Alarcon A, Piccione J, Prager J, Rosen R, Sidell DR, et al. Aerodigestive Research Collaborative. Consensus on triple endoscopy data elements preparatory to development of an aerodigestive registry. Laryngoscope. Nov 2022;132(11):2251-2258. [CrossRef] [Medline]
- Kim HH, Park YR, Lee S, Kim JH. Composite CDE: modeling composite relationships between common data elements for representing complex clinical data. BMC Med Inform Decis Mak. Jul 03, 2020;20(1):147. [FREE Full text] [CrossRef] [Medline]
- Fitzgerald MP, Kaufman MC, Massey SL, Fridinger S, Prelack M, Ellis C, CHOP Pediatric Epilepsy Program Collaborative, et al. Assessing seizure burden in pediatric epilepsy using an electronic medical record-based tool through a common data element approach. Epilepsia. Jul 2021;62(7):1617-1628. [FREE Full text] [CrossRef] [Medline]
- Vest JR, Adler-Milstein J, Gottlieb LM, Bian J, Campion TR, Cohen GR, et al. Assessment of structured data elements for social risk factors. Am J Manag Care. Jan 01, 2022;28(1):e14-e23. [FREE Full text] [CrossRef] [Medline]
- Chen EK, Edelen MO, McMullen T, Ahluwalia SC, Dalton SE, Paddock S, et al. Developing standardized patient assessment data elements for Medicare post-acute care assessments. J Am Geriatr Soc. Apr 2022;70(4):981-990. [CrossRef] [Medline]
- Ryan ME, Warmin A, Binstadt BA, Correll CK, Hause E, Hobday P, Pediatric Rheumatology Care, et al. Outcomes Improvement Network. Capturing critical data elements in Juvenile Idiopathic Arthritis: initiatives to improve data capture. Pediatr Rheumatol Online J. Sep 29, 2022;20(1):83. [FREE Full text] [CrossRef] [Medline]
- Wyles CC, Fu S, Odum SL, Rowe T, Habet NA, Berry DJ, et al. External validation of natural language processing algorithms to extract common data elements in THA operative notes. J Arthroplasty. Oct 2023;38(10):2081-2084. [CrossRef] [Medline]
- Fu S, Wyles CC, Osmon DR, Carvour ML, Sagheb E, Ramazanian T, et al. Automated detection of periprosthetic joint infections and data elements using natural language processing. J Arthroplasty. Feb 2021;36(2):688-692. [FREE Full text] [CrossRef] [Medline]
- Han P, Fu S, Kolis J, Hughes R, Hallstrom BR, Carvour M, et al. Multicenter validation of natural language processing algorithms for the detection of common data elements in operative notes for total hip arthroplasty: algorithm development and validation. JMIR Med Inform. Aug 31, 2022;10(8):e38155. [FREE Full text] [CrossRef] [Medline]
- Renner R, Li S, Huang Y, van der Zijp-Tan AC, Tan S, Li D, et al. Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner. BMC Med Inform Decis Mak. Dec 23, 2019;19(Suppl 7):276. [FREE Full text] [CrossRef] [Medline]
- Rajamohan AG, Patel V, Sheikh-Bahaei N, Liu CJ, Go JL, Kim PE, et al. Common data elements in head and neck radiology reporting. Neuroimaging Clin N Am. Aug 2020;30(3):379-391. [CrossRef] [Medline]
- Littlefield A, Cooke J, Bagge C, Glenn C, Kleiman E, Jacobucci R, et al. Machine learning to classify suicidal thoughts and behaviors: implementation within the common data elements used by the military suicide research consortium. Clinical Psychological Science. Mar 15, 2021;9(3):467-481. [CrossRef]
- Stausberg J, Löbe M, Verplancke P, Drepper J, Herre H, Löffler M. Foundations of a metadata repository for databases of registers and trials. Stud Health Technol Inform. 2009;150:409-413. [CrossRef] [Medline]
- Hegselmann S, Storck M, Gessner S, Neuhaus P, Varghese J, Bruland P, et al. Pragmatic MDR: a metadata repository with bottom-up standardization of medical metadata through reuse. BMC Med Inform Decis Mak. May 17, 2021;21(1):160. [FREE Full text] [CrossRef] [Medline]
- Mallya P, Stevens LM, Zhao J, Hong C, Henao R, Economou-Zavlanos N, et al. Facilitating harmonization of variables in Framingham, MESA, ARIC, and REGARDS studies through a metadata repository. Circ Cardiovasc Qual Outcomes. Nov 2023;16(11):e009938. [CrossRef] [Medline]
- Wiedekopf J, Ulrich H, Drenkhahn C, Kock-Schoppenhauer A, Ingenerf J. TermiCron - Bridging the Gap Between FHIR Terminology Servers and Metadata Repositories. Stud Health Technol Inform. Jun 06, 2022;290:71-75. [CrossRef] [Medline]
- Schladetzky J, Kock-Schoppenhauer A, Drenkhahn C, Ingenerf J, Wiedekopf J. Mettertron - bridging metadata repositories and terminology servers. Stud Health Technol Inform. Sep 12, 2023;307:243-248. [CrossRef] [Medline]
- Yuan J, Li H. Research on standardization of semantic relation and ontology representation based on MDR. 2022. Presented at: 2022 IEEE 8th International Conference on Computer and Communications (ICCC); 2022 December 09-12:1490-1494; Chengdu, China. [CrossRef]
- Juárez D, Schmidt EE, Stahl-Toyota S, Ückert F, Lablans M. A generic method and implementation to evaluate and improve data quality in distributed research networks. Methods Inf Med. Sep 2019;58(2-03):86-93. [FREE Full text] [CrossRef] [Medline]
- Kapsner LA, Mang JM, Mate S, Seuchter SA, Vengadeswaran A, Bathelt F, et al. Linking a consortium-wide data quality assessment tool with the MIRACUM Metadata Repository. Appl Clin Inform. Aug 2021;12(4):826-835. [FREE Full text] [CrossRef] [Medline]
- Ulrich H, Kern J, Tas D, Kock-Schoppenhauer AK, Ückert F, Ingenerf J, et al. QLMDR: a GraphQL query language for ISO 11179-based metadata repositories. BMC Med Inform Decis Mak. Mar 18, 2019;19(1):45. [FREE Full text] [CrossRef] [Medline]
- Sasse J, Darms J, Fluck J. Semantic metadata annotation services in the biomedical domain—a literature review. Applied Sciences. Jan 13, 2022;12(2):796. [CrossRef]
- Stöhr MR, Günther A, Majeed RW. The Collaborative Metadata Repository (CoMetaR) web app: quantitative and qualitative usability evaluation. JMIR Med Inform. Nov 29, 2021;9(11):e30308. [FREE Full text] [CrossRef] [Medline]
- Reichenpfader D, Glauser R, Dugas M, Denecke K. Assessing and improving the usability of the medical data models portal. Stud Health Technol Inform. Jun 23, 2020;271:199-206. [CrossRef] [Medline]
- Kadioglu D, Breil B, Knell C, Lablans M, Mate S, Schlue D, et al. Samply.MDR—a metadata repository and its application in various research networks. Stud Health Technol Inform. 2018;253:50-54. [FREE Full text] [CrossRef] [Medline]
- MDM.Portal. URL: https://medical-data-models.org/ [accessed 2023-04-07]
- CoMetaR. URL: https://data.dzl.de/cometar/web/ [accessed 2023-04-11]
- CentraXX MDR. URL: https://www.toolpool-gesundheitsforschung.de/produkte/centraxx [accessed 2023-04-12]
- CancerGrid (2005-2010). URL: https://www.cs.ox.ac.uk/projects/cancergrid/ [accessed 2023-04-12]
- Metadata Online Registry. URL: https://meteor.aihw.gov.au/content/181414 [accessed 2023-04-16]
- Aristotle Metadata Registry. URL: https://www.aristotlemetadata.com/ [accessed 2023-04-21]
- caDSR. URL: https://cadsr.cancer.gov/onedata/Home.jsp [accessed 2023-04-22]
- United States Health Information Knowledgebase (USHIK). URL: https://www.ahrq.gov/data/ushik.html [accessed 2023-04-27]
- NIH Common Data Elements (CDE) Repository. URL: https://cde.nlm.nih.gov/ [accessed 2023-04-12]
- CEDAR. URL: https://metadatacenter.org/ [accessed 2023-04-10]
- Nadkarni PM, Brandt CA. The Common Data Elements for cancer research: remarks on functions and structure. Methods Inf Med. 2006;45(6):594-601. [FREE Full text] [CrossRef] [Medline]
- O'Connor MJ, Warzel DB, Martínez-Romero M, Hardi J, Willrett D, Egyedi AL, et al. Unleashing the value of common data elements through the CEDAR workbench. AMIA Annu Symp Proc. 2019;2019:681-690. [FREE Full text] [Medline]
- Dugas M, Meidt A, Neuhaus P, Storck M, Varghese J. ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository. BMC Med Res Methodol. Jun 01, 2016;16:65. [FREE Full text] [CrossRef] [Medline]
- Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clin Epidemiol. 2018;10:961-970. [FREE Full text] [CrossRef] [Medline]
- Waithira N, Mutinda B, Cheah PY. Data management and sharing policy: the first step towards promoting data sharing. BMC Med. Apr 17, 2019;17(1):80. [FREE Full text] [CrossRef] [Medline]
- Paltoo DN, Rodriguez LL, Feolo M, Gillanders E, Ramos EM, Rutter JL, et al. National Institutes of Health Genomic Data Sharing Governance Committees. Data use under the NIH GWAS data sharing policy and future directions. Nat Genet. Sep 2014;46(9):934-938. [FREE Full text] [CrossRef] [Medline]
- Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, et al. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. Jul 2009;37(Web Server issue):W170-W173. [FREE Full text] [CrossRef] [Medline]
- Côté RG, Jones P, Apweiler R, Hermjakob H. The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. Feb 28, 2006;7:97. [FREE Full text] [CrossRef] [Medline]
- Atchinson B, Fox DM. The politics of the Health Insurance Portability and Accountability Act. Health Aff (Millwood). 1997;16(3):146-150. [CrossRef] [Medline]
Abbreviations
AI: artificial intelligence |
API: application programming interface |
caDSR: Cancer Data Standards Registry and Repository |
CDE: Common Data Element |
CEDAR: Center for Expanded Data Annotation and Retrieval |
CSV: Comma Separated Values |
DE: data element |
DSS: data set specification |
FAIR: Findable, Accessible, Interoperable, and Reusable |
GI: Glossary Item |
ISO/IEC: International Organization for Standardization/International Electrotechnical Commission |
LOINC: Logical Observation Identifiers Names and Codes |
MDM-Portal: Portal of Medical Data Models |
METEOR: Metadata Online Registry |
NCBO: National Center for Biomedical Ontology |
NCIT: National Cancer Institute Thesaurus |
NIH: National Institutes of Health |
NINDS: National Institute of Neurological Disorders and Stroke. |
ODM: Operational Data Model |
OWL: Web Ontology Language |
SNOMED CT: Systematized Nomenclature of Medicine—Clinical Terms |
UMLS: Unified Medical Language System |
Edited by C Lovis; submitted 11.05.24; peer-reviewed by C Gaudet-Blavignac, AJ Ponsero; comments to author 16.06.24; revised version received 07.07.24; accepted 21.07.24; published 30.09.24.
Copyright©Zhengyong Hu, Anran Wang, Yifan Duan, Jiayin Zhou, Wanfei Hu, Sizhu Wu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.09.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.