Evaluating and Enhancing the Fitness-for-Purpose of Electronic Health Record Data: Qualitative Study on Current Practices and Pathway to an Automated Approach Within the Medical Informatics for Research and Care in University Medicine Consortium

Background Leveraging electronic health record (EHR) data for clinical or research purposes heavily depends on data fitness. However, there is a lack of standardized frameworks to evaluate EHR data suitability, leading to inconsistent quality in data use projects (DUPs). This research focuses on the Medical Informatics for Research and Care in University Medicine (MIRACUM) Data Integration Centers (DICs) and examines empirical practices on assessing and automating the fitness-for-purpose of clinical data in German DIC settings. Objective The study aims (1) to capture and discuss how MIRACUM DICs evaluate and enhance the fitness-for-purpose of observational health care data and examine the alignment with existing recommendations and (2) to identify the requirements for designing and implementing a computer-assisted solution to evaluate EHR data fitness within MIRACUM DICs. Methods A qualitative approach was followed using an open-ended survey across DICs of 10 German university hospitals affiliated with MIRACUM. Data were analyzed using thematic analysis following an inductive qualitative method. Results All 10 MIRACUM DICs participated, with 17 participants revealing various approaches to assessing data fitness, including the 4-eyes principle and data consistency checks such as cross-system data value comparison. Common practices included a DUP-related feedback loop on data fitness and using self-designed dashboards for monitoring. Most experts had a computer science background and a master’s degree, suggesting strong technological proficiency but potentially lacking clinical or statistical expertise. Nine key requirements for a computer-assisted solution were identified, including flexibility, understandability, extendibility, and practicability. Participants used heterogeneous data repositories for evaluating data quality criteria and practical strategies to communicate with research and clinical teams. Conclusions The study identifies gaps between current practices in MIRACUM DICs and existing recommendations, offering insights into the complexities of assessing and reporting clinical data fitness. Additionally, a tripartite modular framework for fitness-for-purpose assessment was introduced to streamline the forthcoming implementation. It provides valuable input for developing and integrating an automated solution across multiple locations. This may include statistical comparisons to advanced machine learning algorithms for operationalizing frameworks such as the 3×3 data quality assessment framework. These findings provide foundational evidence for future design and implementation studies to enhance data quality assessments for specific DUPs in observational health care settings.

Department-and unit-specific clinical questions e.g., prediction of departmental sepsis and associations with specific treatment procedures/ICD diagnoses.
Other example: patient case-based analysis of multiple clinical complications associated with specific clinical and demographic characteristics.
Most queries through the cDWH and i2b2 repo.10 Mainly retrospective data analysis in pulmonology.
Analyzing is performed via DataSHIELD, therefore no direct query in data repositories.
(Indirect i2b2) Question 3: How are data use project-specific data quality (DQ) requirements collected from the perspective of data requesters at their DIC?

Sites-ID Feedbacks 1
During the data request, we advise that the requested data should be described as finegrained and exact as possible.If the data provided does not match the request, a "postprocessing" process will be initiated.

2
In general, the heads of the projects contact the transfer office/UAC office and clarify which data can be extracted and which variables are useful for a scientific evaluation and what should be considered ( specific conventions/documentation) 3 I am not sure exactly how the question is meant.In any case, the requested data are usually discussed at least once with the requester and quality-reducing aspects are worked out together, e.g.free text information, documentation practice in the respective data-providing institution (usually the requester comes from the same institution and knows it very well).4 Is not collected.

5
In interactive discussion with the researchers Environment at the moment still too heterogeneous for a standardized approach 6 In personal conversation during consultation.Formless communication to the transfer office of the DIC Question 5: What measures are taken at your location to communicate with data requesting sites about the quality of data provided for the specific purpose of the data use project, so that data requesters have opportunities to estimate the fitness of the data for the intended project?
Sites-ID Feedbacks 1 Creation of a transfer office.The transfer office communicates with the data requesting offices.After data provision, the transfer office inquiries about the satisfaction / suitability of the data with the data requesting office.After checking with the data requester site, this consults with the transfer office.If there are deficiencies in the quality of the data, the transfer office forwards this to the architects of the data.They contact the data requesting office directly in order to work out solutions together.

Conduct feasibility study Communicate mid-term results 3
This is done in direct dialog with the requester.(see also 3) 4 The DIC advises the data requesters individually.So far, there are only a few projects in which the DIC was not scientifically represented.

5
Overview dashboard in the self-developed data integration portal 6 Not relevant yet 7 Scope of the core data set vs. expectations in the context of a consultation.

Feasibility queries
Comparison with known data from the hospital vs. data set together with requesters Provision of a data dashboard for own queries by requesters 8 n/a 9 Delivery of the data with involvement of the data requesters First, the feasibility request determines to what extent the number of patients suitable for the planned project is available in sufficient amount Then the data are delivered by the data request administrator, who goes through the data to be delivered together with the data requester.
In case of change requests/incorrect quality in the data, the data selection queries are adjusted and validated and documented again via the 4-eyes principle This results in the feedback cycle: data requestor => data request administrator => internal data scientists => data request administrator => requestor Only in case of a complete match (from the data requestor's perspective) the final data delivery takes place.10 Plausibility check of the provided data together with researchers (physicians) before using the data for the analysis.
Use of the uniform data dictionary (metadata).
Verification of the data format or type, the number of variables via DataSHIELD before the analyses.
If it detects inconsistencies in the research data, it will cross-check them with the source system and identify problems Question 6: What would be their expectations/requirements for a fitness-for-use cross-site DQ framework that you could adopt in the future to measure DQ related to their data use projects?
Sites-ID Feedbacks 1 Implementation of a dashboard 2 Flexible organization of the DQ system Locally assessed DQ compared to sites Integrate project-specific data plausibility Understandability for the clinician and data scientist/statistician Fitness-for-use dashboard 3 Generally enough that it can be used in every DIZ and for every request.It should be pragmatic and easy to understand, so that it can always be used as a basic tool and its benefits are seen equally by all parties (data provider, data supplier, data requester).In the short term, it is limited to the essentials to be able to use it and gain experience.In the long term, it may even be possible to modularize it and thus use it only in parts.

4
Graphical representation over time (gaps, leaps in values).Graphical representation over time (gaps, leaps in values).

5
Integration of the already used resource-specific tracking of the datapath based on a unified system of FHIR business identifiers into the DQ system.6 These cannot yet be definitively determined 7 Complete non-interactive integration of the DQ process as an operation within the data pipelines for complete monitoring of the mapping of source and target systems with automatic machine-readable report generation ( no PDF ) Automated comparison of previous reports ( in the context of performed developments or updates ) Provision of a uniform template for documenting DQ and possibly also data requestor feedbacks in the context of project-related data deliveries across the DIZs Mapping and automation of DQ checks based on the specific data quality metrics  Data completeness: are there enough patients at the DIZ site to carry out the planned projects  Data plausibility: formulation & automation of general-transferable plausibility checks (e.g., no readmission after a death, ...) that could affect the outcomes of most DRs  Data conformity: uniform mapping and verification of conformity of ICD , OPS, LOINC codes, and adequate reporting in the systematics Structured Provenance Documentation:  where did the data come from,  what processing steps were performed on the data up to the time of data delivery,  Are there changes to the data that may represent a potential impact on the planned data use project?FHIR as a single target repository for the data requests (also needs to be coordinated across DIZ).Inclusion of i2b2 and OMOP as additional repositories depending on whether a specific repo is preferred/specified by the data request.10 Uniform FHIR profiles across MIRACUM partners.Standardization of LOINC mapping Uniform measurement units Survey questions, with collected feedbacks sorted by participating sites We expect significantly more requests in the following quarter, as our data catalog is still in the publication mechanism.The 20 therefore refer to requests that have already been generated even though the data catalog has not yet been published.)Qualifying research questions (doctoral dissertations etc.), quality assessment, proof of qualification, where so far mostly the mirror system of ORBIS serves as data repository; the mentioned i2b2/OMOP/FHIR mostly play a role only for MI-I-/MIRACUM-specific queries.
2Internal research queries, quality ensuring and reporting are mainly performed using the DWH.The i2b2/OMOP/FHIR repositories are mainly used for MI-I/MIRACUM specific requests.3 For project-specific validation, comparison of hit ratio from different systems created by an independent person: e.g.separate i2b2 SQL queries compared to FHIR/staging area/DWH queries, etc.Before data delivery/provision, mutual control (DIC internal as well as with clinicians) and official release of results by the head of the transfer office.