This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
There is wide recognition that the lack of health data interoperability has significant impacts. Traditionally, health data standards are complex and test-driven methods played important roles in achieving interoperability. The Health Level Seven International (HL7) standard Fast Healthcare Interoperability Resources (FHIR) may be a technical solution that aligns with policy, but systems need to be validated and tested.
Our objective is to explore the question of whether or not the regular use of validation and testing tools improves server compliance with the HL7 FHIR specification.
We used two independent validation and testing tools, Crucible and Touchstone, and analyzed the usage and result data to determine their impact on server compliance with the HL7 FHIR specification.
The use of validation and testing tools such as Crucible and Touchstone are strongly correlated with increased compliance and “practice makes perfect.” Frequent and thorough testing has clear implications for health data interoperability. Additional data analysis reveals trends over time with respect to vendors, use cases, and FHIR versions.
Validation and testing tools can aid in the transition to an interoperable health care infrastructure. Developers that use testing and validation tools tend to produce more compliant FHIR implementations. When it comes to health data interoperability, “practice makes perfect.”
Despite the relatively rapid nationwide adoption of electronic health records (EHRs), the industry’s ability to successfully exchange computable health data has not kept pace. A recent study found that less than 35% of providers report data exchange with other providers within the same organization or affiliated hospitals. The exchange of data across organizations is even more limited, with less than 14% of providers reporting they exchange data with providers in other organizations or unaffiliated hospitals [
As indicated by the JASON report [
To date, the health care community has produced and tolerated data standards that are complex, difficult to understand, and technically challenging to consistently implement and test. While well meaning, such standardization efforts have advanced interoperability only so far and, at the same time, stifled innovation due to high custom development and maintenance costs. In one such situation, MITRE has previously demonstrated in the domain of clinical quality measurement that a test-driven approach can successfully establish a framework for interoperability using national health care standards [
The Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR) standard [
Although the Office of the National Coordinator for Health Information Technology’s (ONC) 2015 Edition Health IT Certification Criteria includes an API certification criterion—45 United States Code of Federal Regulations 170.315(g)(8) and (g)(9)—that is well suited for FHIR implementation, no government regulations require health IT developers to conform to any published version of the FHIR standard [
In 2015, the ONC published the document, Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap (the Roadmap) [
The 21st Century Cures Act (the Cures Act), Public Law 114-255 [
Importantly, and relevant to this paper, the Cures Act includes two provisions within the conditions of certification related to APIs. First, it charges ONC to require that health IT developers publish APIs that can enable health information to be accessed, exchanged, and used “without special effort.” Second, it charges ONC with requiring that health IT developers successfully test the real-world use of their certified technology for interoperability in the type of setting in which the technology is marketed. Taken together, these two statutory requirements signal a growing need for the industry to coalesce and invest in API-testing capacity.
To meet the requirements expressed within the Cures Act, health IT developers need substantive tools to validate and test system conformity to the FHIR specification. Furthermore, the consistent implementation of FHIR will help enable an open and innovation-friendly ecosystem that can make data exchange more efficient and reduce interface costs. Both Crucible and Touchstone projects represent production-ready testing platforms for the FHIR specification, which can immediately be leveraged by industry to support their needs for FHIR-based testing [
Other available testing tools include Sprinkler, an open-source project developed by Firely, that tested FHIR servers with a web-based application [
The objective of this research was to examine whether or not the use of validation and test tools, specifically Crucible and Touchstone, had any impact on vendor compliance with the FHIR specification and, by extension, interoperability.
Two independent projects—MITRE’s Crucible project and AEGIS.net’s Touchstone project—provide the capability to rigorously test servers against the FHIR specification. Such testing assures health IT developers and app developers that the standards have been consistently implemented and deployed. This kind of testing is essential to enable interoperable health IT solutions that can be used to deliver safer and more efficient health care.
Crucible is a set of open-source testing tools for HL7 International FHIR developed by MITRE through an internally funded research program. It is provided as a free and public service to the FHIR development community to promote correct FHIR implementations. Its capabilities include the testing of servers for conformance to the FHIR standard, scoring patient records for completeness, and generating synthetic patient data suitable for testing [
The Crucible tool has been used by FHIR developers in the health care information technology industry since 2015. Developers can test their FHIR implementations through the Crucible website [
There are three ways that Crucible can be used to test server compliance:
Server compliance tests may be manually run through the public instance of Crucible.
Server compliance tests of known servers are automated to run every 3 days through the public instance of Crucible.
Server compliance tests may be run on private instances of Crucible behind a private firewall.
In our analysis, this paper examines test results from manual and automated tests run through the public instance. Only manually run tests are considered as an indicator of system usage. Tests run on private instances of Crucible are not included, as that data is not available to the researchers.
Touchstone is an open-access platform which combines nearly 20 years of automated lab-based testing initiatives, most recently the cloud-based Test-as-a-Service Developers Integration Lab developed by AEGIS.net through internal research and development. By leveraging the experience gained and lessons learned supporting ONC onboarding participant organizations to the early stages of Nationwide Health Information Network and later hosting the Sequoia Project formal testing program for eHealth Exchange, AEGIS.net has advanced this test platform to address FHIR [
Touchstone has successfully been used by developers and quality assurance experts in health care information technology since 2015. Users can privately test their FHIR implementations by navigating to the Touchstone Project site [
In order to test for conformance and interoperability, Touchstone combines the following features in an open-access platform:
Testing both client applications and server implementations, while supporting peer-to-peer, multi-actor scenarios (ie, care coordination and workflow) in a unified testing approach.
Testing is based entirely on the FHIR Test Script Resource, allowing for crowdsourcing future test case development.
Multi-version FHIR support, which facilitates testing backwards compatibility and future-proofing systems and products to ensure a continuously interoperable ecosystem.
To gauge FHIR implementation conformance, this paper examines test results from manual and API-automated tests run through the public cloud instance of Touchstone. Only vendor-initiated tests against the cloud version of Touchstone are considered as an indicator of system usage. Tests run on private instances of Touchstone are not included as that data is not available to the researchers.
The FHIR specification was originally proposed as a new health care data and exchange standard in August 2011. The first official release as a Draft Standard for Trial Use (DSTU) was published on September 30, 2014. Subsequent official releases of the FHIR specification have occurred on a 1.5-2-year balloting cycle. The FHIR specification has rapidly evolved over a short number of years; until initial stabilization of the specification occurred with the release of the DSTU, the introduction of publicly available testing tools was not feasible. To that point, the Crucible and Touchstone platforms only became available starting in 2015 when the test execution results data used in this statistical analysis began to be collected. For this study, data from Crucible ranged from December 1, 2015, to May 31, 2017, and data from Touchstone ranged from September 27, 2015, to September 3, 2017.
Data was collected for this study through the usage of the Crucible and Touchstone projects. During the study period, software developers executed tests using both projects either autonomously or as part of a FHIR Connectathon. Both projects automatically collected usage data on the tests that they execute. This included the following: which FHIR server was under test, the version of FHIR being tested, which tests were being executed and how those tests map to the FHIR specification, the results of each test (eg, pass, fail, skip), as well as step-by-step interactions between the testing system and the target FHIR server (eg, every HTTP request including headers and body and every HTTP response including headers and body), and detailed introspection and checks of those results.
We wanted to know whether or not there was a relationship between testing and compliance. Therefore, we explored whether a statistically significant correlation could be found between the frequency with which vendors execute tests and their conformance with the FHIR specification. For this regression, servers were grouped together by vendor and as many vendors tested FHIR implementations using multiple servers. The number of manual tests executed was used as a measure of an organization’s usage level. The number of distinct test suites supported (ie, tests successfully passed) across all the servers was used to measure vendor performance. This metric is a good approximation of the number of features a vendor has implemented successfully and completely.
The number of tests executed were log-normalized to reflect decreasing marginal returns. This is because the most complex test suites tend to be implemented by developers last and require more implementation hours and testing. Regressing log tests executed against the number of supported suites gives a statistically significant (
A similar analysis for Touchstone shows a statistically significant (
These simple regressions—plotted in
Predicting suites passed by tests executed.
Predicting suites passed by tests executed (log scale).
The results of our data analysis indicate that as the frequency of testing or number of tests increases, the performance of a server against those tests increases. This should not be surprising as software developers will address issues and fix defects in order to pass the tests, so long as they are discovering these issues and defects by repeated testing. Assuming the tests accurately and adequately cover the depth and breadth of the FHIR specification, then FHIR servers developed and tested using these tests in a test-driven manner should more accurately adhere to the FHIR specification. If compatibility with FHIR equates to health data interoperability, then it seems that fair and neutral testing is critical to achieving that goal. Of course, health data interoperability is vastly more complex than FHIR alone; other factors include, but are not limited to, clinical terminologies, security and trust frameworks, clinical workflow compatibility, and financial incentives. But the correct implementation of software that adheres to the FHIR specification is a good first step to exchanging data.
As shown in
Using regular automated testing on known FHIR servers, Crucible can track the progress of a server over time. Because vendors often use temporary server URLs for testing purposes, to track the weekly progress of an individual vendor, we can aggregate the results of all known servers for that vendor and use the best results to track their progress implementing FHIR. Crucible’s tracking of one anonymized vendor’s Standard for Trial Use version 3 (STU3) servers is shown below in
Similarly, looking specifically at the top active anonymized users of Touchstone—Vendor A (188 uses), who started testing with Touchstone in February 2017, Vendor B (378 uses), Vendor C (321 uses), and Vendor D (207 uses)—there is evidence of both high use of Touchstone and improvement in their FHIR implementations. These implementations used Touchstone consistently during the study period, with their results progressively improving (ie, passing less than 20 tests initially to passing over 1000 tests). Touchstone’s TDD testing capabilities allowed these developers to implement their FHIR servers faster by finding errors and confirming the correctness of their implementations, including managing version upgrades.
It is important to note that Vendor A accomplished in 6 weeks what many organizations accomplish in 12-24 weeks, by leveraging TDD—and testing on a daily basis—and integrating continuous testing into their development lifecycle.
Test runs per week.
Vendor A STU3 (Standard for Trial Use version 3) servers.
During the study period there were 3253 identified user-initiated test executions on Crucible, of which 1970 included only a single test suite. Four of the top 10 test suites executed were Argonaut suites. The other commonly executed suites include the most general tests: reading, searching, history retrieval, and formatting, as well as the transaction and batch test. The FHIR patient-resource test was the most-used resource test since it is one of the most important and central resources in the FHIR specification.
Within Touchstone, there were 529,847 tests run during the study period. A total of 99,848 (18.8%) of the tests executed were specifically testing the FHIR Patient Resource, while 55,163 (10.4%) tested the terminology functionality. Touchstone also includes tests for HL7 Connectathon tracks, which comprised 125,720 (23.7%) of the tests run by volume, although many of these tests were most likely run outside of Connectathons.
The top tests executed on Crucible and Touchstone are listed in
FHIR is an evolving standard that has seen three major releases in the last 4 years and a dozen minor releases in the same time frame [
Crucible supports testing the last two major versions of FHIR, while Touchstone supports testing all point releases since FHIR 1.0.
Beyond providing the tools themselves, the Crucible and Touchstone teams have maintained considerable involvement with the FHIR development community by attending Connectathons sponsored by HL7, assisting the Argonaut group by providing tailor-made tests for their use cases, and in the case of Crucible, reaching out to the open-source community for involvement in the development of the software.
The Crucible development team has attended each HL7-sponsored FHIR Connectathon since Connectathon 8 in January 2015 through Connectathon 17 in January 2018. The AEGIS.net team has attended each HL7-sponsored FHIR Connectathon since Connectathon 4 in September 2013 and introduced Touchstone at Connectathon 10 in October 2015. Both Crucible and Touchstone develop and support a suite of tests for each Connectathon, specific to that event’s tracks. The Touchstone team regularly runs a “Developers Introduction to FHIR” session parallel to each Connectathon introducing FHIR and TDD.
Top tests executed by users.
Rank | Crucible | Touchstone | ||
Test ID | Number of executions | Test ID | Number of executions | |
1 | Argonaut Sprint 1 | 858 | Patient Resource Test | 132,328 |
2 | Read Test | 664 | ValueSet Resource Test | 59,162 |
3 | Argonaut Sprint 3 | 549 | Practitioner Resource Test | 26,282 |
4 | Argonaut Sprint 4 | 539 | Organization Resource Test | 22,124 |
5 | History001 | 476 | Location Resource Test | 12,243 |
6 | Search001 | 475 | Observation Resource Test | 11,592 |
7 | Format001 | 460 | Device Resource Test | 10,698 |
8 | Argonaut Sprint 5 | 451 | AllergyIntolerance Resource Test | 10,621 |
9 | Transaction and Batch Test | 447 | Appointment Resource Test | 10,232 |
10 | Patient Resource Test | 445 | Condition Resource Test | 9588 |
Touchstone usage by Fast Healthcare Interoperability Resources (FHIR) version.
Touchstone Fast Healthcare Interoperability Resources (FHIR) by version over time. DSTU2: Draft Standard for Trial Use version 2; STU3: Standard for Trial Use version 3.
Anonymized performance of Argonaut members at the completion of each Argonaut Sprint.
Vendor | Sprint | Resprint | Sprint | Argonaut Connectathon tests | |||||||
1 | 2 | 3 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
E | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
F | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
G | Pass | Fail | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Fail |
I | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Fail |
M | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
N | Pass | Fail | Pass | Fail | Fail | Fail | Pass | Pass | Pass | Pass | Pass |
O | Pass | Fail | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Fail |
P | Pass | Fail | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Fail |
Q | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
R | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
S | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail |
T | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
U | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
V | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail | Fail |
The Argonaut Project is a private sector initiative with the mission of advancing industry adoption of modern open interoperability standards. Its stated purpose is to “develop a first-generation FHIR-based API and Core Data Services specification to enable expanded information sharing for electronic health records and other health information technology using existing Internet standards and architectural patterns and styles” [
With Touchstone’s and Crucible’s missions to advance the adoption of the FHIR API, both teams collaborated with Argonaut vendors to develop a series of test suites to help them test their FHIR implementations. Crucible’s test suite results show almost all Argonaut members failed these test suites initially. However, as shown in
Electronic health records have structured and unstructured data. FHIR supports both of these data types: structured data using Resources and unstructured data using Binary and DocumentReference [
Crucible and Touchstone have proven to be valuable tools for the FHIR developer community. These tools can aid in the transition to an interoperable health care infrastructure by providing open reference implementations for FHIR testing and support future Cures Act requirements. Our research shows that developers that use testing and validation tools tend to produce more compliant FHIR implementations. The test data collected by MITRE and AEGIS.net during the study period shows that when it comes to health data interoperability, “practice makes perfect.” This gives us hope that a future with ubiquitous health care information interoperability is possible.
application programming interface
Draft Standard for Trial Use
electronic health record
Fast Healthcare Interoperability Resources
Health Level Seven International
Office of the National Coordinator for Health Information Technology
primary care physician
Standard for Trial Use version 3
Test-Driven-Development
MITRE research reported in this publication was supported by The MITRE Innovation Program (Approved for Public Release; Distribution Unlimited, Case Numbers 16-0597 and 17-3214-6) and the ONC, US Department of Health and Human Services, Washington, DC, USA.
JW, RS, and CD are employed by The MITRE Corporation, which funded the development of Crucible software. MH and RE are employed by AEGIS.net, which owns and develops the Touchstone software.
Crucible development was led by JW and RS. Touchstone development was led by RE and MH. Data analysis was completed by CD, MH, and JW. Policy input and review was provided by SP. All authors contributed to the writing and final approval of this manuscript.