This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Infectious intestinal disease (IID) has considerable health impact; there are 2 billion cases worldwide resulting in 1 million deaths and 78.7 million disability-adjusted life years lost. Reported IID incidence rates vary and this is partly because terms such as “diarrheal disease” and “acute infectious gastroenteritis” are used interchangeably. Ontologies provide a method of transparently comparing case definitions and disease incidence rates.
This study sought to show how differences in case definition in part account for variation in incidence estimates for IID and how an ontological approach provides greater transparency to IID case finding.
We compared three IID case definitions: (1) Royal College of General Practitioners Research and Surveillance Centre (RCGP RSC) definition based on mapping to the Ninth International Classification of Disease (ICD-9), (2) newer ICD-10 definition, and (3) ontological case definition. We calculated incidence rates and examined the contribution of four supporting concepts related to IID: symptoms, investigations, process of care (eg, notification to public health authorities), and therapies. We created a formal ontology using ontology Web language.
The ontological approach identified 5712 more cases of IID than the ICD-10 definition and 4482 more than the RCGP RSC definition from an initial cohort of 1,120,490. Weekly incidence using the ontological definition was 17.93/100,000 (95% CI 15.63-20.41), whereas for the ICD-10 definition the rate was 8.13/100,000 (95% CI 6.70-9.87), and for the RSC definition the rate was 10.24/100,000 (95% CI 8.55-12.12). Codes from the four supporting concepts were generally consistent across our three IID case definitions: 37.38% (3905/10,448) (95% CI 36.16-38.5) for the ontological definition, 38.33% (2287/5966) (95% CI 36.79-39.93) for the RSC definition, and 40.82% (1933/4736) (95% CI 39.03-42.66) for the ICD-10 definition. The proportion of laboratory results associated with a positive test result was 19.68% (546/2775).
The standard RCGP RSC definition of IID, and its mapping to ICD-10, underestimates disease incidence. The ontological approach identified a larger proportion of new IID cases; the ontology divides contributory elements and enables transparency and comparison of rates. Results illustrate how improved diagnostic coding of IID combined with an ontological approach to case definition would provide a clearer picture of IID in the community, better inform GPs and public health services about circulating disease, and empower them to respond. We need to improve the Pathology Bounded Code List (PBCL) currently used by laboratories to electronically report results. Given advances in stool microbiology testing with a move to nonculture, PCR-based methods, the way microbiology results are reported and coded via PBCL needs to be reviewed and modernized.
The burden of infectious intestinal disease (IID) is considerable. The World Health Organization (WHO) estimated that foodborne disease from 22 pathogens accounted for 22 diseases resulted in 2 billion cases, over 1 million deaths, and 78.7 million disability-adjusted life years in 2010 [
Reported incidence rates for IID vary between 0.5% and 20% annually in the developed world [
Published variations may also be caused by imprecise or interchangeable use of the terms such as “diarrheal disease,” “acute infectious gastroenteritis” and “IID” and differing methods for describing cases, underscoring the importance of transparency when defining the disease [
Ontologies provide a method of systematically and transparently defining concepts and their relationships. They are used to clarify case finding and more accurately calculate disease incidence based on disease definitions that balance sensitivity and specificity [
UK general practice is highly computerized. Electronic registration–based systems ensure accurate denominators, and data from general practice provide opportunities for health research [
We aimed to test new technologies that provide general practitioners near real-time test results for a wide range of pathogens associated with IID [
We reviewed common IID case definitions published in the literature and standard coding systems used to record IID diagnoses in primary care settings and chose three IID definitions (
RSC definition
Based on WHO’s International Classification of Diseases, ICD-8/9 versions, infectious intestinal diseases chapter
Used for RCGP RSC weekly returns report
Includes all codes falling into the infectious intestinal disease group of infectious and parasitic diseases within the concept hierarchies of the 5-byte Version 2 read Code system (A00-A09 codes)
ICD-10 definition
Based on WHO’s International Classification of Diseases, ICD-10 version, infectious intestinal diseases chapter, all of which fall within A00-A09 chapter. More limited than the RSC definition
Subset of ICD-8/9 and RSC definition due to exclusion of codes such as Helicobacter, nonintestinal Salmonella infections, Astrovirus, Calicivirus, and redundant codes
Ontological definition
Based on IID case definition used during the Second Study of Infectious Intestinal Disease in the Community (IID2 Study)
Includes all codes within the restricted ICD-10 definition, plus additional diagnostic codes that directly or partially map to IID2 case definition even though they fall outside of A00-A09 infectious intestinal disease group. Investigation and process of care codes that directly map to the case definition are also included
The codes do not all necessarily fall into the A00-A09 infectious intestinal disease hierarchy used by the RSC and ICD-10 to define IID. This definition was based on the established case definition and was developed using an ontological approach designed to include all definite and possible IID cases recorded by clinicians in the RSC network
In the United Kingdom, the RCGP RSC case definition used for calculating weekly incidence of IID for the RSC’s weekly communicable and respiratory disease report is the established “gold standard” for surveillance [
For the ontological case definition of IID, we selected the more restrictive, well-documented case definition used during the Second Study of Infectious Intestinal Disease in the Community (IID2 Study), an extensively published, longitudinal study of IID incidence carried out in UK primary care [
We used a three-level approach previously developed by the University of Surrey to establish an ontology based on IID case definition [
The design of the ontology followed the structure used in problem-orientated records (POMR) and their associated coding system. This has its roots in the work of Lawrence Weed who created the idea of separating subjective (history) from objective (findings) and analysis (often diagnosis or problem) from plan (prescription or treatment). This is known internationally as Weed’s SOAP [
We applied the ontology to the Read Code list by searching for codes indicative of IID diagnosis and mapped each into one of the following three classes [
Codes that refer to chronic conditions or non-intestinal conditions were defined as not mapping to IID and were excluded (eg,
This study used primary care data recorded during a 52-week period spanning July 2014-July 2015 from the RCGP RSC, a sentinel network representative of the English population [
We calculated case numbers and incidence rates for the three IID definitions. When clinicians record a diagnosis, they assign episode type, which differentiates incident (first, new) cases from prevalent (follow-up, ongoing) cases. Records with “first” or “new” episode types were counted when counting cases and calculating incidence rates using diagnostic codes. When cases were found using directly mapping investigation and process care codes, all episode types were included because it is not standard clinical practice to code these events as “first” or “new.” Patients with excessive IID diagnostic records (>4 per year) were excluded from case counts as they likely had chronic gastrointestinal conditions, although this represented fewer than 10 people over the one-year study period.
We further investigated differences between case definitions and the validity of using an ontological case definition by searching patients who had been already counted as a case for codes relating to four supporting concepts: (1) symptoms (diarrhea, vomiting, and fever), (2) pathology investigations (stool sample sent to laboratory, to test for specific pathogens), (3) process of care (notification of dysentery or food poisoning), and (4) therapies (loperamide or oral rehydration therapy). We used a 2-week sliding window due to IID’s acute nature: all events for supporting concepts recorded with any episode type had to occur 2 weeks before or after the patient’s diagnosis event to be included. In addition, multiple events coded for any one factor within the 2-week window (eg, three investigation codes in one week) were counted as one event. Complete code lists for supporting factors are presented in
The “Integrate” study received a favorable ethical opinion from the NHS NRES Committee North West-Greater Manchester East (Ref: 15/NW/0233). Patient-level data were automatically extracted and pseudonymized at the point of extraction. Data were stored at the University of Surrey Clinical Informatics and Health Outcomes Research Group data and analysis hub such that patients could not be identified from records used during the study.
We identified an initial cohort (N=1,120,490) used to count cases and calculate incidence rates from the RCGP RSC population among all registered patients with at least one recorded event during a 52-week period spanning ISO 2014-W30 to ISO 2015-W29.
The results of the ontology can be found online (http://webprotege.stanford.edu) under the title “IID infectious intestinal disease ontology.”
Use of the ICD-10 case definition identified 4736 cases of IID within the cohort, compared with 5966 cases found with the RSC definition (
Application of the ICD-10 definition when selecting Read codes resulted in a more limited code list (90 codes in ICD-9 reduced to 70 codes in ICD-10). This reduction is due to the removal of codes for
A key difference between ICD-10 and RSC weekly report code lists was the inclusion of
Using the ontological approach, we identified 5712 more cases than the ICD-10 definition and 4482 more cases than the RSC definition within the same cohort (
Counts of additional ontological cases by code type (number of additional ontological cases not included in other case definitions=5712, data for period ISO 2014-W30 to ISO 2015-W29).
Code type | Code | Count of cases | Additional ontological cases (percentage) |
Gastroenteritis, toxic gastroenteritis | J43-1 J43..11 | 4399 | 77.0 |
Diarrhea and vomiting | 19G% | 582 | 10.2 |
Clostridium difficile infection | A3Ay2% | 145 | 2.5 |
Direct pathology investigation | Multiple; see |
546 | 9.6 |
Direct process of care | 65V1%, 65V2% | 29 | 0.5 |
IID incidence and case counts (Data for period ISO 2014-W30 to ISO 2015-W29, weekly denominator N=1,120,490).
Definition | Count of cases | Annual person-time rates (per 1000 person-time units) |
Standard RSC | 5966 | 5.32 (95% CI 5.19-5.46) |
ICD-10 | 4736 | 4.23 (95% CI 4.11-4.35) |
Ontological | 10,448 | 9.32 (95% CI 9.15-9.50) |
Mean weekly incidence rates and case counts (Data for period ISO 2014-W30 to ISO 2015-W29, weekly denominator N=1,120,490).
Definition | Mean weekly count of cases | Incidence rate (per 100,000/week) |
Standard RSC | 114.73 | 10.24 (95% CI 8.55-12.12) |
ICD-10 | 91.08 | 8.13 (95% CI 6.70-9.87) |
Ontological | 200.92 | 17.93 (95% CI 15.63-20.41) |
Using the ontological definition for case finding resulted in an annual percentage incidence rate of 0.93% (10,448/1,120,490) compared with 0.42% (4736/1,120,490) under the ICD-10 definition and 0.53% (5966/1,120,490) under the RSC definition. Annual person-time rate per 1000 person-time units for the standard RSC definition was 5.32 (95% CI 5.19-5.46), for the ICD-10 definition was 4.23 (95% CI 4.11-4.35), and for the ontological definition was 9.32 (95% CI 9.15-9.50;
Mean weekly incidence rate was 10.24 per 100,000 (95% CI 8.55-12.12) for the RSC definition, 8.13 per 100,000 (95% CI 6.70-9.87) for the ICD-10 definition, and 17.93 per 100,000 (95% CI 15.63-20.41) for the ontological definition (
Event counts of four supporting concepts within the 2-week period preceding or following case finding were consistent across IID definitions (
Consistency of results supports the use of the ontological definition, as supporting concept codes are specific to acute IID. For the three definitions, majority of cases (61.67% [3679/5966], 59.18% [2803/4736], and 62.62% [6543/10,448]) had no supporting concepts recorded within the 2-week sliding window. In addition, proportion of laboratory results associated with positive test results (ie, directly mapping to IID case definition) was 19.7% (546/2775).
Counts of supporting factors for RSC defined cases (N=5966).
Code category | Number of events coded | Percentage of RSC cases |
Symptoms | 295 | 4.94 |
Investigations | 708 | 11.87 |
Therapies | 705 | 11.82 |
Process of care | 12 | 0.20 |
Symptoms and investigations | 207 | 3.47 |
Symptoms and therapies | 70 | 1.17 |
Symptoms and process of care | 2 | 0.03 |
Investigations and therapies | 147 | 2.46 |
Investigations and process of care | 32 | 0.54 |
Therapies and process of care | 1 | 0.02 |
Symptoms, investigations, and therapies | 83 | 1.39 |
Symptoms, investigations, and process of care | 14 | 0.23 |
Symptoms, therapies, and process of care | 0 | 0.00 |
Investigations, therapies, and process of care | 8 | 0.13 |
All supporting concepts | 3 | 0.05 |
Number of cases with any of the above | 2287 | 38.33 |
Number of cases with none of the above | 3679 | 61.67 |
Counts of supporting factors for ICD-10 defined cases (N=4736).
Code category | Number of events coded | Percentage of ICD-10 cases |
Symptoms | 235 | 4.96 |
Investigations | 588 | 12.42 |
Therapies | 587 | 12.39 |
Process of care | 8 | 0.17 |
Symptoms and investigations | 195 | 4.12 |
Symptoms and therapies | 58 | 1.22 |
Symptoms and process of care | 2 | 0.04 |
Investigations and therapies | 129 | 2.72 |
Investigations and process of care | 30 | 0.63 |
Therapies and process of care | 0 | 0.00 |
Symptoms, investigations, and therapies | 76 | 1.60 |
Symptoms, investigations, and process of care | 14 | 0.30 |
Symptoms, therapies, and process of care | 0 | 0.00 |
Investigations, therapies, and process of care | 8 | 0.17 |
All supporting concepts | 3 | 0.06 |
Number of cases with any of the above | 1933 | 40.82 |
Number of cases with none of the above | 2803 | 59.18 |
Counts of supporting factors for cases defined ontologically (N=10,448).
Code category | Number of events coded | Percentage of ontological cases |
Symptoms | 632 | 6.05 |
Investigations | 1050 | 10.05 |
Therapies | 1337 | 12.80 |
Process of care | 10 | 0.10 |
Symptoms and investigations | 269 | 2.57 |
Symptoms and therapies | 188 | 1.80 |
Symptoms and process of care | 2 | 0.02 |
Investigations and therapies | 238 | 2.28 |
Investigations and process of care | 34 | 0.33 |
Therapies and process of care | 2 | 0.02 |
Symptoms, investigations, and therapies | 112 | 1.07 |
Symptoms, investigations, and process of care | 16 | 0.15 |
Symptoms, therapies, and process of care | 0 | 0.00 |
Investigations, therapies, and process of care | 12 | 0.11 |
All supporting concepts | 3 | 0.03 |
Number of cases with any of the above | 3905 | 37.38 |
Number of cases with none of the above | 6543 | 62.62 |
Total number of cases identified using three differing definitions of IID (RSC, ICD-10 and ontological). Cohort includes all registered patients in the RCGP RSC primary care database with at least one recorded event during a 52-week period spanning ISO 2014-W30 to ISO 2015-W29 (initial cohort, N=1120490).
Count of events found using codes for supporting factors (symptom, investigation, process of care, and/or therapy) for cases identified using the standard RCGP RSC IID definition (ISO 2014-W30 to ISO 2015-W29). For
Count of events found using codes for supporting factors (symptom, investigation, process of care, and/or therapy) for cases identified using the ICD-10 IID definition (ISO 2014-W30 to ISO 2015-W29).
Count of events found using codes for supporting factors (symptom, investigation, process of care, and/or therapy) for cases identified using the ontological IID definition (ISO 2014-W30 to ISO 2015-W29).
An ontological approach to IID case finding changed IID incidence rate, increasing case detection. The ontological approach is also more transparent and independent of coding systems.
The ontological approach may address elements of IID underestimation due to low rates of case finding using electronic data alone [
GPs appear more likely to enter symptom codes, which from the ontological perspective are less helpful as they overlap with other conditions rather than being specific to IID, unless the symptoms are supported by another code indicating pathogen detection [
An ontological approach provides insights into what types of data are available for case ascertainment. Although this approach offers benefits, and has limitations, our recommendation is to start by making the laboratory results recorded much more specific.
The mechanism for transferring results from stool sampling to GPs needs to be updated. Currently UK laboratories electronically report stool sample results to GPs using the Pathology Bounded Code List (PBCL), a subset of Read codes. However, there is no standardized algorithm for reporting results, and the PBCL code list for Microscopy, Culture & Sensitivity (MC&S) results has not kept pace with developments in pathology services. For example, typical laboratory protocol is to report one generic stool sampling code per test request, regardless of the range of pathogens being screened or detected, or of the sensitivity or specificity of the testing method. When a GP receives electronic results of a stool sample, the electronic report only contains generic MC&S Read codes, indicating that a stool sample was analyzed. This is followed by a “free-text” message (ie, not coded) indicating any detected pathogens. If pathogens are detected, the clinician must then code this information manually into the computerized medical record (CMR) system. This means that, inevitably, laboratory findings are under-coded. Furthermore, for some pathogens there is only one PBCL code specifically for test requests, not for recording results. Many IID pathogens have no designated PBCL code at all, and where appropriate pathogen codes are available, they are often not used. Given likely advances in stool microbiology testing in the future, with a move away from MC&S to nonculture, PCR-based methods, the way microbiology results are reported and coded via PBCL needs to be reviewed and modernized. There might be scope to draw lessons from biochemistry and hematology where, with the exception of glucose provenance and use of nonnumeric keys [
The principal limitation of this study is the lack of a gold standard; we do not know the “true” incidence of IID. There has been no back-to-case records review to validate this approach, though the authors have gone back to records to demonstrate the reliability of case finding from clinical records in other domains, for example, chronic kidney disease [
In addition, ontologies are developed as an iterative process; therefore, we recommend testing by running data extracts to improve sensitivity and specificity. Our ontology is online and may be superseded by better laboratory coding, advances in near-patient testing, or other unforeseen advances. For example, there was no attempt to include social media data in this exercise. Techniques are emerging to do this and should be considered as part of future investigations and for inclusion in the subjective elements of the ontology [
Bias of many types can affect the quality of data recording in CMR systems. This can be around financial incentives to adopt CMR systems which then may not get used [
Finally, use of a new ontological approach to measuring disease incidence might result in further discrepancies between different surveillance systems that monitor the IID incidence. Harmonization of coding systems across different systems and countries is important from an epidemiological perspective to ensure that estimates of disease burden are comparable.
Our study indicates that use of the standard definition of IID to identify cases in primary care results in the underestimation of disease incidence. To capture a larger proportion of new IID cases in primary care, an ontological approach should be adopted to expand the case definition to include those patients with codes falling outside more restrictive standard definitions, as well as improving the PBCL coding list used by laboratories returning pathology results. Given the high burden of IID in the community, identifying what specific organisms are circulating within a community would help GPs and public health services. For GPs this would reinforce the importance of stressing simple and important control measures, such as hand washing, and trigger the implementation-specific interventions for specific infections. Local and regional public health services would more accurately know the disease burden and be able to intervene; nationally and internationally more accurate data would enable better policy evaluation and development around hygiene and food chain management.
Using these approaches will provide a better picture for clinicians, epidemiologists, and public health officials of the burden of IID in the community and the impact of seasonal infectious disease outbreaks.
Formal ontology developed based on infectious intestinal disease case definition.
Diagnostic and supporting factor codes and code mapping classes selected based on infectious intestinal disease ontology.
computerized medical record
general practitioner
International Classification of Diseases
infectious intestinal disease
Microscopy, Culture & Sensitivity
Pathology Bounded Code List
Royal College of General Practitioners
Research and Surveillance Centre
World Health Organisation
The authors wish to thank Filipa Ferreira (program manager) and other members of the Clinical Informatics and Health Outcomes Research Group at the University of Surrey and Sam O’Sullivan and Barbara Arrowsmith for their contribution; RCGP RSC practices and their patients for allowing us to extract and use their health data for surveillance and research; Apollo Medical Systems [
This work was supported by Wellcome Trust and Department of Health which provided funding for “Integrate” through Health Innovation Challenge Fund: Theme 5 Infections Response Systems (grant number HICF-T5-354) awarded to SJO’B (Principal Investigator). This work was also partly funded by a 10-funder consortium led by MRC for Health e-Research Centre, Farr Institute (MRC Grant: MR/K006665/1).
SdeL has a pending application for feasibility study funding with Takeda to look at the potential to identify household transmission from routine data and to assess gastroenteritis economic impact; SOB will advise this project. All other authors report no conflicts.