This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Electronic health record (EHR)–based disease registries have aided health care professionals and researchers in increasing their understanding of chronic illnesses, including identifying patients with (or at risk of developing) conditions and tracking treatment progress and recovery. Despite excessive alcohol use being a major contributor to the global burden of disease and disability, no registries of alcohol problems exist. EHR-based data in Kaiser Permanente Northern California (KPNC), an integrated health system that conducts systematic alcohol screening, which provides specialty addiction medicine treatment internally and has a membership of over 4 million members that are highly representative of the US population with access to care, provide a unique opportunity to develop such a registry.
Our objectives were to describe the development and implementation of a protocol for assembling the KPNC Adult Alcohol Registry, which may be useful to other researchers and health systems, and to characterize the registry cohort descriptively, including underlying health conditions.
Inclusion criteria were adult members with unhealthy alcohol use (using National Institute on Alcohol Abuse and Alcoholism guidelines), an alcohol use disorder (AUD) diagnosis, or an alcohol-related health problem between June 1, 2013, and May 31, 2019. We extracted patients’ longitudinal, multidimensional EHR data from 1 year before their date of eligibility through May 31, 2019, and conducted descriptive analyses.
We identified 723,604 adult patients who met the registry inclusion criteria at any time during the study period: 631,780 with unhealthy alcohol use, 143,690 with an AUD diagnosis, and 18,985 with an alcohol-related health problem. We identified 65,064 patients who met two or more criteria. Of the 4,973,195 adult patients with at least one encounter with the health system during the study period, the prevalence of unhealthy alcohol use was 13% (631,780/4,973,195), the prevalence of AUD diagnoses was 3% (143,690/4,973,195), and the prevalence of alcohol-related health problems was 0.4% (18,985/4,973,195). The registry cohort was 60% male (n=432,847) and 41% non-White (n=295,998) and had a median age of 41 years (IQR=27). About 48% (n=346,408) had a chronic medical condition, 18% (n=130,031) had a mental health condition, and 4% (n=30,429) had a drug use disorder diagnosis.
We demonstrated that EHR-based data collected during clinical care within an integrated health system could be leveraged to develop a registry of patients with alcohol problems that is flexible and can be easily updated. The registry’s comprehensive patient-level data over multiyear periods provides a strong foundation for robust research addressing critical public health questions related to the full course and spectrum of alcohol problems, including recovery, which would complement other methods used in alcohol research (eg, population-based surveys, clinical trials).
Electronic health records (EHRs) provide a platform to study many diseases and health-related issues longitudinally in diverse populations, including identification of patients with (or at-risk of developing) conditions, and tracking treatment progress and recovery. The development of EHR-based disease registries has aided health care professionals and researchers in increasing their understanding of chronic illnesses and how to manage them [
Despite excessive alcohol use being a significant contributor to the global burden of disease and disability [
The goal of the overall project was to assemble a registry of patients with alcohol problems by leveraging comprehensive EHR-based data within Kaiser Permanente Northern California (KPNC). The registry was developed for research specifically, but with the potential for future clinical or administrative applications such as quality improvement. KPNC is an integrated health care system that provides primary and specialty care internally (including addiction medicine and psychiatry). It has a mature, fully developed Epic EHR system (Epic Systems, Verona, WI), Kaiser Permanente (KP) HealthConnect, that stores data collected throughout the full course of patient care since 2005. Additionally, KPNC has conducted over 12 million alcohol screenings among 4 million adult members since June 2013 as part of a systematic alcohol screening, brief intervention, and referral to treatment initiative in primary care [
KPNC serves 4.3 million members, comprising about one-third of the population in Northern California. The membership is diverse and highly representative of the US population with access to care [
In June 2013, KPNC implemented Alcohol as a Vital Sign, a systematic alcohol screening, brief intervention, and referral to treatment initiative, in adult primary care [
Registry data are leveraged from two existing data sources: Clarity (GridApp Systems, Inc) and the Virtual Data Warehouse (VDW). Clarity is the back-end database of EHR data collected in KP HealthConnect, which we used to extract alcohol screening data. The VDW is a distributed data model developed by the Health Care Systems Research Network (HCSRN) to maintain single extract, transform, and load processes that efficiently create relational tables useful for research [
In collaboration with NIAAA, we defined the objective of the registry and target population and formed research aims to frame the registry’s scope. The purpose of the registry is to study the full course of alcohol problems with the flexibility to address many research questions, such as the escalation of unhealthy drinking to development of AUDs and alcohol-related health problems, and the ability to be updated with new data. The target population for the registry is adult patients diagnosed with an alcohol problem and those at risk of developing one.
We developed a protocol for building the registry (available upon request), following recommendations from the US Agency for Healthcare Research and Quality [
We included adult patients (age ≥18 years) with unhealthy alcohol use, an active AUD diagnosis, or an alcohol-related health problem, from any department or encounter setting within the health system. The initial registry cohort includes patients who met these criteria between June 1, 2013, (when Alcohol as a Vital Sign was implemented) to May 31, 2019. The patient’s index date was the first date in which the patient met eligibility criteria during the study period.
Unhealthy alcohol use was identified using systematic alcohol screening data collected as part of Alcohol as a Vital Sign. Using NIAAA recommended drinking guidelines [
ICD-9 and ICD-10 codes given at any encounter at KPNC or through a claim were used to identify patients with a diagnosis of an active AUD (excluding remission codes) or an alcohol-related health problem (
International Classification of Diseases (ICD) codes for identification of active alcohol use disorders and alcohol-related health problems as part of inclusion criteria for the Kaiser Permanente Northern California Adult Alcohol Registry.
Disorder, ICDa version, and code | Description | ||||
|
|
||||
|
|
||||
|
|
291b | Alcohol-induced mental disorders (eg, alcohol withdrawal delirium) | ||
|
|
303b, except 303.03 and 303.93c | Alcohol dependence syndrome | ||
|
|
305.0b, except 305.03c | Nondependent alcohol abuse | ||
|
|
||||
|
|
F10.9b | Alcohol use, unspecified (includes alcohol-induced mental disorders) | ||
|
|
F10.2b, except F10.21d | Alcohol dependence | ||
|
|
F10.1b, except F10.11d | Alcohol abuse | ||
|
|
||||
|
|
||||
|
|
357.5 | Alcoholic polyneuropathy | ||
|
|
425.5 | Alcoholic cardiomyopathy | ||
|
|
535.3b | Alcoholic gastritis | ||
|
|
571.0-571.3 | Alcoholic liver disease | ||
|
|
||||
|
|
G31.2 | Degeneration of nervous system due to alcohol | ||
|
|
G62.1 | Alcoholic polyneuropathy | ||
|
|
G72.1 | Alcoholic myopathy | ||
|
|
I42.6 | Alcoholic cardiomyopathy | ||
|
|
K29.2b | Alcoholic gastritis | ||
|
|
K70b | Alcoholic liver disease | ||
|
|
K86.0 | Alcohol-induced chronic pancreatitis |
aICD: International Classification of Diseases.
bAny (or no) additional digits.
c303.03, 303.93, and 305.03 are ICD-9 remission codes.
dF10.21 and F10.11 are ICD-10 remission codes.
Like the VDW [
Entity-relationship diagram representing the data structure of core files in the Kaiser Permanente Northern California Adult Alcohol Registry. Primary key variables are unique identifiers that can be used with foreign key variables to link data across files. PK: primary key; FK: foreign key.
Implementation of the protocol took about 10 months with 50% programmer time effort. We wrote programs using SAS software, version 9.4 of the SAS System for Unix (SAS Institute), to build the registry, which were designed to minimize user interaction and could be used again to refresh the registry data (eg, using macros and macro variables). We minimized data cleaning to allow future studies to make their own decisions regarding the use of the data. We created a data dictionary to describe the files and variables that comprise the registry. We also developed queries for quality control, such as identifying missing data and characterizing data storage requirements. Last, we created reporting tools to display trends of the registry data over time.
Since the EHR is a constantly changing data environment, refreshing the registry with new data requires programs and documentation to be updated. For example, source variables and tables may be renamed or become deprecated during upgrades of data systems. The amount of time required to refresh the registry depends on the quantity and types of changes needed (eg, adding more ICD codes versus editing SAS programs), but may take anywhere from an hour to several days. Receiving ongoing feedback of the registry as research staff use it for their projects is also critical to ensuring the registry’s validity and usefulness.
We calculated the prevalence of alcohol problems among all adult KPNC patients who had at least one encounter with the health system between June 1, 2013, and May 31, 2019. We conducted descriptive analyses to describe demographic, clinical (eg, medical and mental health conditions), and insurance characteristics of the registry cohort. We included only key variables in the current analysis to compare the registry cohort to those in other published studies. All characteristics, such as age, were based on the patient’s index date. We estimated patients’ household income and education using US Census data that has been geocoded to patients’ closest residential addresses in the year prior to and including the month of their index date. If the index date was before January 1, 2017, we used the 2010 US Census data; otherwise, we used 2017 data, since census block boundaries can change over time [
We identified 723,604 adult patients eligible for inclusion in the registry between June 1, 2013, to May 31, 2019: 631,780 with unhealthy alcohol use, 143,690 with an AUD diagnosis, and 18,985 with an alcohol-related health problem, anytime during the study period. Counts are not independent, as 65,064 patients met two or more eligibility criteria. Of 4,973,195 adult KPNC patients with at least one encounter with the health system during the study period, the prevalence of unhealthy alcohol use was 13% (631,780/4,973,195), the prevalence of AUD diagnoses was 3% (143,690/4,973,195), and the prevalence of alcohol-related health problems was 0.4% (18,985/4,973,195).
The registry cohort was about 60% (n=432,847) male and 40% (n=290,755) female, and there were 2 patients with other/unknown sex. In regard to gender, 0.1% (n=688) of the cohort were gender minorities (transgender, nonbinary, or other gender). The median age was 41 years (IQR=27;
Characteristics of patients meeting eligibility criteria for the Kaiser Permanente Northern California Adult Alcohol Registry between 6/1/2013 and 5/31/2019 (N=723,604).
Characteristic | Value | ||
|
|
||
|
Male | 432,847 (59.8) | |
|
Female | 290,755 (40.2) | |
|
Other/Unknown | 2 (<0.1) | |
|
|
||
|
Male | 432,614 (59.8) | |
|
Female | 290,302 (40.1) | |
|
Transgender male | 217 (<0.1) | |
|
Transgender female | 241 (<0.1) | |
|
Non-binary | 229 (<0.1) | |
|
Other/Unknown | 1 (<0.1) | |
Age in years, median (IQR) | 41.0 (27.0) | ||
|
|
||
|
18-34 | 279,276 (38.6) | |
|
35-49 | 187,072 (25.9) | |
|
50-64 | 156,250 (21.6) | |
|
≥65 | 101,006 (14.0) | |
|
|
||
|
White | 427,606 (59.1) | |
|
Asian/Native Hawaiian/Pacific Islander | 76,197 (10.5) | |
|
Black | 50,601 (7.0) | |
|
Latino/Hispanic | 138,925 (19.2) | |
|
Native American | 7,015 (1.0) | |
|
Other/Unknown | 23,260 (3.2) | |
|
|
||
|
0-19,999 | 5,694 (0.8) | |
|
20,000-34,999 | 38,534 (5.3) | |
|
35,000-69,999 | 264,638 (36.6) | |
|
≥70,000 | 409,004 (56.5) | |
|
Unknown | 5,734 (0.8) | |
|
|
||
|
Less than high school | 32,446 (4.5) | |
|
High school graduate | 171,132 (23.6) | |
|
Some college or higher | 517,624 (71.5) | |
|
Unknown | 2,402 (0.3) | |
|
|
||
|
Never or former | 552,618 (76.4) | |
|
Current | 115,557 (16.0) | |
|
Unknown | 55,429 (7.7) | |
|
|
||
|
0 | 614,422 (84.9) | |
|
1 | 64,420 (8.9) | |
|
≥2 | 44,762 (6.2) | |
|
|
||
|
None | 30,033 (4.2) | |
|
Medicaid | 19,834 (2.7) | |
|
Medicare | 105,393 (14.6) | |
|
Commercial | 561,620 (77.6) | |
|
Other | 6,724 (0.9) | |
Enrolled via California Affordable Care Act exchange, n (%)a | 44,110 (6.1) | ||
Months of follow-up data in the registry, median (IQR) | 21.0 (39.0) | ||
Number of alcohol screenings, minimum-maximum | 0-15 |
aPercentages may not add up to 100% due to rounding error.
bMedian household income from geocoded census blocks to patients’ residential addresses was used as a proxy of individual-level data.
cThe proportion of individuals within a census block with a level of education was used to estimate each patient’s education level.
Diagnosesa of patients in the Kaiser Permanente Northern California Adult Alcohol Registry (N=723,604).
Condition | Value, n (%) | |||
|
||||
|
Any chronic medical condition | 346,408 (47.9) | ||
|
Arthritis and other rheumatic conditions | 70,371 (9.7) | ||
|
Asthma | 65,073 (9.0) | ||
|
Atherosclerosis | 12,751 (1.8) | ||
|
Atrial fibrillation | 49,141 (6.8) | ||
|
Cerebrovascular disease | 14,920 (2.1) | ||
|
Chronic kidney disease | 23,253 (3.2) | ||
|
Chronic liver disease | 21,363 (3.0) | ||
|
Chronic obstructive pulmonary disease | 21,953 (3.0) | ||
|
Chronic pain | 41,089 (5.7) | ||
|
Coronary disease | 20,644 (2.9) | ||
|
Dementia | 2,143 (0.3) | ||
|
Diabetes | 45,988 (6.4) | ||
|
Epilepsy | 5,050 (0.7) | ||
|
Gastroesophageal reflux | 71,159 (9.8) | ||
|
Heart failure | 8,342 (1.2) | ||
|
HIV | 2,424 (0.3) | ||
|
Hyperlipidemia | 134,705 (18.6) | ||
|
Hypertension | 152,928 (21.1) | ||
|
Migraine | 23,600 (3.3) | ||
|
Osteoarthritis | 66,800 (9.2) | ||
|
Osteoporosis and osteopenia | 18,626 (2.6) | ||
|
Parkinson’s disease | 713 (0.1) | ||
|
Peptic ulcer | 3,074 (0.4) | ||
|
Rheumatoid arthritis | 3,179 (0.4) | ||
|
|
|||
|
Any mental health condition | 130,031 (18.0) | ||
|
|
76,444 (10.6) | ||
|
|
Obsessive-compulsive disorder | 1,700 (0.2) | |
|
|
Panic disorder | 7,823 (1.1) | |
|
|
Posttraumatic stress disorder | 5,312 (0.7) | |
|
|
924 (0.1) | ||
|
|
Anorexia nervosa | 276 (<0.1) | |
|
|
Bulimia nervosa | 699 (0.1) | |
|
|
82,059 (11.3) | ||
|
|
Bipolar disorder | 9,162 (1.3) | |
|
|
Depression | 75,445 (10.4) | |
|
|
Other mood disorder | 842 (0.1) | |
|
Pervasive developmental disorder | 221 (<0.1) | ||
|
|
6,016 (0.8) | ||
|
|
Schizoaffective disorder | 1,427 (0.2) | |
|
|
Schizophrenia | 1,534 (0.2) | |
|
|
Other psychoses | 4,555 (0.6) | |
|
Trauma- and stressor-related disorders | 12,158 (1.7) | ||
|
|
|||
|
Nicotine use disorder | 86,540 (12.0) | ||
|
|
30,429 (4.2) | ||
|
|
Cannabis | 15,175 (2.1) | |
|
|
Cocaine | 4,980 (0.7) | |
|
|
Opioid | 5,934 (0.8) | |
|
|
Other drugs | 10,418 (1.4) | |
|
|
Stimulants | 7,293 (1.0) |
aDiagnoses were identified using ICD codes given at encounters in the year before the patient’s eligibility date for the registry (ie, index date).
In an integrated health system, we identified a large, population-based cohort of adult patients with unhealthy alcohol use, an AUD, or an alcohol-related health problem that had about 2 years of follow-up time. The KPNC Adult Alcohol Registry can evaluate the full course of alcohol problems, longitudinally and comprehensively, including early identification, initiation and engagement in treatment (including psychiatry, addiction medicine, and pharmacotherapy), and long-term outcomes (eg, drinking, physical and mental well-being), which are critical to understanding recovery. The prevalence of unhealthy alcohol use was 13%, which falls within the range reported by prior studies of the general US population (6%-28%) [
Similar to other studies using population-based survey data that indicated a higher prevalence of unhealthy drinking and AUDs in younger males [
This EHR-based registry provides a strong foundation for robust research examining the development of alcohol problems and recovery from them. In contrast to national population-based surveys such as the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) [
Additionally, our registry data are longitudinal, spanning over 6 years as of May 31, 2019, and the registry can be continually refreshed with new data extracted from the EHR, including adding new cases and more time points for existing cases. Some current alcohol research studies utilize longitudinal data (eg, NESARC, Project MATCH), but many are repeated cross-sectional studies with different samples (eg, National Health and Nutrition Examination Survey [
We included gender minorities in our registry, given recent research demonstrating a high prevalence of unhealthy drinking in this population [
EHR-based registries enable observational studies of “real-world” settings (eg, comparative effectiveness research), an alternative to randomized controlled trials, which may not be feasible; however, using secondary data for research has limitations, including the omission of essential variables and potential for bias (eg, selection bias, information bias, confounding). For example, clinicians in addiction medicine and psychiatry assess AUD symptoms based on the DSM-5, but detailed data are not entered in the EHR. Instead, clinicians record ICD codes to indicate AUD diagnoses, which we use in the registry. While ICD codes for AUDs are not given lightly in other departments, they do occur, and it is not clear what guidelines are used. Therefore, a future validation study of alcohol screening results and the use of these ICD codes is warranted. We also do not have direct measures of individual socioeconomic status (eg, income, education), which are important factors associated with unhealthy alcohol use [
Sex and gender variables in the EHR can change and are not collected longitudinally, so their values in the registry reflect what was present at the time of the data extraction rather than historical values, for example, at the time of alcohol screening. We are also not certain which variables are used to determine the appropriate screening questions and risk thresholds (especially for gender minorities), which a future study could evaluate. Therefore, some alcohol screening results may have been misclassified in the registry, affecting eligibility; however, we expect this to have a minimal impact on future studies.
While we included only core data elements that were necessary to address our research aims, the registry could be extended to include other types of data, including provider information, family members of patients with alcohol problems, and medications prescribed off-label to treat AUD (eg, gabapentin [
We demonstrate that EHR-based data collected during routine clinical care within an integrated health care system can be leveraged to develop a registry of patients with alcohol problems that is flexible and can be easily refreshed and extended. The registry can be used to address critical public health questions related to the full spectrum and course of alcohol problems, which will complement other methods used in alcohol research. Future analyses will aim to provide insight on how to strengthen efforts in the prevention of alcohol-related disability and mortality and improve patient-centered health care delivery. We hope that other researchers and health systems interested in assembling a similar registry can take advantage of the time we invested in developing this protocol.
Detailed descriptions of data elements in the Kaiser Permanente Northern California Adult Alcohol Registry.
Additional diagnoses tracked among patients in the Kaiser Permanente Northern California Adult Alcohol Registry.
alcohol use disorder
electronic health record
Health Care Systems Research Network
International Classification of Diseases
International Classification of Diseases, 9th Revision, Clinical Modification
International Classification of Diseases, 10th Revision, Clinical Modification
Kaiser Permanente
Kaiser Permanente Northern California
National Epidemiologic Survey on Alcohol and Related Conditions
National Institute on Alcohol Abuse and Alcoholism
Virtual Data Warehouse
This project was funded by contracts (#HHSN275201800625P and #75N94019P00907) and a grant (R01AA025902) from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). We gratefully acknowledge Dr Raye Litten and Dr Daniel Falk at the NIAAA for their expertise in alcohol research, and Yun Lu at the Kaiser Permanente Northern California Division of Research for assistance in extracting data used in this project.
None declared.