This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Extraction of line-of-therapy (LOT) information from electronic health record and claims data is essential for determining longitudinal changes in systemic anticancer therapy in real-world clinical settings.
The aim of this retrospective cohort analysis is to validate and refine our previously described open-source LOT algorithm by comparing the output of the algorithm with results obtained through blinded manual chart review.
We used structured electronic health record data and clinical documents to identify 500 adult patients treated for metastatic non–small cell lung cancer with systemic anticancer therapy from 2011 to mid-2018; we assigned patients to training (n=350) and test (n=150) cohorts, randomly divided proportional to the overall ratio of simple:complex cases (n=254:246). Simple cases were patients who received one LOT and no maintenance therapy; complex cases were patients who received more than one LOT and/or maintenance therapy. Algorithmic changes were performed using the training cohort data, after which the refined algorithm was evaluated against the test cohort.
For simple cases, 16 instances of discordance between the LOT algorithm and chart review prerefinement were reduced to 8 instances postrefinement; in the test cohort, there was no discordance between algorithm and chart review. For complex cases, algorithm refinement reduced the discordance from 68 to 62 instances, with 37 instances in the test cohort. The percentage agreement between LOT algorithm output and chart review for patients who received one LOT was 89% prerefinement, 93% postrefinement, and 93% for the test cohort, whereas the likelihood of precise matching between algorithm output and chart review decreased with an increasing number of unique regimens. Several areas of discordance that arose from differing definitions of LOTs and maintenance therapy could not be objectively resolved because of a lack of precise definitions in the medical literature.
Our findings identify common sources of discordance between the LOT algorithm and clinician documentation, providing the possibility of targeted algorithm refinement.
Lung cancer is the most common cause of cancer-related deaths worldwide [
Cancer therapy is commonly classified into lines of therapy (LOTs), each comprising one or more cycles of a single agent or a combination systemic anticancer therapy (SACT) [
The objective of this study is to validate and refine our previously described open-source LOT algorithm [
After receiving approval from the Indiana University Institutional Review Board, we conducted a retrospective cohort analysis using structured EHR data and clinical documents from the Indiana Network for Patient Care, one of the largest and oldest health information exchanges in the United States [
To validate the LOT algorithm, we identified adult patients treated for metastatic NSCLC with SACT and excluded patients who had received any SACT commonly used for small cell lung cancer, as described in
Next, we extracted structured data commonly found in claims data and required by the LOT algorithm, including patient identifiers, SACT medications, and associated dates. For SACT medications, we filtered the SACT drug list to those used to treat metastatic NSCLC. The index date of the first-line (L1) treatment in this study corresponded to the date of initial SACT on or after recorded evidence of metastatic disease. For chart review purposes, we extracted all available clinical notes after the metastatic diagnosis date. In preparation for manual chart review, these clinical notes were loaded into nDepth, the Regenstrief natural language processing platform. This platform provides an efficient means of reviewing documents and capturing related information on a per-patient basis.
We then created a CSV file with patient identifier number, administration start date, administration end date (for oral drugs), and generic drug name as the column fields. This format is the minimum information required for the LOT algorithm input. Finally, we divided patients into a training cohort of 70% (350/500) patients and a test cohort of 30% (150/500) patients, using stratified sampling to keep the ratio of simple:complex cases the same in both cohorts. All algorithmic changes were performed using the training cohort data, after which the final version of the algorithm was evaluated against the test cohort.
Investigators at Merck Sharp & Dohme Corp have internally developed automated business rules to identify LOT numbers, treatment regimens, and maintenance treatment for patients with cancer [
An overview of the LOT algorithm can be roughly understood by breaking it down into the five basic modules depicted in
Schematic depicting the five basic modules of a line-of-therapy (LOT) algorithm. L1: first-line therapy. Reprinted with permission from Meng et al [
Within these modules, several parameters are available in the code that allow the adjustment and introduction of special cases and exceptions for the rules. Common adjustments relate to the detection of maintenance therapy, checking for drug switches early in a LOT, and adding exceptions to line advancement for gaps in therapy or when certain drug classes are added or substituted in a treatment regimen.
Blinded to results generated from the LOT algorithm, a physician (PRD) used the nDepth chart review functionality to review clinical notes for patients with metastatic NSCLC. The reviewer also had access to a spreadsheet that included the individual SACT medication names and dates of administration for each patient. The majority of detailed SACT LOT and maintenance therapy descriptions came from outpatient oncology notes. The reviewer extracted the following clinical information for each patient: (1) the sequence of SACT LOT and (2) maintenance therapy. He then formatted this information in a spreadsheet format identical to the LOT algorithm output to facilitate automated comparison.
For the initial (prerefinement) validation, we customized the NSCLC LOT algorithm parameters using the previously published criteria [
After completion of the blinded, automated initial comparison between algorithm output and chart review for the patients in the training cohort (n=350), we identified issues accounting for any discordance between algorithm output and chart review. To evaluate the areas of discordance, we separated the cases into simple and complex categories. For each issue, we then refined the LOT algorithm using close review of the initial comparison results, iterative rerunning of the refined LOT algorithm against the original chart review results, discussion with internal experts, and targeted medical literature review. Researchers from Merck Sharp & Dohme, Indiana University, and Regenstrief reviewed the deidentified raw SACT data and arbitrated the differences between algorithm output and chart review through a series of meetings.
Descriptive statistics were calculated for demographic and LOT characteristics, including means, SDs, ranges for continuous variables, and counts and percentages for categorical variables. One-way analysis of variance (ANOVA) and the Fisher exact test, as appropriate, were used to compare demographic characteristics and LOT counts between the training and test cohorts.
Intraclass correlation coefficients (ICCs) and the corresponding 95% CIs for the number of LOTs for each case based on the LOT algorithm and chart review were calculated. Percentage agreement and 95% CIs were calculated to compare the results from the LOT algorithm with the chart review. Agreement was defined as an exact match between the LOT algorithm output and physician chart review in terms of LOT number, regimen name, and maintenance therapy classification. Each LOT comprised the treatment as well as any subsequent maintenance therapy regimen.
We identified 11,223 patients with at least one diagnosis code for lung cancer during the study period. Of these, 1461 patients had metastatic lung cancer as defined by diagnosis codes, metastatic criteria, and receipt of SACT 14 days before or any time after the index date. Of these 1461 patients, 897 patients also had NSCLC mentioned in unstructured patient notes.
To construct our final sample, the first iteration of the LOT algorithm was run on the 897 eligible patients. All complex cases who, according to the algorithm output, received more than one LOT and/or maintenance therapy were automatically selected for chart review. The chart review identified 246 patients as complex cases, and then 254 patients who had only a single LOT and never received maintenance therapy (simple cases) were randomly chosen to complete the sample of 500 patients. The 500 cases were then split into training and test cohorts with the same ratios of simple:complex cases (
No significant differences in patient characteristics were found between the training and test cohorts (
Selection of 500 patients whose deidentified charts were included in the study. LOT: line of therapy; NSCLC: non–small cell lung cancer.
Patient demographic characteristics.
Demographics | All patients (N=500) | Training cohort (n=350) | Test cohort (n=150) | |||
Female, n (%) | 220 (44.0) | 153 (43.7) | 67 (44.7) | .85a | ||
|
.16b | |||||
|
Mean (SD) | 64.3 (10.7) | 64.8 (10.7) | 63.3 (10.5) |
|
|
|
Range | 25-91 | 34-91 | 25-90 |
|
|
|
.95a | |||||
|
White | 442 (89.1) | 308 (88.8) | 134 (89.9) |
|
|
|
Black | 50 (10.1) | 36 (10.4) | 14 (9.4) |
|
|
|
Asian | 4 (0.8) | 3 (0.9) | 1 (0.7) |
|
|
Hispanic, Latino, or other ethnicity | 2 (0.4) | 1 (0.3) | 1 (0.7) | .49a |
aFisher exact test comparing training and test cohorts.
bLinear model analysis of variance comparing training and test cohorts.
cNo information on race was available for 4 patients.
The distributions of LOT counts were similar between the training and test cohorts, and simple and complex cases each represented approximately half of the cases (
A total of 55.1% (193/350) patients in the training cohort received one LOT. An additional 29.4% (103/350) had two LOTs, and 10.3% (36/350) had three LOTs. Most patients had three or fewer LOTs during their treatment history, and 14.6% (51/350) patients received maintenance therapy (
Blinded manual chart review findings for lines of therapy.
Group | All patients (N=500) | Training cohort (n=350) | Test cohort (n=150) | |||||||
|
N/Ab | |||||||||
|
Simple cases | 254 (50.8) | 178 (50.9) | 76 (50.7) |
|
|||||
|
Complex cases | 246 (49.2) | 172 (49.1) | 74 (49.3) |
|
|||||
|
.09d | |||||||||
|
1 | 280 (56.0) | 193 (55.1) | 87 (58.0) |
|
|||||
|
2 | 144 (28.8) | 103 (29.4) | 41 (27.3) |
|
|||||
|
3 | 51 (10.2) | 36 (10.3) | 15 (10.0) |
|
|||||
|
4 | 20 (4.0) | 17 (4.9) | 3 (2.0) |
|
|||||
|
5 | 3 (0.6) | 0 (0.0) | 3 (2.0) |
|
|||||
|
6 | 2 (0.4) | 1 (0.3) | 1 (0.7) |
|
|||||
Maintenance therapy, n (%) | 74 (14.8) | 51 (14.6) | 23 (15.3) | .89d |
aBlinded manual chart review was used to identify simple cases as patients who received one line of therapy (LOT) and no maintenance therapy, and complex cases as patients who received more than one LOT and/or maintenance therapy.
bN/A: not applicable.
cLOT: line of therapy.
dFisher exact test comparing training and test cohorts.
The ICCs on the number of LOTs between the LOT algorithm and chart review in the training cohort were 0.81 overall and 0.71 in the complex cases. The prerefinement agreement between the LOT algorithm output and chart review was 91% for the simple cases overall and 61% for the complex cases in the training cohort (
Intraclass correlation coefficients (ICCs) on number of lines of therapy (LOTs) and percentage agreement of non–small cell lung cancer LOT algorithm output with manual chart review.a
Group | Training cohort (n=350) | Test cohort (n=150) | ||||||
|
Prerefinement | Postrefinement |
|
|||||
Overall,b ICCc (95% CI) | 0.81 (0.77-0.84) | 0.87 (0.84-0.89) | 0.90 (0.86-0.92) | |||||
Complex cases, ICC (95% CI) | 0.71 (0.63-0.78) | 0.75 (0.68-0.81) | 0.82 (0.73-0.88) | |||||
|
||||||||
|
1 (train n=193, test n=87) | 88.6 (84.1-93.1) | 93.3 (89.7-96.8) | 93.1 (87.8-98.4) | ||||
|
2 (train n=103, test n=41) | 68.0 (58.9-77.0) | 72.8 (64.2-81.4) | 56.1 (40.9-71.3) | ||||
|
3 (train n=36, test n=15) | 58.3 (42.2-74.4) | 58.3 (42.2-74.4) | 53.3 (28.1-78.6) | ||||
|
4 (train n=17, test n=3) | 23.5 (3.4-43.7) | 23.5 (3.4-43.7) | 33.3 (0.0-86.7) | ||||
|
5 (train n=0, test n=3) | —e | — | — | ||||
|
6 (train n=1, test n=1) | — | — | — | ||||
Simple cases, overall | 91.0 (86.8-95.2) | 95.5 (92.5-98.5) | 100 | |||||
Complex cases, overall | 60.5 (53.2-67.8) | 64.0 (56.8-71.1) | 50.0 (38.6-61.4) |
aSimple or complex designation and mutually exclusive groups based on the total number of lines of therapy according to the chart review.
bThe overall intraclass coefficients included data for simple cases, whereas simple cases were not evaluated separately because of low variability.
cICC: intraclass coefficient.
dLOT: line of therapy.
ePatient numbers for five and six LOTs were too few for analysis.
For the simple cases, we found that the majority of discordances reflected the LOT not being advanced by chart review but being advanced in the algorithm output because of the 120-day gap-in-therapy rule (
Reasons for discordance between the non–small cell lung cancer line-of-therapy algorithm and blinded chart review: numbers of cases.a
Reason for discordance | Training cohort (n=350) | Test cohort (n=150) | ||||||||||||||
|
Prerefinement | Postrefinement | n (%) | N | ||||||||||||
|
n (%) | N | n (%) | N |
|
|
||||||||||
|
16 (9.0) | 178 | 8 (4.5) | 178 | 0 (0) | 76 | ||||||||||
|
Gap-in-therapy window length | 9 (56) | 16 | 3 (38) | 8 |
|
|
|||||||||
|
28-day line regimen window | 3 (19) | 16 | 3 (38) | 8 |
|
|
|||||||||
|
Line name disagreement | 3 (19) | 16 | 1 (13) | 8 |
|
|
|||||||||
|
Other | 1 (6) | 6 | 1 (13) | 8 |
|
|
|||||||||
|
68 (39.5) | 172 | 62 (36) | 172 | 37 (50.0) | 76 | ||||||||||
|
Dropped drugs | 22 (32) | 68 | 24 (39) | 62 | 17 (46) | 37 | |||||||||
|
Maintenance therapy classification | 14 (21) | 68 | 13 (21) | 62 | 8 (22) | 37 | |||||||||
|
28-day line regimen window | 12 (18) | 68 | 12 (19) | 62 | 6 (16) | 37 | |||||||||
|
Gap-in-therapy window length | 9 (13) | 68 | 4 (6) | 62 | 2 (5) | 37 | |||||||||
|
Other | 11 (16) | 68 | 9 (15) | 62 | 4 (11) | 37 |
aPercentages may not add up to 100 because of rounding.
For the complex cases, the most common source of discordance resulted from dropped drugs, specifically cases when chart review advanced the LOT after a drug in combination therapy was dropped, but the algorithm did not (
The second most common source of discordance occurred because of differences in the identification of maintenance therapy, such as the determination of maintenance therapy after L1 regimens by chart review but not in algorithm output. Chart review often labeled maintenance therapy beyond L1 and/or with drugs outside the National Comprehensive Cancer Network (NCCN) guidelines, whereas the algorithm identified maintenance therapy in the L1 setting and using a drug list defined by NCCN guidelines [
Discordances related to the 120-day gap-in-therapy window and the 28-day regimen window were also relatively common among the complex cases (
After reviewing the discordances between chart review findings and LOT algorithm output for the training cohort, we used descriptive statistics and plots to determine how to adjust the discordant parameters and improve concordance when possible. For example, we identified the need to increase the gap-in-therapy window from 120 to 180 days by plotting the gap between successive prescriptions, excluding several protein kinase inhibitors as exceptions to the rule for gap-in-therapy line advancement (these
Line-of-therapy algorithm parameters for metastatic non–small cell lung cancer: prerefinement and postrefinement.
Basic modules | Parameters | |||
|
Prerefinement | Postrefinement | ||
L1a first drug | On or after index dateb | On or after index dateb | ||
Line regimen window | ≤28 days after first drug | ≤28 days after first drug | ||
New drug line advancement | First instance | First instance | ||
Exceptions (allowed substitutions) | Cisplatin ↔ carboplatin or paclitaxel ↔ albumin-bound paclitaxel substitution | Cisplatin ↔ carboplatin or paclitaxel ↔ albumin-bound paclitaxel substitution | ||
Gap in therapy window | >120 days | >180 days | ||
Exceptions (allowed gaps) | None | Erlotinib, afatinib, brigatinib, crizotinib, ceritinib, alectinib, gefitinib, osimertinib | ||
|
||||
|
|
|||
|
|
Continuation maintenance | Bevacizumab, pemetrexed, atezolizumab | Bevacizumab, pemetrexed, atezolizumab, gemcitabine |
|
|
Switch maintenance | Pemetrexed, docetaxel | Pemetrexed, docetaxel |
|
Combination dropped drugs to advance LOTc | N/Ad | Optional flag (not implemented)e | |
|
Drug switch during initial regimen window | N/A | Optional flag (not implemented) |
aL1: first line of therapy.
bIndex date defined as date of recorded metastatic non–small cell lung cancer diagnosis.
cLOT: line of therapy.
dN/A: not applicable.
eOption included in LOT to handle these cases but not used in this study.
Postrefinement agreement increased from 91% to 96% for the simple cases overall and from 61% to 64% for the complex cases, although improvements were limited to receipt of one or two LOTs (
Results of applying the line-of-therapy algorithm to the training cohort postrefinement. NSCLC: non–small cell lung cancer.
After the LOT algorithm was refined, the total number of discordant results was halved for the simple cases in the training cohort, with the greatest decrease in discordance resulting from the increase from 120 to 180 days in the gap-in-therapy window (
The LOT algorithm was then run for the test cohort. For the simple cases, the agreement between the chart review results and algorithm output was 100% (
For patients who received one LOT, agreement was high and improved slightly with algorithm refinement (89% prerefinement, 93% postrefinement, 93% test cohort;
We found an overall good alignment between our automated method of LOT classification and blinded manual chart review. As expected, the likelihood of precise matching between LOT algorithm output and chart review regarding LOT and maintenance therapy identification decreased with an increasing number of unique SACT regimens. This finding is consistent with the simple compounding of errors, that is, the chance of at least one error being found in multiple LOTs is greater than finding an error in a single LOT. On a per-LOT basis, the error would presumably remain fairly constant.
For the purposes of our comparisons, we used manual chart review as the gold standard. We improved the concordance between the LOT algorithm and chart review by increasing the gap-in-therapy window from 120 to 180 days. Concordance was also improved by adding drug class exceptions for protein kinase inhibitors to the gap-in-therapy rule and by adding gemcitabine as a continuation maintenance candidate. Our study notably contributes to the literature insofar as it identifies common sources of discordance between an LOT algorithm and clinician documentation, providing for the possibility of targeted algorithm refinement.
Our study is one of the first to validate and refine an open-source LOT algorithm using manual chart review [
In the case of (1) whether to advance the line if a drug in a combination regimen is dropped, particular drugs may be dropped because of adverse events. Whether the remaining drugs should be considered the original or a new SACT regimen (LOT) is a subjective matter and may not be explicitly recorded by the prescribing physician. In the case of issues (2) and (3), NCCN guidelines specify that maintenance therapy is prescribed in the first-line setting and that a prescribed set of drugs is eligible for switch maintenance therapy for NSCLC [
These small apparent inconsistencies may reflect a lack of precise definitions in LOT classification rules, or perhaps more likely, that physicians are instead appropriately focused on dynamically selecting optimal SACT regimens for their patients rather than precisely categorizing LOT and maintenance therapy. In addition, as shown in
Our algorithm is adaptable for use with other cancers and other cancer stages because of its modular design [
This study has some limitations. First, we did not consider the length of oral drug administration. Oral drugs are often prescribed with a preset supply, and because the last dose administration is not typically recorded, extrapolation would be needed to determine the length of administration. In this study, our agreement metrics accounted for only the LOT number and regimen, and not the LOT duration; therefore, we considered only the first dose of oral drugs. We note also that we purposely oversampled for complex cases; therefore, the metrics reflect a distribution of patients that was not representative of the overall distribution. For example, only 28% of our selected study population versus 60% of eligible patients in the database received just one LOT without maintenance therapy, our definition of a simple case. Therefore, it is possible that single LOT metrics are under-represented. Finally, it could have been helpful to have more than one physician conducting the manual chart reviews, with an additional independent physician to resolve any discrepancies or disagreements.
Further research is needed on other data sets to determine if the results and conclusions are generalizable. In addition, considerations such as detecting drug cycles and accounting for drug-specific nuances may increase the robustness of the algorithm. Further research on the appropriate metrics and benchmarks may be needed to address issues such as error compounding.
This study validates an EHR- and claims-based algorithm using medical chart review. We have refined the algorithm, highlighted areas of discordance, and noted the error compounding on further lines, allowing a deeper understanding of how the LOT algorithm may be used. We envision contributions to different disease indications and areas. In addition, common data set benchmarks, metrics, and increased accessibility will contribute substantially toward the development and adoption of this tool. Finally, a database of specific business rules concerning individual drugs and other nuanced behaviors will increase the robustness of the algorithm.
Identification of patients with metastatic non–small cell lung cancer.
anaplastic lymphoma kinase
analysis of variance
epidermal growth factor receptor
electronic health record
first-, second-, third-, and fourth-line therapy
line of therapy
National Comprehensive Cancer Network
non–small cell lung cancer
systemic anticancer therapy
The authors thank Elizabeth V Hillyer, DVM, for editorial assistance. This work was supported by Merck Sharp & Dohme Corporation, a subsidiary of Merck & Company, Inc, Kenilworth, NJ, United States.
WM and WO are employees of Merck Sharp & Dohme Corp, a subsidiary of Merck & Co, Inc, Kenilworth, NJ, USA, and stockholders of Merck & Co, Inc, Kenilworth, NJ, USA. KMM, KAL, and PRD have no conflicts of interest to declare. ARR and AG are employees of Regenstrief Institute, which was paid by Merck Sharp & Dohme Corp to conduct this study.