Published on in Vol 10, No 4 (2022): April

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/35475, first published .
Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non–Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non–Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non–Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

Original Paper

1College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China

2Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China

*these authors contributed equally

Corresponding Author:

Xudong Lu, PhD

College of Biomedical Engineering and Instrumental Science

Zhejiang University

38 Zheda Road

Hangzhou, 310027

China

Phone: 86 139 5711 8891

Email: lvxd@zju.edu.cn


Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non–small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use.

Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms.

Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician’s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction.

Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician’s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features.

Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician’s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.

JMIR Med Inform 2022;10(4):e35475

doi:10.2196/35475

Keywords



Lung cancer remains the leading cause of cancer death worldwide, representing approximately 1 in 5 (18.0%) cancer deaths [1]. Non–small cell lung cancer (NSCLC) accounts for about 84% of lung cancer cases, and its 5-year relative survival rate is only 25.0% [2], making it one of the biggest threats to human health.

Staging of NSCLC is a process to determine the extent of the cancer and is critical to prognosis evaluation and treatment decision making [3,4]. The TNM stage classification [5] is the most widely used staging method in clinical practice; it describes the anatomic extent of a tumor from 3 aspects (ie, T for extent of the primary tumor, N for involvement of lymph nodes, M for distant metastases). For patients with resectable NSCLC, preoperative confirmed N2 (a type of N stage) lymph node metastasis (LNM) indicates neoadjuvant therapy should be given before surgery to achieve the best clinical practice [3]. Currently, various advanced noninvasive diagnostic modalities are available for N staging like chest computed tomography (CT) and positron emission tomography–computed tomography (PET-CT). In clinical practice, clinicians commonly use a size criterion (ie, the maximum short axis diameter of lymph node >10 mm on CT scan) to discriminate LNM from benign nodes and yield 55% sensitivity [6]. Another criterion is the maximum standardized uptake value (SUVmax) of lymph node >2.5 on PET-CT scan, which has an 81% sensitivity [7]. Invasive methods such as mediastinoscopy and endobronchial ultrasound-guided transbronchial needle aspiration have better diagnostic abilities than noninvasive methods. However, these methods are mainly for lymph nodes with indications and not suitable for patients with severe comorbidities, so they are not routinely used in clinical practice [8]. One study analyzed data from 9 clinical trials and found nearly 38% of patients were misclassified in comparison with their pathological N staging [9]. Therefore, new reliable LNM prediction methods are required to alleviate this clinical dilemma.

For precise staging, researchers explored using statistical analysis or machine learning methods to learn nontrivial knowledge between the comprehensive patient features and LNM status [8,10-16]. Recently, with the rapid development of hospital information systems, a large volume of electronic medical records (EMR) has become available, and it contains almost all clinical features about patients. However, some important features are recorded in the narratives in free text, such as the size of the tumor and lymph node, tumor density, pleural indentation, etc, which hinders their direct use. Manual extraction is time-consuming and error-prone. So, one big challenge is how to extract this information effectively to support subsequent tasks like LNM prediction [17]. A review by Garg et al [18] found studies in which users were automatically prompted to use the system achieved better performance in comparison with those in which users were required to actively initiate the system. The finding implicitly indicates that the duplicative data entry activity may explain why the predictive models are not widely adopted in the clinic despite their potential to improve diagnostic accuracy. Furthermore, with the prevalence of machine learning models, more features are required for analysis, making the clinical application of the models more difficult [19-21].

Natural language processing (NLP) offers the opportunity to automatically extract information to support the application of predictive models [17,22]. Many studies used rule-based, machine learning, or deep learning methods to extract the cancer-related information from free-text EMR data [22-29], but only a few included further elaboration on how to exploit the extracted information. Chen et al [30] extracted information from various clinical notes including CT reports and operative notes to calculate the Cancer of the Liver Italian Program score. Martinez et al [31] extracted information from pathology reports to calculate the TNM and Australian clinicopathological stage of colorectal cancer. Castro et al [32] developed an NLP system for automated breast imaging reporting and data system (BI-RADS) categories extraction from breast radiology reports. Bozkurt et al [33,34] developed an information extraction pipeline to extract information from mammography reports to predict the malignancy of breast cancer. Sui et al [35] constructed an NLP-based feature generalizing to extract features from free-text EMR data and provided the stage of lung cancer using a Bayesian reasoning network. Yuan et al [36] used NLP tools to extract multiple features from EMRs to estimate survival for patients with lung cancer. Although many studies have explored how to extract the cancer-related information from various types of free-text narratives and some also exploit the extracted information for cancer risk evaluation, diagnosis, and pathological staging, few studies exploit the extracted information from radiological reports for preoperative LNM prediction, especially for NSCLC.

In this study, we aim to use EMR data to develop LNM prediction models for NSCLC patients. We first developed a multiturn question answering NLP model to extract the features from CT reports and then combined these features with other clinical characteristics to develop the predictive models. Since the NLP model may produce imperfect extraction results, we also conducted experiments to compare the predicted probabilities between models using NLP-extracted features and gold standard features.


Patients

We retrospectively analyzed EMR data of 794 patients who underwent surgical resection for NSCLC with systematic mediastinal lymphadenectomy at the Department of Thoracic Surgery II of Peking University Cancer Hospital from 2010 to 2018. All patients underwent contrast-enhanced chest CT images within 2 months before surgical resection. We excluded the patients with preoperative chemotherapy or radiotherapy. The collected EMR includes demographic information, medical history, CT reports, preoperative serum tumor markers, and pathology reports, which can be analyzed to develop the prediction model. For each patient, we also collected the clinical staging that clinicians evaluated before surgery as the baseline to compare with the LNM prediction models.

Ethics Approval

This study was approved by the Ethics Committee of Peking University Cancer Hospital (2019KT59).

Clinical and Pathological LNM Evaluation

In this study, all included patients underwent systematic mediastinal lymphadenectomy during surgical resection. The lymph node tissues were examined by pathologists, and the metastasis results were recorded in the postoperative pathology reports. We reviewed the pathology reports to determine the LNM status and label the pathological N (pN) stage (pN0/pN1/pN2) for each patient based on the 8th edition TNM stage classification [5] as the gold standard. We also used the size criterion (ie, the maximum short axis diameter of lymph node >10 mm on CT scan as positive) to label the clinical N (cN) stages (cN0/cN1/cN2) based on the CT-reported lymph node size. Moreover, we collected the cN stages, which were determined preoperatively by a thoracic surgeon using all available patient data including the information used in this study. The thoracic surgeon has 10 years of experience in lung cancer surgery. The cN stages determined by the size criterion and the thoracic surgeon were regarded as the baselines.

NLP Feature Extraction

As one of the most important preoperative examinations, CT reports record valuable information about the tumors and lymph nodes, which is of paramount importance for staging. However, the free-text nature of CT reports makes it difficult to understand and analyze them using computer programs. In our previous work [27], we developed an information extraction system composed of named entity recognition, relation classification, and postprocessing modules to extract valuable information in a pipeline manner. However, in this pipeline, the subsequent tasks would be influenced by the outputs of former tasks, which may affect the performance of the whole system. Therefore, to alleviate this problem, we applied a multiturn question answering (MTQA) [37] approach to extract information from CT reports in this study. Using the MTQA strategy, we can encode the relation into the question query and jointly model entity and relation in a natural question answering way.

Specifically, we first defined 10 questions related to the primary tumor and lymph nodes. All questions are listed in Table 1. Note that there are 2 types of questions (ie, head entity questions and tail entity question templates). In the model training stage, we inserted the annotated head entities into the slots in the tail entity question templates as the tail entity questions. We then used 2 special tokens (ie, CLS and SEP) to concatenate the questions and sentences in the reports as the inputs and annotated entities as the answers to conduct the bidirectional encoder representations from transformers (BERT) model training. In the model test stage, we first concatenated the head entity questions and sentences in the reports as the inputs and applied the trained MTQA model to extract the head entities (ie, tumor and lymph node). If there were any head entities recognized, we inserted the extracted head entities into the slots in the tail entity question templates as the tail entity questions and combined them with sentences in the reports as the inputs to drive the tail entity extraction. A case of the MTQA application is shown in Figure 1. Finally, the extracted head and tail entities are organized as triples, and a rule-based postprocessing algorithm proposed in the previous work [27] is used to process the triples to obtain the standardized NLP-extracted features. Furthermore, the NLP-extracted features were manually reviewed and corrected by a clinician based on the report contents as the gold standard features. In this study, we used BERT [38], an advanced pretrained language representation model, to tag the answer for each question.

Table 1. Questions and entity types for natural language processing–extracted features.
Question (Chinese)Question (English)Answer notationEntity type
Head entity question

原发肿物的相关描述是什么?What is the description about the primary tumor?Head1Tumor

淋巴结的相关描述是什么?What is the description about the lymph nodes?Head2Lymph node
Tail entity question template

Head1 位于什么地方?Where is Head1 located?Tail1Location

Head1 的大小是多少?What is the size of Head1?Tail2Size

Head1 的形状是什么?What is the shape of Head1?Tail3Shape

Head1 的密度是什么?What is the density of Head1?Tail4Density

与Head1 相关的胸膜侵犯的描述是什么?What is the description about the pleura invasion related to Head1?Tail5Pleura

与Head1 相关的血管侵犯的描述是什么?What is the description about the vessel invasion related to Head1?Tail6Vessel

Head2 位于什么地方?Where is Head2 located?Tail7Location

Head2 的大小是多少?What is size of Head2?Tail8Size
Figure 1. A case of multiturn question answering application. BERT: bidirectional encoder representations from transformers.
View this figure

LNM Prediction

Six machine learning algorithms were applied to develop the LNM prediction models, including logistic regression (LR) [39], L2-logistic regression (L2-LR) [40], random forest (RF) [41], LightGBM (LGBM) [42], support vector machine (SVM) [43], and artificial neural network (ANN) [44]. LR is the conventional classification method, and L2-LR is the LR with the L2 regularization for parameters. RF and LGBM are ensemble methods but with different ways to combine the weak decision trees. SVM is a classical algorithm that constructs hyperplanes in a high- or infinite-dimensional space to classify samples. ANN is a supervised learning algorithm that can learn nonlinear functions between features and targets. LR and L2-LR have good interpretability because the predicted results can be calculated by a simple linear function and a sigmoid transformation. RF and LGBM are also interpretable, in which they can provide the feature importance.

Experimental Setup

In this study, we used the Whole Word Masking version of BERT [45] pretrained on the Chinese Wikipedia corpus as the tagging model in the MTQA. An additional 359 annotated CT reports from our previous work were used to develop and evaluate the MTQA model. We randomly split 70% of CT reports as the training set, 10% as the validation set, and 20% as the test set. A total of 100 of these reports were each annotated by 2 biomedical informatics engineers to calculate the interannotator agreement score using the kappa score. Pipeline methods with bidirectional long short-term memory (BiLSTM) and BERT were selected as the baseline. To obtain the NLP-extracted features for LNM prediction, the MTQA model developed on the 359 reports was used to process the 794 CT reports of included patients. Subsequently, the NLP-extracted features were manually reviewed and corrected by a clinician as the gold standard features.

Univariate analysis was performed using the Mann-Whitney U test for continuous features and Pearson chi-square test for categorical features. P<.05 was considered statistically significant. To obtain robust experimental results, a 10-fold cross-validation strategy was first performed on the total data set. The 10-fold cross-validation randomly split the data set into 10 subsets. Each subset was considered as the independent test set and the remaining 9 subsets were considered as the training set. During each fold, a 5-fold cross-validation was applied on the training set to find the optimal hyperparameters for the machine learning algorithms by a grid search. When the optimal hyperparameters were selected, we retrained the prediction model on the training set and tested it on the test set to obtain the final predictive performance. Using this strategy, we can ensure that the test set is always invisible during the model training and hyperparameter tuning and obtain the predicted probability for each case. The hyperparameter spaces are as follows:

  • LR: tol ∈ {1e–3, 1e–4, 1e–5}, max_iter ∈ {500, 1000}
  • L2-LR: C ∈ {10, 1, 0.1}, tol ∈ {1e–3, 1e–4, 1e–5}, max_iter ∈ {500, 1000}
  • RF: n_estimators ∈ {50, 100, 200}, max_depth ∈ {2, 3}, min_samples_leaf ∈ {1, 2}
  • LGBM: n_estimators ∈ {50, 100, 200}, max_depth ∈ {2, 3}, num_leaves ∈ {20, 31, 50}, min_child_samples ∈ {1, 2, 3}, reg_alpha ∈ {2, 3}
  • SVM: C ∈ {10, 1, 0.1, 0.01}, kernel ∈ {‘linear,’ ‘rbf,’ ‘poly’}, tol ∈ {1e–3, 1e–4, 1e–5}
  • ANN: hidden_layer_sizes ∈ {5, 10, 30}, learning_rate ∈ {1e–2, 1e–3, 1e–4}, alpha ∈ {1e–3, 1e–4, 1e–5}

We applied the receiver operating characteristic (ROC) curve to evaluate the diagnostic performances of the machine learning models. Besides the ROC curve, we also used the precision-recall (PR) curve to test the models because the ROC curve pays attention to sensitivity and specificity but ignores precision. The mean area under the receiver operating characteristic curve (AUC) and average precision (AP) values with standard derivations were calculated based on the 10-fold cross-validation results. We also drew the ROC curves and PR curves to compare with the size criterion (maximum short axis diameter of lymph node >10 mm on CT) and the clinician’s evaluation. All LNM prediction models were developed using the Scikit-learn 0.24.1 and LightGBM 3.2.0 Python packages. All statistical analyses were conducted using SciPy 1.6.2 Python package.


Patient Characteristics

Table 2 shows the characteristics of all 794 patients. Univariate analysis was performed for all collected features, and 13.2% (105/794) of patients had pN2 LNM. Sex, age, drinking history, family history, and disease history are not significantly associated with the pN2. The pN2 occurred more frequently in smokers (P=.04). The long and short axis diameters of the tumor in pN2 patients are significantly larger than those in pN0 and pN1 patients (both P<.001). Patients with solid nodules are more likely to have pN2 (P<.001). Other morphological characteristics of tumor-like lobulation and pleural indentation are more likely to occur in pN2 patients (P=.006 and P=.003, respectively), but spiculation and vessel invasion present no significant differences between pN2 and other patients. Using 10 mm as the size criterion, the maximum long and short axis diameters of the hilar and mediastinal lymph nodes show significant differences between the 2 groups (P=.008, P<.001, P<.001, and P<.001, respectively). Among all 6 serum tumor biomarkers, carcinoembryonic antigen (CEA), carbohydrate antigen 12-5 (CA125), and neuron-specific enolase (NSE) show significant differences between the 2 groups (P<.001, P<.001, and P=.048, respectively).

Table 2. Patient characteristics.

Total (n=794)LNMa statusP value


pN2b (n=105)pN0c or pN1d (n=689)
Age (years), mean (SD)60.92 (51.48 to 70.36)60.87 (51.87 to 69.86)60.93 (51.42 to 70.44).45
Sex, n (%)e.06

Male39762335

Female39743354
Smoking history, n (%).04

Yes33755282

No45750407
Drinking history, n (%).94

Yes18325158

No61180531
Family history, n (%).32

Yes13714123

No65791566
Hypertension, n (%).18

Yes23237195

No56268494
Diabetes, n (%).25

Yes841569

No71090620
Pulmonary tuberculosis, n (%).33

Yes33231

No761103658
Cardiovascular disease, n (%).06

Yes36927

No75896662
Cerebrovascular disease, n (%).35

Yes29623

No76599666
Tumor locationf, n (%).22

RULg24927222

RMLh59455

RLLi15018132

LULj18531154

LLLk12621105

Other25421
TLAf,l, median (IQR)2.61 (1.20 to 4.01)3.02 (1.64 to 4.39)2.55 (1.15 to 3.94)<.001
TSAf,m, median (IQR)2.03 (0.88 to 3.18)2.38 (1.27 to 3.48)1.98 (0.83 to 3.13)<.001
Spiculationf, n (%).08

Yes25542213

No53963476
Lobulationf, n (%)<.001

Yes21148163

No58357526
Tumor densityf, n (%)<.001

pGGOn1240124

mGGOo96393

Solid nodule574102472
Vessel invasionf, n (%).87

Yes52646

No74299643
Pleural indentationf, n (%).001

Yes40670336

No38835353
HLNLAf,p, n (%).008

>10 mm14830118

≤10 mm64675571
HLNSAf,q, n (%)<.001

>10 mm661947

≤10 mm72886642
MLNLAf,r, n (%)<.001

>10 mm19150141

≤10 mm60355548
MLNSAf,s, n (%)<.001

>10 mm722745

≤10 mm72278644
CEAt, median (IQR)5.31 (–6.66 to 17.27)12.66 (–8.44 to 33.76)4.18 (–5.17 to 13.54)<.001
CA199u, median (IQR)14.41 (–3.24 to 32.06)15.80 (–5.08 to 36.68)14.20 (–2.90 to 31.29).47
CA125v, median (IQR)14.46 (0.03 to 28.90)19.88 (–5.56 to 45.32)13.64 (1.96 to 25.32)<.001
NSEw, median (IQR)15.81 (8.85 to 22.78)16.26 (10.19 to 22.33)15.75 (8.66 to 22.83).048
Cyfra211x, median (IQR)3.20 (–0.23 to 6.62)3.55 (–0.64 to 7.75)3.14 (–0.15 to 6.43).06
SCCAgy, median (IQR)0.96 (–0.16 to 2.08)1.18 (–0.62 to 2.99)0.93 (–0.04 to 1.90).14

aLNM: lymph node metastasis.

bpN2: pathological N stage 2.

cpN0: pathological N stage 0.

dpN1: pathological N stage 1.

eNot applicable.

fFeatures recorded in computed tomography reports.

gRUL: right upper lobe.

hRML: right middle lobe.

iRLL: right lower lobe.

jLUL: left upper lobe.

kLLL: left lower lobe.

lTLA: tumor long axis.

mTSA: tumor short axis

npGGO: pure ground glass opacity.

omGGO: mixed ground glass opacity.

pHLNLA: hilar lymph node long axis.

qHLNSA: hilar lymph node short axis.

rMLNLA: mediastinal lymph node long axis.

sMLNSA: mediastinal lymph node short axis.

tCEA: carcinoembryonic antigen.

uCA199: carbohydrate antigen 19-9.

vCA125: carbohydrate antigen 12-5.

wNSA: neuron-specific enolase.

xCyfra211: cytokeratin 19-fragments.

ySCCAg: squamous cell carcinoma antigen.

Performance of pN2 LNM Prediction Models

As preoperative confirmed N2 indicating neoadjuvant therapy should be given before surgery, we first developed machine learning models to predict the pN2 LNM. We regarded the pN2 patients as positive and pN0 and pN1 patients as negative to train the predictive models. To obtain reliable models, we used the gold standard features instead of NLP-extracted features in this section. Table 3 shows the performances of all models. The RF model achieved the highest averaged AUC value with 0.792 and the LGBM model achieved the highest averaged AP value with 0.457 while all models’ 95% CI are overlapping with each other. The LR obtained a competitive performance in comparison with ANN and SVM. The L2-LR did not obtain improvements in AUC value and AP value compared with the LR. To compare with the size criterion and clinician’s evaluation, we used the probabilities predicted during the 10-fold cross-validation to draw the ROC and PR curves. Figure 2 shows the ROC curves and PR curves of pN2 prediction models and the results of the size criterion and clinician’s evaluation. From Figure 2 we can notice all the ROC curves and PR curves are above the points of size criterion and clinician’s evaluation, which indicates the developed pN2 prediction models not only have better discriminative ability than the diagnostic size criterion used in the clinical practice but also may exceed the clinician in pN2 LNM evaluation.

Table 3. Performances of pN2 lymph node metastasis prediction models.
ModelAUCaAPb

MeanSD95% CIMeanSD95% CI
LRc0.7780.0410.747-0.8090.4420.0750.385-0.499
L2-LRd0.7680.0380.739-0.7960.4130.0720.359-0.467
ANNe0.7690.0510.730-0.8080.4340.0950.363-0.506
SVMf0.7710.0710.718-0.8250.4530.0840.389-0.516
RFg0.7920.0420.760-0.8250.4560.0750.399-0.512
LGBMh0.7870.0440.755-0.8200.4570.1010.381-0.534

aAUC: area under the receiver operating characteristic curve.

bAP: average precision.

cLR: logistic regression.

dL2-LR: L2-logistic regression.

eANN: artificial neural network.

fSVM: support vector machine.

gRF: random forest.

hLGBM: LightGBM.

Figure 2. The receiver operating characteristic curve (A) and precision-recall curves (B) of pN2 prediction models.
View this figure

Performance of pN1&N2 LNM Prediction Models

Besides predicting pN2 LNM, we also developed machine learning models to predict the pN1&N2 LNM by regarding patients with pN1 or pN2 LNM as positive. The model training and evaluation processes are the same as pN2 LNM prediction. Table 4 shows the performances of the machine learning models for pN1&N2 LNM prediction. LGBM obtained the highest averaged AUC value with 0.771. The RF model achieved a comparable performance in comparison with LGBM. As in pN2 prediction, LGBM and RF obtained better predictive performances than other models. Figure 3 shows the ROC curves and PR curves of pN1&N2 LNM prediction models. The curves of the machine learning models are also all above the points of the size criterion and clinician’s evaluation.

Table 4. Performances of pN1&N2 lymph node metastasis prediction models.
ModelAUCaAPb

MeanSD95% CIMeanSD95% CI
LRc0.7400.0350.714-0.7660.4670.0580.423-0.510
L2-LRd0.7360.0440.704-0.7690.4650.0580.422-0.509
ANNe0.7340.0470.698-0.7700.4790.0870.413-0.545
SVMf0.7350.0230.717-0.7520.4740.0470.439-0.509
LGBMg0.7680.0300.745-0.7910.5240.0440.491-0.557
RFh0.7710.0260.752-0.7910.5240.0570.481-0.567

aAUC: area under the receiver operating characteristic curve.

bAP: average precision.

cLR: logistic regression.

dL2-LR: L2-logistic regression.

eANN: artificial neural network.

fSVM: support vector machine.

gRF: random forest.

hLGBM: LightGBM.

Figure 3. The receiver operating characteristic curve (A) and precision-recall curves (B) of pN1&N2 prediction models.
View this figure

Feature Importance

Among all machine learning models, the LR, L2-LR, RF, and LGBM can provide the feature importance. Table 5 shows the top 10 important features of LR, L2-LR, RF, and LGBM for pN2 LNM prediction. The features were ranked by averaging the weights of models developed from 10-fold cross validation. Note that the LR and L2-LR models provide weights with signs, so we used the absolute values to rank the features. Because the weight magnitudes from different models vary greatly, we used the averaged rankings of features, but not the averaged weights, to find the most important features among the 4 types of models. The CEA is ranked as the most important feature to increase the risk of pN2 LNM by all models. Features recorded in CT reports account for at least half of the top 10 important features, indicating these features are of great importance for pN2 LNM prediction.

Table 5. Top 10 important features for pN2 lymph node metastasis prediction.
RankLRaL2-LRbRFcLGBMdAll

FeatureWeightFeatureWeightFeatureWeightFeatureWeight
1pGGOe,f–10.383CEAg3.530CEA0.229CEA46.0CEA
2CEA6.010CA125h3.067CA1250.094Age23.3Solid nodulef
3CA1254.728pGGOf–1.799Solid nodulef0.094Solid nodulef18.8CA125
4Solid nodulef3.683Solid nodulef1.773MLNSAf,i0.073TLAf,j17.6Age
5TLAf–2.701Age–1.315MLNLAf,k0.072TSAf,l15.1MLNLAf
6Age–1.908SCCAgm0.944TLAf0.054CA12513.3TLAf
7SCCAg1.763MLNLAf0.896TSAf0.048Cyfra211n12.9pGGOf
8mGGOf,o1.759Pleural indentationf0.836Cyfra2110.038NSEp12.7SCCAg
9RMLf,q–1.729Cardiovascular disease0.807SCCAg0.037MLNLAf11.6Lobulationf
10TSAf1.601Lobulationf0.725Lobulationf0.036SCCAg9.0TSAf

aLR: logistic regression.

bL2-LR: L2-logistic regression.

cRF: random forest.

dLGBM: LightGBM.

epGGO: pure ground glass opacity.

fFeatures recorded in computed tomography reports.

gCEA: carcinoembryonic antigen.

hCA125: carbohydrate antigen 12-5.

iMLNSA: mediastinal lymph node short axis.

jTLA: tumor long axis.

kMLNLA: mediastinal lymph node long axis.

lTSA: tumor short axis.

mSCCAg: squamous cell carcinoma antigen.

nCyfra211: cytokeratin 19-fragments.

omGGO: mixed ground glass opacity.

pNSE: neuron-specific enolase.

qRML: right middle lobe.

NLP-Extracted Features Versus Gold Standard Features

In this study, we applied the MTQA model to extract important features from CT reports to support the development of LNM prediction models. In this section, we first conduct experiments to explore the effectiveness of the MTQA model on feature extraction and then analyze the influence of imperfect extraction results on LNM prediction.

We used an additional 359 annotated CT reports to develop the MTQA model. The interannotator agreement score was 0.937 based on the 100 reports annotated by 2 annotators. Table 6 shows the performances of the MTQA model and the pipeline models on the test set. We can notice that the BERT-MTQA model achieved significant improvement compared with the pipeline models.

Table 7 illustrates the performance of the BERT-MTQA model on the 794 CT reports of included patients. We can notice that the accuracy values of all extracted features are higher than 0.90. The F1 scores are higher than 0.90 except for lobulation, tumor density, vessel invasion, and hilar lymph node long axis. For the NLP-extracted features ranked in the top 10 important features, the mediastinal lymph node long axis (MLNLA), tumor long axis (TLA), and tumor short axis (TSA) obtained good accuracy values and F1 scores, but the F1 scores of tumor density and lobulation are not higher than 0.90.

In this study, the MTQA model generates imperfect extractions, which may influence the subsequent application. To analyze the influence on the pN2 LNM prediction, we calculated the Pearson correlation between the predicted probabilities of models using NLP-extracted features and gold standard features. Moreover, we also replaced the NLP-extracted feature with the gold standard feature one by one according to their importance in Table 5 to explore the changes in the consistency. Figure 4 shows the concordance correlations of the pN2 LNM prediction models. The RF model obtained a high concordance correlation with 0.950 when using all NLP-extracted features in comparison with using gold standard features, and the correlation increased to 0.984 when replacing top 5 important NLP-extracted features. The correlation values of the LR, L2-LR, LGBM, and SVM models were more influenced by using the NLP-extracted features. With the replacement of gold standard features, the correlation values gradually increased and exceeded 0.950. The ANN model did not achieve a good concordance correlation even when the top 5 important NLP-extracted features were replaced.

Table 6. Performance of the multiturn question answering model and baseline models.
FeatureBiLSTMa-pipelineBERTb-pipelineBERT-MTQAc

PdReFfPRFPRF
Tumor density0.8820.6250.7320.8890.6670.7620.9380.9380.938
MLNLAg1.0000.6400.7801.0000.7200.8371.0000.9600.980
TLAh0.9670.8920.9280.9840.9380.9610.9840.9540.969
Lobulation0.8890.5330.6670.9090.6670.7691.0000.8670.929
TSAi0.9670.8920.9280.9840.9380.9610.9840.9540.969
MLNSAj1.0000.7500.8571.0000.7500.8571.0000.9380.968
Pleural indentation0.9310.8180.8710.9640.8180.8851.0000.8480.918
Tumor location0.9840.8970.9380.9680.8970.9310.9850.9850.985
Spiculation1.0000.7270.8421.0000.7730.8721.0001.0001.000
Vessel invasion1.0000.1110.2001.0000.2220.3641.0000.5560.714
HLNLAk1.0000.7780.8751.0000.8330.9091.0001.0001.000
HLNSAl1.0000.7500.8571.0000.7500.8571.0001.0001.000
Average0.9680.7010.7900.9750.7480.8300.9910.9170.948

aBiLSTM: bidirectional long short-term memory.

bBERT: bidirectional encoder representations from transformers.

cMTQA: multiturn question answering.

dP: precision.

eR: recall.

fF: F1 score.

gMLNLA: mediastinal lymph node long axis.

hTLA: tumor long axis.

iTSA: tumor short axis.

jMLNSA: mediastinal lymph node short axis.

kHLNLA: hilar lymph node long axis.

lHLNSA: hilar lymph node short axis.

Table 7. Performance of the multiturn question answering model for feature extraction.
FeatureAccuracyPrecisionRecallF1 score
Tumor density0.9400.8750.9150.893
MLNLAa0.9650.9270.9270.927
TLAb0.9740.9740.9740.974
Lobulation0.9230.9930.7160.832
TSAc0.9720.9720.9720.972
MLNSAd0.9860.9180.9310.924
Pleural indentation0.9170.9030.9380.920
Tumor location0.9940.9900.9900.990
Spiculation0.9790.9880.9450.966
Vessel invasion0.9820.9320.7880.854
HLNLAe0.9651.0000.8110.896
HLNSAf0.9860.9820.8480.911

aMLNLA: mediastinal lymph node long axis.

bTLA: tumor long axis.

cTSA: tumor short axis.

dMLNSA: mediastinal lymph node short axis.

eHLNLA: hilar lymph node long axis.

fHLNSA: hilar lymph node short axis.

Figure 4. Concordance correlation values between pN2 prediction models using complete and partial gold standard features. LR: logistic regression; L2-LR: L2-logistic regression; RF: random forest; LGBM: LightGBM; SVM: support vector machine; ANN: artificial neural network: NLP: natural language processing; pGGO: pure ground glass opacity; MLNLA: mediastinal lymph node long axis; TLA: tumor long axis; TSA: tumor short axis.
View this figure

Principal Findings

In this study, we explored the feasibility of using EMR to develop machine learning models to predict LNM for patients with NSCLC. The important features about the primary tumor and lymph nodes were extracted from the CT reports using NLP technique to support the model development. To the best of our knowledge, this is the first study to use NLP technique to extract features to build preoperative LNM prediction models for patients with NSCLC. Experimental results indicate that the RF model achieved the best performances with 0.792 AUC value and 0.456 AP value for pN2 LNM prediction. All machine learning models outperformed the size criterion and clinician’s evaluation.

Among all models, the LR, L2-LR, RF, and LGBM provide the feature importance to show the connections between the patient features and LNM status. CEA, tumor density, CA125, MLNLA, TLA, lobulation, and TSA were ranked in the top 10 important features by the machine learning models, which was consistent with the results of univariate analysis. Squamous cell carcinoma antigen (SCCAg) was also identified as a top 10 important feature by the models, although univariate analysis did not show significance. However, SCCAg has been proved to be associated with LNM in esophageal squamous cell carcinoma [46], anus squamous cell carcinoma [47], oral-cavity squamous cell carcinoma [48], and cervical squamous cell carcinoma [49]. It is also a poor prognostic factor of lung squamous cell carcinoma and upgrading the patient stage is recommended [50,51]. Surprisingly, TLA was identified as an important feature with negative weight by the LR model, which means the longer the TLA is, the lower the risk of pN2 LNM the patient may have. The result is contrary to the result of univariate analysis and may be caused by multicollinearity or interactions between the features [52]. In the L2-LR model, the TLA was not ranked in the top 10 important features, indicating the L2 regularization can indeed reduce the influence of multicollinearity and improve the interpretability of the model [53]. In addition, other features like right middle lobe cardiovascular disease also suffered interpretability problems, which may be hard to accept in clinical practice. Therefore, more robust interpretable machine learning algorithms are needed to make accurate predictions while giving more reasonable explanations.

In this study, we innovatively extracted features from CT reports and used them to develop LNM prediction models. The concordance correlations between the predicted probabilities of models using NLP-extracted features, partially NLP-extracted features, and gold standard features indicate that the automatically developed models can obtain similar predictive results to those of models using gold standard features. This finding implicitly indicates it is possible to build models using a large amount of unstructured data and update them automatically. More importantly, it can also reduce the burden of manual feature extraction to improve the usability of the prediction models in clinical practice.

Limitations

Although the experimental results show that machine learning models using CT reports, demographic information, medical history, and biomarker data can achieve better performances than the size criterion and clinician’s evaluation on the collected data, external validation is still needed to further prove the effectiveness and generalization of the NLP and LNM prediction models. Note that the writing styles of CT reports from different medical centers may vary greatly, which poses a huge challenge to the NLP model developed using the CT reports from a single medical center. Transfer learning is a proper strategy to solve the problem by fine-tuning the model to adapt to CT reports from other centers. Overall, multicenter data is necessary to develop a more robust and generalizable NLP and LNM prediction model.

Furthermore, many studies have proved that there are deep features or radiomics features related to LNM in the CT images [54-60]. Clinicians cannot recognize these with the naked eye, so these features may provide extra information about the metastasis status. In the future, we will extract the image features and combine them with the features in this study to develop more robust, accurate multimodal LNM prediction models.

Conclusions

In this study, we used NLP and machine learning methods to develop the LNM prediction models for patients with NSCLC using EMRs. The RF model achieved the best performance with 0.792 AUC value and 0.456 AP value for pN2 prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 prediction. All machine learning models outperformed the size criterion and clinician’s evaluation. Furthermore, the experimental results indicate that the NLP model can effectively extract features from CT reports to support the automatic development and update of the LNM prediction model and may facilitate the application of models in clinical practice.

Acknowledgments

The publication of this paper was funded by grant 2018YFC0910700 from the National Key Research and Development Program of China.

Authors' Contributions

DH, SL, XL, and NW conceptualized the study. SL acquired the clinical data. DH and HZ designed and implemented the algorithms and conducted the experiments. DH, HZ, and SL analyzed the experimental results. DH wrote the manuscript with revision assistance from SL, XL, and NW. All authors have read and approved the manuscript.

Conflicts of Interest

None declared.

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021 Feb 04:1 [FREE Full text] [CrossRef] [Medline]
  2. Cancer facts and figures 2021. American Cancer Society.   URL: https:/​/www.​cancer.org/​research/​cancer-facts-statistics/​all-cancer-facts-figures/​cancer-facts-figures-2021.​html [accessed 2021-07-14]
  3. Ettinger D, Wood D, Aisner D, Akerley W, Bauman J, Chirieac L, et al. Non-Small Cell Lung Cancer, Version 5.2017, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2017 Apr;15(4):504-535 [FREE Full text] [CrossRef] [Medline]
  4. Hu D, Li S, Huang Z, Wu N, Lu X. Predicting postoperative non-small cell lung cancer prognosis via long short-term relational regularization. Artif Intell Med 2020 Jul;107:101921. [CrossRef] [Medline]
  5. Detterbeck FC, Boffa DJ, Kim AW, Tanoue LT. The Eighth Edition Lung Cancer Stage Classification. Chest 2017 Jan;151(1):193-203. [CrossRef] [Medline]
  6. Silvestri GA, Gonzalez AV, Jantz MA, Margolis ML, Gould MK, Tanoue LT, et al. Methods for staging non-small cell lung cancer: diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013 May;143(5 Suppl):e211S-e250S. [CrossRef] [Medline]
  7. Schmidt-Hansen M, Baldwin DR, Zamora J. FDG-PET/CT imaging for mediastinal staging in patients with potentially resectable non-small cell lung cancer. JAMA 2015 Apr 14;313(14):1465-1466. [CrossRef] [Medline]
  8. Zhang C, Song Q, Zhang L, Wu X. Development of a nomogram for preoperative prediction of lymph node metastasis in non-small cell lung cancer: a SEER-based study. J Thorac Dis 2020 Jul;12(7):3651-3662 [FREE Full text] [CrossRef] [Medline]
  9. Navani N, Fisher DJ, Tierney JF, Stephens RJ, Burdett S, NSCLC Meta-analysis Collaborative Group. The accuracy of clinical staging of stage I-IIIa non-small cell lung cancer: an analysis based on individual participant data. Chest 2019 Mar;155(3):502-509 [FREE Full text] [CrossRef] [Medline]
  10. Lv X, Wu Z, Cao J, Hu Y, Liu K, Dai X, et al. A nomogram for predicting the risk of lymph node metastasis in T1-2 non-small-cell lung cancer based on PET/CT and clinical characteristics. Transl Lung Cancer Res 2021 Jan;10(1):430-438 [FREE Full text] [CrossRef] [Medline]
  11. Chen K, Yang F, Jiang G, Li J, Wang J. Development and validation of a clinical prediction model for N2 lymph node metastasis in non-small cell lung cancer. Ann Thorac Surg 2013 Nov;96(5):1761-1768. [CrossRef] [Medline]
  12. Miao H, Shaolei L, Nan L, Yumei L, Shanyuan Z, Fangliang L, et al. Occult mediastinal lymph node metastasis in FDG-PET/CT node-negative lung adenocarcinoma patients: risk factors and histopathological study. Thorac Cancer 2019 Jun;10(6):1453-1460 [FREE Full text] [CrossRef] [Medline]
  13. Verdial FC, Madtes DK, Hwang B, Mulligan MS, Odem-Davis K, Waworuntu R, et al. Prediction model for nodal disease among patients with non-small cell lung cancer. Ann Thorac Surg 2019 Jun;107(6):1600-1606 [FREE Full text] [CrossRef] [Medline]
  14. Shafazand S, Gould MK. A clinical prediction rule to estimate the probability of mediastinal metastasis in patients with non-small cell lung cancer. J Thorac Oncol 2006 Nov;1(9):953-959 [FREE Full text] [Medline]
  15. Farjah F, Lou F, Sima C, Rusch VW, Rizk NP. A prediction model for pathologic N2 disease in lung cancer patients with a negative mediastinum by positron emission tomography. J Thorac Oncol 2013 Sep;8(9):1170-1180 [FREE Full text] [CrossRef] [Medline]
  16. Song C, Kimura D, Sakai T, Tsushima T, Fukuda I. Novel approach for predicting occult lymph node metastasis in peripheral clinical stage I lung adenocarcinoma. J Thorac Dis 2019 Apr;11(4):1410-1420 [FREE Full text] [CrossRef] [Medline]
  17. Yim W, Yetisgen M, Harris WP, Kwan SW. Natural language processing in oncology: a review. JAMA Oncol 2016 Jun 01;2(6):797-804. [CrossRef] [Medline]
  18. Garg AX, Adhikari NKJ, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA 2005 Mar 9;293(10):1223-1238. [CrossRef] [Medline]
  19. Monteiro M, Fonseca AC, Freitas AT, Pinho E Melo T, Francisco AP, Ferro JM, et al. Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Trans Comput Biol Bioinform 2018;15(6):1953-1959. [CrossRef] [Medline]
  20. Desai RJ, Wang SV, Vaduganathan M, Evers T, Schneeweiss S. Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw Open 2020 Jan 03;3(1):e1918962 [FREE Full text] [CrossRef] [Medline]
  21. Ali F, El-Sappagh S, Islam S, Kwak D, Ali A, Imran M. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 2020;63:208-222 [FREE Full text] [CrossRef]
  22. Datta S, Bernstam EV, Roberts K. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019 Dec;100:103301 [FREE Full text] [CrossRef] [Medline]
  23. Si Y, Roberts K. A frame-based NLP system for cancer-related information extraction. AMIA Annu Symp Proc 2018;2018:1524-1533 [FREE Full text] [Medline]
  24. Yim W, Denman T, Kwan SW, Yetisgen M. Tumor information extraction in radiology reports for hepatocellular carcinoma patients. AMIA Jt Summits Transl Sci Proc 2016;2016:455-464 [FREE Full text] [Medline]
  25. Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, et al. DeepPhe: a natural language processing system for extracting cancer phenotypes from clinical records. Cancer Res 2017 Nov 01;77(21):e115-e118 [FREE Full text] [CrossRef] [Medline]
  26. Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med 2016 Jan;66:29-39 [FREE Full text] [CrossRef] [Medline]
  27. Hu D, Zhang H, Li S, Wang Y, Wu N, Lu X. Automatic extraction of lung cancer staging information from computed tomography reports: deep learning approach. JMIR Med Inform 2021 Jul 21;9(7):e27955 [FREE Full text] [CrossRef] [Medline]
  28. Zheng C, Huang BZ, Agazaryan AA, Creekmur B, Osuj TA, Gould MK. Natural language processing to identify pulmonary nodules and extract nodule characteristics from radiology reports. Chest 2021 Nov;160(5):1902-1914. [CrossRef] [Medline]
  29. Sugimoto K, Takeda T, Oh J, Wada S, Konishi S, Yamahata A, et al. Extracting clinical terms from radiology reports with deep learning. J Biomed Inform 2021 Apr;116:103729 [FREE Full text] [CrossRef] [Medline]
  30. Chen L, Song L, Shao Y, Li D, Ding K. Using natural language processing to extract clinically useful information from Chinese electronic medical records. Int J Med Inform 2019 Apr;124:6-12. [CrossRef] [Medline]
  31. Martinez D, Pitson G, MacKinlay A, Cavedon L. Cross-hospital portability of information extraction of cancer staging information. Artif Intell Med 2014 Sep;62(1):11-21. [CrossRef] [Medline]
  32. Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, et al. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform 2017 Dec;69:177-187 [FREE Full text] [CrossRef] [Medline]
  33. Bozkurt S, Lipson JA, Senol U, Rubin DL. Automatic abstraction of imaging observations with their characteristics from mammography reports. J Am Med Inform Assoc 2015 Apr;22(e1):e81-e92. [CrossRef] [Medline]
  34. Bozkurt S, Gimenez F, Burnside ES, Gulkesen KH, Rubin DL. Using automatically extracted information from mammography reports for decision-support. J Biomed Inform 2016 Aug;62:224-231 [FREE Full text] [CrossRef] [Medline]
  35. Sui X, Liu T, Huang Q, Hou Y, Wang Y, Kang G, et al. P2.09-29 Automatic lung cancer staging from medical reports using natural language processing. J Thor Oncol 2018 Oct;13(10):S772. [CrossRef]
  36. Yuan Q, Cai T, Hong C, Du M, Johnson BE, Lanuti M, et al. Performance of a machine learning algorithm using electronic health record data to identify and estimate survival in a longitudinal cohort of patients with lung cancer. JAMA Netw Open 2021 Jul 01;4(7):e2114723 [FREE Full text] [CrossRef] [Medline]
  37. Li X, Yin F, Sun Z, Li X, Yuan A, Chai D, et al. Entity-relation extraction as multi-turn question answering. 2019 Presented at: Proc 57th Annu Meet Assoc Comput Linguist; 2019; Florence p. 1340-1350. [CrossRef]
  38. Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Arxiv. Preprint posted online Oct 10, 2018 2018:1 [FREE Full text]
  39. Hosmer D, Lemeshow S, Sturdivant R. Applied Logistic Regression. 3rd ed. Hoboken: John Wiley & Sons; 2013.
  40. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970 Feb;12(1):55-67. [CrossRef]
  41. Breiman L. Random forests. Mach Learn 2001;45(1):5-32 [FREE Full text] [CrossRef]
  42. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. 2017 Presented at: 31st Conf Neural Inf Process Syst (NIPS 2017); 2017; Long Beach   URL: https://papers.nips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
  43. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995 Sep;20(3):273-297. [CrossRef]
  44. Jain A, Mao J, Mohiuddin K. Artificial neural networks: a tutorial. Computer (Long Beach Calif) 1996;29(3):31-44. [CrossRef]
  45. Cui Y, Che W, Liu T, Qin B, Yang Z. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Process 2021;29:3504-3514. [CrossRef]
  46. Shimada H, Nabeya Y, Okazumi S, Matsubara H, Shiratori T, Gunji Y, et al. Prediction of survival with squamous cell carcinoma antigen in patients with resectable esophageal squamous cell carcinoma. Surgery 2003 May;133(5):486-494. [CrossRef] [Medline]
  47. Williams M, Swampillai A, Osborne M, Mawdsley S, Hughes R, Harrison M, Mount Vernon Colorectal Cancer Network. Squamous cell carcinoma antigen: a potentially useful prognostic marker in squamous cell carcinoma of the anal canal and margin. Cancer 2013 Jul 01;119(13):2391-2398 [FREE Full text] [CrossRef] [Medline]
  48. Lin W, Chen I, Wei F, Huang J, Kang C, Hsieh L, et al. Clinical significance of preoperative squamous cell carcinoma antigen in oral-cavity squamous cell carcinoma. Laryngoscope 2011 May;121(5):971-977. [CrossRef] [Medline]
  49. Xu D, Wang D, Wang S, Tian Y, Long Z, Ren X. Correlation between squamous cell carcinoma antigen level and the clinicopathological features of early-stage cervical squamous cell carcinoma and the predictive value of squamous cell carcinoma antigen combined with computed tomography scan for lymph node metastasis. Int J Gynecol Cancer 2017 Nov;27(9):1935-1942. [CrossRef] [Medline]
  50. Kinoshita T, Ohtsuka T, Yotsukura M, Asakura K, Goto T, Kamiyama I, et al. Prognostic impact of preoperative tumor marker levels and lymphovascular invasion in pathological stage I adenocarcinoma and squamous cell carcinoma of the lung. J Thorac Oncol 2015 Apr;10(4):619-628 [FREE Full text] [CrossRef] [Medline]
  51. Kinoshita T, Ohtsuka T, Hato T, Goto T, Kamiyama I, Tajima A, et al. Prognostic factors based on clinicopathological data among the patients with resected peripheral squamous cell carcinomas of the lung. J Thorac Oncol 2014 Dec;9(12):1779-1787 [FREE Full text] [CrossRef] [Medline]
  52. Tolles J, Meurer WJ. Logistic regression: relating patient characteristics to outcomes. JAMA 2016 Aug 02;316(5):533-534. [CrossRef] [Medline]
  53. Marquardt DW, Snee RD. Ridge regression in practice. Am Statistician 1975 Feb;29(1):3-20. [CrossRef]
  54. Gu Y, She Y, Xie D, Dai C, Ren Y, Fan Z, et al. A texture analysis-based prediction model for lymph node metastasis in stage Ia lung adenocarcinoma. Ann Thorac Surg 2018 Jul;106(1):214-220. [CrossRef] [Medline]
  55. Hosny A, Parmar C, Quackenbush J, Schwartz LH. Artificial intelligence in radiology. Nat Rev Cancer 2018 Dec;18(8):500-510 [FREE Full text] [CrossRef] [Medline]
  56. Cong M, Feng H, Ren J, Xu Q, Cong L, Hou Z, et al. Development of a predictive radiomics model for lymph node metastases in pre-surgical CT-based stage IA non-small cell lung cancer. Lung Cancer 2020 Jan;139:73-79 [FREE Full text] [CrossRef] [Medline]
  57. Zhao X, Wang X, Xia W, Li Q, Zhou L, Li Q, et al. A cross-modal 3D deep learning for accurate lymph node metastasis prediction in clinical stage T1 lung adenocarcinoma. Lung Cancer 2020 Jul;145:10-17. [CrossRef] [Medline]
  58. Wang X, Nan W, Yan S, Li Q, Guo N, Guo Z. MA05.11 radiomics analysis using SVM predicts mediastinal lymph nodes status of squamous cell lung cancer by pre-treatment chest CT scan. J Thor Oncol 2018 Oct;13(10):S374. [CrossRef]
  59. He L, Huang Y, Yan L, Zheng J, Liang C, Liu Z. Radiomics-based predictive risk score: a scoring system for preoperatively predicting risk of lymph node metastasis in patients with resectable non-small cell lung cancer. Chin J Cancer Res 2019 Aug;31(4):641-652 [FREE Full text] [CrossRef] [Medline]
  60. Yoo J, Cheon M, Park YJ, Hyun SH, Zo JI, Um S, et al. Machine learning-based diagnostic method of pre-therapeutic F-FDG PET/CT for evaluating mediastinal lymph nodes in non-small cell lung cancer. Eur Radiol 2021 Jun;31(6):4184-4194. [CrossRef] [Medline]


ANN: artificial neural network
AP: average precision
AUC: area under the receiver operating characteristic curve
BERT: bidirectional encoder representations from transformers
BiLSTM: bidirectional long short-term memory
BI-RADS: breast imaging-reporting and data system
CA125: carbohydrate antigen 12-5
CEA: carcinoembryonic antigen
cN: clinical N stage
EMR: electronic medical record
LGBM: LightGBM
LNM: lymph node metastasis
LR: logistic regression
L2-LR: L2-logistic regression
MLNLA: mediastinal lymph node long axis
MTQA: multiturn question answering
NLP: natural language processing
NSCLC: non–small cell lung cancer
NSE: neuron-specific enolase
PET-CT: positron emission tomography–computed tomography
pN: pathological N stage
PR: precision-recall curve
RF: random forest
ROC: receiver operating characteristic curve
SCCAg: squamous cell carcinoma antigen
SUVmax: maximum standardized uptake value
SVM: support vector machine
TLA: tumor long axis
TSA: tumor short axis


Edited by C Lovis; submitted 22.12.21; peer-reviewed by YH Kim, V Rajan; comments to author 27.03.22; revised version received 31.03.22; accepted 11.04.22; published 25.04.22

Copyright

©Danqing Hu, Shaolei Li, Huanyao Zhang, Nan Wu, Xudong Lu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.04.2022.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.