<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="research-article"><front><journal-meta><journal-id journal-id-type="nlm-ta">JMIR Med Inform</journal-id><journal-id journal-id-type="publisher-id">medinform</journal-id><journal-id journal-id-type="index">7</journal-id><journal-title>JMIR Medical Informatics</journal-title><abbrev-journal-title>JMIR Med Inform</abbrev-journal-title><issn pub-type="epub">2291-9694</issn><publisher><publisher-name>JMIR Publications</publisher-name><publisher-loc>Toronto, Canada</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">v12i1e52896</article-id><article-id pub-id-type="doi">10.2196/52896</article-id><article-categories><subj-group subj-group-type="heading"><subject>Original Paper</subject></subj-group></article-categories><title-group><article-title>Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study</article-title></title-group><contrib-group><contrib contrib-type="author"><name name-style="western"><surname>Ghasemi</surname><given-names>Peyman</given-names></name><degrees>MSc</degrees><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref></contrib><contrib contrib-type="author" corresp="yes"><name name-style="western"><surname>Lee</surname><given-names>Joon</given-names></name><degrees>PhD</degrees><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff3">3</xref><xref ref-type="aff" rid="aff4">4</xref><xref ref-type="aff" rid="aff5">5</xref></contrib></contrib-group><aff id="aff1"><institution>Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary</institution>, <addr-line>Calgary</addr-line><addr-line>AB</addr-line>, <country>Canada</country></aff><aff id="aff2"><institution>Department of Biomedical Engineering, University of Calgary</institution>, <addr-line>Calgary</addr-line><addr-line>AB</addr-line>, <country>Canada</country></aff><aff id="aff3"><institution>Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary</institution>, <addr-line>Calgary</addr-line><addr-line>AB</addr-line>, <country>Canada</country></aff><aff id="aff4"><institution>Department of Community Health Sciences, Cumming School of Medicine, University of Calgary</institution>, <addr-line>Calgary</addr-line><addr-line>AB</addr-line>, <country>Canada</country></aff><aff id="aff5"><institution>Department of Preventive Medicine, School of Medicine, Kyung Hee University</institution>, <addr-line>Seoul</addr-line>, <country>Republic of Korea</country></aff><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Lovis</surname><given-names>Christian</given-names></name></contrib></contrib-group><contrib-group><contrib contrib-type="reviewer"><name name-style="western"><surname>El-Hafeez</surname><given-names>Tarek Abd</given-names></name></contrib><contrib contrib-type="reviewer"><name name-style="western"><surname>Wang</surname><given-names>Tongnian</given-names></name></contrib></contrib-group><author-notes><corresp>Correspondence to Joon Lee, PhD, Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, T2N 4Z6, AB, Canada, 1 403 220 2968; <email>joon.lee@ucalgary.ca</email></corresp></author-notes><pub-date pub-type="collection"><year>2024</year></pub-date><pub-date pub-type="epub"><day>26</day><month>7</month><year>2024</year></pub-date><volume>12</volume><elocation-id>e52896</elocation-id><history><date date-type="received"><day>19</day><month>09</month><year>2023</year></date><date date-type="rev-recd"><day>06</day><month>06</month><year>2024</year></date><date date-type="accepted"><day>08</day><month>06</month><year>2024</year></date></history><copyright-statement>&#x00A9;Peyman Ghasemi, Joon Lee. Originally published in JMIR Medical Informatics (<ext-link ext-link-type="uri" xlink:href="https://medinform.jmir.org">https://medinform.jmir.org</ext-link>), 26.7.2024. </copyright-statement><copyright-year>2024</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://medinform.jmir.org/">https://medinform.jmir.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://medinform.jmir.org/2024/1/e52896"/><abstract><sec><title>Background</title><p>The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the &#x201C;curse of dimensionality&#x201D; and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems.</p></sec><sec><title>Objective</title><p>The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients.</p></sec><sec sec-type="methods"><title>Methods</title><p>We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis.</p></sec><sec sec-type="results"><title>Results</title><p>In feature space reconstruction and mortality prediction, the concrete autoencoder&#x2013;based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (<italic>P</italic>&#x003C;.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives.</p></sec><sec sec-type="conclusions"><title>Conclusions</title><p>This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features.</p></sec></abstract><kwd-group><kwd>unsupervised feature selection</kwd><kwd>ICD-10</kwd><kwd>International Classification of Diseases</kwd><kwd>ATC</kwd><kwd>Anatomical Therapeutic Chemical</kwd><kwd>concrete autoencoder</kwd><kwd>Laplacian score</kwd><kwd>unsupervised feature selection for multicluster data</kwd><kwd>autoencoder-inspired unsupervised feature selection</kwd><kwd>principal feature analysis</kwd><kwd>machine learning</kwd><kwd>artificial intelligence</kwd><kwd>case study</kwd><kwd>coronary artery disease</kwd><kwd>artery disease</kwd><kwd>patient cohort</kwd><kwd>artery</kwd><kwd>mortality prediction</kwd><kwd>mortality</kwd><kwd>data set</kwd><kwd>interpretability</kwd><kwd>International Classification of Diseases, Tenth Revision</kwd></kwd-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Machine learning is increasingly being used in health care to analyze patient data and provide insights on improving health outcomes and the quality of care [<xref ref-type="bibr" rid="ref1">1</xref>]. With the rise of electronic health data (EHD) and entering a large amount of data per patient in hospitals, there are big opportunities to train machine learning models for a variety of applications, such as the prediction or diagnosis of diseases, outcome prediction, and treatment planning [<xref ref-type="bibr" rid="ref1">1</xref>,<xref ref-type="bibr" rid="ref2">2</xref>]. EHD are a valuable source of information on a patient, containing details on their demographics, hospital visits, medical diagnoses, physiological measurements, and treatments received [<xref ref-type="bibr" rid="ref3">3</xref>]. However, despite the opportunities offered by these large data sets, there are challenges in terms of data quality, privacy, and the complexity of medical conditions [<xref ref-type="bibr" rid="ref1">1</xref>]. In terms of machine learning, EHD can include many irrelevant and redundant features, where their direct use can lead to the &#x201C;curse of dimensionality&#x201D; as the high dimensionality of the data can make it more difficult to extract meaningful patterns and relationships [<xref ref-type="bibr" rid="ref4">4</xref>]. Therefore, it is important to apply appropriate techniques for dimensionality reduction and feature engineering to address this challenge and improve the effectiveness of predictive models built from EHD.</p><p>Feature selection is one of the critical aspects of machine learning. It involves selecting a subset of relevant features that are the most useful for predicting a target variable. In the case of medical data, these features could include patient demographics, medical history, laboratory test results, and diagnosis codes [<xref ref-type="bibr" rid="ref3">3</xref>]. Feature selection is essential because it can help improve the accuracy and performance of machine learning models by reducing the number of irrelevant or redundant features and avoiding overfitting [<xref ref-type="bibr" rid="ref4">4</xref>]. Unsupervised feature selection is a type of feature selection method that is used when there is no target variable available to guide the selection of features. Unlike supervised feature selection, which chooses features that better predict a certain target variable, unsupervised feature selection methods rely on the intrinsic structure of the data to identify the most important features. This behavior helps the selected features to be unbiased and perform well when there are no labeled data. It can also reduce the risk of overfitting to a certain target variable and ensure robustness to new target variables [<xref ref-type="bibr" rid="ref5">5</xref>]. This is an important advantage in health care, where collecting labeled data is usually difficult and the same data are often used to predict multiple target variables.</p><p>Generally, there are 3 main categories of feature selection methods: filter, wrapper, and embedded methods. Filter methods use statistical tests such as variance to rank individual features within a data set and select the features that maximize the desired criteria. However, they usually lack the ability to consider the interactions between features [<xref ref-type="bibr" rid="ref6">6</xref>]. Wrapper methods, on the other hand, select features that optimize an objective function for a clustering algorithm. Therefore, these methods are generally specific to particular clustering algorithms and may not be suitable for use with other algorithms. Wrapper methods can detect potential relationships between features, but this often results in increased computational complexity [<xref ref-type="bibr" rid="ref5">5</xref>]. Embedded methods also take into account feature relationships but generally do so more efficiently by incorporating feature selection into the learning phase of another algorithm. Lasso regularization is one of the well-known embedded methods that can be applied to a variety of machine learning models [<xref ref-type="bibr" rid="ref6">6</xref>].</p><p>The <italic>International Classification of Diseases, Tenth Revision</italic> (<italic>ICD-10</italic>) is a method of classifying diseases that was created by the World Health Organization and is used internationally [<xref ref-type="bibr" rid="ref7">7</xref>]. It categorizes diseases based on their underlying cause, characteristics, symptoms, and location in the body and uses codes to represent each disease. The <italic>ICD-10</italic> system organizes thousands of codes in a hierarchical structure that includes chapters, sections, categories, and expansion codes. Within this structure, section codes and their corresponding chapter codes can be thought of as child-parent relationships, with each <italic>ICD-10</italic> code serving as a node in the classification system. The same relationship applies to categories and sections, as well as expansion codes and categories. The high number of codes in this system is one of the major challenges of using them in machine learning applications [<xref ref-type="bibr" rid="ref8">8</xref>]. It is worth noting that Canada has added or changed some codes in the lower levels according to their health care system requirements (<italic>ICD-10, Canada</italic> [<italic>ICD-10-CA</italic>]) [<xref ref-type="bibr" rid="ref9">9</xref>].</p><p>Similar to International Classification of Diseases (ICD) codes, the Anatomical Therapeutic Chemical (ATC) classification system, developed by the World Health Organization Collaborating Centre for Drug Statistics Methodology, is an international tool for the active and systematic categorization of active pharmaceutical ingredients [<xref ref-type="bibr" rid="ref10">10</xref>]. ATC codes are also structured hierarchically and are assigned based on the organ or system they impact, as well as their therapeutic, pharmacological, and chemical properties. This hierarchical system comprises 5 distinct levels, with the lower levels providing detailed information about the pharmacological subgroup and chemical substance, and the highest level representing the anatomical main group. As in the <italic>ICD-10</italic>, the ATC&#x2019;s hierarchy introduces child-parent relationships at each level.</p><p>In this research, we used 3 administrative databases comprising <italic>ICD-10</italic> and ATC codes pertaining to patients with coronary artery disease (CAD). These databases, relevant to acute care, ambulatory care, and pharmacy facilities, were used to select the most insightful codes characterizing this cohort.</p></sec><sec id="s2" sec-type="methods"><title>Methods</title><sec id="s2-1"><title>Data Set and Preprocessing</title><p>The Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease (APPROACH) registry [<xref ref-type="bibr" rid="ref11">11</xref>] is one of the most comprehensive data repositories of CAD management in the world, matching unique disease phenotypes with rich clinical information and relevant outcomes for patients in Alberta, Canada, who have undergone diagnostic cardiac catheterization or revascularization procedures. Our cohort&#x2019;s patients were selected from the APPROACH registry. These patients underwent diagnostic angiography between January 2009 and March 2019 at 1 of the following 3 hospitals in Alberta: Foothills Medical Centre, University of Alberta Hospital, and Royal Alexandra Hospital. We excluded patients with ST elevation myocardial infarction from the study to focus on nonemergency CAD.</p><p>Discharge Abstract Database (DAD), National Ambulatory Care Reporting System (NACRS), and Pharmaceutical Information Network (PIN) data for the abovementioned patients were extracted from Alberta provincial health records. The DAD contains summary hospitalization information from all acute care facilities in Alberta. The NACRS includes all visits to ambulatory care facilities (ie, emergency department, urgent care, and day surgery visits) in the province as well as some nonabstracted data from other specialty clinics. The PIN is based on a system that collects all prescription medicine dispensations from pharmacies all over Alberta.</p><p>In the DAD and NACRS, for each patient, we aggregated all <italic>ICD-10-CA</italic> codes of hospital admissions or physician visits every 3 months following the first admission date (all codes in that period are treated as 1 record&#x2019;s codes). This helps us to make sure that chronic diseases are captured more comprehensively in fewer records and reduce the effect of noisy records. We did a similar procedure for ATC codes in the PIN data set and aggregated the codes every 6 months, since most medications prescription refills did not extend beyond 6 months. We one-hot encoded the <italic>ICD-10-CA</italic> and ATC codes and their parent nodes for each record. For example, if the <italic>ICD-10-CA</italic> code &#x201C;I251&#x201D; was present, &#x201C;I25,&#x201D; &#x201C;I20-I25,&#x201D; and &#x201C;Chapter IX&#x201D; were also encoded in the one-hot table. Similarly, if the ATC code &#x201C;C07AB02&#x201D; was present, &#x201C;C07AB,&#x201D; &#x201C;C07A,&#x201D; &#x201C;C07,&#x201D; and &#x201C;C&#x201D; were also encoded. We show the number of all unique <italic>ICD-10-CA</italic> or ATC codes in the data set with <italic>N<sub>All</sub></italic>.</p><p>To validate the performance of the selected features in a real clinical problem, we pulled the mortality data of the patients enrolled in the cohort from the Vital Statistics Database and matched them with the aggregated records to determine 90-day mortality following the end of the last procedure.</p></sec><sec id="s2-2"><title>Feature Selection</title><p>The following unsupervised algorithms were used for feature selection:</p><list list-type="bullet"><list-item><p>Concrete autoencoder (CAE) [<xref ref-type="bibr" rid="ref6">6</xref>]: In this method, continuous relaxation of discrete random (concrete) variables [<xref ref-type="bibr" rid="ref12">12</xref>] and the Gumbel-Softmax reparameterization trick are used to construct a special layer in the neural network to transform discrete random variables into continuous ones, which allows for efficient computation and optimization using gradient-based methods. The reparameterization trick allows the use of a softmax function in this layer, which is differentiable, unlike the argmax function. This characteristic is useful for designing an autoencoder, in which features are selected in the concrete layer (as the encoder) through the softmax operation and a common neural network (as the decoder) is used to reconstruct the main feature space out of the selected features. During the training, a temperature parameter can be gradually decreased, allowing the concrete selector layer to try different features in the initial epochs and behave more similar to an argmax function in the last epochs to keep the best features. After training, we can use an argmax function on the weights to find the features passed to the neurons of the encoder layer. One of the major problems of this method is that it may converge to a solution where some duplicate features are selected in some neurons (ie, fewer than the desired number of features are selected).</p></list-item><list-item><p>Autoencoder-inspired unsupervised feature selection (AEFS) [<xref ref-type="bibr" rid="ref13">13</xref>]: This method combines autoencoder and group lasso tasks by applying an L<sub>1,2</sub> regularization on the weights of the autoencoder. The autoencoder in this method tries to map the inputs to a latent space and then reconstruct the inputs from that space. The L<sub>1,2</sub> regularization will optimize the weights (change them toward 0) to select a smaller number of features. The neural network structure of the autoencoder will enable the model to incorporate both linear and nonlinear behavior of the data in the results. After training this neural network, the features with higher weight values in the first layer can be selected as the most informative features. The authors claimed that this algorithm showed promising results in computer vision tasks.</p></list-item><list-item><p>Principal feature analysis (PFA) [<xref ref-type="bibr" rid="ref14">14</xref>]: This method selects features based on principal component analysis (PCA). The most important features are selected by applying a <italic>k</italic>-means clustering algorithm to the components of PCA and finding the features dominating each cluster (closest to the mean of the cluster). This algorithm is primarily designed for computer vision.</p></list-item><list-item><p>Unsupervised feature selection for multicluster data (MCFS) [<xref ref-type="bibr" rid="ref15">15</xref>]: This approach prioritizes the preservation of the multicluster structure of data. The algorithm involves constructing a nearest neighbor graph of the features, solving a sparse eigen-problem to find top eigenvectors with regard to the smallest eigenvalues. Then, an L<sub>1</sub>-regularized least squares problem is optimized to find the linear weights between the features and the eigenvectors. This allows us to define the MCFS score as the maximum weight of each feature across different clusters and select the highest scores as the best features.</p></list-item><list-item><p>Laplacian score (LS) [<xref ref-type="bibr" rid="ref16">16</xref>]: The LS algorithm uses the nearest neighbor graph to capture the local structure of the data in the affinity matrix. For each feature, its adjusted variation is calculated by removing the feature&#x2019;s mean, normalized by a degree matrix, which itself is derived from the sum of similarities in the affinity matrix. The Laplacian matrix, essential for this calculation, is formed by subtracting the affinity matrix from the degree matrix. The significance of each feature is then assessed by the LS, which is the ratio of the feature&#x2019;s ability to preserve local information (captured by its adjusted variation&#x2019;s alignment with the Laplacian matrix) to its overall variance (measured by its alignment with the degree matrix). The lower the LS, the more relevant the feature for representing the intrinsic geometry of the data set.</p></list-item></list><p>We applied the LS, AEFS, PFA, MCFS, and CAE algorithms to a 67% training data set (split based on patients) of one-hot encoded features to select the best 100 features (<italic>N<sub>Best</sub></italic>=100) with the following specifications (we chose <italic>N<sub>Best</sub></italic> based on preliminary experimentations).</p><p>For the AEFS method, we used a single hidden-layer autoencoder and optimized the loss function as described in Han et al [<xref ref-type="bibr" rid="ref13">13</xref>], with &#x03B1;=0.001 as the trade-off parameter of the reconstruction loss and the regularization term and &#x03B2;=0.1 as the penalty parameter for the weight decay regularization. The choice of these parameters was based on preliminary experimentations on a small set of data and exploring &#x03B1; and &#x03B2; of {0.001, 0.1, 1, 1000}.</p><p>For the PFA method, we used incremental PCA instead of the normal PCA in the original paper [<xref ref-type="bibr" rid="ref14">14</xref>], with a batch size of 2<italic>N<sub>All</sub></italic> due to the high computational cost. We decomposed the data to <inline-formula><mml:math id="ieqn1"><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> components and then applied <italic>k</italic>-means clustering to find <italic>N<sub>Best</sub></italic> clusters. We also tried { <inline-formula><mml:math id="ieqn2"><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> } as the number of components of the PCA in the preliminary experiments.</p><p>To use the LS and MCFS methods for feature selection, we used the Euclidean distances between features to construct a nearest neighbor graph G based on the 5 nearest neighbors. For the LS method, we set the weights of the connected nodes of G to 1, assuming a large <italic>t</italic> in the LS formulation. Then, we computed the LS for each feature and selected the top features with higher scores. Due to the high computational resources required for the LS and MCFS methods, we did not explore different parameters and used the same settings suggested by the implementation codes of these algorithms.</p><p>As the structure of the loss function allows us to prioritize some target variables, the CAE method was applied in 2 different ways&#x2014;with and without adjusting weights for features. The reason for adjusting the weights is that since there are many correlated features in the <italic>ICD-10-CA</italic> and ATC code data sets, the model may choose one of them randomly [<xref ref-type="bibr" rid="ref3">3</xref>]. Therefore, we applied the function in <xref ref-type="disp-formula" rid="E1">equation 1</xref> as the class weights of the features to the loss function of the model:</p><disp-formula id="E1"><label>(1)</label><mml:math id="eqn1"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mfenced separators="|"><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfrac></mml:math></disp-formula><p>where <italic>W<sub>F</sub></italic> is the weight for feature <italic>F</italic> and <italic>d</italic>(<italic>F</italic>) is the depth of feature <italic>F</italic> as a node of the <italic>ICD-10-CA</italic> or ATC tree. This weight adjustment will force the model to give more importance to the features at the top of the tree and to generalize more in clinical settings. In the rest of the paper, this variant of the CAE model will be referred to as the CAE with weight adjustment (CAEWW) and the regular CAE model will be referred to as the CAE with no weight adjustment (CAENW).</p><p>We defined <italic>N<sub>Best</sub></italic> neurons in the concrete selector layer and used a learning rate of 0.001, a batch size of 64, and 1000 epochs. We also controlled the learning of the concrete selector layer by the temperature parameter that started from 20 and decreased to 0.01 exponentially (this annealing strategy was suggested by Abid et al [<xref ref-type="bibr" rid="ref6">6</xref>] for better convergence). The decoder of the CAE was a feed-forward neural network with 2 hidden layers with 64 neurons and used a sigmoid activation function for the output layer and a leaky rectified linear unit activation function for the other layers. The learning rate, number of neurons, and the layers were determined based on preliminary experiments for the fastest convergence of the autoencoder.</p></sec><sec id="s2-3"><title>Evaluation of Selected Features: Reconstruction of Initial Feature Space</title><p>To evaluate the effectiveness of the selected features, we trained a simple feed-forward neural network model using the chosen features to reconstruct the original feature space for each data set separately. The neural network consisted of 2 hidden layers, each with 64 neurons, and used leaky rectified linear unit activation functions, with a 10% dropout rate, in the hidden layers and a sigmoid activation function in the output layer. We trained the model using the same training set used in the feature selection step and evaluated its performance on the remaining 33% test set using binary cross entropy. We also calculated the accuracy of each feature selection method to determine which method produced the most accurate results. One of the challenges in comparing models with a large number of targets is that the accuracy values are inflated, because most of the targets are heavily imbalanced (ie, most of them were 0s) and the models were able to predict them easily. To circumvent this issue, we used a 2-tailed <italic>t</italic> test analysis and compared the accuracy values of the classes with the accuracy of a baseline model that simply outputs the mode of the training data for each class regardless of the input.</p></sec><sec id="s2-4"><title>Evaluation of Selected Features: Prediction of 90-Day Mortality</title><p>To demonstrate the utility of using unsupervised feature selection methods in a supervised setting, we conducted a case study where we used the selected features from each method to predict 90-day mortality following the end of the last procedure for each data set separately. Since our data sets were highly imbalanced, with only ~6%, ~2%, and ~1% of the aggregated records for the DAD, NACRS, and PIN data set, respectively, leading to 90-day mortality, we upsampled the minority class using random sampling to balance the training sets. We then trained extreme gradient boosting (XGBoost) models using the training sets with 5-fold cross-validation to tune the hyperparameters for each model. We used the best models to predict the binary outcome variables on the test sets and measured their performances. XGBoost was selected for its efficiency with sparse data, which was crucial for our data sets. XGBoost&#x2019;s regularization features help prevent overfitting [<xref ref-type="bibr" rid="ref17">17</xref>]. Additionally, its ability to provide interpretable models through tree-based Shapley values aligns with our objective to not only predict mortality but also understand the contributing factors [<xref ref-type="bibr" rid="ref18">18</xref>]. XGBoost&#x2019;s scalability on multiple processors and speed (for both training [<xref ref-type="bibr" rid="ref17">17</xref>] and Shapley analysis [<xref ref-type="bibr" rid="ref18">18</xref>]) are also beneficial for processing large volumes of data and complex model tuning. After training the mortality prediction models for each method and data set, we calculated tree-based Shapley values corresponding to the features. This allowed us to rank the importance of each feature and explain their roles in predicting mortality.</p><p>We have made the implementation code for the methods discussed available at our GitHub repository [<xref ref-type="bibr" rid="ref19">19</xref>].</p></sec><sec id="s2-5"><title>Ethical Considerations</title><p>This study received ethics approval from the Conjoint Health Research Ethics Board at the University of Calgary (REB20-1879). Informed consent was waived due to the retrospective nature of the data and the large number of patients involved, making it impractical to seek consent from each patient. All data were deidentified. No compensation was provided to the participants as the study did not involve direct participant interaction.</p></sec></sec><sec id="s3" sec-type="results"><title>Results</title><sec id="s3-1"><title>Data Set Description</title><p><xref ref-type="table" rid="table1">Table 1</xref> summarizes the characteristics of the patients in the cohort at the time of their initial catheterization. The total numbers of patients with at least 1 record in the respective data sets, as well as the time ranges for each data set, are provided in <xref ref-type="table" rid="table2">Table 2</xref>. The aggregation procedure described in the <italic>Methods</italic> section reduced the number of records to the values listed in the &#x201C;Aggregated Records&#x201D; row, and the table also includes the total number of codes (unique <italic>ICD-10-CA</italic> or ATC codes and their parent codes) in a data set, along with the average number of codes per record. <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> illustrates the percentages of the 20 most common <italic>ICD-10-CA</italic> and ATC codes within each processed data set. Within the data set, there were 9942 cases corresponding to a 90-day mortality, resulting in a 20% mortality rate in the cohort. The final aggregated data for each data set were split into 67% for the training sets and 33% for the test sets at the patient level.</p><table-wrap id="t1" position="float"><label>Table 1.</label><caption><p>Key characteristics of the patients with CAD<sup><xref ref-type="table-fn" rid="table1fn1">a</xref></sup> enrolled in the cohort.</p></caption><table id="table1" frame="hsides" rules="groups"><thead><tr><td align="left" valign="bottom" colspan="2">Variable</td><td align="left" valign="bottom">Overall (N=51,506)</td></tr></thead><tbody><tr><td align="left" valign="top" colspan="2">Total population, n (%)</td><td align="left" valign="top">51,506 (100)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>Sex, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Female</td><td align="left" valign="top">12,875 (25)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Male</td><td align="left" valign="top">38,631 (75)</td></tr><tr><td align="left" valign="top" colspan="2">Age (years), mean (SD)</td><td align="left" valign="top">66.09 (11.41)</td></tr><tr><td align="left" valign="top" colspan="2"><bold>BMI (kg/m</bold><sup><bold>2</bold></sup><bold>), mean (SD)</bold></td><td align="left" valign="top">29.51 (7.45)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Missing data, n (%)</td><td align="left" valign="top">8449 (16.4)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>CAD type, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Non-ST elevation myocardial infarction</td><td align="left" valign="top">24,119 (46.83)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Unstable angina</td><td align="left" valign="top">10,671 (20.72)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Stable angina</td><td align="left" valign="top">9832 (19.09)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Missing data</td><td align="left" valign="top">6884.0 (13.37)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>Canadian Cardiovascular Society angina grade, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">II (slight limit)</td><td align="left" valign="top">4688 (9.1)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">IVb</td><td align="left" valign="top">7513 (14.59)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">IVa (hospitalized with acute coronary syndrome)</td><td align="left" valign="top">21,117 (41)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">III (marked limit)</td><td align="left" valign="top">2581 (5.01)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">IVc</td><td align="left" valign="top">1627 (3.16)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">I (strenuous)</td><td align="left" valign="top">1309 (2.54)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Atypical</td><td align="left" valign="top">698 (1.36)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Other or missing data</td><td align="left" valign="top">11,973 (23.25)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>Diabetes, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">No diabetes</td><td align="left" valign="top">37,544 (72.89)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Type II</td><td align="left" valign="top">12,067 (23.43)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Type I</td><td align="left" valign="top">806 (1.56)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Other</td><td align="left" valign="top">1089 (2.11)</td></tr><tr><td align="left" valign="top" colspan="2">Dyslipidemia, n (%)</td><td align="left" valign="top">32,967 (64.01)</td></tr><tr><td align="left" valign="top" colspan="2">Heart failure, n (%)</td><td align="left" valign="top">3689 (7.16)</td></tr><tr><td align="left" valign="top" colspan="2">Atrial fibrillation or flutter, n (%)</td><td align="left" valign="top">1220 (2.37)</td></tr><tr><td align="left" valign="top" colspan="2">Hypertension, n (%)</td><td align="left" valign="top">32,264 (62.64)</td></tr><tr><td align="left" valign="top" colspan="2">Angina, n (%)</td><td align="left" valign="top">2559 (4.97)</td></tr><tr><td align="left" valign="top" colspan="2">Family history of CAD, n (%)</td><td align="left" valign="top">15,209 (29.53)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>Smoking, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Never</td><td align="left" valign="top">25,822 (50.13)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Current</td><td align="left" valign="top">11,196 (21.74)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Past</td><td align="left" valign="top">14,488 (28.13)</td></tr><tr><td align="left" valign="top" colspan="2">Chronic lung disease, n (%)</td><td align="left" valign="top">5318 (10.33)</td></tr><tr><td align="left" valign="top" colspan="2">Cerebrovascular disease, n (%)</td><td align="left" valign="top">2040 (3.96)</td></tr><tr><td align="left" valign="top" colspan="2">Psychiatric history, n (%)</td><td align="left" valign="top">1097 (2.13)</td></tr><tr><td align="left" valign="top" colspan="2">Venous insufficiency, n (%)</td><td align="left" valign="top">476 (0.92)</td></tr><tr><td align="left" valign="top" colspan="2">Alcohol consumption, n (%)</td><td align="left" valign="top">599 (1.16)</td></tr><tr><td align="left" valign="top" colspan="3"><bold>Extent of CAD, n (%)</bold></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">3 VDs<sup><xref ref-type="table-fn" rid="table1fn2">b</xref></sup></td><td align="left" valign="top">247 (0.48)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">3 VDs (one &#x003E;75%)</td><td align="left" valign="top">7765 (15.08)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">3 VDs (&#x003E;75% proximal LAD<sup><xref ref-type="table-fn" rid="table1fn3">c</xref></sup>)</td><td align="left" valign="top">5704 (11.07)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">3 VDs (proximal LAD)</td><td align="left" valign="top">3318 (6.44)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">2 VDs</td><td align="left" valign="top">5392 (10.47)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">2 VDs (&#x003E;75% LAD)</td><td align="left" valign="top">569 (1.1)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">2 VDs (both &#x003E;75%)</td><td align="left" valign="top">5215 (10.13)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">2 VDs (&#x003E;75% proximal LAD)</td><td align="left" valign="top">2819 (5.47)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">1 VD (&#x003E;75% proximal LAD)</td><td align="left" valign="top">2299 (4.46)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">1 VD (&#x003E;75%)</td><td align="left" valign="top">8504 (16.51)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">1 VD (50%&#x2010;75%)</td><td align="left" valign="top">4032 (7.83)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Severe left main disease</td><td align="left" valign="top">3058 (5.94)</td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Left main disease</td><td align="left" valign="top">2584 (5.02)</td></tr></tbody></table><table-wrap-foot><fn id="table1fn1"><p><sup>a</sup>CAD: coronary artery disease.</p></fn><fn id="table1fn2"><p><sup>b</sup>VD: vessel disease.</p></fn><fn id="table1fn3"><p><sup>c</sup>LAD: left anterior descending.</p></fn></table-wrap-foot></table-wrap><table-wrap id="t2" position="float"><label>Table 2.</label><caption><p>Summary statistics of the DAD<sup><xref ref-type="table-fn" rid="table2fn1">a</xref></sup>, NACRS<sup><xref ref-type="table-fn" rid="table2fn2">b</xref></sup>, and PIN<sup><xref ref-type="table-fn" rid="table2fn3">c</xref></sup> data sets.</p></caption><table id="table2" frame="hsides" rules="groups"><thead><tr><td align="left" valign="bottom">Summary statistics</td><td align="left" valign="bottom" colspan="3">Data set</td></tr><tr><td align="left" valign="bottom"/><td align="left" valign="bottom">DAD</td><td align="left" valign="bottom">NACRS</td><td align="left" valign="bottom">PIN</td></tr></thead><tbody><tr><td align="left" valign="top">Patients with at least 1 record, n</td><td align="left" valign="top">49,075</td><td align="left" valign="top">50,628</td><td align="left" valign="top">49,052</td></tr><tr><td align="left" valign="top">Records, n</td><td align="left" valign="top">273,910</td><td align="left" valign="top">3,974,403</td><td align="left" valign="top">28,807,136</td></tr><tr><td align="left" valign="top">Aggregated records, n</td><td align="left" valign="top">166,083</td><td align="left" valign="top">173,507</td><td align="left" valign="top">997,997</td></tr><tr><td align="left" valign="top">Unique <italic>ICD-10-CA<sup><xref ref-type="table-fn" rid="table2fn4">d</xref></sup></italic> or ATC<sup><xref ref-type="table-fn" rid="table2fn5">e</xref></sup> codes and their parent codes, n</td><td align="left" valign="top">9651</td><td align="left" valign="top">7803</td><td align="left" valign="top">2315</td></tr><tr><td align="left" valign="top">Codes per aggregated record, mean (SD)</td><td align="left" valign="top">24.90 (16.55)</td><td align="left" valign="top">15.27 (12.55)</td><td align="left" valign="top">33.31 (18.95)</td></tr><tr><td align="left" valign="top">Time range</td><td align="left" valign="top">2004&#x2010;2022</td><td align="left" valign="top">2010&#x2010;2022</td><td align="left" valign="top">2004&#x2010;2022</td></tr></tbody></table><table-wrap-foot><fn id="table2fn1"><p><sup>a</sup>DAD: Discharge Abstract Database.</p></fn><fn id="table2fn2"><p><sup>b</sup>NACRS: National Ambulatory Care Reporting System.</p></fn><fn id="table2fn3"><p><sup>c</sup>PIN: Pharmaceutical Information Network.</p></fn><fn id="table2fn4"><p><sup>d</sup><italic>ICD-10-CA</italic>: <italic>International Classification of Diseases, Tenth Revision, Canada</italic>.</p></fn><fn id="table2fn5"><p><sup>e</sup>ATC: Anatomical Therapeutic Chemical.</p></fn></table-wrap-foot></table-wrap></sec><sec id="s3-2"><title>Performances of the Feature Selection Methods</title><p><xref ref-type="table" rid="table3">Table 3</xref> shows the accuracies and binary cross entropies of the models based on the selected features from each method. <xref ref-type="table" rid="table4">Table 4</xref> shows the accuracy, <italic>F</italic><sub>1</sub>-score, and area under the receiver operating characteristic curve (AUC-ROC) metrics of the XGBoost models to predict 90-day mortality.</p><table-wrap id="t3" position="float"><label>Table 3.</label><caption><p>Average accuracy and binary cross entropy (BCE) loss of different sets of selected features in reconstructing the original feature space in a neural network structure.</p></caption><table id="table3" frame="hsides" rules="groups"><thead><tr><td align="left" valign="bottom">Feature selection method</td><td align="left" valign="bottom" colspan="2">DAD<sup><xref ref-type="table-fn" rid="table3fn1">a</xref></sup></td><td align="left" valign="bottom" colspan="2">NACRS<sup><xref ref-type="table-fn" rid="table3fn2">b</xref></sup></td><td align="left" valign="bottom" colspan="2">PIN<sup><xref ref-type="table-fn" rid="table3fn3">c</xref></sup></td></tr><tr><td align="left" valign="bottom"/><td align="left" valign="bottom">Accuracy, mean (95% CI)</td><td align="left" valign="bottom">BCE, mean (95% CI)</td><td align="left" valign="bottom">Accuracy, mean (95% CI)</td><td align="left" valign="bottom">BCE, mean (95% CI)</td><td align="left" valign="bottom">Accuracy, mean (95% CI)</td><td align="left" valign="bottom">BCE, mean (95% CI)</td></tr></thead><tbody><tr><td align="left" valign="top">CAEWW<sup><xref ref-type="table-fn" rid="table3fn4">d</xref></sup></td><td align="left" valign="top">0.9992<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9992-0.9993)</td><td align="left" valign="top">0.0121<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0121-0.0121)</td><td align="left" valign="top">0.9994<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9994-0.9995)</td><td align="left" valign="top">0.0091<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0091-0.0091)</td><td align="left" valign="top">0.9972<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9969-0.9975)</td><td align="left" valign="top">0.0432<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0432-0.0432)</td></tr><tr><td align="left" valign="top">CAENW<sup><xref ref-type="table-fn" rid="table3fn6">f</xref></sup></td><td align="left" valign="top">0.9992<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9991-0.9993)</td><td align="left" valign="top">0.0121<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0121-0.0121)</td><td align="left" valign="top">0.9994<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9993-0.9994)</td><td align="left" valign="top">0.0094<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0094-0.0094)</td><td align="left" valign="top">0.9972<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9969-0.9974)</td><td align="left" valign="top">0.0438<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0438-0.0438)</td></tr><tr><td align="left" valign="top">AEFS<sup><xref ref-type="table-fn" rid="table3fn7">g</xref></sup></td><td align="left" valign="top">0.9976 (0.9972-0.9980)</td><td align="left" valign="top">0.0370 (0.0370-0.0370)</td><td align="left" valign="top">0.9982 (0.9979-0.9985)</td><td align="left" valign="top">0.0274 (0.0274-0.0274)</td><td align="left" valign="top">0.9884 (0.9867-0.9901)</td><td align="left" valign="top">0.1794 (0.1794-0.1794)</td></tr><tr><td align="left" valign="top">MCFS<sup><xref ref-type="table-fn" rid="table3fn8">h</xref></sup></td><td align="left" valign="top">0.9991<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9990-0.9991)</td><td align="left" valign="top">0.0145<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0145-0.0145)</td><td align="left" valign="top">0.9992<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9992-0.9993)</td><td align="left" valign="top">0.0117<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0117-0.0117)</td><td align="left" valign="top">0.9956<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9951-0.9962)</td><td align="left" valign="top">0.0677<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0677-0.0677)</td></tr><tr><td align="left" valign="top">PFA<sup><xref ref-type="table-fn" rid="table3fn9">i</xref></sup></td><td align="left" valign="top">0.9975 (0.9971-0.9979)</td><td align="left" valign="top">0.0382 (0.0382-0.0382)</td><td align="left" valign="top">0.9981 (0.9978-0.9985)</td><td align="left" valign="top">0.0286 (0.0286-0.0286)</td><td align="left" valign="top">0.9871 (0.9852-0.9891)</td><td align="left" valign="top">0.1982 (0.1982-0.1982)</td></tr><tr><td align="left" valign="top">LS<sup><xref ref-type="table-fn" rid="table3fn10">j</xref></sup></td><td align="left" valign="top">0.9989<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9988-0.9990)</td><td align="left" valign="top">0.0165<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0165-0.0165)</td><td align="left" valign="top">0.9991<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9990-0.9992)</td><td align="left" valign="top">0.0136<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0136-0.0136)</td><td align="left" valign="top">0.9945<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.9938-0.9952)</td><td align="left" valign="top">0.0850<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (0.0850-0.0850)</td></tr><tr><td align="left" valign="top"><italic>Mode of the training set (baseline model</italic>)</td><td align="left" valign="top">0.9975 (0.9971-0.9979)</td><td align="left" valign="top">0.0384 (0.0322-0.0447)</td><td align="left" valign="top">0.9981 (0.9978-0.9984)</td><td align="left" valign="top">0.0294 (0.0245-0.0342)</td><td align="left" valign="top">0.9870 (0.9850-0.9889)</td><td align="left" valign="top">0.2012 (0.1712-0.2312)</td></tr></tbody></table><table-wrap-foot><fn id="table3fn1"><p><sup>a</sup>DAD: Discharge Abstract Database.</p></fn><fn id="table3fn2"><p><sup>b</sup>NACRS: National Ambulatory Care Reporting System.</p></fn><fn id="table3fn3"><p><sup>c</sup>PIN: Pharmaceutical Information Network.</p></fn><fn id="table3fn4"><p><sup>d</sup>CAEWW: concrete autoencoder with weight adjustment.</p></fn><fn id="table3fn5"><p><sup>e</sup>Significantly different from the baseline model that outputs the mode of each class (<italic>P</italic>&#x003C;.05). The <italic>P</italic> values are presented in Table S1 of <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>.</p></fn><fn id="table3fn6"><p><sup>f</sup>CAENW: concrete autoencoder with no weight adjustment.</p></fn><fn id="table3fn7"><p><sup>g</sup>AEFS: autoencoder-inspired unsupervised feature selection.</p></fn><fn id="table3fn8"><p><sup>h</sup>MCFS: unsupervised feature selection for multicluster data.</p></fn><fn id="table3fn9"><p><sup>i</sup>PFA: principal feature analysis.</p></fn><fn id="table3fn10"><p><sup>j</sup>LS: Laplacian score.</p></fn></table-wrap-foot></table-wrap><table-wrap id="t4" position="float"><label>Table 4.</label><caption><p>Performance of the extreme gradient boosting (XGBoost) model in predicting 90-day mortality using different sets of selected features.</p></caption><table id="table4" frame="hsides" rules="groups"><thead><tr><td align="left" valign="bottom">Feature selection method</td><td align="left" valign="bottom" colspan="3">DAD<sup><xref ref-type="table-fn" rid="table4fn1">a</xref></sup></td><td align="left" valign="bottom" colspan="3">NACRS<sup><xref ref-type="table-fn" rid="table4fn2">b</xref></sup></td><td align="left" valign="bottom" colspan="3">PIN<sup><xref ref-type="table-fn" rid="table4fn3">c</xref></sup></td></tr><tr><td align="left" valign="top"/><td align="left" valign="top">Accuracy</td><td align="left" valign="top"><italic>F</italic><sub>1</sub>-score</td><td align="left" valign="top">AUC-ROC<sup><xref ref-type="table-fn" rid="table4fn4">d</xref></sup></td><td align="left" valign="top">Accuracy</td><td align="left" valign="top"><italic>F</italic><sub>1</sub>-score</td><td align="left" valign="top">AUC-ROC</td><td align="left" valign="top">Accuracy</td><td align="left" valign="top"><italic>F</italic><sub>1</sub>-score</td><td align="left" valign="top">AUC-ROC</td></tr></thead><tbody><tr><td align="left" valign="top">CAEWW<sup><xref ref-type="table-fn" rid="table4fn5">e</xref></sup></td><td align="left" valign="top">0.86</td><td align="left" valign="top">0.37</td><td align="left" valign="top">0.87</td><td align="left" valign="top">0.85</td><td align="left" valign="top">0.15</td><td align="left" valign="top">0.75</td><td align="left" valign="top">0.84</td><td align="left" valign="top">0.1</td><td align="left" valign="top">0.82</td></tr><tr><td align="left" valign="top">CAENW<sup><xref ref-type="table-fn" rid="table4fn6">f</xref></sup></td><td align="left" valign="top">0.86</td><td align="left" valign="top">0.36</td><td align="left" valign="top">0.87<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.86</td><td align="left" valign="top">0.15</td><td align="left" valign="top">0.75</td><td align="left" valign="top">0.83</td><td align="left" valign="top">0.1</td><td align="left" valign="top">0.82</td></tr><tr><td align="left" valign="top">AEFS<sup><xref ref-type="table-fn" rid="table4fn8">h</xref></sup></td><td align="left" valign="top">0.88</td><td align="left" valign="top">0.21</td><td align="left" valign="top">0.61<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.9</td><td align="left" valign="top">0.09</td><td align="left" valign="top">0.56<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.85</td><td align="left" valign="top">0.08</td><td align="left" valign="top">0.69<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td></tr><tr><td align="left" valign="top">MCFS<sup><xref ref-type="table-fn" rid="table4fn9">i</xref></sup></td><td align="left" valign="top">0.86</td><td align="left" valign="top">0.37</td><td align="left" valign="top">0.89<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.84</td><td align="left" valign="top">0.13</td><td align="left" valign="top">0.74</td><td align="left" valign="top">0.81</td><td align="left" valign="top">0.09</td><td align="left" valign="top">0.84<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td></tr><tr><td align="left" valign="top">PFA<sup><xref ref-type="table-fn" rid="table4fn10">j</xref></sup></td><td align="left" valign="top">0.92</td><td align="left" valign="top">0.05</td><td align="left" valign="top">0.5<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.97</td><td align="left" valign="top">0.02</td><td align="left" valign="top">0.5<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.93</td><td align="left" valign="top">0.05</td><td align="left" valign="top">0.54<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td></tr><tr><td align="left" valign="top">LS<sup><xref ref-type="table-fn" rid="table4fn11">k</xref></sup></td><td align="left" valign="top">0.77</td><td align="left" valign="top">0.26</td><td align="left" valign="top">0.81<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.8</td><td align="left" valign="top">0.11</td><td align="left" valign="top">0.73<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup></td><td align="left" valign="top">0.76</td><td align="left" valign="top">0.08</td><td align="left" valign="top">0.82</td></tr></tbody></table><table-wrap-foot><fn id="table4fn1"><p><sup>a</sup>DAD: Discharge Abstract Database.</p></fn><fn id="table4fn2"><p><sup>b</sup>NACRS: National Ambulatory Care Reporting System.</p></fn><fn id="table4fn3"><p><sup>c</sup>PIN: Pharmaceutical Information Network.</p></fn><fn id="table4fn4"><p><sup>d</sup>AUC-ROC: area under the receiver operating characteristic curve.</p></fn><fn id="table4fn5"><p><sup>e</sup>CAEWW: concrete autoencoder with weight adjustment.</p></fn><fn id="table4fn6"><p><sup>f</sup>CAENW: concrete autoencoder with no weight adjustment.</p></fn><fn id="table4fn7"><p><sup>g</sup>Significantly different from the AUC-ROC of the model trained on CAEWW features in their corresponding data set (<italic>P</italic>&#x003C;.05) using the DeLong test [<xref ref-type="bibr" rid="ref20">20</xref>]. The <italic>P</italic> values are presented in Table S2 in <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>.</p></fn><fn id="table4fn8"><p><sup>h</sup>AEFS: autoencoder-inspired unsupervised feature selection.</p></fn><fn id="table4fn9"><p><sup>i</sup>MCFS: unsupervised feature selection for multicluster data.</p></fn><fn id="table4fn10"><p><sup>j</sup>PFA: principal feature analysis.</p></fn><fn id="table4fn11"><p><sup>k</sup>LS: Laplacian score.</p></fn></table-wrap-foot></table-wrap><p>Both tables indicate that the CAE methods generally selected superior features compared to the other algorithms. Adjusting the weights within the CAE improved the performance of feature space reconstruction slightly. In terms of predicting 90-day mortality, the CAE methods again performed better than the other methods, as evidenced by the AUC-ROC. This superior performance was statistically significant in most instances (<italic>P</italic>&#x003C;.05), according to the DeLong test for AUC-ROC. Furthermore, the McNemar test revealed a significant difference between the overall performance of the mortality prediction models trained on the features of the CAEWW method and those based on the other methods (<italic>P</italic>&#x003C;.05). The <italic>P</italic> values of the McNemar and DeLong tests can be found in Table S2 of <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>.</p><p><xref ref-type="fig" rid="figure1">Figure 1</xref> shows the log-scale histograms of the original feature space reconstruction accuracy in each <italic>ICD-10-CA</italic> or ATC code for different feature selection methods. It shows that CAEWW and CAENW were the best methods in terms of reconstructing the majority of the features with high accuracy. The other methods, despite having high average accuracy, performed poorly in reconstructing some of the features.</p><fig position="float" id="figure1"><label>Figure 1.</label><caption><p>Log-scale histograms of the initial feature space reconstruction accuracy in each International Classification of Diseases (ICD) code for different feature selection methods and different data sets: (A) concrete autoencoder with weight adjustment (CAEWW), (B) concrete autoencoder with no weight adjustment (CAENW), (C) autoencoder-inspired unsupervised feature selection (AEFS), (D) unsupervised feature selection for multicluster data (MCFS), (E) principal feature analysis (PFA), and (F) Laplacian score (LS). DAD: Discharge Abstract Database; NACRS: National Ambulatory Care Reporting System; PIN: Pharmaceutical Information Network.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="medinform_v12i1e52896_fig01.png"/></fig></sec><sec id="s3-3"><title>Characteristics of the Selected Features</title><p>We also calculated the average depths of the codes selected by each method and compared them (using 2-tailed <italic>t</italic> test analysis) against the CAEWW method that is intended to select more general codes (ie, smaller depths). The CAEWW method selected codes with average depths of 1.38, 1.42, and 1.99 for the DAD, NACRS, and PIN data sets, respectively. Although CAEWW&#x2019;s code depths were significantly lower than the depths of the codes selected by the other methods (all <italic>P</italic>&#x003C;.05), there was no significant difference between the CAEWW and CAENW methods (with average depths of 1.48, 1.45, 2.03; all <italic>P</italic>&#x003E;.05). The <italic>P</italic> values can be found in Table S3 of <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>. <xref ref-type="fig" rid="figure2">Figure 2</xref> illustrates the difference in average code depth among the different methods.</p><p>We used the average of the mean absolute Shapley values from each mortality prediction model as an index for the importance of the features selected by each method. To compare the CAEWW method with the other methods across the 3 different data sets, we conducted a 2-tailed <italic>t</italic> test analysis. The CAEWW method did not show any significant difference from the CAENW methods across all data sets (all <italic>P</italic>&#x003E;.05). However, the CAEWW method did yield significantly higher mean absolute Shapley values compared to the other methods (AEFS and PFA; all <italic>P</italic>&#x003C;.001). The AEFS and PFA methods had the lowest Shapley values compared to all the other methods, indicating that they selected lower-quality features for this task. The <italic>P</italic> values are available in Table S4 in <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>. <xref ref-type="fig" rid="figure3">Figure 3</xref> illustrates the aforementioned differences in mean absolute Shapley values. <xref ref-type="fig" rid="figure4">Figure 4</xref> shows the Shapley plots of the 20 most important features selected by the CAEWW method across different data sets. The corresponding Shapley plots for the other methods can be found in <xref ref-type="supplementary-material" rid="app3">Multimedia Appendix 3</xref> (Figures S1-S5). Additionally, <xref ref-type="supplementary-material" rid="app4">Multimedia Appendix 4</xref> (Tables S1-S18) includes all chosen features, detailed descriptions, and the average absolute Shapley values across all data sets and methods.</p><fig position="float" id="figure2"><label>Figure 2.</label><caption><p>Average depths of the selected codes by each method in the <italic>ICD-10-CA</italic> or ATC tree. Methods with average depths significantly (<italic>P</italic>&#x003C;.05) larger than the CAEWW method in their corresponding data set are marked with asterisks (*). AEFS: autoencoder-inspired unsupervised feature selection; ATC: Anatomical Therapeutic Chemical CAENW: concrete autoencoder with no weight adjustment; CAEWW: concrete autoencoder with weight adjustment; DAD: Discharge Abstract Database; <italic>ICD-10-CA</italic>: <italic>International Classification of Diseases, Tenth Revision, Canada</italic>; LS: Laplacian score; MCFS: unsupervised feature selection for multicluster data; NACRS: National Ambulatory Care Reporting System; PFA: principal feature analysis; PIN: Pharmaceutical Information Network.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="medinform_v12i1e52896_fig02.png"/></fig><fig position="float" id="figure3"><label>Figure 3.</label><caption><p>Average of mean absolute Shapley values of features in each mortality prediction model. Methods with average values significantly (<italic>P</italic>&#x003C;.05) smaller than the CAEWW method in their corresponding data set are marked with asterisks (*). AEFS: autoencoder-inspired unsupervised feature selection; CAENW: concrete autoencoder with no weight adjustment; CAEWW: concrete autoencoder with weight adjustment; DAD: Discharge Abstract Database; LS: Laplacian score; MCFS: unsupervised feature selection for multicluster data; NACRS: National Ambulatory Care Reporting System; PFA: principal feature analysis; PIN: Pharmaceutical Information Network.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="medinform_v12i1e52896_fig03.png"/></fig><fig position="float" id="figure4"><label>Figure 4.</label><caption><p>SHAP values of the features selected by the CAEWW method across different data sets (20 most important features): (A) Discharge Abstract Database (DAD), (B) National Ambulatory Care Reporting System (NACRS), and (C) Pharmaceutical Information Network (PIN). CAEWW: concrete autoencoder with weight adjustment; SHAP: Shapley additive explanations.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="medinform_v12i1e52896_fig04.png"/></fig></sec></sec><sec id="s4" sec-type="discussion"><title>Discussion</title><sec id="s4-1"><title>Principal Findings</title><p>The high dimensionality of the ICD and ATC code databases necessitates the use of dimensionality reduction techniques to feed the data into machine learning models. Due to interpretability concerns in the health domain, selecting from original features, rather than transforming them into new features, is an essential step in reducing dimensionality. In this study, we demonstrated that the CAE methods performed the best in selecting the most informative ICD and ATC codes in an unsupervised setting. Using a clinical outcome as a case study, we also demonstrated that ICD and ATC code features selected by the CAE methods were able to predict the outcome variable with better accuracy than the other methods in the study, even though they were derived from an unsupervised setting in the absence of the target variable. This indicates that the selected features can be considered unbiased toward a specific target variable and explain the phenomenon appropriately. We also showed that the AEFS and PFA methods did not select high-quality features in our data set and were not suitable for both tasks of reconstructing the feature space and predicting 90-day mortality. The LS and MCFS methods, however, showed better performance in both tasks (slightly lower than the CAE methods). Furthermore, the features selected by the CAE methods (especially CAEWW) were generally higher-level codes (ie, lower depth in the hierarchical structure), which helps the study to find generalized solutions.</p><p>It is worth mentioning that our methodology code is publicly shared, allowing other researchers to use the desired methods for selecting the most informative features within cohorts with large ICD, ATC, or other hierarchical-coded health databases [<xref ref-type="bibr" rid="ref19">19</xref>].</p></sec><sec id="s4-2"><title>Computational Cost</title><p>The MCFS, PFA, and LS methods had multiple special matrix operations that made them computationally expensive. Considering our large-scale, high-dimensional data set, these algorithms were not possible to run on a personal computer and we had to optimize the operations for an advanced computing cluster with 40 Intel Xeon Gold 6342 2.80 GHz CPUs and 2048 GB RAM. The MCFS and LS feature selection experiments had some shared operations and together took over 2 days to complete. The PFA method also needed less than a day for the entire feature selection experiments. The AEFS and CAE methods, however, had the advantage of using GPUs and optimized deep learning libraries for training the neural networks and were faster. Each of these algorithms took less than 4 hours on an Nvidia A100 80 GB GPU.</p></sec><sec id="s4-3"><title>Selected Features</title><p>The Shapely analysis of the 20 most important features selected in each data set using the CAEWW method for predicting mortality revealed the multidimensional capabilities of this method in identifying relevant information. In the DAD and NACRS data sets, it selected disease codes relevant to mortality among patients with CAD. In both data sets, diseases related to cardiovascular conditions, hypertensive and circulatory disorders, metabolic disorders, renal failures [<xref ref-type="bibr" rid="ref21">21</xref>], and cancer [<xref ref-type="bibr" rid="ref22">22</xref>] were selected, which are important factors in the outcomes of patients with CAD. Furthermore, DAD-based features included accidents, arthropathies, and hospitalization-specific conditions, whereas the NACRS data set resulted in falls [<xref ref-type="bibr" rid="ref23">23</xref>]; digestive disorders [<xref ref-type="bibr" rid="ref24">24</xref>]; and codes related to the rehabilitations, management, or complexity of the diseases. In the PIN data set, direct interventions for CAD and related risk factors were mainly selected, including high-ceiling diuretics, statins, ace inhibitors, angiotensin II receptor blockers (plain and combinations), direct factor Xa inhibitors, vasodilators, antianemic preparations, antithrombotic agents, and other lipid-modifying agents, addressing heart failure, cholesterol management, blood pressure control, anticoagulation, anemia, and blood flow. Also, drugs related to accompanying diseases or conditions with CAD were selected: gastrointestinal issues (pantoprazole and general drugs for acid-related disorders [<xref ref-type="bibr" rid="ref25">25</xref>,<xref ref-type="bibr" rid="ref26">26</xref>] and drugs for constipation [<xref ref-type="bibr" rid="ref27">27</xref>]), pain management (opioids, other analgesics, and antipyretics) [<xref ref-type="bibr" rid="ref28">28</xref>], inflammatory conditions and immune responses (anti-inflammatory and antirheumatic products and corticosteroids for systemic use) [<xref ref-type="bibr" rid="ref29">29</xref>,<xref ref-type="bibr" rid="ref30">30</xref>], mental and behavioral health (antipsychotics) [<xref ref-type="bibr" rid="ref31">31</xref>], respiratory conditions (adrenergic inhalants [<xref ref-type="bibr" rid="ref32">32</xref>]), and urological issues [<xref ref-type="bibr" rid="ref33">33</xref>].</p><p>Previous studies typically selected codes as machine learning model features based on expert opinions, the presence of high-level codes (eg, categories or chapters), or a combination of both [<xref ref-type="bibr" rid="ref34">34</xref>,<xref ref-type="bibr" rid="ref35">35</xref>]. To the best of our knowledge, only 1 study [<xref ref-type="bibr" rid="ref3">3</xref>] attempted to offer a sophisticated feature selection method using tree-lasso regularization for ICD code data sets, but it was in a supervised setting that required an outcome variable. Our study provides a general tool for health researchers to select the most informative ICD and ATC codes without biasing the study toward a specific outcome variable. We also introduced a unique target weight adjustment function to the CAE model to guide the model to select higher levels of the ICD table compared to the model without adjustment.</p></sec><sec id="s4-4"><title>Limitations and Future Work</title><p>One of the limitations of this study was the incapability of the CAE method to select an exact number of desired features. Since the neurons in the concrete selector layer work independently, there is a possibility of selecting duplicate features. Therefore, the number of final selected features can be fewer than the desired number. Although it indicates that the decoder model is still capable of reconstructing the initial feature space with a smaller number of features, some researchers may prefer to have an exact number of features they desire for their models. One previous study [<xref ref-type="bibr" rid="ref36">36</xref>] has used a special regularization term in the training step to enforce the model not to select duplicate features. This method can be investigated for the ICD and ATC codes in the future.</p><p>The aggregation of codes should be viewed as a trade-off in this study. We needed to select a reasonable aggregation period that covers both long-term and short-term diseases. A shorter period could skew the results by including multiple correlated records from the same patient. Conversely, longer periods could weigh short-term diseases equally with long-term ones, and the codes of the patients with fewer records (eg, recent patients in the cohort) would have a lower chance of selection.</p><p>Another limitation was that we only used 3 data sets of a specific disease cohort to choose the features. Therefore, the selected features in this study may not generalize to other patient cohorts or diseases. Furthermore, we selected the 100 best features, but other data sets or patient cohorts may require a different number of features. Future studies may investigate the impact of the number of features on the results. Moreover, our hyperparameter analysis was conducted within a constrained scope due to limited computational resources. Future studies could further explore the impact of a broader range of hyperparameter values. We anticipate that the CAEs hold potential for this area due to their flexible neural network structure and optimized algorithms. A similar limitation also applies to the mortality prediction case study, where we only trained XGBoost and did not explore other model types.</p></sec><sec id="s4-5"><title>Conclusions</title><p>In this study, we investigated 5 different methods for selecting the best features in ICD and ATC code data sets in an unsupervised setting. We demonstrated that the CAE method can select better features representing the whole data set that can be useful in further machine learning studies. We also introduced weight adjustment of the CAE method for ICD and ATC code data sets that can be useful in the generalizability and interpretability of the models, given that it prioritizes selecting high-level definitions of diseases.</p><p>The CAEWW method outperformed all other methods in reconstructing the initial feature space across all data sets. We validated the selected features through a supervised learning task, predicting 90-day mortality after discharge using 3 distinct data sets. Features selected via the CAEWW method demonstrated significantly improved performance on this task, as evidenced by the DeLong and McNemar tests. Given the advantages of the CAE method, we recommend its use in the feature selection phase of EHD analysis with ICD or ATC codes.</p></sec></sec></body><back><ack><p>This study was supported by the Libin Cardiovascular Institute PhD Graduate Scholarship, the Alberta Innovates Graduate Scholarship, and a Project Grant from the Canadian Institutes of Health Research (PJT 178027).</p></ack><fn-group><fn fn-type="con"><p>PG led the conceptualization of the study, designed the research methodology, executed the study, performed the data analysis, and drafted the original manuscript. JL provided the data set essential for the research, contributed to study supervision, and critically reviewed and revised the manuscript. Both authors read and approved the final manuscript.</p></fn><fn fn-type="conflict"><p>JL is the chief technology officer, a co-founder, and a major shareholder of Symbiotic AI, Inc. The authors have no further conflicts of interest to declare.</p></fn></fn-group><glossary><title>Abbreviations</title><def-list><def-item><term id="abb1">AEFS</term><def><p>autoencoder-inspired unsupervised feature selection</p></def></def-item><def-item><term id="abb2">APPROACH</term><def><p>Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease</p></def></def-item><def-item><term id="abb3">ATC</term><def><p>Anatomical Therapeutic Chemical</p></def></def-item><def-item><term id="abb4">AUC-ROC</term><def><p>area under the receiver operating characteristic curve</p></def></def-item><def-item><term id="abb5">CAD</term><def><p>coronary artery disease</p></def></def-item><def-item><term id="abb6">CAE</term><def><p>concrete autoencoder</p></def></def-item><def-item><term id="abb7">CAENW</term><def><p>concrete autoencoder with no weight adjustment</p></def></def-item><def-item><term id="abb8">CAEWW</term><def><p>concrete autoencoder with weight adjustment</p></def></def-item><def-item><term id="abb9">DAD</term><def><p>Discharge Abstract Database</p></def></def-item><def-item><term id="abb10">EHD</term><def><p>electronic health data</p></def></def-item><def-item><term id="abb11">ICD</term><def><p>International Classification of Diseases</p></def></def-item><def-item><term id="abb12">ICD-10</term><def><p>International Classification of Diseases, Tenth Revision</p></def></def-item><def-item><term id="abb13">ICD-10-CA</term><def><p>International Classification of Diseases, Tenth Revision, Canada</p></def></def-item><def-item><term id="abb14">LS</term><def><p>Laplacian score</p></def></def-item><def-item><term id="abb15">MCFS</term><def><p>unsupervised feature selection for multicluster data</p></def></def-item><def-item><term id="abb16">NACRS</term><def><p>National Ambulatory Care Reporting System</p></def></def-item><def-item><term id="abb17">PCA</term><def><p>principal component analysis</p></def></def-item><def-item><term id="abb18">PFA</term><def><p>principal feature analysis</p></def></def-item><def-item><term id="abb19">PIN</term><def><p>Pharmaceutical Information Network</p></def></def-item><def-item><term id="abb20">XGBoost</term><def><p>extreme gradient boosting</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Jensen</surname><given-names>PB</given-names> </name><name name-style="western"><surname>Jensen</surname><given-names>LJ</given-names> </name><name name-style="western"><surname>Brunak</surname><given-names>S</given-names> </name></person-group><article-title>Mining electronic health records: towards better research applications and clinical care</article-title><source>Nat Rev Genet</source><year>2012</year><month>05</month><day>2</day><volume>13</volume><issue>6</issue><fpage>395</fpage><lpage>405</lpage><pub-id pub-id-type="doi">10.1038/nrg3208</pub-id><pub-id pub-id-type="medline">22549152</pub-id></nlm-citation></ref><ref id="ref2"><label>2</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Yu</surname><given-names>C</given-names> </name><name name-style="western"><surname>Liu</surname><given-names>J</given-names> </name><name name-style="western"><surname>Nemati</surname><given-names>S</given-names> </name><name name-style="western"><surname>Yin</surname><given-names>G</given-names> </name></person-group><article-title>Reinforcement learning in healthcare: a survey</article-title><source>ACM Comput Surv</source><year>2023</year><month>11</month><day>23</day><volume>55</volume><issue>1</issue><fpage>1</fpage><lpage>36</lpage><pub-id pub-id-type="doi">10.1145/3477600</pub-id></nlm-citation></ref><ref id="ref3"><label>3</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Kamkar</surname><given-names>I</given-names> </name><name name-style="western"><surname>Gupta</surname><given-names>SK</given-names> </name><name name-style="western"><surname>Phung</surname><given-names>D</given-names> </name><name name-style="western"><surname>Venkatesh</surname><given-names>S</given-names> </name></person-group><article-title>Stable feature selection for clinical prediction: exploiting icd tree structure using Tree-Lasso</article-title><source>J Biomed Inform</source><year>2015</year><month>02</month><volume>53</volume><fpage>277</fpage><lpage>290</lpage><pub-id pub-id-type="doi">10.1016/j.jbi.2014.11.013</pub-id><pub-id pub-id-type="medline">25500636</pub-id></nlm-citation></ref><ref id="ref4"><label>4</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Berisha</surname><given-names>V</given-names> </name><name name-style="western"><surname>Krantsevich</surname><given-names>C</given-names> </name><name name-style="western"><surname>Hahn</surname><given-names>PR</given-names> </name><etal/></person-group><article-title>Digital medicine and the curse of dimensionality</article-title><source>NPJ Digit Med</source><year>2021</year><month>10</month><day>28</day><volume>4</volume><issue>1</issue><fpage>153</fpage><pub-id pub-id-type="doi">10.1038/s41746-021-00521-5</pub-id><pub-id pub-id-type="medline">34711924</pub-id></nlm-citation></ref><ref id="ref5"><label>5</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Solorio-Fern&#x00E1;ndez</surname><given-names>S</given-names> </name><name name-style="western"><surname>Carrasco-Ochoa</surname><given-names>JA</given-names> </name><name name-style="western"><surname>Mart&#x00ED;nez-Trinidad</surname><given-names>JF</given-names> </name></person-group><article-title>A review of unsupervised feature selection methods</article-title><source>Artif Intell Rev</source><year>2020</year><month>02</month><volume>53</volume><issue>2</issue><fpage>907</fpage><lpage>948</lpage><pub-id pub-id-type="doi">10.1007/s10462-019-09682-y</pub-id></nlm-citation></ref><ref id="ref6"><label>6</label><nlm-citation citation-type="preprint"><person-group person-group-type="author"><name name-style="western"><surname>Abid</surname><given-names>A</given-names> </name><name name-style="western"><surname>Balin</surname><given-names>MF</given-names> </name><name name-style="western"><surname>Zou</surname><given-names>J</given-names> </name></person-group><article-title>Concrete autoencoders for differentiable feature selection and reconstruction</article-title><source>arXiv</source><comment>Preprint posted online on  Jan 27, 2019</comment><pub-id pub-id-type="doi">10.48550/arXiv.1901.09346</pub-id></nlm-citation></ref><ref id="ref7"><label>7</label><nlm-citation citation-type="book"><person-group person-group-type="author"><collab>World Health Organization</collab></person-group><source>International Statistical Classification of Diseases and Related Health Problems: Alphabetical Index</source><year>2004</year><publisher-name>World Health Organization</publisher-name><pub-id pub-id-type="other">978-92-4-154654-6</pub-id></nlm-citation></ref><ref id="ref8"><label>8</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Yan</surname><given-names>C</given-names> </name><name name-style="western"><surname>Fu</surname><given-names>X</given-names> </name><name name-style="western"><surname>Liu</surname><given-names>X</given-names> </name><etal/></person-group><article-title>A survey of automated international classification of diseases coding: development, challenges, and applications</article-title><source>Intell Med</source><year>2022</year><month>08</month><volume>2</volume><issue>3</issue><fpage>161</fpage><lpage>173</lpage><pub-id pub-id-type="doi">10.1016/j.imed.2022.03.003</pub-id></nlm-citation></ref><ref id="ref9"><label>9</label><nlm-citation citation-type="book"><person-group person-group-type="author"><collab>World Health Organization</collab><collab>Canadian Institute for Health Information</collab></person-group><source>International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Canada (ICD-10-CA): Tabular List</source><year>2015</year><publisher-name>Canadian Institute for Health Information</publisher-name><pub-id pub-id-type="other">1-55392-804-0</pub-id></nlm-citation></ref><ref id="ref10"><label>10</label><nlm-citation citation-type="web"><article-title>Structure and principles</article-title><source>WHO Collaborating Centre for Drug Statistics Methodology</source><access-date>2023-07-30</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://www.whocc.no/atc/structure_and_principles/">https://www.whocc.no/atc/structure_and_principles/</ext-link></comment></nlm-citation></ref><ref id="ref11"><label>11</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ghali</surname><given-names>WA</given-names> </name><name name-style="western"><surname>Knudtson</surname><given-names>ML</given-names> </name></person-group><article-title>Overview of the alberta provincial project for outcome assessment in coronary heart disease. on behalf of the APPROACH investigators</article-title><source>Can J Cardiol</source><year>2000</year><month>10</month><volume>16</volume><issue>10</issue><fpage>1225</fpage><lpage>1230</lpage><pub-id pub-id-type="medline">11064296</pub-id></nlm-citation></ref><ref id="ref12"><label>12</label><nlm-citation citation-type="preprint"><person-group person-group-type="author"><name name-style="western"><surname>Maddison</surname><given-names>CJ</given-names> </name><name name-style="western"><surname>Mnih</surname><given-names>A</given-names> </name><name name-style="western"><surname>Teh</surname><given-names>YW</given-names> </name></person-group><article-title>The concrete distribution: a continuous relaxation of discrete random variables</article-title><source>arXiv</source><comment>Preprint posted online on  Nov 2, 2017</comment><pub-id pub-id-type="doi">10.48550/arXiv.1611.00712</pub-id></nlm-citation></ref><ref id="ref13"><label>13</label><nlm-citation citation-type="confproc"><person-group person-group-type="author"><name name-style="western"><surname>Han</surname><given-names>K</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>C</given-names> </name><name name-style="western"><surname>Li</surname><given-names>C</given-names> </name><name name-style="western"><surname>Xu</surname><given-names>C</given-names> </name></person-group><article-title>Autoencoder inspired unsupervised feature selection</article-title><conf-name>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name><conf-date>Apr 15 to 20, 2018:</conf-date><conf-loc>Calgary, AB</conf-loc><fpage>2941</fpage><lpage>2945</lpage><pub-id pub-id-type="doi">10.1109/ICASSP.2018.8462261</pub-id></nlm-citation></ref><ref id="ref14"><label>14</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Lu</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Cohen</surname><given-names>I</given-names> </name><name name-style="western"><surname>Zhou</surname><given-names>XS</given-names> </name><name name-style="western"><surname>Tian</surname><given-names>Q</given-names> </name></person-group><article-title>Feature selection using principal feature analysis</article-title><source>MM &#x2019;07: Proceedings of the 15th ACM International Conference on Multimedia</source><year>2007</year><publisher-name>Association for Computing Machinery</publisher-name><fpage>301</fpage><lpage>304</lpage><pub-id pub-id-type="doi">10.1145/1291233.1291297</pub-id></nlm-citation></ref><ref id="ref15"><label>15</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Cai</surname><given-names>D</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>C</given-names> </name><name name-style="western"><surname>He</surname><given-names>X</given-names> </name></person-group><article-title>Unsupervised feature selection for multi-cluster data</article-title><source>KDD &#x2019;10: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source><year>2010</year><publisher-name>Association for Computing Machinery</publisher-name><fpage>333</fpage><lpage>342</lpage><pub-id pub-id-type="doi">/10.1145/1835804.1835848</pub-id></nlm-citation></ref><ref id="ref16"><label>16</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>He</surname><given-names>X</given-names> </name><name name-style="western"><surname>Cai</surname><given-names>D</given-names> </name><name name-style="western"><surname>Niyogi</surname><given-names>P</given-names> </name></person-group><person-group person-group-type="editor"><name name-style="western"><surname>Weiss</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Sch&#x00F6;lkopf</surname><given-names>B</given-names> </name><name name-style="western"><surname>Platt</surname><given-names>J</given-names> </name></person-group><article-title>Laplacian score for feature selection</article-title><source>Advances in Neural Information Processing Systems 18 (NIPS 2005)</source><year>2005</year><access-date>2024-07-15</access-date><publisher-name>MIT Press</publisher-name><comment><ext-link ext-link-type="uri" xlink:href="https://papers.nips.cc/paper_files/paper/2005/hash/b5b03f06271f8917685d14cea7c6c50a-Abstract.html">https://papers.nips.cc/paper_files/paper/2005/hash/b5b03f06271f8917685d14cea7c6c50a-Abstract.html</ext-link></comment></nlm-citation></ref><ref id="ref17"><label>17</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>T</given-names> </name><name name-style="western"><surname>Guestrin</surname><given-names>C</given-names> </name></person-group><article-title>XGBoost: a scalable tree boosting system</article-title><source>KDD &#x2019;16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source><year>2016</year><publisher-name>Association for Computing Machinery</publisher-name><fpage>785</fpage><lpage>794</lpage><pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id></nlm-citation></ref><ref id="ref18"><label>18</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lundberg</surname><given-names>SM</given-names> </name><name name-style="western"><surname>Erion</surname><given-names>G</given-names> </name><name name-style="western"><surname>Chen</surname><given-names>H</given-names> </name><etal/></person-group><article-title>From local explanations to global understanding with explainable AI for trees</article-title><source>Nat Mach Intell</source><year>2020</year><month>01</month><volume>2</volume><issue>1</issue><fpage>56</fpage><lpage>67</lpage><pub-id pub-id-type="doi">10.1038/s42256-019-0138-9</pub-id><pub-id pub-id-type="medline">32607472</pub-id></nlm-citation></ref><ref id="ref19"><label>19</label><nlm-citation citation-type="web"><person-group person-group-type="author"><name name-style="western"><surname>Ghasemi</surname><given-names>P</given-names> </name></person-group><article-title>Unsupervised feature selection to identify important ICD-10 and ATC codes for machine learning</article-title><access-date>2023-09-16</access-date><publisher-name>GitHub</publisher-name><comment><ext-link ext-link-type="uri" xlink:href="https://github.com/data-intelligence-for-health-lab/ICD10-Unsupervised-Feature-Selection">https://github.com/data-intelligence-for-health-lab/ICD10-Unsupervised-Feature-Selection</ext-link></comment></nlm-citation></ref><ref id="ref20"><label>20</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sun</surname><given-names>X</given-names> </name><name name-style="western"><surname>Xu</surname><given-names>W</given-names> </name></person-group><article-title>Fast implementation of Delong&#x2019;s algorithm for comparing the areas under correlated receiver operating characteristic curves</article-title><source>IEEE Signal Process Lett</source><year>2014</year><month>11</month><volume>21</volume><issue>11</issue><fpage>1389</fpage><lpage>1393</lpage><pub-id pub-id-type="doi">10.1109/LSP.2014.2337313</pub-id></nlm-citation></ref><ref id="ref21"><label>21</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Hajar</surname><given-names>R</given-names> </name></person-group><article-title>Risk factors for coronary artery disease: historical perspectives</article-title><source>Heart Views</source><year>2017</year><volume>18</volume><issue>3</issue><fpage>109</fpage><lpage>114</lpage><pub-id pub-id-type="doi">10.4103/HEARTVIEWS.HEARTVIEWS_106_17</pub-id><pub-id pub-id-type="medline">29184622</pub-id></nlm-citation></ref><ref id="ref22"><label>22</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Mamas</surname><given-names>MA</given-names> </name><name name-style="western"><surname>Brown</surname><given-names>SA</given-names> </name><name name-style="western"><surname>Sun</surname><given-names>LY</given-names> </name></person-group><article-title>Coronary artery disease in patients with cancer: it&#x2019;s always the small pieces that make the bigger picture</article-title><source>Mayo Clin Proc</source><year>2020</year><month>09</month><volume>95</volume><issue>9</issue><fpage>1819</fpage><lpage>1821</lpage><pub-id pub-id-type="doi">10.1016/j.mayocp.2020.07.006</pub-id><pub-id pub-id-type="medline">32861320</pub-id></nlm-citation></ref><ref id="ref23"><label>23</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Denfeld</surname><given-names>QE</given-names> </name><name name-style="western"><surname>Turrise</surname><given-names>S</given-names> </name><name name-style="western"><surname>MacLaughlin</surname><given-names>EJ</given-names> </name><etal/></person-group><article-title>Preventing and managing falls in adults with cardiovascular disease: a scientific statement from the American Heart Association</article-title><source>Circ Cardiovasc Qual Outcomes</source><year>2022</year><month>06</month><volume>15</volume><issue>6</issue><fpage>e000108</fpage><pub-id pub-id-type="doi">10.1161/HCQ.0000000000000108</pub-id><pub-id pub-id-type="medline">35587567</pub-id></nlm-citation></ref><ref id="ref24"><label>24</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Gesualdo</surname><given-names>M</given-names> </name><name name-style="western"><surname>Scicchitano</surname><given-names>P</given-names> </name><name name-style="western"><surname>Carbonara</surname><given-names>S</given-names> </name><etal/></person-group><article-title>The association between cardiac and gastrointestinal disorders: causal or casual link?</article-title><source>J Cardiovasc Med (Hagerstown)</source><year>2016</year><month>05</month><volume>17</volume><issue>5</issue><fpage>330</fpage><lpage>338</lpage><pub-id pub-id-type="doi">10.2459/JCM.0000000000000351</pub-id><pub-id pub-id-type="medline">26702598</pub-id></nlm-citation></ref><ref id="ref25"><label>25</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ariel</surname><given-names>H</given-names> </name><name name-style="western"><surname>Cooke</surname><given-names>JP</given-names> </name></person-group><article-title>Cardiovascular risk of proton pump inhibitors</article-title><source>Methodist Debakey Cardiovasc J</source><year>2019</year><volume>15</volume><issue>3</issue><fpage>214</fpage><lpage>219</lpage><pub-id pub-id-type="doi">10.14797/mdcj-15-3-214</pub-id><pub-id pub-id-type="medline">31687101</pub-id></nlm-citation></ref><ref id="ref26"><label>26</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sherwood</surname><given-names>MW</given-names> </name><name name-style="western"><surname>Melloni</surname><given-names>C</given-names> </name><name name-style="western"><surname>Jones</surname><given-names>WS</given-names> </name><name name-style="western"><surname>Washam</surname><given-names>JB</given-names> </name><name name-style="western"><surname>Hasselblad</surname><given-names>V</given-names> </name><name name-style="western"><surname>Dolor</surname><given-names>RJ</given-names> </name></person-group><article-title>Individual proton pump inhibitors and outcomes in patients with coronary artery disease on dual antiplatelet therapy: a systematic review</article-title><source>J Am Heart Assoc</source><year>2015</year><month>10</month><day>29</day><volume>4</volume><issue>11</issue><fpage>e002245</fpage><pub-id pub-id-type="doi">10.1161/JAHA.115.002245</pub-id><pub-id pub-id-type="medline">26514161</pub-id></nlm-citation></ref><ref id="ref27"><label>27</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ishiyama</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Hoshide</surname><given-names>S</given-names> </name><name name-style="western"><surname>Mizuno</surname><given-names>H</given-names> </name><name name-style="western"><surname>Kario</surname><given-names>K</given-names> </name></person-group><article-title>Constipation-induced pressor effects as triggers for cardiovascular events</article-title><source>J Clin Hypertens (Greenwich)</source><year>2019</year><month>03</month><volume>21</volume><issue>3</issue><fpage>421</fpage><lpage>425</lpage><pub-id pub-id-type="doi">10.1111/jch.13489</pub-id><pub-id pub-id-type="medline">30761728</pub-id></nlm-citation></ref><ref id="ref28"><label>28</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Majeed</surname><given-names>MH</given-names> </name><name name-style="western"><surname>Ali</surname><given-names>AA</given-names> </name><name name-style="western"><surname>Khalil</surname><given-names>HA</given-names> </name><name name-style="western"><surname>Bacon</surname><given-names>D</given-names> </name><name name-style="western"><surname>Imran</surname><given-names>HM</given-names> </name></person-group><article-title>A review of the pharmacological management of chronic pain in patients with heart failure</article-title><source>Innov Clin Neurosci</source><year>2019</year><month>11</month><day>1</day><volume>16</volume><issue>11-12</issue><fpage>25</fpage><lpage>27</lpage><pub-id pub-id-type="medline">32082939</pub-id></nlm-citation></ref><ref id="ref29"><label>29</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Baoqi</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Dan</surname><given-names>M</given-names> </name><name name-style="western"><surname>Xingxing</surname><given-names>Z</given-names> </name><etal/></person-group><article-title>Effect of anti-rheumatic drugs on cardiovascular disease events in rheumatoid arthritis</article-title><source>Front Cardiovasc Med</source><year>2021</year><month>02</month><day>3</day><volume>8</volume><fpage>812631</fpage><pub-id pub-id-type="doi">10.3389/fcvm.2021.812631</pub-id><pub-id pub-id-type="medline">35187113</pub-id></nlm-citation></ref><ref id="ref30"><label>30</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sholter</surname><given-names>DE</given-names> </name><name name-style="western"><surname>Armstrong</surname><given-names>PW</given-names> </name></person-group><article-title>Adverse effects of corticosteroids on the cardiovascular system</article-title><source>Can J Cardiol</source><year>2000</year><month>04</month><volume>16</volume><issue>4</issue><fpage>505</fpage><lpage>511</lpage><pub-id pub-id-type="medline">10787466</pub-id></nlm-citation></ref><ref id="ref31"><label>31</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Shulman</surname><given-names>M</given-names> </name><name name-style="western"><surname>Miller</surname><given-names>A</given-names> </name><name name-style="western"><surname>Misher</surname><given-names>J</given-names> </name><name name-style="western"><surname>Tentler</surname><given-names>A</given-names> </name></person-group><article-title>Managing cardiovascular disease risk in patients treated with antipsychotics: a multidisciplinary approach</article-title><source>J Multidiscip Healthc</source><year>2014</year><month>10</month><day>31</day><volume>7</volume><fpage>489</fpage><lpage>501</lpage><pub-id pub-id-type="doi">10.2147/JMDH.S49817</pub-id><pub-id pub-id-type="medline">25382979</pub-id></nlm-citation></ref><ref id="ref32"><label>32</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Cazzola</surname><given-names>M</given-names> </name><name name-style="western"><surname>Matera</surname><given-names>MG</given-names> </name><name name-style="western"><surname>Donner</surname><given-names>CF</given-names> </name></person-group><article-title>Inhaled beta2-adrenoceptor agonists: cardiovascular safety in patients with obstructive lung disease</article-title><source>Drugs</source><year>2005</year><volume>65</volume><issue>12</issue><fpage>1595</fpage><lpage>1610</lpage><pub-id pub-id-type="doi">10.2165/00003495-200565120-00001</pub-id><pub-id pub-id-type="medline">16060696</pub-id></nlm-citation></ref><ref id="ref33"><label>33</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Son</surname><given-names>YJ</given-names> </name><name name-style="western"><surname>Kwon</surname><given-names>BE</given-names> </name></person-group><article-title>Overactive bladder is a distress symptom in heart failure</article-title><source>Int Neurourol J</source><year>2018</year><month>06</month><volume>22</volume><issue>2</issue><fpage>77</fpage><lpage>82</lpage><pub-id pub-id-type="doi">10.5213/inj.1836120.060</pub-id><pub-id pub-id-type="medline">29991228</pub-id></nlm-citation></ref><ref id="ref34"><label>34</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Jamian</surname><given-names>L</given-names> </name><name name-style="western"><surname>Wheless</surname><given-names>L</given-names> </name><name name-style="western"><surname>Crofford</surname><given-names>LJ</given-names> </name><name name-style="western"><surname>Barnado</surname><given-names>A</given-names> </name></person-group><article-title>Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record</article-title><source>Arthritis Res Ther</source><year>2019</year><month>12</month><day>30</day><volume>21</volume><issue>1</issue><fpage>305</fpage><pub-id pub-id-type="doi">10.1186/s13075-019-2092-7</pub-id><pub-id pub-id-type="medline">31888720</pub-id></nlm-citation></ref><ref id="ref35"><label>35</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lucini</surname><given-names>FR</given-names> </name><name name-style="western"><surname>Stelfox</surname><given-names>HT</given-names> </name><name name-style="western"><surname>Lee</surname><given-names>J</given-names> </name></person-group><article-title>Deep learning-based recurrent delirium prediction in critically ill patients</article-title><source>Crit Care Med</source><year>2023</year><month>04</month><day>1</day><volume>51</volume><issue>4</issue><fpage>492</fpage><lpage>502</lpage><pub-id pub-id-type="doi">10.1097/CCM.0000000000005789</pub-id><pub-id pub-id-type="medline">36790184</pub-id></nlm-citation></ref><ref id="ref36"><label>36</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Strypsteen</surname><given-names>T</given-names> </name><name name-style="western"><surname>Bertrand</surname><given-names>A</given-names> </name></person-group><article-title>End-to-end learnable EEG channel selection for deep neural networks with Gumbel-softmax</article-title><source>J Neural Eng</source><year>2021</year><month>07</month><day>20</day><volume>18</volume><issue>4</issue><pub-id pub-id-type="doi">10.1088/1741-2552/ac115d</pub-id><pub-id pub-id-type="medline">34225257</pub-id></nlm-citation></ref></ref-list><app-group><supplementary-material id="app1"><label>Multimedia Appendix 1</label><p>The percentages of the 20 most common <italic>ICD-10-CA</italic> and ATC codes present in the processed data sets. ATC: Anatomical Therapeutic Chemical; <italic>ICD-10-CA</italic>: <italic>International Classification of Diseases, Tenth Revision, Canada</italic>.</p><media xlink:href="medinform_v12i1e52896_app1.png" xlink:title="PNG File, 70 KB"/></supplementary-material><supplementary-material id="app2"><label>Multimedia Appendix 2</label><p><italic>P</italic> values associated with <xref ref-type="table" rid="table3">Tables 3</xref> and <xref ref-type="table" rid="table4">4</xref>, and characteristics of the selected features.</p><media xlink:href="medinform_v12i1e52896_app2.docx" xlink:title="DOCX File, 21 KB"/></supplementary-material><supplementary-material id="app3"><label>Multimedia Appendix 3</label><p>Shapley value plots of the features selected by our methods across the different data sets (20 most important features).</p><media xlink:href="medinform_v12i1e52896_app3.docx" xlink:title="DOCX File, 925 KB"/></supplementary-material><supplementary-material id="app4"><label>Multimedia Appendix 4</label><p>Tables of the features selected by our methods across the different data sets with each feature&#x2019;s description, rank, and average absolute Shapley score.</p><media xlink:href="medinform_v12i1e52896_app4.docx" xlink:title="DOCX File, 187 KB"/></supplementary-material></app-group></back></article>