Published on in Vol 13 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/74170, first published .
Hypertension Medication Recommendation via Synergistic and Selective Modeling of Heterogeneous Medical Entities: Development and Evaluation Study of a New Model

Hypertension Medication Recommendation via Synergistic and Selective Modeling of Heterogeneous Medical Entities: Development and Evaluation Study of a New Model

Hypertension Medication Recommendation via Synergistic and Selective Modeling of Heterogeneous Medical Entities: Development and Evaluation Study of a New Model

Authors of this article:

Ke Zhang1 Author Orcid Image ;   Zhichang Zhang1 Author Orcid Image ;   Yali Liang1 Author Orcid Image ;   Wei Wang1 Author Orcid Image ;   Xia Wang2 Author Orcid Image

1College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou, China

2Gansu Provincial Hospital, Lanzhou, China

Corresponding Author:

Zhichang Zhang, PhD


Background: Electronic health records (EHRs) contain comprehensive information regarding diagnoses, clinical procedures, and prescribed medications. This makes them a valuable resource for developing automated hypertension medication recommendation systems. Within this field, existing research has used machine learning approaches, leveraging demographic characteristics and basic clinical indicators, or deep learning techniques, which extract patterns from EHR data, to predict optimal medications or improve the accuracy of recommendations for common antihypertensive medication categories. However, these methodologies have significant limitations. They rarely adequately characterize the synergistic relationships among heterogeneous medical entities, such as the interplay between comorbid conditions, laboratory results, and specific antihypertensive agents. Furthermore, given the chronic and fluctuating nature of hypertension, effective medication recommendations require dynamic adaptation to disease progression over time. However, current approaches either lack rigorous temporal modeling of EHR data or fail to effectively integrate temporal dynamics with interentity relationships, resulting in the generation of recommendations that are not clinically appropriate due to the neglect of these critical factors.

Objective: This study aims to overcome the challenges in existing methods and introduce a novel model for hypertension medication recommendation that leverages the synergy and selectivity of heterogeneous medical entities.

Methods: First, we used patient EHR data to construct both heterogeneous and homogeneous graphs. The interentity synergies were captured using a multihead graph attention mechanism to enhance entity-level representations. Next, a bidirectional temporal selection mechanism calculated selective coefficients between current and historical visit records and aggregated them to form refined visit-level representations. Finally, medication recommendation probabilities were determined based on these comprehensive patient representations.

Results: Experimental evaluations on the real-world datasets Medical Information Mart for Intensive Care (MIMIC)-III v1.4 and MIMIC-IV v2.2 demonstrated that the proposed model achieved Jaccard similarity coefficients of 58.01% and 55.82%, respectively; areas under the curve of precision-recall of 83.56% and 80.69%, respectively; and F1-scores of 68.95% and 64.83%, respectively, outperforming the baseline models.

Conclusions: The findings indicate the superior efficacy of the introduced model in medication recommendation, highlighting its potential to enhance clinical decision-making in the management of hypertension. The code for the model has been released on GitHub.

JMIR Med Inform 2025;13:e74170

doi:10.2196/74170

Keywords



Background

Hypertension represents a prevalent chronic condition and serves as a significant contributor to cardiovascular mortality, making timely pharmacological intervention for blood pressure management crucial [1]. With the increasing trend of an aging population, the growing number of patients with hypertension has placed a significant burden on health care systems [2]. Consequently, automated medication recommendation systems for hypertension have been developed.

Prior Work and Limitations

Early hypertension medication recommendation methods were primarily rule-based. For example, Wu and Xie [3] developed a hypertension ontology and reasoning rules to recommend appropriate antihypertensive medications to patients. However, these methods relied only on predefined rules and a limited set of case data, neglecting other critical patient information, which resulted in recommendations that lacked flexibility and personalization. In recent years, numerous neural network models for recommending hypertension medications based on electronic health records (EHRs) have been proposed [4,5]. These models have shown improved outcomes and effectively address many limitations inherent in earlier algorithms.

Nevertheless, the complexity of EHR data continues to present significant challenges for medication recommendation tasks, particularly in 2 critical areas: insufficient synergy among heterogeneous medical entities and neglect of temporal dynamics in the patient’s condition.

Insufficient Synergy Among Heterogeneous Medical Entities

EHR data contains heterogeneous but interrelated medical entities—such as diagnoses, procedures, and medications—that jointly influence treatment outcomes. For example, a diagnosis indicates a patient’s health status, which subsequently informs procedure and medication decisions. Effective modeling of such cross-entity relationships is essential for generating accurate and personalized treatment recommendations.

Existing approaches like Multilevel Medical Embedding (MiME) [6] and graph convolutional transformer (GCT) [7] attempt to model medical concepts and their causal relations using homogeneous graph structures. However, these structures fail to capture the inherent heterogeneity across EHR entities. The Heterogeneous Information Network for Medical Diagnosis (HeteroMed) [8] introduces semantic associations via metapaths, but it does not model dynamic interentity interactions. CausalMed [9] explores causal inferences between treatment elements but overlooks the reinforcement between medications and procedures. Graph transformers of bidirectional encoder representations from transformers on EHRs (GT-BEHRT) [10] integrates graph transformers with temporal modeling yet does not incorporate contextual synergy among entities.

Further, current models inadequately balance efficacy and safety. For instance, Graph-Augmented Memory Networks (GAMENet) [11] uses a drug-drug interaction (DDI) graph to reduce adverse effects but does not consider diagnostic and procedural inputs. Graph-augmented bidirectional encoder representations from transformers (G-BERT) [12] extends to multientity modeling yet omits DDI-aware safety metrics. Competitive neural network (CompNet) [13] faces issues with computational efficiency and model stability on large-scale datasets. Recent work like that by Yang et al [14] (using medication molecular structure graphs) and Li et al [15] (using contrastive learning) enhance representation power, but they still lack a unified mechanism for modeling both “multientity synergy” and “safety constraints.”

Neglect of Temporal Dynamics in the Patient’s Condition

As a chronic disease, hypertension evolves over time. In clinical practice, treatment strategies are often adjusted based on both current and historical visit data. Hence, capturing the temporal dynamics of the patient’s condition is essential for effective medication recommendation.

Several studies have addressed sequential modeling. Yang et al [16] used dual medical sequences to represent medication history but did not account for intersequence relationships. Liu et al [17] developed 3 long short-term memory (LSTM) variants to model correlations across medical sequences but failed to consider the immediate influence of current clinical status. Although models like the Reverse Time Attention model (RETAIN) [18] use attention mechanisms to prioritize relevant historical visits and An et al [19] introduced hierarchical temporal modeling, these methods primarily focus on general temporal behavior without consideration for hypertension-specific patterns like long-term management or phased treatment adjustment.

Advanced designs have attempted to improve modeling capabilities. Le et al [20] used a memory-enhanced neural network to represent long-term dependencies, while Yang et al [21] adopted a residual mechanism to capture patient status transitions. Wu et al [22] introduced a transformer with a “copy-generate” mechanism to decide whether to reuse previous prescriptions. However, none of these methods fully addresses the task-specific temporal dynamics required for hypertension medication recommendations.

Objective and Contributions

To address the aforementioned limitations, this study proposed CSRec, a novel hypertension medication recommendation framework that integrates the synergistic interactions of heterogeneous medical entities with selective modeling of temporal progression. CSRec is designed to capture cross-entity synergies through both heterogeneous and homogeneous graph construction based on EHR data, model time-aware patient representations using a temporal selectivity module that weighs current versus historical visits, and enhance safety by incorporating DDI information in the recommendation process.

The primary contributions of this paper are summarized in the following paragraphs.

We propose a novel medication recommendation model, named CSRec, specifically designed for hypertension treatment. CSRec effectively integrates the synergistic interactions and selective characteristics among diverse medical entities. By constructing a heterogeneous medical entity graph derived from EHRs, our model utilizes a graph attention mechanism to generate enhanced collaborative embeddings among medical entities. Additionally, a temporal selection mechanism was incorporated to simulate hypertension progression, thereby producing a comprehensive patient representation to facilitate accurate medication recommendations.

To better capture synergistic relationships between medical entities, we innovatively modified the traditional graph attention network (GAT) to focus more on neighboring node information, thus obtaining a more aggregated representation of the principal nodes.

Extensive experiments on the publicly available datasets Medical Information Mart for Intensive Care (MIMIC)-III [23] and MIMIC-IV [24] were conducted to validate the superiority and effectiveness of our proposed method.


Overview

In this section, we present a comprehensive description of the CSRec model’s structure. As depicted in Figure 1, our model comprises 3 core modules for end-to-end hypertension medication recommendations: (1) The Heterogeneous Collaborative Module, with heterogeneous and homogeneous graph networks, learns entity-level representations by aggregating collaboration patterns between medical entities, which are then passed to the Temporal Selectivity Module; (2) the Temporal Selectivity Module, using a bidirectional selection mechanism, processes these entity-level representations to calculate relevance coefficients between current and previous visits, generating visit-level entity representations that are transmitted to the Interaction Prediction Module; and (3) the Interaction Prediction Module concatenates these visit-level entity representations to enrich entity information, forming a patient representation. This is converted into medication recommendation probabilities, with medications exceeding a threshold output as results.

Figure 1. The framework of CSRec. Diag: diagnosis; GAT: graph attention network; GRU: gated recurrent unit; Med: medication; MLP: multilayer perceptron; Proc: procedures.

Problem Formulation

Electronic Health Records

EHRs encompass a variety of medical visit information collected from patients. For a specific patient, EHRs can be structured into a sequence comprising multiple clinical visit records, represented as V=V1,V2,,VT, where Vt represents the t-th visit and T indicates the total number of visits for that patient. Specifically, each clinical visit Vt can be represented as Vt=Vdt,Vpt,Vmt, in which Vdt0,1D, Vpt0,1P, and Vmt0,1M represent multihot encoded vectors corresponding to diagnosis, procedures, and medications, respectively. Here, the notation indicates the total number of distinct categories within each respective medical entity type.

Heterogeneous Medical Entities

In this paper, distinct medications were modeled as medication entities. Each medication recorded in the EHR corresponds to a unique medication entity, identified by a specific identifier denoted as m1,m2,, and each entity is independently embedded within the model using embeddings of identical dimensionality. Likewise, diagnoses and procedures were categorized into diagnostic entities and procedural entities, respectively. Collectively, these 3 categories of entities were termed heterogeneous medical entities.

Hypertension Medication Recommendation

Based on the patient’s current diagnostic information Vdt, procedural data Vpt, historical visit sequence V=V1,V2,,VT1, and heterogeneous medical entity graph G, the model recommends appropriate antihypertensive medications y^t0,1Μ to the patient.

Heterogeneous Collaborative Module

For a specific patient, we first constructed a medication homogeneous graph, which was explicitly designed to characterize relationships between medications and provide a foundation for subsequent cross-entity modeling. This graph comprised 2 components, both of which take medications as the sole node type.

One component is the medication collaboration graph Gmmt1=Mt1,Amm, which serves to capture the patterns of combined medication use in clinical practice. Specifically, Mt1 denotes the collection of all prescribed medications from the patient’s previous visits, and Amm signifies the adjacency matrix representing collaborative interactions among medications. Each entry within this adjacency matrix indicates the initial collaboration weight between the respective medication nodes. The detailed procedure for generating this matrix is illustrated in Figure 2. Initially, Amm is set as a zero matrix. If medication i and medication j co-occur during a specific visit, then Ai,j=1; if medication i and j co-occur across multiple visits, the corresponding value in Amm is incremented, with higher values indicating stronger collaboration between the medications.

Figure 2. The medication-medication synergy matrix construction process.

Each medication node mMt1 in the graph corresponds to its initial embedding vector, expressed as:

em=Vmt1Em(1)

Specifically, Em refers to the embedding matrix associated with medication entities. The selector vector Vmt1 specifically extracts the relevant embedding from Em for the medication at the previous time step t-1.

Moreover, some medications may have harmful interactions (DDI) and should be avoided when used together. To address this, we integrated the medication collaboration graph Gmm with the medication safety graph Gddi , enhancing the comprehensive representation of the medication nodes. Notably, the methodology used to construct the medication safety graph Gddi closely mirrors that of Gmm. Specifically, we used Addi as the adjacency matrix, where Addii,j=1 signifies that the i-th and j-th medications exhibit a paired harmful medication interaction.

Gm=GmmλGddi(2)

Subsequently, we adopted a GAT-based graph neural approach to obtain embeddings for medication nodes within the medication graph Gm.

First, we utilized the attention mechanism to calculate the attention coefficients between a node and its neighbors, followed by normalization:

αij(m)=expgijmkNi(m)expgikm(3)

Here, gijm=LeakyLUβmThimhjm denotes the output of the LeakyReLU activation function applied to the linear transformation of the concatenated feature vectors himand hjm, parameterized by the learnable weight vector βm. The notation Ni(m) represents the neighboring node set of node mi in the graph Gm, while signifies the vector concatenation operation.

Second, we used the calculated attention coefficients to perform weighted aggregation of neighbor nodes, thereby obtaining the representation of node mi:

eim=σjNimγijmhjm(4)

To overcome the limitations posed by single-view attention, we further introduced a multihead attention mechanism, using a linear layer to map node representations into multiple subspaces, then aggregate representations under each subspace. Formally, this operation is expressed as:

eim=h=1HσjNimhγijmhhjmh(5)

In this equation, H denotes the total number of attention heads, the superscript h indicates the current attention head index, and represents concatenating outputs from different heads. Through this strategy, distinct node-specific information is captured across various dimensions and integrated from multiple subspaces, thereby significantly improving the precision and robustness of the learned node representations.

During the training process, each medication entity is updated according to the aforementioned steps, resulting in an aggregated representation of the medication set Vm=vm1,vm2,vmt1.

Similarly, to capture the unique clinical relationships of hypertension, such as the interactions between comorbidities, long-term monitoring data, and combination therapy dynamics, we defined 3 heterogeneous complete graphs: diagnosis-procedure graph Gdpt=Dt,Pt,Adp, medication-diagnosis graph Gmdt=Mt1,Dt,Amd, and medication-procedure graph Gmpt=Mt1,Pt,Amp. This established an association model that is crucial for accurate hypertension medication recommendations but has low correlation with other medication categories. Following a similar learning approach to the medication graph Gm, we first initialized the diagnosis and procedure nodes in the 3 heterogeneous graphs.

ed=VdtEd,ep=VptEp(6)

Using the approach outlined in equations (3)-(5), we then learned from the diagnosis-procedure graph to obtain aggregated sets of diagnosis codes Vd=vd1,vd2,vdt and procedure nodes Vp=vp1,vp2,vpt.

Finally, based on the aggregated medication nodes from the medication graph Gm and the learned diagnosis and procedure nodes from the diagnosis-procedure graph Gdp, we learned from the medication-diagnosis graph Gmd and the medication-procedure graph Gmp, updating to obtain entity-level representations of the medication, diagnosis, and procedure sets:

VD={Vd1,Vd2,Vdt}
VP={Vp1,Vp2,Vpt}
VM={Vm1,Vm2,Vmt1}(7)

It is noteworthy that this study used a multihead GAT mechanism to learn and update nodes within the graph. Although traditional GAT models consider intrinsic features and aggregate neighboring features, our proposed heterogeneous graph-based method places greater emphasis on enhancing the embeddings of medications, diagnoses, and procedures by explicitly modeling interactions among diverse medical entities. Consequently, our model focuses more intensively on neighboring nodes to prevent excessive node information merging. To this end, we modified the standard GAT approach to explicitly focus on information derived from neighboring nodes within the heterogeneous graph learning process.

Temporal Selectivity Module

To effectively model the temporal evolution of patient health conditions, this paper introduces a bidirectional temporal selection mechanism using gated recurrent units (GRUs), with 3 key innovations that distinguish it from conventional applications.

First, we adopted a bidirectional temporal architecture to capture multiscale temporal dependencies. Specifically, we first used GRUα to learn the diagnostic sequence, generating forward diagnostic selection coefficients that emphasize the impact of historical visits on current states (eg, past hypertensive crisis records influencing present medication titration):

g1,g2,,gt=GRUdαVd1,Vd2,,Vdt (8)
αj=tanhWαgj+bα,j=1,,t (9)

Concurrently, a backward GRUβ was used in the reverse temporal order to learn the diagnostic sequence, generating backward selection coefficients at different time steps that highlight recent critical changes. For example, recent fluctuations in blood pressure require immediate therapeutic adjustment. This bidirectional design enhances computational stability while overcoming the limitations of unidirectional GRUs or traditional recurrent neural networks (RNNs), which often overlook either long-term or short-term temporal cues:

ht,ht1,,h1=GRUdβVdt,Vdt1,,Vd1(10)
βj=tanhshrinkWβhj+bβ,j=t,,1(11)

Second, we proposed an adaptive selection coefficient integration strategy. Based on the generated bidirectional diagnostic selection coefficients, we can capture key visit information and entity information within the visit sequence, rather than relying on static aggregation methods. This allowed us to capture key visit information and entity interactions, thereby obtaining a diagnostic representation that integrates historical context with current needs:

dt=j=1tαjβjVdj(12)

Third, leveraging GRU’s inherent advantages, our design achieved computational efficiency without sacrificing performance. Compared with traditional RNNs, GRU effectively mitigates gradient vanishing issues; relative to LSTMs, its simplified gating mechanism reduces parameter complexity by avoiding redundant memory cells, resulting in faster training efficiency, a critical advantage for handling large-scale longitudinal EHR data in medication recommendation tasks. After a series of similar processing steps, we obtained the patient’s final procedural and medication representations.

pt=j=1tαjβjVpj,mt1=j=1t1αjβjVmj(13)

Interaction Prediction Module

Based upon the outputs generated by the aforementioned modules, for the patient’s t-th visit, we concatenated the medical entity sequence to make medication recommendations:

yt=σ([dt;pt;mt1])(14)

In our approach, the medication recommendation task is formulated as a multilabel classification problem [25,26]. To address the complexities and potential imbalances in medical datasets, we used a comprehensive strategy during model training. We enhanced the model’s generalization capabilities and mitigated overfitting through the use of regularization techniques, which limit the complexity of the parameters learned. Additionally, an early-stopping mechanism was implemented to curtail training based on validation set performance. To optimize parameters, we used the Adam optimization algorithm [27], which minimizes the binary cross-entropy loss function to promote efficient and stable model convergence.

L=t=1Ti=1|M|yitlogy^it+(1yit)log(1y^it)(15)

Ethical Considerations

This study made use of the standardized, publicly available MIMIC-III and MIMIC-IV datasets from the Massachusetts Institute of Technology [23,24] and was therefore deemed exempt from ethical approval requirements. Prior to their release, these datasets underwent comprehensive ethical review and privacy protection processes conducted by the data provider (Massachusetts Institute of Technology). These processes included deidentifying all personally identifiable information in patients’ EHRs, such as names, hospital admission numbers, and dates of birth. Additionally, the datasets’ usage license explicitly covers secondary analysis scenarios for academic research, eliminating the need for users to obtain additional ethical approval independently. Since this study did not involve independent collection of human subject data and solely relied on the aforementioned publicly available and compliant existing datasets for secondary analysis, we do not possess separate ethical approval documents to provide. The ethical approval statement is published on the official website of the data provider [23,24], and we can be contacted to obtain the original copy of the datasets’ usage license.


Dataset Description

Our experiments were conducted using the MIMIC-III v1.4 and MIMIC-IV v2.2 datasets provided by the Massachusetts Institute of Technology. The datasets comprise medical records collected from patients admitted to intensive care units, such as diagnoses, procedures, and medications. The diagnostic and procedural information uses the International Classification of Diseases, Ninth Revision (ICD-9) coding system. To study medication recommendations for patients with hypertension, this paper extracted relevant data from the aforementioned datasets based on the ICD-9 hypertension codes under the guidance of clinical experts. Meanwhile, referring to previous research norms [4,21,22], patients who completed at least two visits were included, and medications with a frequency <2000 occurrences were excluded.

During data analysis, we observed that the number of frequently used antihypertensive medications was relatively limited and the recommendations based on overly broad Anatomical Therapeutic Chemical (ATC) categories often lacked the specificity needed for precise clinical decision-making. Given the systematic medication classification framework of the ATC classification system, this study overcame the limitation of previous studies that only focused on the ATC04 level for recommendation. It not only predicted the medication categories at the ATC04 level (such as C02A, anti-adrenergic medications with central effects) but also further refined to the specific medication types at the ATC05 level (such as C02AA, reserpine-like medications).

After data extraction, we carried out meticulous preprocessing. We normalized features to standardize the scale of different variables, which prevented certain features from dominating others during model training. Moreover, we selectively chose variables most relevant to hypertension medications. Finally, the preprocessed dataset was partitioned into training, validation, and testing subsets according to a ratio of 23:16:16.

Detailed information about the dataset used in the experiments and examples from patients are shown in Table 1 and Table 2.

Table 1. Detailed information for the experimental datasets, by Anatomical Therapeutic Chemical (ATC)–level encoding.
ItemMIMIC-IIIaMIMIC-IV
ATC04ATC05ATC04ATC05
Patients, n3115310919,60919,609
Visits, n7308726355,23955,236
Diagnoses, n1966196520002000
Procedures, n1145114554885488
Medications, n14181418
Number of visits, mean2.34602.33612.81692.8169
Number of diagnoses, mean10.892710.92629.36669.3666
Number of procedures, mean4.04634.06092.54922.5492
Number of medications, mean1.55081.44001.08041.1352

aMIMIC: Medical Information Mart for Intensive Care.

Table 2. Samples from electronic health records.
Sub_IDHadm_IDDiagnoses (Anatomical Therapeutic Chemical code)ProceduresMedications
10001217245970183240, 3484, 3485, 5180, 340, 04109, 3051, 4019, V168,V161139, 331, 3897HydrALAzine, LeVETiracetam, Vancomycin ,Bisacodyl, Meropenem,…
10001217277035173240, 3485, 340, 04102, 04184, 4019, 3051139HydrALAzine, Vancomycin, Meropenem, Bisacodyl, Lidocaine,…

Evaluation Metrics

To validate the effectiveness of CSRec, the evaluation metrics described in the following sections were used.

Jaccard Similarity Coefficient

A higher Jaccard coefficient reflects greater overlap between the predicted medication set and the actual medication set.

Jaccard=1Tt=1T|yty^t||yty^t| (16)
Area Under the Curve of Precision-Recall

A high area under the curve of precision-recall (PRAUC) indicates that the model recommends appropriate medications while keeping a low error rate.

Δcall(i)t=call(i)tcall(i1)t (17)
PRAUC=1Tt=1Ti=1|M|Precision(i)tΔcall(i)t (18)
F1-Score

The elevated F1-score metric in medication recommendations signifies an optimal equilibrium between minimizing false negatives and maintaining classification specificity.

Precisiont=|yty^t||y^t|, Recallt=|yty^t||yt| (19)
F1=1Tt=1T2×Precisiont×calltPrecisiont+callt (20)
DDI Rate

A lower DDI rate ensures that the recommended combination of medications is safer in clinical practice.

DDI=1Tt=1Ti=1|y^t|j=i+1|y^t|1{Ad[y^it,y^jt]}i=1|y^t|j=i+1|y^j|1(21)

Comparative Models

The following baseline models were selected for comparison with the proposed CSRec:

  • Logistic regression (LR) uses L2 regularization.
  • Ensemble of classifier chains (ECC) [28] enhances predictions by connecting multiple classifiers, typically applied to multilabel classification scenarios for performance optimization.
  • RETAIN [18] implements bidirectional temporal attention architecture designed for sequential clinical prediction tasks like treatment recommendation.
  • Learn to Prescribe (LEAP) [29] uses RNNs to extract meaningful representations during current medical visits and generate medication sequences.
  • Dual memory neural computer (DMNC) [20] incorporates dual-memory neural components to model asynchronous therapeutic pattern interactions.
  • GAMENet [11] adopts graph-based memory architecture combining medication interaction knowledge and querying longitudinal EHR data for medication retrieval.
  • MICRON [21] analyzes EHR temporal dynamics to adaptively optimize medication combinations upon symptom evolution.
  • SafeDrug [16] evaluates molecular-level medication-patient compatibility to suggest safer therapeutic regimens.
  • COGNet [22] implements medication copy or predict strategy integrating historical effective prescriptions into current recommendations.
  • MoleRec [14] predicts medication mechanisms and interactions via molecular-patient relationships for personalized, safer recommendations.
  • CausalMed [9] identifies medical entity causal relationships via causal discovery, accounting for dynamic health condition differences to generate causally linked recommendations.

Performance Comparison

In this section, we conducted an extensive comparative analysis between CSRec and the previously described baseline medication recommendation models to evaluate its effectiveness. Experimental results on the MIMIC-III v1.4 and MIMIC-IV 2.2 datasets are shown in Table 3 (“↑” indicates a preference for larger values, “↓” indicates a preference for smaller values).

Table 3. The comparison of experimental results on datasets of Anatomical Therapeutic Chemical (ATC)04 and ATC05 codes.
ModelJaccard similarity coefficient↑aPRAUC↑bF1-score↑DDI↓c,d
ATC04ATC05ATC04ATC05ATC04ATC05ATC04ATC05
MIMIC-IIIe
LRf0.52030.49190.81170.77260.63470.59720.29510.3569
ECCg0.54380.50560.82780.78460.65930.61270.30120.3979
RETAINh0.55400.52100.81430.78740.67830.64490.39210.5068
LEAPi0.52280.50820.63460.61620.64050.62170.26270.3648
DMNCj0.53420.50970.82250.76960.64630.62770.23650.2615
GAMENetk0.54970.51690.82390.78610.67100.63080.23660.4199
MICRON0.53520.51460.82850.78840.64230.61450.23570.2658
SafeDrug0.54740.50630.82520.77640.64840.61460.2329l0.1975l
COGNet0.53250.50750.79720.77620.64480.61670.29020.3587
MoleRec0.55010.51890.82610.78250.67160.62990.27090.3491
CausalMed0.56260.53400.82260.78770.67410.63940.23400.2679
CSRec0.5801l0.5534l0.8356l0.8123l0.6895l0.6602l0.23510.2663
MIMIC-IV
LR0.42480.35180.74260.68730.51250.43150.28210.2734
ECC0.43890.37790.74980.69230.53050.46240.30640.3047
RETAIN0.53490.50450.79640.76410.63560.60080.34310.3197
LEAP0.41860.38840.50470.46830.53260.49320.0104l0.0506l
DMNC0.51390.47870.76780.72770.53920.50940.28010.2792
GAMENet0.51510.47380.77810.73260.60690.56620.28840.3050
MICRON0.42970.37160.75430.70560.51510.44310.28660.2557
SafeDrug0.51730.47670.77700.73130.61090.56480.30840.3255
COGNet0.50870.45680.74480.70090.59480.55020.32610.3835
MoleRec0.52750.48380.76240.72670.62620.58010.27330.2993
CausalMed0.54400.52250.78120.75090.64120.60990.26770.2855
CSRec0.5582l0.5307l0.8069l0.7655l0.6483l0.6195l0.27600.2931

a↑ indicates a preference for larger values.

bPRAUC: area under the curve of precision-recall.

cDDI: drug-drug interaction.

d↓ indicates a preference for smaller values.

eMIMIC: Medical Information Mart for Intensive Care.

fLR: logistic regression.

gECC: ensemble of classifier chains.

hRETAIN: Reverse Time Attention model.

iLEAP: Learn to Prescribe.

jDMNC: dual memory neural computer.

kGAMENet: Graph-Augmented Memory Networks.

lOptimal data.

After detailed analysis, we observed several key findings, as described in the following paragraphs.

The CSRec model proposed in this paper outperformed most comparative models across multiple evaluation metrics, fully demonstrating its significant effectiveness in hypertension medication recommendation.

LR and ECC performed poorly, primarily because these methods only focus on the patient’s current clinical condition, neglecting the influence of historical medical data on present treatment decisions. In contrast, the model presented in this paper, along with RETAIN, GAMENet, and other models integrating longitudinal medical histories, performed better, highlighting the importance of capturing historical medical information in hypertension medication recommendations.

In comparison with longitudinal models such as RETAIN, GAMENet, and COGNet, CSRec maintained superior performance, with a particularly notable enhancement in Jaccard scores. This superiority can be attributed to 2 core architectural advancements. The Temporal Selectivity Module enables accurate simulation of patient disease progression by assigning enhanced weights to clinically critical historical information, thereby ensuring that recommendations adhere to consistent clinical reasoning frameworks. Meanwhile, the Heterogeneous Collaborative Module facilitates effective capture of synergistic relationships among heterogeneous medical entities (diagnoses, procedures, medications), enabling the identification of medication combinations that are clinically coherent and reflective of real-world coprescription patterns. Collectively, these innovations augment the degree of overlap between recommended and actually prescribed medications, as empirically validated by the elevated Jaccard index.

Although the CSRec model introduced in this study does not achieve the highest score on the DDI metric, it still ranks as suboptimal. This is mainly because the best performing LEAP model only considers the patient’s present health status, resulting in the recommendation of fewer medications and consequently yielding the lowest DDI scores.

Ablation Study

To systematically assess the contribution and validity of each module within CSRec, ablation experiments were conducted on MIMIC-IV by comparing CSRec with its variants:

  • WO_S removes the selection module based on the cyclic mechanism.
  • WO_C removes the collaborative module based on the heterogeneous graph.
  • GAT_GCN replaces the multihead GAT in the collaborative module with a graph convolutional network (GCN).

The results displayed in Figure 3 indicate that the performance of CSRec declined when any module is removed or replaced, demonstrating that each component of CSRec is indispensable. The observed performance drops in evaluation metrics (Jaccard, PRAUC, F1-score, and DDI) can be attributed to the unique functional roles of each module: The collaborative module effectively modeled the correlations between different medical events during each visit, and its removal (WO_C) led to incomplete capture of event associations, resulting in a decrease in recommendation accuracy as medical entity representations lost contextual relevance. The selective module globally modeled the patient’s historical medical data, and its absence (WO_S) caused the model to fail at emphasizing critical time points in disease progression, leading to a decline in metrics as the temporal continuity of hypertension development was disrupted. When replacing multihead GAT with GCN (GAT_GCN), the loss of the attention mechanism resulted in insufficient differentiation of important medical entity relationships, reducing recommendation precision compared with the original model as convolutional operations cannot dynamically weight heterogeneous graph information.

Figure 3. The ablation study of different components. DDI: drug-drug interaction; GAT_GCN: variant replacing the multihead graph attention network in the collaborative module with a graph convolutional network; Jaccard: Jaccard similarity coefficient; PRAUC: area under the curve of precision-recall; WO_C: variant removing the collaborative module based on the heterogeneous graph; WO_S: variant removing the selection module based on the cyclic mechanism.

Additionally, this study further explored the impact of different types of collaborative medical entity information (diagnosis-medication, procedure-medication, diagnosis-procedure, and medication-medication) on hypertension medication recommendations. The variants were designed as follows:

  • WO_DM removes diagnosis-medication interaction information.
  • WO_PM removes procedure-medication interaction information.
  • WO_DP removes diagnosis-procedure interaction information.
  • WO_MM removes medication co-occurrence and interaction information.

As illustrated in Figure 4, CSRec’s performance decreased when any single type of collaborative medical entity information (WO_DM, WO_PM, WO_DP) was removed, confirming the strong correlation between such information and the medication recommendation task. The specific decline in metric degree corresponded to the clinical relevance of each information type: Removing diagnosis-medication interactions (WO_DM) reduced metrics, as diagnostic information directly guides first-line hypertension medication selection. The WO_PM group’s drop stemmed from lost procedure-related medication adjustments (eg, postinterventional anticoagulation needs). For WO_DP, the decline reflects disrupted diagnosis-procedure logical chains that inform therapeutic medication choices.

Notably, removing medication co-occurrence and interaction information (WO_MM) caused a significant decline in the DDI risk rate of the medication recommendation performance. This substantial change is explained by the frequent use of combination therapy in hypertension management. Without medication interaction data, the model cannot avoid contraindicated medication pairs, while the loss of co-occurrence patterns reduces the accuracy of synergistic medication recommendations. This underscores the essential role of medication co-occurrence and DDI information for ensuring hypertension treatment efficacy and safety.

Figure 4. The ablation study of different medical entities. DDI: drug-drug interaction; Jaccard: Jaccard similarity coefficient; PRAUC: area under the curve of precision-recall; WO_DM: variant removing diagnosis-medication interaction information; WO_DP: variant removing diagnosis-procedure interaction information; WO_MM: variant removing medication co-occurrence and interaction information; WO_PM: variant removing procedure-medication interaction information.

Parameter Sensitivity

In order to investigate how varying the number of attention heads in the multihead graph attention mechanism within the heterogeneous collaborative module impacted model efficacy, multiple experiments were conducted on the MIMIC-IV dataset. The outcomes of these experiments, comparing different head counts, are presented in Figure 5.

A setting of 3 attention heads (head=3) achieved optimal performance. Consequently, this configuration (head=3) was adopted consistently throughout the remaining experiments in this study. Moreover, the experimental results illustrated that variations in the number of attention heads led to minimal performance differences, highlighting the stability and reliability of the proposed model.

Furthermore, to investigate the impact of the number of included hypertension medications on model performance, we conducted a sensitivity analysis by varying the minimum occurrence threshold for medication inclusion from 500 to 3000. The number of medications ranged from 26 (≥500) to 9 (≥3000).

Figure 5. The effect of different numbers of heads in the multihead graph attention network (GAT), as measured using (A) the Jaccard similarity coefficient, (B) area under the curve of precision-recall (PRAUC), (C) F1-score, and (D) drug-drug interaction (DDI) rate.

As shown in Table 4, performance across all 4 metrics—Jaccard, PRAUC, F1-score, and DDI—remained remarkably stable, with only marginal fluctuations. Notably, the best performance was observed at the 2000 threshold, which is used in our paper. This finding supported our design choice and indicated that our framework remained robust even under different medication inclusion criteria, alleviating concerns regarding limited sample size. In addition, setting a lower threshold introduced lower-frequency medications that may lack sufficient clinical representation, potentially compromising the statistical reliability of training. Thus, the current configuration struck a reasonable trade-off between medication variety and prediction stability.

Table 4. Sensitivity analysis of the medication frequency threshold.
Minimum medication
occurrence 
Medications, nJaccard similarity coefficient↑aPRAUC↑bF1-score↑DDI↓c,d
≥500260.55310.79830.64280.2847
≥1000200.55540.80270.64560.2802
≥1500170.55760.80490.64710.2781
≥2000140.5582e0.8069e0.6483e0.2760e
≥2500120.55680.80510.64750.2766
≥300090.55600.80500.64520.2764

a↑ indicates a preference for larger values.

bPRAUC: area under the curve of precision-recall.

cDDI: drug-drug interaction.

d↓ indicates a preference for smaller values.

eOptimal data.

Case Study

To visualize and validate our core innovations, which included heterogeneous collaborative interaction modeling and the temporal selectivity mechanism, we randomly selected a patient from the test set of the MIMIC-IV dataset for a case study. Taking diagnosis and medication entities as examples, the entire learning process is shown in Figure 6.

Figure 6. Case study. DC: diagnosis code; DS: diagnosis selection coefficient; MC: medication code; MS: medication selection coefficient; Rec: recommended medication; True: prescribed medication.

Initially, in the collaborative process (a), the model’s collaborative interaction mechanism was showcased: In the visit-specific level, the model dynamically enhanced the integration of diagnostic information into medication recommendation reasoning by assigning context-aware interaction weights to the interplays between diagnostic codes (DC column) and medication codes (MC column). For the first visit of the illustrative patient, the DC column encompasses comorbidity-related codes, including chronic obstructive pulmonary disease (ICD-9: 496), hypothyroidism (ICD-9: 2449), atrial fibrillation (ICD-9: 42731), and a documented history of cardiovascular disease (ICD-9: V1254), while the MC column includes hypertension-targeted agents such as selective β1-receptor blockers (ATC-04: C07A) and dihydropyridine calcium channel blockers (ATC-04: C08C). This aligns with our innovation of capturing disease-specific collaborative patterns, evident in the consistently higher selection coefficients for hypertension-related diagnostic codes in each DC column, which prioritizes the disease context critical for hypertension treatment.

Furthermore, to demonstrate our temporal selectivity mechanism, the case study highlights how the model captures longitudinal EHR dependencies. Given hypertension’s chronic nature, recommended medications depend on both current (third visit) diagnoses (DC column) and historical data. As shown in step (b), diagnostic codes from previous visits received higher selection coefficients, while step (c) reveals that historical medication codes were similarly prioritized. This reflects our model’s ability to dynamically weight temporal information, addressing the evolving nature of hypertension and representing an advantage over static models.


This paper presents the CSRec model for hypertension medication recommendation, which is based on the collaboration and selection of heterogeneous medical entities. By effectively capturing relationships among diverse medical entities and combining the temporal evolution characteristics of clinical entities, the model provides more precise and effective medication guidance for hypertension treatment. Future research will focus on 3 main areas: (1) in-depth exploration of the multilabel imbalance problem existing in the current method, (2) further investigation of the model’s performance in addressing the cold-start problem, and (3) exploration of recommendations regarding medication dosages and medication types.

Acknowledgments

This research is supported by the Industrial Support Program of Colleges and Universities in Gansu Province (number 2025CYZC-010); National Natural Science Foundation of China (number 62163033); Talent Innovation and Entrepreneurship Project of Lanzhou, China (number 2021-RC-49); Natural Science Foundation of Gansu Province, China (numbers 22JR5RA145, 21JR7RA781, and 21JR7RA116); and Major Research Project Incubation Program of Northwest Normal University, China (number NWNU-LKZD2021-06). The funding agencies did not participate in the study design, data gathering and analysis, decision-making regarding publication, nor manuscript preparation.

Data Availability

The code for the model is available on GitHub [30].

Conflicts of Interest

None declared.

  1. Hypertension Branch of Chinese Geriatrics Society, Beijing Hypertension Association, National Clinical Research Center of the Geriatric Diseases, Hua Q, Fan L, Wang ZW, Li J. 2023 guideline for the management of hypertension in the elderly population in China. J Geriatr Cardiol. Jun 28, 2024;21(6):589-630. [CrossRef] [Medline]
  2. The Writing Committee of the Report on Cardiovascular Health and Diseases in China. Interpretation of report on cardiovascular health and diseases in China 2022. Chin J Cardiovasc Med. 2023;28(4):297-312. [CrossRef]
  3. Wu H, Xie HW. Research on hypertension diagnosis and treatment system based on ontology and CBR. Comput Appl Softw. 2013;30(12):155-159. [CrossRef]
  4. Li X, Liang SP, Hou YL, Ma TF. StratMed: relevance stratification between biomedical entities for sparsity on medication recommendation. Knowl Based Syst. Jan 2024;284:111239. [CrossRef]
  5. Zhang HJ, Yang X, Bai L, Liang JY. Enhancing drug recommendations via heterogeneous graph representation learning in EHR networks. IEEE Trans Knowl Data Eng. 2024;36(7):3024-3035. [CrossRef]
  6. Choi E, Xiao C, Stewart WF, Sun J. MiME: multilevel medical embedding of electronic health records for predictive healthcare. Presented at: 32nd International Conference on Neural Information Processing Systems; Dec 3-8, 2018. URL: https://dl.acm.org/doi/10.5555/3327345.3327366 [Accessed 2025-09-25]
  7. Choi E, Xu Z, Li Y, et al. Learning the graphical structure of electronic health records with graph convolutional transformer. AAAI. 2020;34(1):606-613. [CrossRef]
  8. Hosseini A, Chen T, Wu W, et al. HeteroMed: heterogeneous information network for medical diagnosis. Presented at: 27th ACM International Conference on Information and Knowledge Management; Oct 22-26, 2018. [CrossRef]
  9. Li X, Liang S, Lei Y, et al. CausalMed: causality-based personalized medication recommendation centered on patient health state. Presented at: CIKM ’24: The 33rd ACM International Conference on Information and Knowledge Management; Oct 21-25, 2024. [CrossRef]
  10. Poulain R, Beheshti R. Graph transformers on EHRs: better representation improves downstream performance. Presented at: The Twelfth International Conference on Learning Representations; May 7-11, 2024. URL: https://openreview.net/forum?id=pe0Vdv7rsL [Accessed 2025-10-12]
  11. Shang JY, Xiao C, Ma TF, Li HY, Sun JM. GAMENet: graph augmented memory networks for recommending medication combination. Proc AAAI Conf Artif Intell. 2019;33(1):1126-1133. [CrossRef]
  12. Shang JY, Ma TF, Xiao C, Sun JM. Pre-training of graph augmented transformers for medication recommendation. Presented at: Twenty-Eighth International Joint Conference on Artificial Intelligence; Aug 10-16, 2019. [CrossRef]
  13. Liang X, Yang JY, Lu GM, Zhang D. CompNet: competitive neural network for palmprint recognition using learnable gabor kernel. IEEE Signal Process Lett. 2021;28:1739-1743. [CrossRef]
  14. Yang N, Zeng K, Wu Q, Yan J. MoleRec: combinatorial drug recommendation with substructure-aware molecular representation learning. Presented at: WWW ’23: The ACM Web Conference 2023; Apr 30 to May 4, 2023. [CrossRef]
  15. Li XW, Zhang YJ, Li XB, Wei H, Lu MY. DGCL: distance-wise and graph contrastive learning for medication recommendation. J Biomed Inform. Mar 2023;139:104301. [CrossRef]
  16. Yang C, Xiao C, Ma F, Glass L, Sun J. SafeDrug: dual molecular graph encoders for recommending effective and safe drug combinations. Presented at: Thirtieth International Joint Conference on Artificial Intelligence; Aug 19-27, 2021. [CrossRef]
  17. Liu SC, Wang XL, Xiang Y, Xu H, Wang H, Tang BZ. Multi-channel fusion LSTM for medical event prediction using EHRs. J Biomed Inform. Mar 2022;127:104011. [CrossRef]
  18. Choi E, Bahadori MT, Kulas JA, Schuetz A, Stewart WF, Sun JM. RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. Presented at: 30th International Conference on Neural Information Processing Systems (NIPS’16); Dec 5-10, 2016. URL: https://dl.acm.org/doi/10.5555/3157382.3157490 [Accessed 2025-09-25] [CrossRef]
  19. An Y, Zhang L, You M, Tian XQ, Jin B, Wei XP. MeSIN: multilevel selective and interactive network for medication recommendation. Knowl Based Syst. Dec 2021;233:107534. [CrossRef]
  20. Le H, Tran T, Venkatesh S. Dual memory neural computer for asynchronous two-view sequential learning. Presented at: KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Aug 19-23, 2018. [CrossRef]
  21. Yang CQ, Xiao C, Glass L, Sun JM. Change matters: medication change prediction with recurrent residual networks. Presented at: Thirtieth International Joint Conference on Artificial Intelligence; Aug 19-27, 2021. [CrossRef]
  22. Wu R, Qiu ZP, Jiang JC, Qi GL, Wu X. Conditional generation net for medication recommendation. Presented at: WWW ’22: The ACM Web Conference 2022; Apr 25-29, 2022. [CrossRef]
  23. Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible electronic health record dataset. Sci Data. 2016;3:160035. [CrossRef]
  24. Johnson AEW, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. Jan 3, 2023;10(1):1. [CrossRef] [Medline]
  25. Zhao D, Shi YL, Cheng L, Li H, Zhang LG, Guo HM. Time interval uncertainty-aware and text-enhanced based disease prediction. J Biomed Inform. Mar 2023;139:104239. [CrossRef] [Medline]
  26. Nguyen MV, Nguyen DT, Trinh QH, Le B. ALGNet: attention light graph memory network for medical recommendation system. Presented at: SOICT 2023: The 12th International Symposium on Information and Communication Technology; Dec 7-8, 2023. [CrossRef]
  27. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv. Preprint posted online on Dec 22, 2014. [CrossRef]
  28. Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Mach Learn. 2011;85:333-359. [CrossRef]
  29. Zhang YT, Chen R, Tang J, Stewart W, Sun JM. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. Presented at: KDD ’17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Aug 13-17, 2017. [CrossRef]
  30. Zk0814/csrec. GitHub. URL: https://github.com/zk0814/CSRec [Accessed 2025-09-18]

Edited by Ling Luo, Qiao Jin; submitted 19.Mar.2025; peer-reviewed by Hongbin Lu, Qingqing Zhu, Shunpan Liang; final revised version received 27.Aug.2025; accepted 28.Aug.2025; published 25.Nov.2025.

Copyright

© Ke Zhang, Zhichang Zhang, Yali Liang, Wei Wang, Xia Wang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.Nov.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.