Abstract
Background: Drug–target interaction (DTI) prediction is crucial in drug repositioning, as it can significantly reduce research and development costs and shorten the development cycle. Most existing deep learning–based approaches employ graph neural networks for DTI prediction. However, these approaches still face limitations in capturing complex biochemical features, integrating multilevel information, and providing interpretable model insights.
Objective: This study proposes a heterogeneous network model based on multiview path aggregation, aiming to predict interactions between drugs and targets.
Methods: This study employed a molecular attention transformer to extract 3D conformation features from the chemical structures of drugs and utilized Prot-T5, a protein-specific large language model, to deeply explore biophysically and functionally relevant features from protein sequences. By integrating drugs, proteins, diseases, and side effects from multisource heterogeneous data, we constructed a heterogeneous graph model to systematically characterize multidimensional associations between biological entities. On this foundation, a meta-path aggregation mechanism was proposed, which dynamically integrates information from both feature views and biological network relationship views. This mechanism effectively learned potential interaction patterns between biological entities and provided a more comprehensive representation of the complex relationships in the heterogeneous graph. It enhanced the model’s ability to capture sophisticated, context-dependent relationships in biological networks. Furthermore, we integrated multiscale features of drugs and proteins within the heterogeneous network, significantly improving the prediction accuracy of DTIs and enhancing the model’s interpretability and generalization ability.
Results: In the DTI prediction task, the proposed model achieves an AUPR (area under the precision-recall curve) of 0.901 and an AUROC (area under the receiver operating characteristic curve) of 0.966, representing improvements of 1.7% and 0.8%, respectively, over the baseline methods. Furthermore, a case study on the KCNH2 target demonstrates that the proposed model successfully predicts 38 out of 53 candidate drugs as having interactions, which further validates its reliability and practicality in real-world scenarios.
Conclusions: The proposed model shows marked superiority over baseline methods, highlighting the importance of integrating heterogeneous information with biological knowledge in DTI prediction.
doi:10.2196/74974
Keywords
Introduction
Background
Drugs play a crucial role in treating diseases by interacting with multiple targets and modulating their functions. Accurately predicting drug–target interactions (DTI) is essential for understanding drug mechanisms [
], discovering new targets, and facilitating drug repositioning. Traditional methods rely heavily on large amounts of labeled data and often suffer from label noise in negative samples. In recent years, positive-unlabeled learning has been effectively applied to alleviate this issue [ ]. Meanwhile, self-supervised learning strategies integrated with heterogeneous biomedical networks have improved the accuracy of DTI prediction by leveraging multimodal information [ ]. With the rapid advancement of large language models (LLM), their applications in the biomedical domain have expanded significantly. Due to their powerful sequence representation and semantic understanding capabilities, LLMs are driving DTI prediction methods toward a higher level of performance [ ].Currently, DTI prediction methods are primarily categorized into three types [
, ]: ligand similarity–based methods [ , ], network-based methods [ , ], and structure-based methods [ ]. Ligand similarity–based methods predict DTIs by comparing the structural similarity of drugs. These methods are computationally efficient but often overlook complex biochemical properties and molecular characteristics, which may lead to inaccurate predictions [ ]. Network-based methods rely on large amounts of high-quality interaction data and are computationally intensive [ - ]. They typically lack structural information about drugs and targets, resulting in poor performance on sparse networks. Structure-based methods usually depend on the three-dimensional structures of target proteins and drugs [ ]. However, these methods exhibit limited efficacy for proteins with unknown structures.To address the above issues, we propose a heterogeneous network (HN) model based on multiview path aggregation for DTI (MVPA-DTI) prediction. MVPA-DTI enhances prediction capability through a multiview feature extraction and fusion mechanism. First, the molecular attention Transformer model is used to extract the three-dimensional spatial structure information of drugs, while the protein-specific LLM Prot-T5 is leveraged to deeply explore the biophysical properties of protein sequences, forming two feature views based on structure and sequence. Subsequently, a HN is constructed by incorporating multisource heterogeneous information from proteins, drugs, diseases, and side effects. This approach effectively integrates the previously defined feature views into the message-passing framework of the HN, thereby forming a comprehensive biological network relationship view. The multiview path aggregation mechanism allows the model to dynamically synthesize feature information from multiple perspectives. By introducing a meta-path aggregation mechanism, the model dynamically synthesizes feature information across different perspectives, capturing underlying drug–target associations at multiple hierarchical levels through both feature-oriented views and biological network relational contexts. During the message-passing process, MVPA-DTI optimizes the weight distribution by incorporating both network topology and biological prior knowledge. By extracting node-level feature embeddings from heterogeneous data, the model is capable of accurately predicting new DTI. Experimental results show that MVPA-DTI outperforms existing advanced methods across multiple evaluation metrics. Additionally, for the voltage-gated inward-rectifying potassium channel KCNH2 target, which is related to cardiovascular diseases, MVPA-DTI was used for candidate drug screening. Among 53 candidate drugs, 38 are predicted to interact with this target (10 of which are already used in clinical treatment). This finding not only validates the effectiveness of MVPA-DTI in predicting DTI but also further demonstrates its application potential and practical value in real drug development. The main contributions of this paper are as follows:
- We employ a molecular attention Transformer network that extracts three-dimensional structural information from molecular graphs through a physics-informed attention mechanism, establishing a structural feature perspective of the drug.
- The sequence-level features are extracted from protein sequences using Prot-T5, a protein-specific LLM. The model maps sequence features to functional relevance, establishing a sequence feature perspective of proteins.
- We integrate the drug structural view and protein sequence view into a multi-entity HN to construct a biological network relationship view. The meta-path information aggregation mechanism captures higher-order interaction patterns among different types of nodes.
- We propose a multiview path aggregation–enhanced HN model that integrates protein and drug biological knowledge to accurately identify critical drug–target relationships. Benchmark tests show that the model exhibits a significant advantage in prediction performance.
Related Work
DTI prediction plays a significant role in drug discovery and drug repositioning. With the rapid advancement of deep learning technologies, an increasing number of deep learning–based DTI prediction methods emerge. These methods, particularly those combining big data with deep learning, gradually overcome the limitations of traditional approaches. Based on different prediction strategies, DTI prediction methods are generally categorized into three types [
, ]: ligand similarity–based methods [ , ], network-based methods [ , ], and structure-based methods [ ]. In recent years, prediction methods based on LLMs also bring remarkable breakthroughs to the DTI field, opening new pathways for its development.Ligand Similarity–Based Methods
In DTI prediction, ligand similarity–based methods estimate potential interactions by comparing the structures or chemical similarity of drug molecules. These methods typically rely on the SMILES (simplified molecular input line entry system) representation or molecular fingerprints of drugs to calculate drug–drug similarities and infer potential target interactions based on these similarities. For instance, Thafar et al [
] proposed the DTiGEMS model, which integrates drug–drug similarities and employs a similarity selection and fusion algorithm to enhance the accuracy of DTI predictions. Additionally, Shim et al [ ] introduced a similarity-based convolutional neural network method. This method calculates the outer product of the similarity matrix between drugs and targets and employs a two-dimensional convolutional neural network to capture potential relationships between them. These methods have improved DTI prediction accuracy to some extent, but they often overlook the dynamic interactions and complex spatial structures between molecules. Moreover, ligand similarity–based methods operate under the assumption that similar drug molecules share similar targets. However, this assumption does not always hold, especially when dealing with molecules that exhibit significant structural differences.Network-Based Methods
Network-based methods typically rely on large amounts of known DTI data and utilize graph algorithms for modeling. In these methods, drugs and targets are represented as nodes in a graph, while the interactions between them are represented as edges. Additionally, drug–drug similarity networks, protein–protein similarity networks, and known DTI networks are often integrated into a HN. Examples of such approaches include DTI networks, multisimilarity collaborative matrix factorization, and HN models. On this basis, researchers used graph structures to explore potential interaction patterns and promoted the development of tasks such as drug repositioning.
In recent years, with the development of graph representation learning [
, ], network-based methods have achieved deep fusion modeling of structural and semantic information. For example, Zhao et al proposed a regulation-aware graph learning method that enhances drug representation by integrating gene regulation information, thereby improving the prediction effect of drug repositioning [ ]. GNN methods mostly rely on the direct adjacency structure of nodes, which easily ignores the rich high-order semantic information and cross-modal associations in biological networks. To this end, some studies have proposed information fusion methods that combine low-order and high-order graph structures. For example, Zhao et al jointly modeled direct neighbor relationships and high-order network path features to effectively enhance the discriminability of drug and target representation [ ]. At the same time, in order to more comprehensively model the heterogeneous relationships between multiple entities such as drugs, proteins, and diseases, Zhao et al constructed a heterogeneous information network using semantic paths and attention mechanisms to achieve the integration and discrimination of multimodal information, thereby improving the accuracy and generalization ability of DTI prediction [ ]. The iGRLDTI introduced edge weight regulation mechanisms and regularization terms in GNN, further improving the model’s ability to learn potential interactions in heterogeneous biological networks [ ].Structure-Based Methods
Traditional structure-based methods typically use molecular docking techniques, combined with molecular dynamics simulations, to accurately predict the binding patterns between drugs and targets. With the development of deep learning methods, an increasing number of studies have integrated structural information with neural networks. For example, DeepDTA [
] and DeepDrug3D [ ] have enhanced the modeling capability of the drug–target binding process by incorporating three-dimensional structural information of proteins and drugs, further improving prediction accuracy. Although deep learning models can handle structural information, they generally require large amounts of labeled data for effective training, and obtaining high-quality drug–target binding data remains a challenge.The application of language models is particularly crucial in the absence of molecular structural information. Drawing on techniques from the field of Natural Language Processing, pretrained language models (such as BERT [
] and MolBERT [ ]) are used to deeply characterize drug and protein sequence data. By automatically learning latent features and semantic information within sequences, these models enhance the accuracy of DTI predictions. A key advantage of LLMs is their ability to process large-scale, unlabeled data and improve model generalization through transfer learning. For instance, MolBERT [ ] and ChemBERTa [ ] achieve high-quality molecular representations by pretraining on massive drug molecular datasets. Transformer-based protein language models, such as ProtBERT [ ], TAPE [ ], and ProtT5 [ ], are pretrained on large protein sequence databases. This enables them to capture hierarchical and context-rich sequence representations, enhancing the understanding of protein characteristics. These models can efficiently and accurately characterize proteins using only sequence data, without relying on three-dimensional structural information.Methods
Framework
MVPA-DTI combined multiple types of heterogeneous data. It extracted structural and sequential features from drug molecules and protein sequences. It also optimized the weight distribution for information propagation in a heterogeneous graph neural network. As illustrated in
, the process consisted of four steps: (a) First, an HN is constructed, incorporating drugs, proteins, diseases, and drug side effects. (b) Next, a molecular attention Transformer [ ] network is used to extract 3D structural features from the SMILES [ ] representations of drugs, which are then assigned to drug nodes. (c) Then, the Prot-T5 model is employed to analyze key biophysical features of protein data, which are subsequently assigned to target nodes in the graph. (d) In the final step, a Heterogeneous Graph Attention Network (HAN) [ ] is utilized to analyze network topology, unify node embeddings, determine node importance using meta-paths, and integrate node features through semantic attention. Ultimately, MVPA-DTI predicted potential DTI by optimizing the drug–protein reconstruction loss and the cross-entropy loss.
Feature View
Feature Extraction of Compound Structures
The MVPA-DTI model utilized SMILES sequences as input for drug molecules to learn effective representations of their structural features. Since SMILES sequences shared structural similarities with natural language, contextual information can be effectively leveraged to analyze molecular features. However, traditional methods often struggled to capture long-range atomic relationships when processing molecules. To overcome this limitation, this study employed an improved Transformer architecture.
First, we employed the RDKit toolkit [
] to convert SMILES sequences into molecular graph structures. Since the lengths of SMILES sequences vary across different molecules, a maximum length of 100 is selected to construct reasonable molecular representations. This setting covered at least 90% of the compounds in the dataset. For sequences exceeding this length, truncation is applied, while shorter sequences are padded with zeros to maintain consistent input formatting.For feature extraction, we employed the Molecule Attention Transformer (MAT) [
] for encoding. Its core idea is to replace the self-attention layer of the traditional Transformer [ ] with an enhanced molecular self-attention mechanism. By integrating adjacency information from molecular graphs and interatomic distance information, MAT comprehensively captured the feature representation of drug structures. The attention mechanism integrated interatomic distance and molecular topology. This enhancement allowed the attention distribution to more precisely capture internal molecular relationships. The calculation formula for molecular multihead self-attention is as follows:(1)
where represented the adjacency matrix of the molecular graph and denoted the distances between atoms. The query, key, and value vector matrices are defined as , , and , respectively, where W represented learnable parameters. In the attention calculation, , , and corresponded to the weights of different attention components, with specific meanings as follows: measured the importance of interatomic distances in the attention mechanism; adjusted the influence of self-attention on the overall attention mechanism; and controlled the role of the adjacency matrix in the attention computation.
Feature Extraction of Protein Structures
The MVPA-DTI model employed Prot-T5 to extract features from protein sequences. By treating sequences as natural language, it used Transformer-based self-supervised learning to capture biological insights. During the data preprocessing stage, protein sequences are treated as text data composed of 20 standard amino acids (along with a few unknown amino acids, such as X). To adapt to the input format of Natural Language Processing tasks, each protein sequence is treated as a token sequence of individual amino acids, with spaces inserted between each amino acid to ensure the Transformer can correctly parse the sequence information. Additionally, to enhance the model’s generalization capability, all uncommon or unresolved amino acid symbols (eg, B, O, U, Z) are mapped to the universal token X, ensuring data consistency.
Prot-T5 is based on the Text-to-Text Transfer Transformer (T5) [
] architecture, which employs an encoder–decoder structure. However, for the task of protein feature extraction, only the encoder part of T5 is utilized to map protein sequences into a high-dimensional feature space. Prot-T5 adopted the Span Masking language modeling strategy, where during training, continuous segments of amino acids in the sequence are randomly masked, and the model is tasked with reconstructing the masked portions based on the context. This enabled the model to learn long-range dependencies and semantic information within protein sequences. As shown in , given a protein sequence , it is first embedded into a high-dimensional space and then fed into the Transformer to compute the hidden representations:(2)
Finally, the global representation of the entire sequence is obtained through average pooling, as calculated by the following formula:
(3)
where represented the hidden states output by the Transformer and served as the global feature of the protein.
Relational View of Biological Networks
Heterogeneous Networks
The HN is described as an undirected graph G = (V, E), where represented the set of nodes and denoted the set of types of edges, with each representing a specific type of edge.
As shown in
, this paper constructed a comprehensive heterogeneous information network, which included drug–drug interactions, drug–protein interactions, drug–disease associations, drug–side effect associations, protein–protein interactions, and protein–disease associations. In the network, entities such as drugs, targets, diseases, and side effects are represented as nodes, while the relationships and interactions between nodes are represented as edges. In the proposed framework, each node belonged to only one type of entity, and all edges are undirected and have nonnegative weights. Node messages are first sent to their first-order neighbors and then propagated to higher-order neighbors through the network edges, a process known as message passing.The message-passing process in the HN is divided into multiple stages. First, nodes transmitted their information to first-order neighbors. This information is then propagated to higher-order neighbors through network edges, forming more complex information dissemination. Message passing is not limited to directly connected nodes but also includes information transmitted through multihop paths. This enabled the model to integrate features across a broader neighborhood, thereby better capturing long-range dependencies and multilevel network structures.
Metapath-Based Entity Information Aggregation
In heterogeneous graph representation learning, our goal is to learn effective feature representations for each node in the HN. However, the challenge of this task lies in not only integrating information from different types of nodes and edges in the heterogeneous graph but also considering the heterogeneous features and content of each node. To address this issue, we employed HAN [
] as the topological feature extraction method for the heterogeneous information network. HAN modeled the structural relationships in the HN through a hierarchical attention mechanism. Specifically, node-level attention learned meta-path-based neighborhood weights and aggregates them to generate semantically specific node embeddings, while semantic-level attention assigned weights across different meta-paths to obtain the optimal task-specific representation, as illustrated in . In terms of meta-path selection, we selected two biologically meaningful paths: D-P-D (Drug–Protein–Drug) and P-D-P-D (Protein–Disease–Protein–Drug). D-P-D is used to capture the semantic relationship between drugs connected by common targets, which helped the model identify drug pairs with similar functions or related potential mechanisms of action, while P-D-P-D further explored the multi-hop indirect connections between proteins formed by disease associations, thereby reflecting a more complex biological network structure. We selected these two meta-paths based on their good semantic interpretability and biological relevance, which helps the model more effectively model the potential interactions between drugs and targets.After obtaining the node embeddings, we further proceeded with the message-passing process. Assuming the initial node embeddings are defined as , where represented the mapping of node in the -dimensional space, the information aggregation process for node can be expressed as:
(4)
(5)
where represents a nonlinear activation function, denotes the number of attention layers, represents the set of neighboring nodes of node , is the shared weight parameter, is the edge weight computed by the attention mechanism, and [
] is an adaptive activation function.Based on obtaining the graph structure representation, we further integrated the structural information of each node to construct the final drug and protein representation. For drug nodes, its representation integrates the meta-path structure features, the original embedding representation, and the molecular structure features, which is specifically expressed as:
(6)
Similarly, the protein node representation combined its meta-path structure features, original embeddings, and sequence representations and is expressed as:
(7)
Among them, and are learnable linear transformation parameters, which are used to uniformly map the concatenated multiple features to a common low-dimensional semantic space, and implicitly model the importance weights of each feature subspace during the training process to improve the discriminative ability of the final representation.
After completing the representations of drugs and proteins, we used the inner product to calculate the interaction probability between them, as shown in
. For a drug node and a protein node , their interaction probability is calculated as follows:(8)
where is the sigmoid function and represents the interaction score between node and node .
To optimize the model, we employed multiple loss terms for joint training. Among them, the cross-entropy loss is used to supervise the interaction prediction task, defined as follows:
(9)
where is the set of relationships, is the ground truth label of sample (1 for positive samples and 0 for negative samples), and is the predicted probability.
In addition, to enhance the model’s structural perception of node features, we introduced reconstruction losses for drugs and proteins, which are defined as follows:
(10)
(11)
Among them, and are the adjacency matrices of drug and protein nodes in the original HN, respectively. represents the adjacency matrix reconstructed by the model through node embedding and represents the Frobenius norm. The final optimization target is the weighted sum of three losses:
(12)
where and represent the reconstruction losses for drugs and proteins, respectively, while and are weighting coefficients used to balance the contributions of different loss terms. Through joint optimization, we can improve the accuracy of DTI prediction while ensuring that the generated feature representations possess robust structural information and discriminative capabilities.
Results
Dataset
This study conducted DTI prediction experiments on the Luo dataset [
], which was assembled by Luo et al. To better capture biological characteristics, we incorporated the SMILES sequences of drug molecules and the structural sequence data of proteins into the dataset. and present the relevant statistical information of nodes and edges in the dataset. The dataset comprises six independent drug–protein interaction networks, with all edge weights being binary values.Node type | Count |
Drug | 708 |
Protein | 1512 |
Disease | 5603 |
Side effect | 4192 |
Edge type | Count | Source |
Drug–protein interaction | 1923 | DrugBank version 3.0 [ | ]
Drug–drug interaction | 10,036 | DrugBank version 3.0 [ | ]
Protein–protein interaction | 7363 | HPRD Release 9 [ | ]
Drug–disease association | 199,214 | Comparative Toxicogenomics Database [ | ]
Protein–disease association | 1,596,745 | Comparative Toxicogenomics Database [ | ]
Drug–side-effect association | 80,164 | SIDER Release 2 [ | ]
Experimental Settings
Building upon the study by Luo et al [
], 90% of the positive and negative samples from the dataset were used to construct the HN and train MVPA-DTI, while the remaining 10% were reserved for testing. In this study, we adopted a data preprocessing process consistent with NeoDTI, including negative sample sampling strategy, drug–target similarity calculation method, and similarity threshold setting to eliminate redundancy. Subsequently, the model’s performance was evaluated through 10-fold cross-validation, with the effectiveness of the method being measured using AUROC and AUPR metrics.MVPA-DTI consists of three modules: Prot-T5, MAT, and HAN, with the parameter settings detailed in
. During the training process, we employed the Adam optimizer to update the network weights.Parameter | Value |
Number step | 3000 |
Batch size | 128 |
Learning rate | 0.001 |
Dropout rate | 0.1 |
HAN | input size1024 |
MAT | multi-head attention number16 |
MAT stack number | 8 |
aHAN: Heterogeneous Graph Attention Network.
bMAT: Heterogeneous Graph Attention Network.
Comparison With Prior Work
In this study, to better simulate real-world scenarios, we randomly sampled negative samples while retaining all positive samples, setting the ratio of positive to negative samples at 10:1. This ratio aims to reflect the imbalance between positive and negative samples in the real world. Subsequently, we compared the performance of MVPA-DTI with several benchmark models to evaluate its effectiveness in DTI prediction tasks; they are briefly described as follows:
- MSCMF [ ]: MSCMF effectively predicts DTI by utilizing multiple similarity matrices of drugs and targets through collaborative matrix factorization.
- HNM [ ]: HNM integrates disease, drug, and target information via a HN, aiming to enhance the efficiency and accuracy of drug repositioning.
- DTINet [ ]: DTINet predicts interactions between new drugs and targets by integrating heterogeneous data and learning low-dimensional feature representations.
- NeoDTI [ ]: NeoDTI automatically generates feature representations by integrating diverse information from HN.
- EEG-DTI [ ] : EEG-DTI is an end-to-end learning framework based on heterogeneous graph convolutional networks, capable of effectively learning features from multiple biological entities.
- SHGCL-DTI [ ]: SHGCL-DTI combines semi-supervised learning with graph contrastive learning, enhancing the model’s adaptability.
The performance of MVPA-DTI was compared with various models, and the experimental results are shown in
. MVPA-DTI achieves the best performance, demonstrating significant improvement over other DTI models. It achieves an AUPR of 0.901, which is 1.7% higher than the second-best method, and an AUROC of 0.967, representing a 0.9% improvement over SHGCL-DTI. MSCMF employs matrix transformation to optimize prediction results through network inference and the topological structure of data. However, compared to the recent outstanding performance of deep learning methods, MSCMF does not fully exploit the latent information in data matrix embeddings or the features of adjacent nodes, thereby limiting its predictive performance. HNM does not adopt mainstream heterogeneous data embedding methods for feature representation and information integration, resulting in insufficient generalization ability and lower prediction accuracy. In contrast, DTINet, NeoDTI, and EEG-DTI further extract hidden features by combining matrix transformation with neural networks, enabling more accurate modeling of node relationships and improving prediction performance. SHGCL-DTI employs a graph contrastive learning strategy to capture the structural information of heterogeneous graphs. This is achieved by enhancing the similarity of positive sample pairs and reducing the similarity of negative sample pairs. However, SHGCL-DTI fails to fully utilize the rich semantic information and complex interaction patterns in heterogeneous graphs, presenting certain limitations when processing biological data.Model | AUPR | AUROC |
HNM | 0.579 | 0.834 |
MSCMF | 0.603 | 0.856 |
DTINet | 0.818 | 0.916 |
NeoDTI | 0.855 | 0.943 |
EEG-DTI | 0.847 | 0.952 |
SHGCL-DTI | 0.884 | 0.958 |
MVPA-DTI (ours) | 0.901 | 0.966 |
a AUPR: area under the precision–recall curve.
bAUROC: area under the receiver operating characteristic curve.
cHNM: heterogeneous network model.
dMSCMF: multiple similarities collaborative matrix factorization.
eDTINet: drug–target interaction prediction network.
fNeoDTI: neural integration of neighbor information for DTI prediction.
gEEG-DTI: end-to-end graph for drug–target interaction prediction
hSHGCL-DTI: semi-supervised heterogeneous graph contrastive learning for drug–target interaction prediction.
iMVPA-DTI: multi-view path aggregation for drug–target interaction.
Existing methods fail to fully exploit critical biochemical information in drug molecular structures and protein sequences, potentially leading to information loss during the node embedding process. In contrast, MVPA-DTI integrates a graph attention network that incorporates all types of adjacent nodes, enhancing the extraction of composite structural information. Furthermore, it employs HAN to model the potential relationships between different types of nodes. Additionally, the method dynamically fuses features extracted from sequences and graph to update the embedding representations of drug and protein nodes. Through this process, it progressively strengthens the weight of node information and optimizes the feature representation of different types of nodes, thereby improving the predictive accuracy.
Robustness Experiment
To further validate the stability of MVPA-DTI, considering the potential presence of redundant information in the dataset, additional experiments were conducted to evaluate the model’s predictive performance. First, we removed some samples from the dataset, including DTIs with similar drugs or targets and DTIs involving drugs with similar drug interactions. The details of the removed data are provided in
.Redundant data | Drug number | Target number | DTI number |
DTI of similar drugs or targets | 146 | 78 | 955 |
DTI of drugs with similar side effects | 17 | 0 | 51 |
aDTI: drug–target interactions.
After removing similar drugs and targets, as shown in
, MVPA-DTI achieves a 15.9% improvement over the second-best method, NeoDTI. Meanwhile, also demonstrates that after removing drugs with similar side effects, MVPA-DTI outperforms the second-best method by 3.1%.Model | AUPR | ||
DTIs | with similar drugs and targets were removedDTIs with drugs with similar side effects were removed | Trained on non-unique dataset and tested on unique dataset | |
HNM | 0.547 | 0.581 | 0.233 |
MSCMF | 0.265 | 0.593 | 0.206 |
DTINet | 0.611 | 0.803 | 0.389 |
NeoDTI | 0.694 | 0.848 | 0.432 |
EEG-DTI | 0.686 | 0.846 | 0.431 |
SHGCL-DTI | 0.617 | 0.729 | 0.443 |
MVPA-DTI | 0.849 | 0.873 | 0.46 |
aAUPR: area under the precision–recall curve.
bDTI: drug–target interaction.
cHNM: heterogeneous network model.
dMSCMF: multiple similarities collaborative matrix factorization.
eDTINet: drug–target interaction prediction network.
fNeoDTI: neural integration of neighbor information for DTI prediction.
gEEG-DTI: end-to-end graph for drug–target interaction prediction.
hSHGCL-DTI: semi-supervised heterogeneous graph contrastive learning for drug–target interaction prediction.
iMVPA-DTI: multi-view path aggregation for drug–target interaction.
The experimental results indicate that although the model’s performance declined after removing a significant portion of specific DTI data, MVPA-DTI still maintains the best AUPR, demonstrating its strong robustness. Additionally, this study treats drug–protein interactions as a specific case for experimentation. To objectively evaluate the predictive capability of MVPA-DTI, both special drug–target relationships and conventional drug–target relationships were separately processed. Specifically, the model was first trained on a dataset in which the relationships between drugs and proteins were nonunique and then tested on a dataset with unique interactions. As shown in
, MVPA-DTI significantly outperforms the second-best method, with a 1.7% improvement in AUPR. This result suggests that MVPA-DTI exhibits stronger generalization ability in predicting DTI.Ablation Experiment
To verify the contribution of each module in MVPA-DTI, we conducted ablation experiments. The experimental results are presented in
, where ProSF represents the protein sequence feature extraction module and DruSF denotes the drug compound structure feature extraction module.Method | AUROC | AUPR | F1-score | MCC |
w/o | ProSF0.963 | 0.886 | 0.839 | 0.828 |
w/o DruSF | 0.964 | 0.891 | 0.840 | 0.827 |
w/o ProSF and DruSF | 0.957 | 0.875 | 0.813 | 0.804 |
protBert-MVPA-DTI | 0.964 | 0.89 | 0.831 | 0.822 |
MVPA-DTI | 0.966 | 0.901 | 0.848 | 0.839 |
aAUROC: Area Under the Receiver Operating Characteristic curve.
bAUPR: Area Under the Precision-Recall Curve.
cMCC: Matthews Correlation Coefficient.
dw/o denotes corresponding module was removed.
eProSF: protein sequence feature extraction module.
fDruSF: drug compound structure feature extraction module.
gprotBert-MVPA-DTI replaces the ProSF module with ProtBert processing.
hMVPA-DTI: multi-view path aggregation for drug-target interaction.
The Effectiveness of ProSF
As shown in
, the ProSF module improves the model performance by 0.3% in AUROC and 1.5% in AUPR. ProSF effectively extracts deep semantic information from proteins, capturing features related to key physicochemical properties such as secondary structure and solubility, thereby enhancing the model’s predictive capability.The Effectiveness of DruSF
As shown in
, the DruSF module enhances the model’s performance, with improvements of 0.2% and 1.0% in AUROC and AUPR metrics, respectively. This enhancement is attributed to DruSF’s ability to thoroughly explore the structural features of drugs, thereby assigning higher weights to drugs during the message-passing process and further optimizing the prediction outcomes.The experimental results demonstrate that the MVPA-DTI model exhibits high effectiveness in extracting the biological structural features of drugs and proteins. By deeply exploring the chemical structures, physical properties, and biological functions of drugs and proteins, the model can accurately capture the complex interaction relationships between them, thereby enhancing DTI prediction performance. Further analysis reveals that protein structural information plays a more critical role than drug structural information in the prediction process. This phenomenon may be attributed to the protein-specific LLM Prot-T5, which more effectively captures evolutionary conservation and functional relevance in protein sequences, thereby providing more discriminative feature representations for DTI prediction.
Discussion
Case Study
In evaluating the practical application of MVPA-DTI, we predicted 53 candidate drugs targeting the voltage-gated inwardly rectifying potassium channel KCNH2 (hERG). The KCNH2 channel, a critical protein in cardiac electrophysiology, plays a central role in regulating ventricular repolarization. Dysfunction or dysregulation of this channel delays ventricular repolarization, manifesting as QT interval prolongation on electrocardiograms. This phenomenon has been extensively documented to correlate with the pathogenesis of various cardiovascular diseases [
]. The results demonstrate that 38 out of the 53 candidate drugs exhibited potential interactions with KCNH2. Among these, 10 drugs have been experimentally validated in published studies to interact with the KCNH2 channel. Although the remaining candidates have not yet been experimentally confirmed, their established associations with cardiovascular pathologies suggest potential therapeutic relevance in cardiovascular disease management. Studies have demonstrated that procainamide interacts with the KCNH2 channel, primarily by inhibiting its function. This inhibition prolongs the duration of potassium ion efflux, leading to QT interval prolongation [ ]. Nicotine, as a blocker of the KCNH2 channel, significantly affects the electrophysiological properties of the heart. This interaction may have important clinical implications for assessing the impact of nicotine on cardiac health [ ]. Ranolazine is considered to have potential therapeutic effects, as it improves cardiac electrophysiological abnormalities caused by genetic variations by modulating the function of the KCNH2 channel [ ].In this study, AutoDock was used to perform molecular docking simulations on the interaction between procainamide and KCNH2 channel protein. As shown in
and , procainamide can stably bind to the central cavity region of the KCNH2 channel protein and is embedded in a hydrophobic pocket surrounded by the S6 helix and the lamellar structure. The binding site involves multiple key amino acid residues, including THR 768, TYR 827, etc. These residues have been widely reported in the literature as core sites for regulating drug binding and gating behavior of KCNH2 channels. These binding residues are visually annotated in . Binding mode analysis showed that hydrophobic interactions and potential hydrogen bonds were formed between procainamide and the above residues, which may interfere with the open state of the channel and affect its function. As shown in , binding energy analysis showed that procainamide interacted strongly with KCNH2, and the binding energy of some conformations was as low as –9 kcal/mol, indicating that it has high binding stability and potential biological activity.

Conclusions
This study proposes a novel DTI prediction method MVPA-DTI, which extracts key biological features from protein and drug sequences and reconstructs them into a heterogeneous graph, enabling the model to capture the most critical biological information during each iteration, thereby optimizing the weight assignment for drugs and targets. Experimental results demonstrate that MVPA-DTI outperforms existing methods across multiple benchmark tests. Although MVPA-DTI effectively captures DTI, the biological mechanisms underlying these interactions are complex, involving multiple factors, which are not yet fully considered by MVPA-DTI. Future improvements should focus on in-depth exploration of the biological details of DTI to enhance prediction accuracy and applicability. To enhance the ability to model the complex relationship between drugs and targets, in the future, it is possible to consider introducing new graph neural network structures such as FCGCN [
] into the drug–target graph modeling process to more effectively integrate molecular structure, pharmacological properties, and network topology information.Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grants No. 62076045), the 111 Project (Grants No. D23006), the National Foreign Expert Project of China (Grants No. D20240244), and the Interdisciplinary Project of Dalian University (Grants No. DLUXK-2023-YB-003 and DLUXK-2023-YB-009).
Conflicts of Interest
None declared.
References
- Chu Y, Shan X, Chen T, et al. DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method. Brief Bioinform. May 20, 2021;22(3):bbaa205. [CrossRef] [Medline]
- Lan W, Wang J, Li M, et al. Predicting drug–target interaction using positive-unlabeled learning. Neurocomputing. Sep 2016;206:50-57. [CrossRef]
- Liu Z, Chen Q, Lan W, Lu H, Zhang S. SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning. Artif Intell Med. Mar 2024;149:102778. [CrossRef] [Medline]
- Lan W, Tang Z, Liu M, et al. The large language models on biomedical data analysis: a survey. IEEE J Biomed Health Inform. Jun 2025;29(6):4486-4497. [CrossRef] [Medline]
- Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst. Dec 2024;12(1):30. [CrossRef] [Medline]
- Thafar M, Raies AB, Albaradei S, Essack M, Bajic VB. Comparison study of computational prediction tools for drug-target binding affinities. Front Chem. 2019;7:782. [CrossRef] [Medline]
- Mathai N, Kirchmair J. Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope. Int J Mol Sci. May 19, 2020;21(10):3585. [CrossRef] [Medline]
- Lian M, Wang X, Du W. Integrated multi-similarity fusion and heterogeneous graph inference for drug-target interaction prediction. Neurocomputing. Aug 2022;500:1-12. [CrossRef]
- Luo Y, Zhao X, Zhou J, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. Sep 18, 2017;8(1):573. [CrossRef] [Medline]
- Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. Jun 15, 2016;32(12):i18-i27. [CrossRef] [Medline]
- Zhang YF, Wang X, Kaushik AC, et al. SPVec: a Word2vec-inspired feature representation method for drug-target interaction prediction. Front Chem. 2019;7(1-11):895. [CrossRef] [Medline]
- Öztürk H, Ozkirimli E, Özgür A. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinformatics. Mar 18, 2016;17:128. [CrossRef] [Medline]
- Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. Jan 2021;32(1):4-24. [CrossRef] [Medline]
- Zhang Z, Cui P, Zhu W. Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. 2020;34(1):249-270. [CrossRef]
- Zhang Z, Chen L, Zhong F, et al. Graph neural network approaches for drug-target interactions. Curr Opin Struct Biol. Apr 2022;73:102327. [CrossRef] [Medline]
- Singh S, Malik BK, Sharma DK. Molecular drug targets and structure based drug design: a holistic approach. Bioinformation. Dec 23, 2006;1(8):314-320. [CrossRef] [Medline]
- Thafar MA, Olayan RS, Ashoor H, et al. DTiGEMS+: drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Cheminform. Jun 29, 2020;12(1):44. [CrossRef] [Medline]
- Shim J, Hong ZY, Sohn I, Hwang C. Prediction of drug-target binding affinity using similarity-based convolutional neural network. Sci Rep. Feb 24, 2021;11(1):4416. [CrossRef] [Medline]
- Zhang P, Yuan J, Li L, et al. Key substructure learning with chemical intuition for material property prediction[c]. 2024. Presented at: Proceedings of the International Conference on Database Systems for Advanced Applications:87-103; Pohang, Korea. [CrossRef]
- Zhang P, Che C, Jin B, Yuan J, Li R, Zhu Y. NCH-DDA: Neighborhood contrastive learning heterogeneous network for drug–disease association prediction. Expert Syst Appl. Mar 2024;238:121855. [CrossRef]
- Zhao BW, Su XR, Yang Y, et al. Regulation-aware graph learning for drug repositioning over heterogeneous biological network. Inf Sci (Ny). Jan 2025;686:121360. [CrossRef]
- Zhao BW, Wang L, Hu PW, et al. Fusing higher and lower-order biological information for drug repositioning via graph representation learning. IEEE Trans Emerg Topics Comput. 2023;12(1):163-176. [CrossRef]
- Su X, Hu P, Yi H, You Z, Hu L. Predicting drug-target interactions over heterogeneous information network. IEEE J Biomed Health Inform. 2022;27(1):562-572. [CrossRef]
- Zhao BW, Su XR, Hu PW, Huang YA, You ZH, Hu L. iGRLDTI: an improved graph representation learning method for predicting drug-target interactions over heterogeneous biological information network. Bioinformatics. Aug 1, 2023;39(8):btad451. [CrossRef] [Medline]
- Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics. Sep 1, 2018;34(17):i821-i829. [CrossRef] [Medline]
- Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M. DeepDrug3D: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput Biol. Feb 2019;15(2):e1006718. [CrossRef] [Medline]
- Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. 2019. Presented at: Proceedings of the Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers); Jun 2, 2019:4171-4186; Minneapolis, Minnesota. [CrossRef]
- Li J, Jiang X. Mol‐BERT: an effective molecular representation with BERT for molecular property prediction. Wireless Commun Mobile Comput. Jan 2021;2021(1):7181815. URL: https://onlinelibrary.wiley.com/toc/6302/2021/1 [CrossRef]
- Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv. Preprint posted online on 2020. [CrossRef]
- Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. Apr 12, 2022;38(8):2102-2110. [CrossRef] [Medline]
- Rao R, Bhattacharya N, Thomas N, et al. Evaluating protein transfer learning with TAPE. Adv Neural Inf Process Syst. Dec 2019;32:9689-9701. [CrossRef] [Medline]
- Elnaggar A, Heinzinger M, Dallago C, et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. Oct 2022;44(10):7112-7127. [CrossRef] [Medline]
- Maziarka Ł, Danel T, Mucha S, et al. Molecule attention transformer. arXiv. Preprint posted online on 2020. [CrossRef]
- Toropov AA, Toropova AP, Mukhamedzhanoval DV, Gutman I. Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR). Preprint posted online on 2005. URL: http://nopr.niscair.res.in/bitstream/123456789/18068/1/IJCA%2044A%288%29%201545-1552.pdf [Accessed 2025-09-15]
- Wang X, Ji H, Shi C, et al. Heterogeneous graph attention network. 2019. Presented at: WWW ’19; May 13, 2019; San Francisco CA USA. URL: https://dl.acm.org/doi/proceedings/10.1145/3308558 [CrossRef]
- Bento AP, Hersey A, Félix E, et al. An open source chemical structure curation pipeline using RDKit. J Cheminform. Sep 1, 2020;12(1):51. [CrossRef] [Medline]
- Vaswani A. Attention is all you need. Presented at: Advances in neural information processing systems; Dec 4-9, 2017; Long Beach, California, USA.
- Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(1):5485-5551.
- Ma N, Zhang X, Liu M, Sun J. Activate or not: learning customized activation. Presented at: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Jun 20-25, 2021; Nashville, TN, USA. [CrossRef]
- Knox C, Law V, Jewison T, et al. DrugBank 3.0: a comprehensive resource for “omics” research on drugs. Nucleic Acids Res. Jan 2011;39(Database issue):D1035-D1041. [CrossRef] [Medline]
- Wang Z, Zhang Q, Shuang-Wei H, et al. Multi-level protein structure pre-training via prompt learning. Presented at: Proceedings of the The Eleventh International Conference on Learning Representations; 2023; Kigali, Rwanda. URL: https://openreview.net/forum?id=XGagtiJ8XC [Accessed 2025-09-15]
- Davis AP, Murphy CG, Johnson R, et al. The comparative toxicogenomics database: update 2013. Nucleic Acids Res. Jan 2013;41(Database issue):D1104-D1114. [CrossRef] [Medline]
- Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6(1):343. [CrossRef] [Medline]
- Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. 2013. Presented at: KDD’ 13; Aug 11, 2013; Chicago Illinois USA. URL: https://dl.acm.org/doi/proceedings/10.1145/2487575 [CrossRef]
- Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. Oct 15, 2014;30(20):2923-2930. [CrossRef] [Medline]
- Wan F, Hong L, Xiao A, Jiang T, Zeng J. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions. Bioinformatics. Jan 1, 2019;35(1):104-111. [CrossRef] [Medline]
- Peng J, Wang Y, Guan J, et al. An end-to-end heterogeneous graph representation learning-based framework for drug-target interaction prediction. Brief Bioinform. Sep 2, 2021;22(5):bbaa430. [CrossRef] [Medline]
- Yao K, Wang X, Li W, et al. Semi-supervised heterogeneous graph contrastive learning for drug-target interaction prediction. Comput Biol Med. Sep 2023;163:107199. [CrossRef] [Medline]
- Itoh T, Tanaka T, Nagai R, et al. Genomic organization and mutational analysis of HERG, a gene responsible for familial long QT syndrome. Hum Genet. Apr 1998;102(4):435-439. [CrossRef] [Medline]
- Chiu PJS, Marcoe KF, Bounds SE, et al. Validation of a [3H]astemizole binding assay in HEK293 cells expressing HERG K+ channels. J Pharmacol Sci. Jul 2004;95(3):311-319. [CrossRef] [Medline]
- Yang BF, Xu DH, Xu CQ, et al. Inactivation gating determines drug potency: a common mechanism for drug blockade of HERG channels. Acta Pharmacol Sin. May 2004;25(5):554-560. [Medline]
- Smith JL, Reloj AR, Nataraj PS, et al. Pharmacological correction of long QT-linked mutations in KCNH2 (hERG) increases the trafficking of Kv11.1 channels stored in the transitional endoplasmic reticulum. Am J Physiol Cell Physiol. Nov 1, 2013;305(9):C919-C930. [CrossRef] [Medline]
- Yang Y, Li G, Li D, Zhang J, Hu P, Hu L. Integrating fuzzy clustering and graph convolution network to accurately identify clusters from attributed graph. IEEE Trans Netw Sci Eng. 2024;12(2):1112-1125. [CrossRef]
Abbreviations
DruSF: drug compound structure feature extraction module |
DTI: drug-target interaction |
HAN: Heterogeneous Graph Attention Network |
HN: Heterogeneous Network |
LLM: large language model |
MAT: Molecule Attention Transformer |
ProSF: protein sequence feature extraction module. |
Edited by Qiao Jin; submitted 27.Mar.2025; peer-reviewed by Dongjiang Niu, Lun Hu, Wei Lan, Zhizheng Wang; final revised version received 11.Jul.2025; accepted 25.Jul.2025; published 02.Oct.2025.
Copyright© Haixue Zhao, Kui Yao, Yunjiong Liu, Chao Che, Lin Tang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 2.Oct.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.