This paper is available on Arxiv under CC 4.0 license.
Authors:
(1) Md Masud Rana, Department of Mathematics, University of Kentucky;
(2) Duc Duy Nguyen, Department of Mathematics, University of Kentucky & ducnguyen@uky.edu.
Table of Links
Conclusion, Data and Software Availability, Competing interests, Acknowledgments & References
4 Conclusion
The study of protein-protein interactions (PPIs) and the prediction of mutation-induced binding free energy changes are of great importance in understanding the molecular basis of biological processes.
The application of geometric graph theory and atom-level graph coloring techniques provides a powerful framework for analyzing biomolecules and capturing their intricate relationships.
By utilizing the concept of geometric subgraphs and constructing multi-scale weighted colored geometric subgraphs (MWCGS), we can effectively represent the structural and functional properties of PPIs.
The site-specific MWCGS features allow us to extract meaningful patterns and characteristics, shedding light on the effects of mutations and the underlying molecular interactions.
In this work, we developed a mutation-induced binding free energy change predictor, called GGL-PPI, by incorporating site-specific MWCGS features for PPIs and gradient-boosting trees. Our method demonstrates superior performance compared to existing methods.
The model was validated on three datasets: AB-Bind S645, SKEMPI 1.0 S1131, and SKEMPI 2.0 S4169 and S8338, showcasing its robustness and effectiveness. Furthermore, GGL-PPI was evaluated on a blind test set, the Ssym dataset.
To prevent data leakage between the test and training sets, the model was trained on a homology-reduced balanced training set Q3488. This approach ensures the reliability and fairness of the evaluation process.
GGL-PPI exhibits the most unbiased and superior performance in predicting binding free energy changes for both direct and reverse mutations, outperforming other existing methods, particularly for reverse mutations.
Overall, the results highlight the potential of the GGL-PPI approach in accurately predicting mutation-induced binding free energy changes in protein-protein interactions, providing valuable insights into the molecular mechanisms underlying protein-protein interactions and facilitating drug design and discovery efforts.
5 Data and Software Availability
The source code is available at Github: https://github.com/NguyenLabUKY/GGL-Mutation.
6 Competing interests
No competing interest is declared.
7 Acknowledgments
This work is supported in part by funds from the National Science Foundation (NSF: # 2053284, # 2151802, and # 2245903), and the University of Kentucky Startup Fund.
References
[1] Dana Chuderland and Rony Seger. Protein-protein interactions in the regulation of the extracellular signal-regulated kinase. Mol. Biotechnol., 29:57–74, 2005.
[2] Harry Jubb, Tom L Blundell, and David B Ascher. Flexibility and small pockets at protein–protein interfaces: new insights into druggability. Prog. Biophys. Mol., 119(1):2–9, 2015.
[3] Mileidy W Gonzalez and Maricel G Kann. Chapter 4: Protein interactions and disease. PLoS Comput. Biol., 8(12):e1002819, 2012.
[4] Cunliang Geng, Li C Xue, Jorge Roel-Touris, and Alexandre MJJ Bonvin. Finding the δδg spot: are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? Wiley Interdiscip. Rev. Comput. Mol. Sci., 9(5):e1410, 2019.
[5] Rocco Moretti, Sarel J Fleishman, Rudi Agius, Mieczyslaw Torchala, Paul A Bates, Panagiotis L Kastritis, Joao PGLM Rodrigues, Mika¨el Trellet, Alexandre MJJ Bonvin, Meng Cui, et al. Community-wide evaluation of methods for predicting the effect of mutations on protein–protein interactions. Proteins: Struct. Funct. Genet., 81(11):1980–1987, 2013.
[6] Jeffrey R Brender and Yang Zhang. Predicting the effect of mutations on protein-protein binding interactions through structure-based interface profiles. PLoS Comput. Biol., 11(10):e1004494, 2015.
[7] Ning Zhang, Yuting Chen, Haoyu Lu, Feiyang Zhao, Roberto Vera Alvarez, Alexander Goncearenco, Anna R Panchenko, and Minghui Li. Mutabind2: predicting the impacts of single and multiple mutations on protein-protein interactions. Iscience, 23(3):100939, 2020.
[8] Carlos HM Rodrigues, Yoochan Myung, Douglas EV Pires, and David B Ascher. mcsm-ppi2: predicting the effects of mutations on protein–protein interactions. Nucleic Acids Res., 47(W1):W338–W344, 2019.
[9] Minghui Li, Marharyta Petukh, Emil Alexov, and Anna R Panchenko. Predicting the impact of missense mutations on protein–protein binding affinity. J. Chem. Theory Comput., 10(4):1770–1780, 2014.
[10] Hafumi Nishi, Manoj Tyagi, Shaolei Teng, Benjamin A Shoemaker, Kosuke Hashimoto, Emil Alexov, Stefan Wuchty, and Anna R Panchenko. Cancer missense mutations alter binding properties of proteins and their interaction networks. PloS one, 8(6):e66273, 2013.
[11] Mu Gao, Hongyi Zhou, and Jeffrey Skolnick. Insights into disease-associated mutations in the human proteome through protein structural analysis. Structure, 23(7):1362–1369, 2015.
[12] Alessia David, Rozami Razali, Mark N Wass, and Michael JE Sternberg. Protein–protein interaction sites are hot spots for disease-associated nonsynonymous snps. Hum. Mutat., 33(2):359–363, 2012.
[13] H Billur Engin, Jason F Kreisberg, and Hannah Carter. Structure-based analysis reveals cancer missense mutations target protein interaction interfaces. PLoS One, 11(4):e0152929, 2016.
[14] Amelie Stein, Douglas M Fowler, Rasmus Hartmann-Petersen, and Kresten Lindorff-Larsen. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci., 44(7):575–588, 2019.
[15] Stephanie Portelli, Jody E Phelan, David B Ascher, Taane G Clark, and Nicholas Furnham. Understanding molecular consequences of putative drug resistant mutations in mycobacterium tuberculosis. Sci. Rep., 8(1):1–12, 2018.
[16] Sundeep Chaitanya Vedithi, Sony Malhotra, Madhusmita Das, Sheela Daniel, Nanda Kishore, Anuja George, Shantha Arumugam, Lakshmi Rajan, Mannam Ebenezer, David B Ascher, et al. Structural implications of mutations conferring rifampin resistance in mycobacterium leprae. Sci. Rep., 8(1):5016, 2018.
[17] Vytautas Gapsys, Servaas Michielssens, Daniel Seeliger, and Bert L de Groot. Accurate and rigorous prediction of the changes in protein free energies in a large-scale mutation scan. Angew. Chem., Int. Ed., 55(26):7364–7368, 2016.
[18] Elizabeth H Kellogg, Andrew Leaver-Fay, and David Baker. Role of conformational sampling in computing mutationinduced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics, 79(3):830–838, 2011.
[19] Brian J Bender, Alberto Cisneros III, Amanda M Duran, Jessica A Finn, Darwin Fu, Alyssa D Lokits, Benjamin K Mueller, Amandeep K Sangha, Marion F Sauer, Alexander M Sevy, et al. Protocols for molecular modeling with rosetta3 and rosettascripts. Biochemistry, 55(34):4748–4763, 2016.
[20] Kurt S Thorn and Andrew A Bogan. Asedb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics, 17(3):284–285, 2001.
[21] MD Shaji Kumar and M Michael Gromiha. Pint: protein–protein interactions thermodynamic database. Nucleic Acids Res., 34(suppl 1):D195–D198, 2006.
[22] MD Shaji Kumar, K Abdulla Bava, M Michael Gromiha, Ponraj Prabakaran, Koji Kitajima, Hatsuho Uedaira, and Akinori Sarai. Protherm and pronit: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic Acids Res., 34(suppl 1):D204–D206, 2006.
[23] Iain H Moal and Juan Fern´andez-Recio. Skempi: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics, 28(20):2600–2607, 2012.
[24] Justina Jankauskait˙e, Brian Jim´enez-Garc´ıa, Justas Dapk¯unas, Juan Fern´andez-Recio, and Iain H Moal. Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 35(3):462–469, 2019.
[25] Sarah Sirin, James R Apgar, Eric M Bennett, and Amy E Keating. Ab-bind: antibody binding mutational database for computational affinity predictions. Protein Sci., 25(2):393–409, 2016.
[26] Cunliang Geng, Anna Vangone, and Alexandre MJJ Bonvin. Exploring the interplay between experimental methods and the performance of predictors of binding affinity change upon mutations in protein complexes. Protein Eng. Des. Sel., 29(8):291–299, 2016.
[27] Sherlyn Jemimah, K Yugandhar, and M Michael Gromiha. Proximate: a database of mutant protein–protein complex thermodynamics and kinetics. Bioinformatics, 33(17):2787–2788, 2017.
[28] Douglas EV Pires, David B Ascher, and Tom L Blundell. mcsm: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 30(3):335–342, 2014.
[29] Cunliang Geng, Anna Vangone, Gert E Folkers, Li C Xue, and Alexandre MJJ Bonvin. isee: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations. Proteins: Structure, Function, and Bioinformatics, 87(2):110–119, 2019.
[30] Sherlyn Jemimah, Masakazu Sekijima, and M Michael Gromiha. Proaffimuseq: sequence-based method to predict the binding free energy change of protein–protein complexes upon mutation using functional classification. Bioinformatics, 36(6):1725–1730, 2020.
[31] Guangyu Zhou, Muhao Chen, Chelsea JT Ju, Zheng Wang, Jyun-Yu Jiang, and Wei Wang. Mutation effect estimation on protein–protein interactions using deep contextualized representation learning. NAR genom. bioinform., 2(2):lqaa015, 2020.
[32] Alexey Strokach, Tian Yu Lu, and Philip M Kim. Elaspic2 (el2): combining contextualized language models and graph neural networks to predict effects of mutations. J. Mol. Biol., 433(11):166810, 2021.
[33] Xianggen Liu, Yunan Luo, Pengyong Li, Sen Song, and Jian Peng. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput. Biol., 17(8):e1009284, 2021.
[34] Piero Fariselli, Pier Luigi Martelli, Castrense Savojardo, and Rita Casadio. Inps: predicting the impact of nonsynonymous variations on protein stability from sequence. Bioinformatics, 31(17):2816–2821, 2015.
[35] Grant Thiltgen and Richard A Goldstein. Assessing predictors of changes in protein stability upon mutation using self-consistency. PLoS One, 7(10):e46084, 2012.
[36] Fabrizio Pucci, Katrien V Bernaerts, Jean Marc Kwasigroch, and Marianne Rooman. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics, 34(21):3659–3665, 2018.
[37] Dinara R Usmanova, Natalya S Bogatyreva, Joan Ari˜no Bernad, Aleksandra A Eremina, Anastasiya A Gorshkova, German M Kanevskiy, Lyubov R Lonishin, Alexander V Meister, Alisa G Yakupova, Fyodor A Kondrashov, et al. Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation. Bioinformatics, 34(21):3653–3658, 2018.
[38] Duc D Nguyen, Tian Xiao, Menglun Wang, and Guo-Wei Wei. Rigidity strengthening: A mechanism for protein– ligand binding. J. Chem. Inf. Model., 57(7):1715–1721, 2017.
[39] Md Masud Rana and Duc Duy Nguyen. Geometric graph learning with extended atom-types features for proteinligand binding affinity prediction. Comput. Biol. Med., 164:107250, 2023.
[40] Jian Jiang, Rui Wang, and Guo-Wei Wei. Ggl-tox: geometric graph learning for toxicity prediction. J. Chem. Inf. Model., 61(4):1691–1700, 2021.
[41] Menglun Wang, Zixuan Cang, and Guo-Wei Wei. A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nat. Mach. Intell., 2(2):116–123, 2020.
[42] Duan Chen, Zhan Chen, Changjun Chen, Weihua Geng, and Guo-Wei Wei. Mibpb: a software package for electrostatic analysis. J. Comput. Chem., 32(4):756–770, 2011.
[43] Xiang Liu, Huitao Feng, Jie Wu, and Kelin Xia. Hom-complex-based machine learning (hcml) for the prediction of protein–protein binding affinity changes upon mutation. J. Chem. Inf. Model., 62(17):3961–3969, 2022.
[44] Douglas EV Pires and David B Ascher. mcsm-ab: a web server for predicting antibody–antigen affinity changes upon mutation with graph-based signatures. Nucleic Acids Res., 44(W1):W469–W473, 2016.
[45] Peng Xiong, Chengxin Zhang, Wei Zheng, and Yang Zhang. Bindprofx: assessing mutation-induced binding affinity change by protein interface profiles with pseudo-counts. J. Mol. Biol., 429(3):426–434, 2017.
[46] Quanya Liu, Peng Chen, Bing Wang, Jun Zhang, and Jinyan Li. dbmpikt: a database of kinetic and thermodynamic mutant protein interactions. BMC bioinformatics, 19(1):1–7, 2018.
[47] Bian Li, Yucheng T Yang, John A Capra, and Mark B Gerstein. Predicting changes in protein thermodynamic stability upon point mutation with deep 3d convolutional neural networks. PLoS Comp. Bio., 16(11):e1008291, 2020.
[48] Lijun Quan, Qiang Lv, and Yang Zhang. Strum: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics, 32(19):2936–2946, 2016.
[49] Leo Breiman. Random forests. Machine learning, 45:5–32, 2001.
[50] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20:273–297, 1995.
[51] Duc Duy Nguyen, Kaifu Gao, Menglun Wang, and Guo-Wei Wei. Mathdl: mathematical deep learning for d3r grand challenge 4. J. Comput. Aided. Mol. Des., 34(2):131–147, 2020.
[52] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
This paper is available on Arxiv under CC 4.0 license.