Skip to content
1887

Abstract

The complexity and speed of evolution in viruses with RNA genomes makes predictive identification of variants with epidemic or pandemic potential challenging. In recent years, machine learning has become an increasingly capable technology for addressing this challenge, as advances in methods and computational power have dramatically improved the performance of models and led to their widespread adoption across industries and disciplines. Nascent applications of machine learning technology to virus research have now expanded, providing new tools for handling large-scale datasets and leading to a reshaping of existing workflows for phenotype prediction, phylogenetic analysis, drug discovery and more. This review explores how machine learning has been applied to and has impacted the study of viruses, before addressing the strengths and limitations of its techniques and finally highlighting the next steps that are needed for the technology to reach its full potential in this challenging and ever-relevant research area.

Funding
This study was supported by the:
  • Medical Research Council (Award MR/W006677/1)
    • Principle Award Recipient: SebastianBowyer
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/jgv/10.1099/jgv.0.002067
2025-01-13
2025-01-14
Loading full text...

Full text loading...

/deliver/fulltext/jgv/106/1/jgv002067.html?itemId=/content/journal/jgv/10.1099/jgv.0.002067&mimeType=html&fmt=ahah

References

  1. World Health Organisation WHO COVID-19 Dashboard. [accessed December 2023]. n.d https://covid19.who.int
  2. Pike BL, Saylors KE, Fair JN, Lebreton M, Tamoufe U et al. The origin and prevention of pandemics. Clin Infect Dis 2010; 50:1636–1640 [View Article] [PubMed]
    [Google Scholar]
  3. Liu C, Hu L, Dong G, Zhang Y, Ferreira da Silva-Júnior E et al. Emerging drug design strategies in anti-influenza drug discovery. Acta Pharm Sin B 2023; 13:4715–4732 [View Article] [PubMed]
    [Google Scholar]
  4. Thadani NN, Gurev S, Notin P, Youssef N, Rollins NJ et al. Learning from prepandemic data to forecast viral escape. Nature 2023; 622:818–825 [View Article] [PubMed]
    [Google Scholar]
  5. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 2017; 1:33–46 [View Article] [PubMed]
    [Google Scholar]
  6. Tarca AL, Carey VJ, Chen X, Romero R, Drăghici S. Machine learning and its applications to biology. PLoS Comput Biol 2007; 3:e116 [View Article] [PubMed]
    [Google Scholar]
  7. Goodswen SJ, Barratt JLN, Kennedy PJ, Kaufer A, Calarco L et al. Machine learning and applications in microbiology. FEMS Microbiol Rev 2021; 45:fuab015 [View Article] [PubMed]
    [Google Scholar]
  8. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387 [View Article] [PubMed]
    [Google Scholar]
  9. Zhang Y, Ye T, Xi H, Juhas M, Li J. Deep learning driven drug discovery: tackling severe acute respiratory syndrome coronavirus 2. Front Microbiol 2021; 12:739684 [View Article] [PubMed]
    [Google Scholar]
  10. Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022; 38:2102–2110 [View Article] [PubMed]
    [Google Scholar]
  11. Zou J, Han Y, So SS. Overview of artificial neural networks. In Livingstone DJ. ed Artificial Neural Networks: Methods and Applications [Internet](Methods in Molecular BiologyTM) Totowa, NJ: Humana Press; 2009 pp 14–22 [View Article]
    [Google Scholar]
  12. Macukow B. Neural networks – state of art, brief history, basic models and architecture. In Saeed K, Homenda W. eds Computer Information Systems and Industrial Management Cham: Springer International Publishing; 2016 pp 3–14
    [Google Scholar]
  13. Jumper J, Evans R, Pritzel A, Green T, Figurnov M et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596:583–589 [View Article] [PubMed]
    [Google Scholar]
  14. Zhao WX, Zhou K, Li J, Tang T, Wang X et al. A Survey of Large Language Models [Internet]. arXiv; 2023 http://arxiv.org/abs/2303.18223
  15. Govindan G, Nair AS. Bagging with CTD-a novel signature for the hierarchical prediction of secreted protein trafficking in eukaryotes. Genom Proteom Bioinform 2013; 11:385–390 [View Article] [PubMed]
    [Google Scholar]
  16. Detlefsen NS, Hauberg S, Boomsma W. Learning meaningful representations of protein sequences. Nat Commun 2022; 13:1914 [View Article] [PubMed]
    [Google Scholar]
  17. Sayers EW, Bolton EE, Brister JR, Canese K, Chan J et al. Database resources of the national center for biotechnology information. Nucleic Acids Res 2022; 50:D20–D26 [View Article]
    [Google Scholar]
  18. Olson RD, Assaf R, Brettin T, Conrad N, Cucinell C et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res 2023; 51:D678–D689 [View Article] [PubMed]
    [Google Scholar]
  19. Bateman A, Martin M-J, Orchard S, Magrane M, Ahmad S et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 2023; 51:D523–D531 [View Article]
    [Google Scholar]
  20. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN et al. The protein data bank. Nucleic Acids Res 2000; 28:235–242 [View Article] [PubMed]
    [Google Scholar]
  21. Attaluri PK, Zheng X, Chen Z, Lu G. Applying machine learning techniques to classify H1N1 viral strains occurring in 2009 flu pandemic. BIOT-2009 2009
    [Google Scholar]
  22. Kim H, Webster RG, Webby RJ. Influenza virus: dealing with a drifting and shifting pathogen. Viral Immunol 2018; 31:174–183 [View Article] [PubMed]
    [Google Scholar]
  23. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of influenza A viruses. Microbiol Rev 1992; 56:152–179 [View Article] [PubMed]
    [Google Scholar]
  24. Eng CLP, Tong JC, Tan TW. Predicting host tropism of influenza A virus proteins using random forest. BMC Med Genom 2014; 7:S1 [View Article] [PubMed]
    [Google Scholar]
  25. Kwon E, Cho M, Kim H, Son HS. A study on host tropism determinants of influenza virus using machine learning. Curr Bioinform 2020; 15:121–134 [View Article]
    [Google Scholar]
  26. Xu Y, Wojtczak D. Dive into machine learning algorithms for influenza virus host prediction with hemagglutinin sequences. Biosystems 2022; 220:104740 [View Article] [PubMed]
    [Google Scholar]
  27. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [Internet] New York, NY, USA: Association for Computing Machinery;November8 2016 pp 785–794
    [Google Scholar]
  28. Cacciabue M, Marcone DN. INFINITy: a fast machine learning-based application for human influenza A and B virus subtyping. Influenza Other Respir Viruses 2023; 17:e13096 [View Article] [PubMed]
    [Google Scholar]
  29. Humayun F, Khan F, Fawad N, Shamas S, Fazal S et al. Computational method for classification of avian influenza A virus using DNA sequence information and physicochemical properties. Front Genet 2021; 12:599321 [View Article] [PubMed]
    [Google Scholar]
  30. NCBI Virus [Internet]; 2023 https://www.ncbi.nlm.nih.gov/labs/virus/vssi accessed 6 May 2023
  31. Hui KPY, Ho JCW, Cheung M-C, Ng K-C, Ching RHH et al. SARS-CoV-2 Omicron variant replication in human bronchus and lung ex vivo. Nature 2022; 603:715–720 [View Article] [PubMed]
    [Google Scholar]
  32. Robson F, Khan KS, Le TK, Paris C, Demirbag S et al. Coronavirus RNA proofreading: molecular basis and therapeutic targeting. Mol Cell 2020; 79:710–727 [View Article] [PubMed]
    [Google Scholar]
  33. Carabelli AM, Peacock TP, Thorne LG, Harvey WT, Hughes J et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat Rev Microbiol 2023; 21:162–177 [View Article] [PubMed]
    [Google Scholar]
  34. Kawasaki Y, Abe H, Yasuda J. Comparison of genome replication fidelity between SARS-CoV-2 and influenza A virus in cell culture. Sci Rep 2023; 13:13105 [View Article] [PubMed]
    [Google Scholar]
  35. Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020; 5:1403–1407 [View Article] [PubMed]
    [Google Scholar]
  36. O’Toole Á, Scher E, Underwood A, Jackson B, Hill V et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol 2021; 7:veab064 [View Article]
    [Google Scholar]
  37. Obermeyer F, Jankowiak M, Barkas N, Schaffner SF, Pyle JD et al. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science 2022; 376:1327–1332 [View Article] [PubMed]
    [Google Scholar]
  38. Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L et al. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat Genet 2021; 53:809–816 [View Article] [PubMed]
    [Google Scholar]
  39. Lucas C, Vogels CBF, Yildirim I, Rothman JE, Lu P et al. Impact of circulating SARS-CoV-2 variants on mRNA vaccine-induced immunity. Nature 2021; 600:523–529 [View Article] [PubMed]
    [Google Scholar]
  40. Baj A, Novazzi F, Drago Ferrante F, Genoni A, Tettamanzi E et al. Spike protein evolution in the SARS-CoV-2 delta variant of concern: a case series from Northern Lombardy. Emerg Microbes Infect 2021; 10:2010–2015 [View Article]
    [Google Scholar]
  41. Zhou B, Zhou H, Zhang X, Xu X, Chai Y et al. TEMPO: a transformer-based mutation prediction framework for SARS-CoV-2 evolution. Comput Biol Med 2023; 152:106264 [View Article] [PubMed]
    [Google Scholar]
  42. Zeller MA, Gauger PC, Arendsee ZW, Souza CK, Vincent AL et al. Machine learning prediction and experimental validation of antigenic drift in H3 influenza A viruses in Swine. mSphere 2021; 6: [View Article]
    [Google Scholar]
  43. Yao Y, Li X, Liao B, Huang L, He P et al. Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method. Sci Rep 2017; 7:1545 [View Article]
    [Google Scholar]
  44. Xia YL, Li W, Li Y, Ji XL, Fu YX et al. A deep learning approach for predicting antigenic variation of influenza A H3N2. Comput Math Methods Med 20211–10 [View Article]
    [Google Scholar]
  45. Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF et al. Mapping the antigenic and genetic evolution of influenza virus. Science 2004; 305:371–376 [View Article]
    [Google Scholar]
  46. Goldman D, Domschke K. Making sense of deep sequencing. Int J Neuropsychopharmacol 2014; 17:1717–1725 [View Article] [PubMed]
    [Google Scholar]
  47. Schmidt B, Hildebrandt A. Deep learning in next-generation sequencing. Drug Discov Today 2021; 26:173–180 [View Article] [PubMed]
    [Google Scholar]
  48. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods 2014; 11:801–807 [View Article] [PubMed]
    [Google Scholar]
  49. Chen C, Boorla VS, Banerjee D, Chowdhury R, Cavener VS et al. Computational prediction of the effect of amino acid changes on the binding affinity between SARS-CoV-2 spike RBD and human ACE2. Proc Natl Acad Sci USA 2021; 118:e2106480118 [View Article]
    [Google Scholar]
  50. Starr TN, Greaney AJ, Hannon WW, Loes AN, Hauser K et al. Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science 2022; 377:420–424 [View Article] [PubMed]
    [Google Scholar]
  51. Markov PV, Ghafari M, Beer M, Lythgoe K, Simmonds P et al. The evolution of SARS-CoV-2. Nat Rev Microbiol 2023; 21:361–379 [View Article] [PubMed]
    [Google Scholar]
  52. Pavlova A, Zhang Z, Acharya A, Lynch DL, Pang YT et al. Machine learning reveals the critical interactions for SARS-CoV-2 spike protein binding to ACE2. J Phys Chem Lett 2021; 12:5494–5502 [View Article] [PubMed]
    [Google Scholar]
  53. Casalino L, Dommer AC, Gaieb Z, Barros EP, Sztain T et al. AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. Int J High Perform Comput Appl 2021; 35:432–451 [View Article] [PubMed]
    [Google Scholar]
  54. Thomas S, Abraham A, Baldwin J, Piplani S, Petrovsky N. Artificial intelligence in vaccine and drug design. Methods Mol Biol 2022; 2410:131–146 [View Article] [PubMed]
    [Google Scholar]
  55. Dara S, Dhamercherla S, Jadav SS, Babu CM, Ahsan MJ. Machine learning in drug discovery: a review. Artif Intell Rev 2022; 55:1947–1999 [View Article] [PubMed]
    [Google Scholar]
  56. Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:5014–5027 [View Article] [PubMed]
    [Google Scholar]
  57. Joshi T, Sharma P, Mathpal S, Joshi T, Maiti P et al. Computational investigation of drug bank compounds against 3C-like protease (3CLpro) of SARS-CoV-2 using deep learning and molecular dynamics simulation. Mol Divers 2022; 26:2243–2256 [View Article] [PubMed]
    [Google Scholar]
  58. Ozdemir ES, Ranganathan SV, Nussinov R. How has artificial intelligence impacted COVID-19 drug repurposing and what lessons have we learned?. Expert Opin Drug Discov 2022; 17:1061–1065 [View Article] [PubMed]
    [Google Scholar]
  59. Liu Z, Du J, Fang J, Yin Y, Xu G et al. DeepScreening: a deep learning-based screening web server for accelerating drug discovery. Database 2019; 2019:baz104 [View Article]
    [Google Scholar]
  60. Beck BR, Shin B, Choi Y, Park S, Kang K. Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model. Comput Struct Biotechnol J 2020; 18:784–790 [View Article] [PubMed]
    [Google Scholar]
  61. Zhang H, Yang Y, Li J, Wang M, Saravanan KM et al. A novel virtual screening procedure identifies pralatrexate as inhibitor of SARS-CoV-2 RdRp and it reduces viral replication in vitro. PLoS Comput Biol 2020; 16:e1008489 [View Article] [PubMed]
    [Google Scholar]
  62. Beigel JH, Tomashek KM, Dodd LE. Remdesivir for the treatment of covid-19 - preliminary report. reply. N Engl J Med 2020; 383:994 [View Article] [PubMed]
    [Google Scholar]
  63. Ali M, Park IH, Kim J, Kim G, Oh J et al. How deep learning in antiviral molecular profiling identified anti-SARS-CoV-2 inhibitors. Biomedicines 2023; 11:3134 [View Article]
    [Google Scholar]
  64. Ponne S, Kumar R, Vanmathi SM, Brilhante RSN, Kumar CR. Reverse engineering protection: a comprehensive survey of reverse vaccinology-based vaccines targeting viral pathogens. Vaccine 2024; 42:2503–2518 [View Article] [PubMed]
    [Google Scholar]
  65. Bukhari SNH, Jain A, Haq E, Mehbodniya A, Webber J. Machine learning techniques for the prediction of B-Cell and T-Cell epitopes as potential vaccine targets with a specific focus on SARS-CoV-2 pathogen: a review. Pathogens 2022; 11:146 [View Article] [PubMed]
    [Google Scholar]
  66. Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res 2015; 43:D405–12 [View Article] [PubMed]
    [Google Scholar]
  67. Olsen TH, Boyles F, Deane CM. Observed Antibody Space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci 2022; 31:141–146 [View Article] [PubMed]
    [Google Scholar]
  68. Guo Y, Chen K, Kwong PD, Shapiro L, Sheng Z. cAb-Rep: a database of curated antibody repertoires for exploring antibody diversity and predicting antibody prevalence. Front Immunol 2019; 10:2365 [View Article] [PubMed]
    [Google Scholar]
  69. Sidhom JW, Larman HB, Pardoll DM, Baras AS. DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires. Nat Commun 2021; 12:1605 [View Article] [PubMed]
    [Google Scholar]
  70. Schultheiß C, Paschold L, Simnica D, Mohme M, Willscher E et al. Next-generation sequencing of T and B cell receptor repertoires from COVID-19 patients showed signatures associated with severity of disease. Immunity 2020; 53:442–455 [View Article] [PubMed]
    [Google Scholar]
  71. Lian X, Yang X, Yang S, Zhang Z. Current status and future perspectives of computational studies on human–virus protein–protein interactions. Brief Bioinform 2021; 22:bbab029 [View Article]
    [Google Scholar]
  72. Ammari MG, Gresham CR, McCarthy FM, Nanduri B. HPIDB 2.0: a curated database for host-pathogen interactions. Database 2016baw103 [View Article] [PubMed]
    [Google Scholar]
  73. Yang X, Lian X, Fu C, Wuchty S, Yang S et al. HVIDB: a comprehensive database for human-virus protein-protein interactions. Brief Bioinform 2021; 22:832–844 [View Article] [PubMed]
    [Google Scholar]
  74. Cook HV, Doncheva NT, Szklarczyk D, von Mering C, Jensen LJ. Viruses.STRING: a virus-host protein-protein interaction database. Viruses 2018; 10:519 [View Article] [PubMed]
    [Google Scholar]
  75. Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. HCVpro: hepatitis C virus protein interaction database. Infect Genet Evol 2011; 11:1971–1977 [View Article] [PubMed]
    [Google Scholar]
  76. Ako-Adjei D, Fu W, Wallin C, Katz KS, Song G et al. HIV-1, human interaction database: current status and new features. Nucleic Acids Res 2015; 43:D566–D570 [View Article] [PubMed]
    [Google Scholar]
  77. Liu-Wei W, Kafkas Ş, Chen J, Dimonaco NJ, Tegnér J et al. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021; 37:2722–2729 [View Article] [PubMed]
    [Google Scholar]
  78. Tastan O, Qi Y, Carbonell JG, Klein-seetharaman J. Prediction of interactions between HIV-1 and human proteins by information integration. Biocomputing 2008516–527 [View Article]
    [Google Scholar]
  79. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al. Gene ontology: tool for the unification of biology. Nat Genet 2000; 25:25–29 [View Article]
    [Google Scholar]
  80. Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY et al. NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 2014; 42:D975–D979 [View Article] [PubMed]
    [Google Scholar]
  81. Kafkas Ş, Abdelhakim M, Hashish Y, Kulmanov M, Abdellatif M et al. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research. Sci Data 2019; 6:79 [View Article] [PubMed]
    [Google Scholar]
  82. Brito AF, Pinney JW. Protein-protein interactions in virus-host systems. Front Microbiol 2017; 8:1557 [View Article] [PubMed]
    [Google Scholar]
  83. Yang X, Yang S, Lian X, Wuchty S, Zhang Z. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics 2021; 37:4771–4778 [View Article] [PubMed]
    [Google Scholar]
  84. Tsukiyama S, Hasan MM, Fujii S, Kurata H. LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec. Brief Bioinform 2021; 22:bbab228 [View Article] [PubMed]
    [Google Scholar]
  85. Yang X, Yang S, Li Q, Wuchty S, Zhang Z. Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Comput Struct Biotechnol J 2020; 18:153–161 [View Article] [PubMed]
    [Google Scholar]
  86. Ibtehaz N, Kihara D. Application of Sequence Embedding in Protein Sequence-Based Predictions [Internet]. arXiv; 2021 http://arxiv.org/abs/2110.07609 accessed 24 May 2023
  87. Yao Y, Du X, Diao Y, Zhu H. An integration of deep learning with feature embedding for protein–protein interaction prediction. PeerJ 2019; 7:e7126 [View Article]
    [Google Scholar]
  88. Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19:1750–1758 [View Article] [PubMed]
    [Google Scholar]
  89. Hie B, Zhong ED, Berger B, Bryson B. Learning the language of viral evolution and escape. Science 2021; 371:284–288 [View Article] [PubMed]
    [Google Scholar]
  90. Patil R, Gudivada V. A review of current trends, techniques, and challenges in Large Language Models (LLMs). Appl Sci 2024; 14:2074 [View Article]
    [Google Scholar]
  91. Devlin J, Chang MW, Lee K, Toutanova K. B. Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) [Internet] Minneapolis, Minnesota: Association for Computational Linguistics; 2019 pp 4171–4186 https://aclanthology.org/N19-1423
    [Google Scholar]
  92. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007; 23:1282–1288 [View Article] [PubMed]
    [Google Scholar]
  93. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One 2015; 10:e0141287 [View Article] [PubMed]
    [Google Scholar]
  94. Han W, Chen N, Xu X, Sahil A, Zhou J et al. Predicting the antigenic evolution of SARS-COV-2 with deep learning. Nat Commun 2023; 14:3478 [View Article] [PubMed]
    [Google Scholar]
  95. Taft JM, Weber CR, Gao B, Ehling RA, Han J et al. Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain. Cell 2022; 185:4008–4022 [View Article] [PubMed]
    [Google Scholar]
  96. Lin Z, Akin H, Rao R, Hie B, Zhu Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023; 379:1123–1130 [View Article] [PubMed]
    [Google Scholar]
  97. Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D et al. Simulating 500 million years of evolution with a language model. Synth Biol 2024 [View Article]
    [Google Scholar]
  98. Iman M, Arabnia HR, Rasheed K. A review of deep transfer learning and recent advancements. Technologies 2023; 11:40 [View Article]
    [Google Scholar]
  99. Dong TN, Brogden G, Gerold G, Khosla M. A multitask transfer learning framework for the prediction of virus-human protein-protein interactions. BMC Bioinform 2021; 22:572 [View Article] [PubMed]
    [Google Scholar]
  100. Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 2019; 16:1315–1322 [View Article] [PubMed]
    [Google Scholar]
  101. Lanchantin J, Weingarten T, Sekhon A, Miller C, Qi Y. Transfer learning for predicting virus-host protein interactions for novel virus sequences. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics [Internet] New York, NY, USA: Association for Computing Machinery; 2021 pp 1–10 [View Article]
    [Google Scholar]
  102. Madan S, Demina V, Stapf M, Ernst O, Fröhlich H. Accurate prediction of virus-host protein-protein interactions via a siamese neural network using deep protein sequence embeddings. Patterns 2022; 3:100551 [View Article] [PubMed]
    [Google Scholar]
  103. Cui G, Fang C, Han K. Prediction of protein-protein interactions between viruses and human by an SVM model. BMC Bioinf 2012; 13:S5 [View Article] [PubMed]
    [Google Scholar]
  104. Karabulut OC, Karpuzcu BA, Türk E, Ibrahim AH, Süzek BE. ML-AdVInfect: a machine-learning based adenoviral infection predictor. Front Mol Biosci 2021; 8:647424 [View Article] [PubMed]
    [Google Scholar]
  105. Chen Z, Azman AS, Chen X, Zou J, Tian Y et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet 2022; 54:499–507 [View Article] [PubMed]
    [Google Scholar]
  106. Wilkinson MD, Dumontier M, IjJ A, Appleton G, Axton M et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016; 3:160018
    [Google Scholar]
  107. Spratt DE, Chan T, Waldron L, Speers C, Feng FY et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol 2016; 2:1070–1074 [View Article] [PubMed]
    [Google Scholar]
  108. Juhn YJ, Ryu E, Wi CI, King KS, Malik M et al. Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index. J Am Med Inform Assoc 2022; 29:1142–1151
    [Google Scholar]
  109. Abbud A, Castilho EA. A call for a more comprehensive SARS-cov-2 sequence database for Brazil. Lancet Reg Health Am 2021; 5:100095
    [Google Scholar]
  110. Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: a review of machine learning interpretability methods. Entropy 2021; 23:18 [View Article]
    [Google Scholar]
  111. Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. Appl Intell 2022; 52:3002–3017 [View Article]
    [Google Scholar]
  112. Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM et al. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2023; 56:5975–6037 [View Article] [PubMed]
    [Google Scholar]
  113. Frazer J, Notin P, Dias M, Gomez A, Min JK et al. Disease variant prediction with deep generative models of evolutionary data. Nature 2021; 599:91–95 [View Article] [PubMed]
    [Google Scholar]
  114. Thanadi N, Gurev S, Notin P, Youssef N, Rollins N et al. Learning from pre-pandemic data to forecast viral escape. [accessed May 2024]; 2023 https://evescape.org/
  115. Paz S. Climate change: a driver of increasing vector-borne disease transmission in non-endemic areas. PLoS Med 2024; 21:e1004382 [View Article] [PubMed]
    [Google Scholar]
  116. Leung XY, Islam RM, Adhami M, Ilic D, McDonald L et al. A systematic review of dengue outbreak prediction models: current scenario and future directions. PLoS Negl Trop Dis 2023; 17:e0010631 [View Article] [PubMed]
    [Google Scholar]
  117. Alexander J, Wilke ABB, Mantero A, Vasquez C, Petrie W et al. Using machine learning to understand microgeographic determinants of the Zika vector, Aedes aegypti. PLoS One 2022; 17:e0265472 [View Article] [PubMed]
    [Google Scholar]
  118. Rahman MS, Chamsai P, Sumaira Z, Tipaya E, Richard EP et al. Mapping the spatial distribution of the dengue vector Aedes aegypti and predicting its abundance in northeastern Thailand using machine learning approach. One health 2022 [View Article]
    [Google Scholar]
  119. Mano LY, Torres AM, Morales AG, Cruz CCP, Cardoso FH et al. Machine learning applied to COVID-19: a review of the initial pandemic period. Int J Comput Intell Syst 2023; 16:73 [View Article]
    [Google Scholar]
  120. Khare S, Gurry C, Freitas L, Schultz MB, Bach G et al. GISAID’s role in pandemic response. China CDC Wkly 2021; 3:1049–1051 [View Article] [PubMed]
    [Google Scholar]
  121. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L et al. The influenza virus resource at the national center for biotechnology information. J Virol 2008; 82:596–601 [View Article] [PubMed]
    [Google Scholar]
  122. Poux S, Arighi CN, Magrane M, Bateman A, Wei C-H et al. On expert curation and scalability: UniProtKB/swiss-prot as a case study. Bioinformatics 2017; 33:3454–3460 [View Article] [PubMed]
    [Google Scholar]
  123. Jankauskaite J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019; 35:462–469 [View Article] [PubMed]
    [Google Scholar]
  124. Hospital A, Goñi JR, Orozco M, Gelpí JL. Molecular dynamics simulations: advances and applications. Adv Appl Bioinform Chem 2015; 8:37–47 [View Article] [PubMed]
    [Google Scholar]
/content/journal/jgv/10.1099/jgv.0.002067
Loading
/content/journal/jgv/10.1099/jgv.0.002067
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error