Progress and Challenge in Computational Identification of Influenza Virus Reassortment

Xiao Ding; Luyao Qin; Jing Meng; Yousong Peng; Aiping Wu; Taijiao Jiang

doi:10.1007/s12250-021-00392-w

December 2021

Xiao Ding, Luyao Qin, Jing Meng, Yousong Peng, Aiping Wu and Taijiao Jiang. Progress and Challenge in Computational Identification of Influenza Virus Reassortment[J]. Virologica Sinica, 2021, 36(6): 1273-1283. doi: 10.1007/s12250-021-00392-w

Citation: Xiao Ding, Luyao Qin, Jing Meng, Yousong Peng, Aiping Wu, Taijiao Jiang. Progress and Challenge in Computational Identification of Influenza Virus Reassortment .VIROLOGICA SINICA, 2021, 36(6) : 1273-1283. http://dx.doi.org/10.1007/s12250-021-00392-w

生物信息学在流感重配识别中的进展与挑战

丁啸 ^1,4, ,
秦璐瑶 ^1,4, ,
孟静 ^1,4 ,
彭友松 ² ,
吴爱平 ^1,4 ,
蒋太交 ^1,3,4,,

2.
湖南大学生物学院
4.
中国医学科学院苏州系统医学研究所

通讯作者： 蒋太交, taijiao@ibms.pumc.edu.cn, ORCID: 0000-0002-6071-0122
收稿日期： 2020-09-03
录用日期： 2021-03-29
出版日期： 2021-05-26

摘要

基因组的重配是流感病毒的重要进化机制之一，即感染同一宿主的不同流感病毒的完整基因片段进行重新组合的过程。研究表明，之前的全球流感大流行就是由流感病毒的重配造成的。此外，重配还会使禽流感病毒跨越宿主的屏障感染人类，严重影响人类的健康和社会经济发展。由于识别流感病毒重配的生物学实验受到安全和伦理的限制，科研人员试图通过计算的方法对其进行识别。近年来，大量基于生物信息学的流感病毒重配识别算法和重配相关数据库被相继开发。本文中，我们首先系统地综述了不同类型的重配识别算法的原理和优缺点，然后介绍了已开发的流感病毒重配相关的数据库。最后详细地讨论了目前生物信息学在流感病毒重配中面临的瓶颈和挑战。希望我们的工作能够给流感病毒重配相关研究人员提供指导和新的研究视角。
- 流感病毒
- , 重配识别
- , 生物信息学
- , 重配数据库

Progress and Challenge in Computational Identification of Influenza Virus Reassortment

Xiao Ding ^1,4, ,
Luyao Qin ^1,4, ,
Jing Meng ^1,4 ,
Yousong Peng ² ,
Aiping Wu ^1,4 ,
Taijiao Jiang ^1,3,4,,

1.
Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
2.
College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
3.
Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
4.
Suzhou Institute of Systems Medicine, Suzhou, Jiangsu 215123, China

Corresponding author: Taijiao Jiang, taijiao@ibms.pumc.edu.cn
ORCID: 0000-0002-6071-0122
Received Date: 03 September 2020
Accepted Date: 29 March 2021
Published Date: 26 May 2021

Abstract

Genomic reassortment is an important evolutionary mechanism for influenza viruses. In this process, the novel viruses acquire new characteristics by the exchange of the intact gene segments among multiple influenza virus genomes, which may cause flu endemics and epidemics within or even across hosts. Due to the safety and ethical limitations of the experimental studies on influenza virus reassortment, numerous computational researches on the influenza virus reassortment have been done with the explosion of the influenza virus genomic data. A great amount of computational methods and bioinformatics databases were developed to facilitate the identification of influenza virus reassortments. In this review, we summarized the progress and challenge of the bioinformatics research on influenza virus reassortment, which can guide the researchers to investigate the influenza virus reassortment events reasonably and provide valuable insight to develop the related computational identification tools.
- Influenza virus
- , Reassortment
- , Bioinformatics
- , Identification
- , Database

References
1. Ahasan MS, Subramaniam K, Sayler KA, Loeb JC, Popov VL, Lednicky JA, Wisely SM, Campos Krauer JM, Waltzek TB (2019) Molecular characterization of a novel reassortment Mammalian orthoreovirus type 2 isolated from a Florida white-tailed deer fawn. Virus Res 270: 197642
  doi: 10.1016/j.virusres.2019.197642
2. Arenas M, Posada D (2010) The effect of recombination on the reconstruction of ancestral sequences. Genetics 184: 1133-1139
  doi: 10.1534/genetics.109.113423
3. Bi Y, Chen Q, Wang Q, Chen J, Jin T, Wong G, Quan C, Liu J, Wu J, Yin R, Zhao L, Li M, Ding Z, Zou R, Xu W, Li H, Wang H, Tian K, Fu G, Huang Y, Shestopalov A, Li S, Xu B, Yu H, Luo T, Lu L, Xu X, Luo Y, Liu Y, Shi W, Liu D, Gao GF (2016) Genesis, evolution and prevalence of H5N6 avian influenza viruses in China. Cell Host Microbe 20: 810-821
  doi: 10.1016/j.chom.2016.10.022
4. Blitvich BJ, Saiyasombat R, Dorman KS, Garcia-Rejon JE, Farfan-Ale JA, Loroño-Pino MA (2012) Sequence and phylogenetic data indicate that an orthobunyavirus recently detected in the Yucatan Peninsula of Mexico is a novel reassortant of Potosi and Cache Valley viruses. Arch Virol 157: 1199-1204
  doi: 10.1007/s00705-012-1279-x
5. Boni MF, de Jong MD, van Doorn HR, Holmes EC (2010) Guidelines for identifying homologous recombination events in influenza A virus. PLoS ONE 5: e10434
  doi: 10.1371/journal.pone.0010434
6. Butler D (2011) Fears grow over lab-bred flu. Nature 480: 421-422
  doi: 10.1038/480421a
7. Chan JM, Carlsson G, Rabadan R (2013) Topology of viral evolution. Proc Natl Acad Sci USA 110: 18566-18571
  doi: 10.1073/pnas.1313480110
8. de Silva UC, Tanaka H, Nakamura S, Goto N, Yasunaga T (2012) A comprehensive analysis of reassortment in influenza A virus. Biol Open 1: 385-390
  doi: 10.1242/bio.2012281
9. Ding X, Yuan X, Mao L, Wu A, Jiang T (2020) FluReassort: a database for the study of genomic reassortments among influenza viruses. Brief Bioinform 21: 2126-2132
  doi: 10.1093/bib/bbz128
10. Dong C, Ying L, Yuan D (2011) Detecting transmission and reassortment events for influenza A viruses with genotype profile method. Virol J 8: 395
  doi: 10.1186/1743-422X-8-395
11. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368-376
  doi: 10.1007/BF01734359
12. Fouchier RA (2015) Studies on influenza virus transmission between ferrets: the public health risks revisited. mBio 6: e02560-14
13. Gao R, Cao B, Hu Y, Feng Z, Wang D, Hu W, Chen J, Jie Z, Qiu H, Xu K, Xu X, Lu H, Zhu W, Gao Z, Xiang N, Shen Y, He Z, Gu Y, Zhang Z, Yang Y, Zhao X, Zhou L, Li X, Zou S, Zhang Y, Li X, Yang L, Guo J, Dong J, Li Q, Dong L, Zhu Y, Bai T, Wang S, Hao P, Yang W, Zhang Y, Han J, Yu H, Li D, Gao GF, Wu G, Wang Y, Yuan Z, Shu Y (2013) Human infection with a novel avian-origin influenza A (H7N9) virus. N Engl J Med 368: 1888-1897
  doi: 10.1056/NEJMoa1304459
14. Garten RJ, Davis CT, Russell CA, Shu B, Lindstrom S, Balish A, Sessions WM, Xu X, Skepner E, Deyde V, Okomo-Adhiambo M, Gubareva L, Barnes J, Smith CB, Emery SL, Hillman MJ, Rivailler P, Smagala J, de Graaf M, Burke DF, Fouchier RA, Pappas C, Alpuche-Aranda CM, Lopez-Gatell H, Olivera H, Lopez I, Myers CA, Faix D, Blair PJ, Yu C, Keene KM, Dotson PD Jr, Boxrud D, Sambol AR, Abid SH, St George K, Bannerman T, Moore AL, Stringer DJ, Blevins P, Demmler-Harrison GJ, Ginsberg M, Kriner P, Waterman S, Smole S, Guevara HF, Belongia EA, Clark PA, Beatrice ST, Donis R, Katz J, Finelli L, Bridges CB, Shaw M, Jernigan DB, Uyeki TM, Smith DJ, Klimov AI, Cox NJ (2009) Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science 325: 197-201
  doi: 10.1126/science.1176225
15. Goloboff PA, Wilkinson M (2018) On defining a unique phylogenetic tree with homoplastic characters. Mol Phylogenet Evol 122: 95-101
  doi: 10.1016/j.ympev.2018.01.020
16. Graybeal A (1998) Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 47: 9-17
  doi: 10.1080/106351598260996
17. Karasin AI, Carman S, Olsen CW (2006) Identification of human H1N2 and human-swine reassortant H1N2 and H1N1 influenza A viruses among pigs in Ontario, Canada (2003 to 2005). J Clin Microbiol 44: 1123-1126
  doi: 10.1128/JCM.44.3.1123-1126.2006
18. Karasin AI, Landgraf J, Swenson S, Erickson G, Goyal S, Woodruff M, Scherba G, Anderson G, Olsen CW (2002) Genetic characterization of H1N2 influenza A viruses isolated from pigs throughout the United States. J Clin Microbiol 40: 1073-1079
  doi: 10.1128/JCM.40.3.1073-1079.2002
19. Karasin AI, Schutten MM, Cooper LA, Smith CB, Subbarao K, Anderson GA, Carman S, Olsen CW (2000) Genetic characterization of H3N2 influenza viruses isolated from pigs in North America, 1977-1999: evidence for wholly human and reassortant virus genotypes. Virus Res 68: 71-85
  doi: 10.1016/S0168-1702(00)00154-4
20. Kawaoka Y, Krauss S, Webster RG (1989) Avian-to-human transmission of the PB1 gene of influenza A viruses in the 1957 and 1968 pandemics. J Virol 63: 4603-4608
  doi: 10.1128/jvi.63.11.4603-4608.1989
21. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6: 29
  doi: 10.1186/1471-2148-6-29
22. Khiabanian H, Trifonov V, Rabadan R (2009) Reassortment patterns in Swine influenza viruses. PLoS ONE 4: e7366
  doi: 10.1371/journal.pone.0007366
23. Kilbourne ED (2006) Influenza pandemics of the 20th century. Emerg Infect Dis 12: 9-14
  doi: 10.3201/eid1201.051254
24. Kingsford C, Nagarajan N, Salzberg SL (2009) 2009 Swine-origin influenza A (H1N1) resembles previous influenza isolates. PLoS ONE 4: e6402
  doi: 10.1371/journal.pone.0006402
25. Lam TT, Wang J, Shen Y, Zhou B, Duan L, Cheung CL, Ma C, Lycett SJ, Leung CY, Chen X, Li L, Hong W, Chai Y, Zhou L, Liang H, Ou Z, Liu Y, Farooqui A, Kelvin DJ, Poon LL, Smith DK, Pybus OG, Leung GM, Shu Y, Webster RG, Webby RJ, Peiris JS, Rambaut A, Zhu H, Guan Y (2013) The genesis and source of the H7N9 influenza viruses causing human infections in China. Nature 502: 241-244
  doi: 10.1038/nature12515
26. Levin S, Holmes EC, Ghedin E, Miller N, Taylor J, Bao Y, St George K, Grenfell BT, Salzberg SL, Fraser CM, Lipman DJ, Taubenberger JK (2005) Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol 3: e300
  doi: 10.1371/journal.pbio.0030300
27. Li YW, Yu L, Zhang YP (2007) "Long-branch Attraction" artifact in phylogenetic reconstruction. Yi Chuan 29: 659-667
  doi: 10.1360/yc-007-0659
28. Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, Ingersoll R, Sheppard HW, Ray SC (1999) Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73: 152-160
  doi: 10.1128/JVI.73.1.152-160.1999
29. Lu G, Rowley T, Garten R, Donis RO (2007) FluGenome: a web tool for genotyping influenza A virus. Nucleic Acids Res 35: W275-279
  doi: 10.1093/nar/gkm365
30. Lun AT, Wong JW, Downard KM (2012) FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry. BMC Bioinformatics 13: 208
  doi: 10.1186/1471-2105-13-208
31. Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics 16: 562-563
  doi: 10.1093/bioinformatics/16.6.562
32. McGuire G, Wright F, Prentice MJ (1997) A graphical method for detecting recombination in phylogenetic data sets. Mol Biol Evol 14: 1125-1131
  doi: 10.1093/oxfordjournals.molbev.a025722
33. Mena I, Nelson MI, Quezada-Monroy F, Dutta J, Cortes-Fernández R, Lara-Puente JH, Castro-Peralta F, Cunha LF, Trovão NS, Lozano-Dubernard B, Rambaut A, van Bakel H, García-Sastre A (2016) Origins of the 2009 H1N1 influenza pandemic in swine in Mexico. Elife 5: e16777
  doi: 10.7554/eLife.16777
34. Nagarajan N, Kingsford C (2008) Uncovering genomic reassortments among influenza strains by enumerating maximal bicliques. Paper presented at the 2008 IEEE international conference on bioinformatics and biomedicine. https://doi.org/10.1109/BIBM.2008.78
35. Nagarajan N, Kingsford C (2011) GiRaF: robust, computational identification of influenza reassortments via graph mining. Nucleic Acids Res 39: e34-e34
  doi: 10.1093/nar/gkq1232
36. Nakajima K, Nobusawa E, Nagy A, Nakajima S (2005) Accumulation of amino acid substitutions promotes irreversible structural changes in the hemagglutinin of human influenza AH3 virus during evolution. J Virol 79: 6472-6477
  doi: 10.1128/JVI.79.10.6472-6477.2005
37. Olsen CW, Karasin AI, Carman S, Li Y, Bastien N, Ojkic D, Alves D, Charbonneau G, Henning BM, Low DE, Burton L, Broukhanski G (2006) Triple reassortant H3N2 influenza A viruses, Canada, 2005. Emerg Infect Dis 12: 1132-1135
  doi: 10.3201/eid1207.060268
38. Prosperi MC, Ciccozzi M, Fanti I, Saladini F, Pecorari M, Borghi V, Di Giambenedetto S, Bruzzone B, Capetti A, Vivarelli A, Rusconi S, Re MC, Gismondo MR, Sighinolfi L, Gray RR, Salemi M, Zazzi M, De Luca A (2011) A novel methodology for large-scale phylogeny partition. Nat Commun 2: 321
  doi: 10.1038/ncomms1325
39. Rabadan R, Levine AJ, Krasnitz M (2008) Non-random reassortment in human influenza A viruses. Influenza Other Respir Viruses 2: 9-22
  doi: 10.1111/j.1750-2659.2007.00030.x
40. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406-425
  doi: 10.1093/oxfordjournals.molbev.a040454
41. Salzberg SL, Kingsford C, Cattoli G, Spiro DJ, Janies DA, Aly MM, Brown IH, Couacy-Hymann E, De Mia GM, Dung do H, Guercio A, Joannis T, Maken Ali AS, Osmani A, Padalino I, Saad MD, Savic V, Sengamalay NA, Yingst S, Zaborsky J, Zorman-Rojs O, Ghedin E, Capua I (2007) Genome analysis linking recent European and African influenza (H5N1) viruses. Emerg Infect Dis 13: 713-718
  doi: 10.3201/eid1305.070013
42. Sawyer S (1989) Statistical tests for detecting gene conversion. Mol Biol Evol 6: 526-538
43. Schäfer JR, Kawaoka Y, Bean WJ, Süss J, Senne D, Webster RG (1993) Origin of the pandemic 1957 H2 influenza A virus and the persistence of its possible progenitors in the avian reservoir. Virology 194: 781-788
  doi: 10.1006/viro.1993.1319
44. Smith GJ, Donis RO (2015) Nomenclature updates resulting from the evolution of avian influenza A(H5) virus clades 2.1.3.2a, 2.2.1, and 2.3.4 during 2013-2014. Influenza Other Respir Viruses 9: 271-276
  doi: 10.1111/irv.12324
45. Smith GJ, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, Pybus OG, Ma SK, Cheung CL, Raghwani J, Bhatt S, Peiris JS, Guan Y, Rambaut A (2009a) Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459: 1122-1125
  doi: 10.1038/nature08182
46. Smith GJD, Bahl J, Vijaykrishna D, Zhang J, Poon LLM, Chen H, Webster RG, Peiris JSM, Guan Y (2009b) Dating the emergence of pandemic influenza viruses. Proc Natl Acad Sci 106: 11709-11712
  doi: 10.1073/pnas.0904991106
47. Smith JM (1992) Analyzing the mosaic structure of genes. J Mol Evol 34: 126-129
  doi: 10.1007/BF00182389
48. Sourdis J, Nei M (1988) Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree. Mol Biol Evol 5: 298-311
49. Su S, Fu X, Li G, Kerlin F, Veit M (2017) Novel Influenza D virus: epidemiology, pathology, evolution and biological characteristics. Virulence 8: 1580-1591
  doi: 10.1080/21505594.2017.1365216
50. Suzuki Y (2010) A phylogenetic approach to detecting reassortments in viruses with segmented genomes. Gene 464: 11-16
  doi: 10.1016/j.gene.2010.05.002
51. Svinti V, Cotton JA, McInerney JO (2013) New approaches for unravelling reassortment pathways. BMC Evol Biol 13: 1
  doi: 10.1186/1471-2148-13-1
52. Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol 12: 823-833
53. van Ravenzwaaij D, Cassey P, Brown SD (2018) A simple introduction to Markov Chain Monte-Carlo sampling. Psychon Bull Rev 25: 143-154
  doi: 10.3758/s13423-016-1015-8
54. Vijaykrishna D, Poon LL, Zhu HC, Ma SK, Li OT, Cheung CL, Smith GJ, Peiris JS, Guan Y (2010) Reassortment of pandemic H1N1/2009 influenza A virus in swine. Science 328: 1529
  doi: 10.1126/science.1189132
55. Villa M, Lassig M (2017) Fitness cost of reassortment in human influenza. PLoS Pathog 13: e1006685
  doi: 10.1371/journal.ppat.1006685
56. Virk RK, Jayakumar J, Mendenhall IH, Moorthy M, Lam P, Linster M, Lim J, Lin C, Oon LLE, Lee HK, Koay ESC, Vijaykrishna D, Smith GJD, Su YCF (2020) Divergent evolutionary trajectories of influenza B viruses underlie their contemporaneous epidemic activity. Proc Natl Acad Sci USA 117: 619-628
  doi: 10.1073/pnas.1916585116
57. Wan XF, Wu X, Lin G, Holton SB, Desmone RA, Shyu CR, Guan Y, Emch ME (2007a) Computational identification of reassortments in avian influenza viruses. Avian Dis 51: 434-439
  doi: 10.1637/7625-042706R1.1
58. Wan XF, Chen G, Luo F, Emch M, Donis R (2007b) A quantitative genotype algorithm reflecting H5N1 Avian influenza niches. Bioinformatics 23: 2368-2375
  doi: 10.1093/bioinformatics/btm354
59. Wan XF, Ozden M, Lin G (2008) Ubiquitous reassortments in influenza A viruses. J Bioinform Comput Biol 6: 981-999
  doi: 10.1142/S0219720008003813
60. WHO/OIE/FAO H5N1 Evolution Working Group (2008) Toward a unified nomenclature system for highly pathogenic avian influenza virus (H5N1). Emerg Infect Dis 14: e1
61. WHO/OIE/FAO H5N1 Evolution Working Group (2009) Continuing progress towards a unified nomenclature for the highly pathogenic H5N1 avian influenza viruses: divergence of clade 2.2 viruses. Influenza Other Respir Viruses 3: 59-62
  doi: 10.1111/j.1750-2659.2009.00078.x
62. WHO/OIE/FAO H5N1 Evolution Working Group (2012) Continued evolution of highly pathogenic avian influenza A (H5N1): updated nomenclature. Influenza Other Respir Viruses 6: 1-5
  doi: 10.1111/j.1750-2659.2011.00298.x
63. Wu A, Su C, Wang D, Peng Y, Liu M, Hua S, Li T, Gao GF, Tang H, Chen J, Liu X, Shu Y, Peng D, Jiang T (2013) Sequential reassortments underlie diverse influenza H7N9 genotypes in China. Cell Host Microbe 14: 446-452
  doi: 10.1016/j.chom.2013.09.001
64. Xing G, Gu J, Yan L, Lei J, Lai A, Su S, Zhou J (2016) Human infections by avian influenza virus H5N6: Increasing risk by dynamic reassortment? Infect Genet Evol 42: 46-48
  doi: 10.1016/j.meegid.2016.04.009
65. Yin R, Zhou X, Rashid S, Kwoh CK (2020) HopPER: an adaptive model for probability estimation of influenza reassortment through host prediction. BMC Med Genomics 13: 9
  doi: 10.1186/s12920-019-0656-7
66. Yurovsky A, Moret BME (2011) FluReF, an automated flu virus reassortment finder based on phylogenetic trees. BMC Genomics 12: S3
  doi: 10.1186/1471-2164-12-S2-S3
Proportional views

Figures(2) / Tables(1)

PDF

Article Metrics

Article views(4084) PDF downloads(16) Cited by(0)

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

HTML

Introduction

Influenza virus is a negative-sense single stranded RNA virus which belongs to the family of viruses known as Orthomyxoviridae. According to the ICTV (https://talk.ictvonline.org/taxonomy/), there are four types of influenza viruses: A, B, C and D. Human can be infected only with influenza A, B and C viruses, while influenza D virus primarily affects cattle (Su et al. 2017). Influenza A and B viruses are responsible for the worldwide pandemics and the seasonal epidemics which can cause tremendous loss of human lives and social economy. The World Health Organization (WHO) estimated that approximately 3 to 5 million cases of severe illness resulted from seasonal influenza epidemics, and that about 290, 000 to 650, 000 people died of the respiratory illness worldwide (https://www.who.int/en/news-room/fact-sheets/detail/influenza-(seasonal)).

Progress in Computational Identification of Influenza Virus Reassortments

Challenge and Future Prospects in Computational Identification of Influenza Virus Reassortments

Optimal Processing of Big Data

Both influenza virus genomic data and influenza virus-re-lated data have been exploded in the current big data era. Therefore, one of the challenges for researchers is how to process the influenza virus-related big data to improve the reliability of the influenza virus reassortment identification. Reasonable data preprocessing can bring several advan-tages to the computational reassortment identification. The volume of the data is decreased by removing the redundant information, which can reduce the computation time and the hardware requirements. For example, as indicated above, most phylogenetic tree-based methods are unavail-able for enormous amount of viruses. To address this sit-uation, a set of rational criteria, which eliminate the redundant strains sharing a close phylogenetic relationship, are needed urgently. An approach is to process the data by integrating the epidemiologic information and the pairwise sequence distance between strains. Non-redundant data also decreases the noise in the subsequent analysis with the identified reassortment events. For example, multiple influenza virus reassortants were generated in a reassort-ment event once; however, they were assigned with dif-ferent name or ID in a dataset. For the un-preprocessed data, this reassortment event might be regarded as different events in the computation process, which then influenced the subsequent analysis, such as the reassortment network construction. Lastly, the currently available influenza virus-related data are generated by high-throughput sequencing technologies, which have systematic sequenc-ing errors. Elimination of these errors will improve the accuracy of the reassortment event identification.

The integration of various types of influenza virus data is crucial to improve the reliability of the inferred reas-sortment events. As reviewed above, an effort was made by Yin et al. (2020), which identified the reassortments with features generated from seven physicochemical properties of amino acids. Although this algorithm has several limi-tations, an insight was provided in terms of the data-driven computational identification of influenza virus reassort-ments. Such as the previous studies have indicated that the protein structure can be influenced by the evolution of viruses (Nakajima et al. 2005). Thus, we infer that the combination between protein structure information and evolutionary profiles of influenza viruses can identify the reassortment events more sensitively and accurately. For example, similar to the Villa et al.'s work (Villa and Lassig 2017), an approach is to evaluate the mutations generated in the evolutionary pathways based on the function of different regions of viral proteins, which can be further employed to identify the reassortants from the phylogenetic trees. In summary, the reasonably processed data will guarantee accurate identification of the reassortment events.

The Practicality and Validity of the Reassortment Identification Methods

As shown in Table 1, the update and even the download are no longer supported for most of the algorithms, which mainly results from the limited users of these algorithms. In our opinion, a reassortment identification algorithm should be developed aimed at the researchers from differ-ent fields. The usage for the researchers with non-computer background will be limited by the low practical and low validity of the algorithms. For example, the GiRaF algo-rithm not only needs to be compiled from the source code, but also requires to install a series of dependent libraries. Although the software developed by Dong et al. (2011) has a user-friendly interface, each identification process requires manual operation, which greatly reduces the effi-ciency of the software. Therefore, we suggest that the future reassortment identification algorithms encapsulate an easy-to-use pipeline and a user-friendly interface.

The other key difficulty to improve the identification methods is to define the rational thresholds to evaluate the heterogeneity of multiple gene segments between strains. The thresholds are hard to develop due to the complexity and diversity of the analyzed datasets. In addition, auto-matic estimation of cutoffs based on the analyzed data is also worth trying. For instance, the sequence distance cutoff that is used to recognize the heterogeneity of the intra-subtypes influenza viruses should be distinct from that for the inter-subtypes influenza viruses. Additionally, the development of the computational identification methods can focus on the self-adjusting estimation for the related thresholds. As reviewed above, most reassortment identi-fication methods are recursive, which leads the data structure to changing in each process. Therefore, the opti-mal adjustment for the cutoff is required for these algorithms.

The Benchmarking Dataset

Currently, there is lack of a benchmarking dataset to evaluate the performance of the developed reassortment identification algorithms. The mostly appropriate way is to generate a golden dataset that provides the reassortants, corresponding parental strains and the reassortment geno-mic segments confirmed by experimental methods in the laboratories. However, this work is impeded by the safety and ethical issues. An alternative computational way is to infer precise reassortment event that contain complete reassortment information by reliable data and appropriate methods. For example, the reasonable and effective clas-sification standard of the H5N1 viruses' HA segment can be used, which was proposed by World Health Organiza-tion (WHO), World Organization for Animal Health (OIE), and Food and Agriculture Organization (FAO) (WHO 2008, 2009, 2012; Smith and Donis 2015). In addition, several previous identified reassortment events are credible and include complete information, which are considered as the benchmarks in studies (Lam et al. 2013; Wu et al. 2013). In short, a benchmarking dataset is urgently needed for the development of the computational identification of the reassortments.

Conclusion

In this work, the computational identifications on influenza virus reassortment, which included the identification methods and the related database tools, were summarized comprehensively. In addition, the challenge and future prospects in computational identification of influenza virus reassortments were also illuminated. The reassortment identification methods were generally divided into two categories in terms of the dependence on phylogenetic tree. The phylogenetic tree-based methods recognize the reas-sortment events by investigating the structure incongru-ences among the phylogenetic trees of eight gene segments. Among these methods, manual comparison of the topolo-gies of phylogenetic trees coupled with epidemiologic information was the most commonly used. As the manual identifications were empirical and subjective, some auto-matic reassortment inferences based on phylogenetic trees were developed, which employed the graph theory, the statistics and so on. Although the reassortment events could be identified with these approaches accurately and sensi-tively, which is attributed to the use of the evolutionary history of related influenza viruses, the feasibility of these methods is tremendously limited by the reliability, time and computational cost of the phylogenetic tree construc-tion. In this case, several efforts were made to recognize the reassortants without phylogenetic trees. The phyloge-netic tree-independent methods detected the reassortment events primarily by using the significant differences of strain distances among multiple gene segments, which can be implemented on large amount virus strains with low computational complexity. These distances were based on both the genomic sequences and the physicochemical properties of amino acids. However, the quality of nucleotide and amino acid sequences could greatly influ-ence the identification performance. For the reviewed reassortment identification algorithms, the performance were regrettably not compared because the most algorithms are unavailable and the benchmark dataset is lacking. Therefore, we only summarized the actual use experience of the four obtained software in Table 1. Based on the principles of the algorithms, we also give some suggestions on different application scenarios of these algorithms for both bioinformatic and virological researchers. Firstly, the software developed by Dong et al. is the only algorithm that has a friendly interface, which is more suitable for researchers with little bioinformatics background (Dong et al. 2011). In general, the phylogenetic tree-independent methods are more efficient to identify the reassortments from the large-scale data compared to the phylogenetic tree-based methods which are quite time-consuming when constructing the phylogenetic tree. However, for a small number of genomic sequences (usually about 100 sequen-ces), the phylogenetic tree-based methods infer the reas-sortments more accurately. In addition, the methods proposed by Nagarajan and Kingsford(2008, 2011), Wan et al. (Wan et al. 2007a, 2007b, 2008), Rabadan et al. (2008) and de Silva et al. (2012) can identify the reas-sortment events based on partial genomic segment sequences of influenza viruses. The amino acid sequences of genomic segments can be processed by either FluResort or HoPER, and HoPER is more suitable for inferring the reassortments in the same host.

On the other side, as an increasing number of studies on identifying influenza virus reassortments, two databases i.e. FluGenome and FluReassort were developed, providing valuable information related to the influenza virus reas-sortments. In summary, a universal and valid computa-tional method for reassortments identification doesn't exist recently. The most appropriate scheme can be designed, which depending on all information of the analyzed data, such as the amount of strains, the diversity of strains and so on. We hope this review can serve as a guide to reasonably identify the reassortments for diverse influenza virus datasets.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31801101 to X.D., 31671371, 32070678 to T.J.); the CAMS Initiative for Innovative Medicine (CAMS-I2M, 2016-I2M-1-005, 2020-I2M-2-003 to T.J.).

Compliance with Ethical Standards

Conflict of interest The authors declare that no competing interests exist.

Animal and Human Rights Statement

This article does not contain any studies with human or animal subjects performed by any of the authors.

Figure (2) Table (1) Reference (66) Relative (20)

Classification	Method	Principle	Accessibility	Compiling environment	Using experience	Required data	Limitation	References
Phylogenetic tree-based	FluReF	A bottom-up research	Source code	Written by C ++ on Linux system	The test dataset containing 1050 strains spent 10 s in total	Complete genome of influenza virus	High computational complexity	Yurovsky and Moret (2011)
	Villa et al.	Core mutations	Source code	Written by both C++ and python on Linux system	The mock test dataset containing 7477 strains with the 290 bp simulated genomic sequence took more than 5 days in total	Complete HA and NA sequences	Limited to the reassortment identification of the HA and NA segments	Villa and Lassig (2017)
	FluResort	Identities of predicted protein	Web invalid	Written with ANSI/ISO standard C ++ on both Windows or Linux systems	Not available	Viral protein sequences and mass spectral data of these proteins	Limited to the HA, NA, NP and M1 proteins, and the mass spectral data with high-resolution was required	Lun et al. (2012)
	Nagarajan et al.	Enumerating maximal bicliques	Not supported	Not supported	Not available	Genomic segments of influenza virus	High computational complexity	Nagarajan and Kingsford (2008)
	GiRaF	Graph theory	Source code	Written by C ++ on Linux, Mac or Windows systems	The test dataset containing 35 strains took about 5 s in total	Complete genome of influenza virus	High computational complexity	Nagarajan and Kingsford (2011)
	Suzuki et al.	Topologies of quartet trees	Not supported	Not supported	Not available	Complete genome of influenza virus	High computational complexity	Suzuki (2010)
	Dong et al.	Genotype profile	IVEE soft	Written by both C ++ and python on Windows system	Each complete genome took 3 about seconds	Complete genome and the genotype information	It had limitations when inferring intra-subtype reassortments within the same host	Dong et al. 2011)
Phylogenetic tree independent	Wan et al.	Network module; MST	Not supported	Not supported	Not available	Genomic segments of influenza virus	Not suitable for short sequences	Wan et al.(2007a, 2007b, 2008)
	Rabadan et al.	Hamming distance	Not supported	Not supported	Not available	Genomic segments of influenza virus	The assumption of equal mutation rate among segments may not always hold	Rabadan et al. (2008)
	Silva et al.	Genetic distance	Not supported	Written by Ruby script using bioruby on a Debian Linux server system	Not available	Genomic segments of influenza virus	The performance of this algorithm will be influenced by the sample bias significantly	de Silva et al. (2012)
	HoPER	Host tropism	Not supported	Not supported	Not available	The full-length amino acid sequences of all genomic segments	It is difficult to identify the reassortments between different hosts	Yin et al. (2020)

生物信息学在流感重配识别中的进展与挑战

摘要

Progress and Challenge in Computational Identification of Influenza Virus Reassortment

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

通讯作者: 陈斌, bchen63@163.com