HTML
-
Crimean-Congo haemorrhagic fever (CCHF) is a viral haemorrhagic fever that was first described in Crimea in 1944 and subsequently associated with similar outbreaks that occurred in the Congo. Although CCHF has high mortality in humans, the disease primarily occurs in animals. CCHF is endemic in many countries in Africa, Europe and Asia [5, 6, 15, 21, 33, 37] with outbreaks recently reported in Sudan [3, 4], Kosovo [6, 33], China [12, 28, 36], Russia [19, 21, 37] and India [29]. CCHF is caused by the Crimean-Congo haemorrhagic fever virus (CCHFV), a segmented negative strand RNA virus that is a member of the genus Nairovirus of the family Bunyaviridae [16, 23]. Bunyaviruses consist of three segments:small (S), medium (M) and large (L) that encode the viral nucleocapsid protein (N), the glycoprotein precursor (GPC) and the polymerase protein (P) respectively [2].
There have been multiple reports that have investigated the epidemiology and phylogeny of the virus, but these have generally concentrated on the phylogenetic relationships amongst sequences from a single segment or have studied all three segments but with a limited number of sequences or concentrated on a specific region. In this work we collected all publicly available full length S segments for CCHFV and estimated phylogenetic trees based on the nucleotide alignment. Our trees predicted several major clades that were consistent with previous findings and which supported geographical subdivision. We then generated amino acid alignments and selected multiple entries from AAIndex, the Amino Acid Index database [17, 26] and used these to identify mutations that represented major changes in the physical properties of the consensus amino acid at each site. Most of the clades showed few major amino acid replacements with the exception of the Asia 2 clade which showed large numbers of changes associated with charge, volume, salvation energy and hydrophobicity.
-
All Nairovirus sequences were downloaded from Genbank. 62 Full length S segment sequences were selected and aligned using ClustalX v2.0 [32] and gaps were removed to give a final alignment of 1 461 nt. Trees were estimated with the MEGA5 software package [30]using the Neighbourhood Joining method with the Maximum Likelihood Composite method (Tamura-Nei distance matrix) and uniform rates among sites. Sequence accession numbers and background information are listed in Table 1.
Table 1. Background information of sequences used in phylogenetic analysis and mutation analysis.
To identify sites containing mutations that reflected amino acid changes with significantly different properties, eight specific entries were selected from the Amino Acid Index (AAI) Database [17] that reflected changes in charge, volume, pKa and hydrophobicity. The accession numbers of the selected entries were FAUJ880112-Negative Charge [11], FAUJ880113-Positive Charge [11], FAUJ880114-pK-a value [11], GOLD730102-Residue volume [14], TSAJ990101-Packing Density [34], KRIW790103-Side chain volume [18], EISD840101-Consensus normalized hydrophobicity scale [8], ROSM880105-Hydropathies of amino acid side chains [27]. Each alignment was translated to amino acid and analyzed in turn with each AAI entry. Each sequence was inspected in turn and compared to the consensus for the entire set of CCHFV sequences. For charge entries, any mutation that produced a change in charge from neutral, positive or negative was considered significant. For other entries, the change in a parameter brought about by a mutation at a site was considered significant if
where △abI is the change in amino acid index I when amino acid a mutates to amino acid b.
The alignment was analyzed using a custom java program available from the authors on request.
-
The predicted tree for the S, segment is shown in Fig. 1. Each of the trees exhibit clear geographic subdivision, identifying seven clades that were named Asia 1, Asia 2, Europe 1, Europe 2, Africa 1, Africa 2 and Africa 3 and which are consistent with results from previous studies [2, 4, 5, 13, 15, 16, 22, 28, 29].
-
We next used entries from the AminoAcidIndex database [17] to analyze the alignment and investigate whether there were specific mutations that were more probable, or regions where mutations were more likely to occur. We investigated mutations that produced changes in charge, hydrophobicity and volume and mapped these mutations to the Asia 1, Asia 2, Europe 1, Europe 2, Europe 3, Africa 1, Africa 2 and Africa 3 clades identified in the previous section. The mutations are listed in Tables 2, 3 and 4. The results for Residue volume and Packing Density were identical so only Residue Volume is shown.
Table 2. Amino acid mutations leading to charge change in the alignment as classified by clades defined in Fig. 1.
Table 3. Amino acid mutations leading to significant changes in pKa and hydropathy values in the alignment as classified by clades defined in Fig. 1. The shaded mutations correspond to mutations that occurred outside the main European (Europe 1) and African (Africa 3) clades
Table 4. Amino acid mutations leading to charge change in the alignment as classified by clades defined in Fig. 1. The shaded mutations correspond to mutations that occurred outside the main European (Europe 1) and African (Africa 3) clades.
The most notable result is that in every category the Asia 2 clade appears to contain many more mutations than any of the other clades. Although this clade contains twice as many sequences as the other clades, this still doesn't appear to account for many of the observed differences. For negative charge mutations the numbers of changes were (Africa 3: 9 sequences / 3 mutations, Asia 1: 8 sequences / 1 mutation, Asia 2: 20 sequences / 19 mutations, Europe 1: 10 sequences / 1 mutation). Similarly for the pka index (Africa 3: 9 seqs / 2 muts, Asia 1: 8 seqs / 2 muts, Asia 2: 20 seqs / 22 muts, Europe 1: 10 seqs / 11 muts); hydrophobicity index (Africa 3: 9 seqs / 3 muts, Asia 1: 8 seqs / 1 muts, Asia 2: 20 seqs / 32 muts, Europe 1: 10 seqs / 4 muts); volume index (Africa 3: 9 seqs / 4 muts, Asia 1: 8 seqs / 2 muts, Asia 2: 20 seqs / 13 muts, Europe 1: 10 seqs / 4 muts). Even when the Europe 1 and Europe 2 clades and the Africa 1, Africa 2 & Africa 3 clades were consolidated into single European and African clades respectively, they still contained fewer mutations despite their greater genetic diversity (these additional mutations are highlighted in grey in Tables 2, 3 and 4).
There is no solved structure for the N protein, so it is difficult to determine the significance of these changes, particularly for parameters such as changes in side chain volume. In order to try and identify mutations of possible interest, we next mapped all these changes on to a graphical representation of the alignment. These are shown in Fig. 2 (charge change) and Fig. 3 (pka, hydropathy and volume changes). Again, the greater number of mutations in the Asia 2 clade is clear, but additional features are also apparent. First of all, there is a pair of negative charge mutations that appear to be present in several of the Asia 2 sequences (D127N and N266D). Since the first mutation produces a charge change of +1 and the second produces a change of -1 these two sites may be compensatory. Secondly, the Africa 1 sequences contain two adjacent mutations K262N (charge change -1) and E263G (charge change +1). A similar pair mutation is also present in the Europe 2 sequence, K262N (charge change -1) and D263G (charge change +1) suggesting that these sites also play an important functional or structural role. The schematic for pka, hydropathy and volume changes is more difficult to interpret because these types of changes can occur without producing the same impact as charge changes but it is clear that the Asia 2 clade once again contains more mutations than the other clades. Another interesting feature occurs around AA263 where the Europe and African clades contain a number of mutations that produce significant changes in all three indices.
Figure 2. Graphical representation of positive (green) and negative (red) changes in charge from the consensus sequence. A disproportionate amount of mutations that produce a negative change in charge occur in the Asia 2 clade, with multiple sequence containing the change at two sites (D127N and N266D) suggesting that these may represent compensatory mutations.
Figure 3. Graphical representation of mutations that produced significant changes in pka (orange), hydropathy (blue) and side change volume (red boxes) from the consensus sequence. Sites which contain mutations that change both pka and hydropathy are shown as hatched yellow/blue with a red border if there was also a significant change in volume. Consistent with figure 2, a disproportionate amount of mutations occur in the Asia 2 clade, with multiple changes also mapped to the two sites 127 and 266.
Finally, we mapped all the identified changes on to the predicted tree. These are shown in Fig. 4 (charge change) and Fig. 5(pka, hydropathy and volume changes). The plot identifies site mutations that occur in multiple sequences (vertical lines) and sites that contain mutations that have significant changes in multiple indices (horizontal lines). Sites which occur in multiple sequences and have changes in multiple indices are marked with both horizontal and vertical lines. In Fig. 4, only the Asia 2 clade contains multiple sequences with shared mutations. The compensatory positive/negative mutations in the Senegal sequences (Africa 1-DQ211639 and DQ211640)) are also apparent. In Fig. 4, the Uganda (DQ076413) and Congo (DQ211650) sequences in the Africa 2 clade have two mutations at two different sites that modify all three indices. Other mutations in this Africa 2 clade also modify the same sites in the Africa 1 sequences (Senegal DQ211639 and Senegal DQ211640). The box in the bottom right of the figure that spans the Africa 1, Africa 2 and Europe 2 clades delimit the cluster of mutations that are apparent in Fig. 3 around site AA262 which are shared across the clades and which modify three indices.
Figure 4. Positive and negative charge mutations in figure 2 mapped on to the predicted tree in figure 1. Mutations that occur at the same site in multiple sequences are connected by a vertical line. Sequences that have a mutation that changes both indices, i.e. a mutation that changes the site from positive to negative (or vice versa), are connected by a horizontal line (e.g. sequence GQ337053 contains three mutations at three different sites that produce positive to negative changes in charge). Only the Asia 2 clade contains multiple sequences with shared mutations. The Senegal sequences (Africa 1 -DQ211639 and DQ211640)) share compensatory positive/negative mutations.
Figure 5. Significant pka (yellow), hydropathy (blue) and volume mutations (red) mapped on to the predicted tree in figure 1. Mutations that occur at the same site in multiple sequences are connected by a vertical line. Sequences that have a mutation that changes more than one index at the same site are connected by a horizontal line. For example, the Uganda (DQ076413) and Congo (DQ211650) sequences in the Africa 2 clade have two mutations at two different sites (AA 181 & AA 311) that modify all three indices. Other mutations in this Africa 2 clade also modify the same sites in the Africa 1 sequences (Senegal DQ211639 and Senegal DQ211640). The box in the bottom right of the figure that spans the Africa 1, Africa 2 and Europe 2 clades delimit a set of mutations and sites that are shared across the clades and which modify three indices.
Phylogenetic Analysis
Mutation Analysis
-
The S segment of viruses in the Bunyaviridae family encodes for the nucleocapsid N protein. This protein plays a role in encapsidating the viral RNA to form ribonucleoprotein complexes (RNP). The N protein is also involved in a range of interactions with other molecules [20] including viral RNA [24, 25], viral polymerase, other viral proteins [31], host proteins [31] as well as forming multimers with themselves [1]. Therefore, trying to identify key sites or domains can provide insight into the specific role of the N protein in these various functions.
In this study we have used a bioinformatics approach to analyze an alignment by (ⅰ) estimating the phylogenetic relationship between the sequences, (ⅱ) identifying amino acid changes in each sequence that produce significant modifications to the physical properties of the protein (ⅲ) analyzing these changes with respect to the tree and alignment to investigate whether regions exist where specific mutations are accompanied by compensatory changes elsewhere in sequence. This can be used to gain information regarding sites that share some functional or structural role in the protein.
The most surprising finding in this study is that for all three categories (i.e. charge, hydropathy and volume) the Asia 2 clade had significantly more changes than any other clade in the tree. Although the Asia 2 clade contained twice as many samples as any other clade [20], it seems this alone can not explain the observed differences; even when the much more genetically diverse European and African clades were clustered into single European and African clades they still failed to contain as many mutations.
From this analysis we have identified several sites of interest (AA127, AA181, AA262, AA263 & AA311) that could usefully be investigated experimentally to see their effect on virus fitness. There have been reported mutational analysis studies on the Bunyamwera Orthobunyavirus N gene [7, 35] that identified several key mutations that had detrimental effects on viral replication and fitness. We attempted to relate these findings to our results but the Bunyaviridae genera are too diverse to be interpreted here. However, there have been multiple reports that have identified the role of positively charged amino acids in RNA binding and oligomerization [9, 10, 38] which provides further support to our findings.
Our analysis of CCHFV is our first attempt to perform a mutational analysis of a gene by integrating an alignment, a tree and the AAIndex amino acid database. Given our method achieves reasonable predicts we next plan to analyze a gene for which a solved structure and experimental mutational studies are available.