-
Hepatitis B virus (HBV) infection, a major cause of acute and chronic viral hepatitis, is still a major worldwide public health problem. Approximately 2 billion people have been infected by the virus and about 350 million or more are chronic carriers (WHO Fact sheet no. 204), of which more than 200 million are in China (45). Clinically, HBV infection can persist for the whole life of the patients (45), often leading to severe liver diseases, including liver cirrhosis and one of the most common forms of human cancer-hepatocellular carcinoma (HCC).
HBV belongs to the Hepadnaviridae family and contains a small (ca. 3.2-kb), partially double-stranded, relaxed-circular DNA (RC DNA) genome (Fig. 1A). The small enveloped DNA-containing virus replicates through reverse transcription of an RNA intermediate -pregenomic RNA (pgRNA) (37, 40); this unique process is accomplished by its reverse transcriptase (RT, P protein). Thus, the P protein plays a central role in viral replication. Although the fundamental knowledge of the molecular biology of this highly liver-specific virus has profoundly increased in the past two decades, relatively little is known about the detailed mechanism of the P protein. In this review, we summarize what is current understood about the structure and underlying molecular mechanism of hepadnavirus reverse transcriptase. A thorough understanding of this process is of both theoretical and practical significance for HBV fundamental studies and drug discovery.
Figure 1. HBV genome organization and the predicted structure of the P protein. A: HBV genome organization (with permission from Beck J et al, 2007 World J Gastroenterol). The thick black lines represent the RC-DNA of the HBV genome, with the P protein covalently linked to the 5x end of the (-)DNA/-DNA. DR1 and DR2 are the direct repeats. The open reading frames are indicated by arrows with different colors. The outer blue lines represent the pregenomε RNA (pgRNA), with ε and poly-A at the 5x and 3x end, respectively. B: Domains of HBV and DHBV P protein. Numbers represent aa positions; Y indicates the priming Tyr residue. The red box represents the conserved motif box A. The YMDD motif is also labeled. C: 3D spatial distribution of amino acid conservation in the P protein (modified from Van Hemert FJ et al, 2008 Virology). Individual residues in space-filling models of P are decorated with the appropriate color codes as a measure of local conservation. The YMDD motif in the conserved center of enzymatic activity is indicated by circles and arrows.
HTML
-
All hepadnaviruses encode a multi-functional P protein (15), the open reading frame (ORF) of which covers nearly 80% of the hepadnaviral genome (5) (Fig. 1A). The P protein consists of four domains: a terminal protein (TP) domain, a polymerase/reverse transcriptase (RT) domain, a C-terminal RNase H (RH) domain, and a spacer domain between the RT domain and the TP domain (Fig. 1B).
The TP domain located at the N-terminal of P protein is conserved among all hepadnaviruses but shares no homology with any other sequenced viral RT. A unique feature of hepadnavirus RT is the presence of a tyrosine residue (26, 32, 46) to which the first nucleotide (nt) of (-)-DNA is covalently attached during the replication initiation process (see below). This mechanism is distinctly different from the process generally adopted by other DNA-containing viruses and retroviruses. Therefore, the TP domain is considered to be an indispensable com-ponent (30) of hepadnaviral genome replication; in contrast, the spacer domain is considered to be relatively less important. Genetic studies have revealed that many nucleotides in the spacer domain can be deleted without affecting the replication ability of the P protein (35). However, it is generally speculated that the existence of the spacer domain makes the TP domain more flexible, facilitating the spatial adjacency of the TP to a viral RNA element, which is of great importance and is discussed further below.
The RT and RH domains provide the DNA polymerase activity and thε RNA hydrolysis (RNase H) activity respectively. Bioinformatics and genetic analyses showed that the RT domain and the RH domain of the hepadnavirus RT are homologous to the corresponding domains of other known retroviral RTs (15). Functional information of RT domain currently available is mainly based on drug resistance data and on homology modeling with other RTs of known structure.
No direct structural information on any hepa-dnaviral P protein is currently available (5, 30). The principal reason is that it is extremely difficult to isolate sufficient amounts of pure and soluble P protein due to problems of low expression, instability, and insolubility (18) both in baculovirus-and in procaryotic expression systems, which makes crystallography-based structure resolution infeasible. Thus, these problems have impeded progress in P protein-associated studies. This problem has partly been overcome by cutting the nearly complete RH domain, which has large numbers of hydrophobic amino acids (30), and by adding solubility mediating fusion partners such as GrpE, NusA, GST (14, 15, 38, 39) during the construction of the P protein-containing expression vectors. Our laboratory is presently preparing the P protein from an E.coli expression system and attempting to resolve the structure of the emerging inclusion bodies by adding appropriate reagents in order to raise the amounts of soluble human HBV P protein.
Recently, the first theoretical 3D model prediction of HBV RT was proposed by Van Hemert et al., based upon a protein modeling approach (41). They found that the conserved mosaic patterns of amino acid conservation and variability could be understood in terms of their role in local protein function. Conserved amino acid residues of RT are typically clustered at the catalytic core marked by the YMDD (tyrosine-methionine-aspartate-aspartate) motif (Fig. 1C). However, conformation of the proposed HBV RT structure requires experimental verification.
-
The replication cycle of the hepadnaviral genome can be roughly divided into the following steps: (Ⅰ) conversion of the RC DNA into a covalently closed circular (CCC) DNA; (Ⅱ) synthesis of viral RNAs (including the pgRNA) by the host RNA polymerase Ⅱ; (Ⅲ) subsequent package of the pgRNA and P protein into progeny nucleocapsids; the pgRNA is reverse transcribed by the P protein into progeny formed RC DNA (10, 17; for reviews, see references 5and 30). It is this final step that is the focus of this review.
The reverse transcription of the pgRNA is initiated by protein-priming at a RNA hairpin located near the 5' end of the pgRNA, called ε (Fig. 2A). Firstly, the short ε RNA is specifically recognized by the P protein to form a P-ε (RNP) complex. Secondly, the P protein catalyzes the generation of a 3-to 4-nucleotide-primer by using the bulge structure on ε as template, and this de novo synthesized primer is covalently attached to the tyrosine residue of the P protein (Fig. 2A) (25, 31, 32, 42). Furthermore, the formation of the RNP complex also triggers nucleocapsid assembly, through which both the P protein and pgRNA are packaged (1). Thus, the RNP is the determinant not only for the protein-priming, but also for viral encapsulation. It demonstrates that the P-ε interaction is absolutely essential for the virus life cycle. Additionally, the initiation process cannot be fulfilled without the help of cellular chaperones such as Hsp90, Hsc70 (the constitutive form of Hsp70), Hsp40 and ATP (15, 18-20, 39). Recently, it was found that several rate-limiting steps are involved in the priming-mediated initiation process. Therefore, the structural rearrangement of RT from its stable state (P) to the metastable state (P*) seems to be indispensable for the formation of the RNP complex (Fig. 2A). Stahl et al. (38) studied the conformational changes in the TP domain of the DHBV P protein by using an in vitro reconstitution system combined with limited proteolysis, site-specific antibodies as well as specifically P mutants. Their data indicate that ATP-consuming Hsc70 plus Hsp40 action transiently exposes the C-proximal sites, including site R183 of the TP domain.
Figure 2. Molecular Mechanism of viral P Protein involved in reverse transcription. A: Replication initiation of P protein. The hairpin located at the 5x end of the pgRNA is ε RNA. It is firstly specially recognized by P protein with the help of cellular chaperones. Then, using the ε bulge structure as template, the P protein catalyzes the generation of a 3-to 4-nt primer that is covalently linked to the TP domain. Binding of P protein to ε RNA also triggers the nucleocapsid assembly process. B: Completion of (—)-DNA. The direct-repeat elements DR1, DR2, and DR1* are shown as boxes. The newly synthesized DNA primer is translocated to the 3x proximal DR1* region. The elongation of the (—)-DNA is subsequently completed by the reverse transcriptase activity of the RT domain. Meanwhile, the pgRNA is gradually degraded by the RH domain, except for its capped 5x terminal region including 5x DR1.
Following the process of initiation, the synthesis of (—)-DNA begins. Since the nascent short DNA primer is copied from the 5xε, it is firstly translocated to the DR1* region located at the 3x end of the pgRNA (first template switch) (26, 31, 43). The primer, still linked to the TP domain, is paired with the 5x UUCA motif on the DR1* region (Fig. 2B). The RT domain then catalyzes the elongation of the (—)-DNA by using its reverse transcriptase activity. At the same time, the pgRNA is degraded by the RNase H activity of the RH domain, with the exception of its capped 5x terminal region which includes the 5x DR1 region (5, 29) (Fig. 2B). This short region (about 11 to 16 nt in HBV)(13) is essential for the subsequent synthesis of (+)-DNA. Once the synthesis of (—)-DNA has completed, the replication stage immediately enters the subsequent synthesis of (+)-DNA. For initiation of (+)-DNA synthesis, the remaining RNA primer has to be translocated to the 3x proximal DR2 region (second template switch) (12), and is extended to the 5x end of the (—)-DNA. This additional elongation requires a third template switch; the growing end of the (+)-DNA switches to the 3x terminus of the (—)-DNA. Finally, further extension performs in the nucleocapid and creates a set of (+)-DNA strands of various length until the dNTPs inside the capsid are depleted. One may refer to the detailed information from the relevant literatures (5, 30).
-
As mentioned above, the P-ε interaction is highly specific and absolutely essential for virus propagation. Detailed biochemical studies of the P-ε interaction have been made possible by the development of cell-free systems in the past twenty years. Using the in vitro translated DHBV P protein in rabbit reticulocyte lysate (RL) plus the cognate DHBVε (Dε), Wang et al. reconstituted the specific binding and priming process outside an intact cell (42). This system has since been widely used to investigate the key residues of the P protein as well as the sequence/structure determinants within the ε RNA which are essential for P binding and/or priming. The protein priming process employed by hepadnaviruses requires a precise initiation of DNA synthesis that is guided by the information stored in the sequences and/or structures within ε. Thus, many biochemical studies have focused on searching for sequence and/or structure determinants. Some data obtained from studies of DHBV and avian HBV systems have indicated that the internal bulge and the apical loop appear to be essential for replication initiation (2, 3, 14, 21, 36). Another study has shown that an open, rather than a base-paired, upper stem seems to be beneficial for strong P protein binding (21). This is consistent with very recent NMR data, which showed that the upper stem of Dε is the least stable part of the entire structure (11). It also explains that the rearrangement of Dε conformation during P binding and priming can be realized by very little energy input. On the other hand, using human HBV P protein, Hu et al. determined that approximately 150 amino acid residues from the TP domain and 230 residues from the RT domain are necessary and sufficient for ε binding (14).
Other detailed studies can be summarized as follows: (Ⅰ) Cellular chaperones involved in activation of the P protein. Initial cell-free studies performed by Hu et al. revealed a strict dependence of the RT on cellular chaperones for formation of the P-ε complex (18, 20). However, a clear-cut definition of the associated chaperones was not carried out until the development of pure in vitro reconstitution systems consisting of purified recombinant P protein and isolated chaperones. Hu et al.(19) were the first to report that the in vitro reconstitution of DHBV RT activity is dependent on cellular chaperone proteins Hsp90, Hsp70, Hsp40, Hop and p23. Soon after this, Wang et al. (44) showed that a truncated RT lacking the entire RH domain and part of the RT domain was still able to interact with ε, and protein priming was independent of Hsp90. Other studies show that the Hsp70 and Hsp40 plus ATP are necessary and sufficient to activate the P protein, although to a much lesser degree than the five protein system (4, 39). (Ⅱ) Selection of new antiviral agents, which are expected to significantly inhibit the functions of the P protein. The interaction between the viral ε RNA and P protein represents a highly attractive novel target for intervention of HBV replication. Li et al. (26) recently reported that the endogenous small molecule iron protoporphyrin Ⅸ (hemin) and several related porphyrin compounds showed potent inhibition of RT-ε interaction by targeting the unique TP domain. Addtionally, other strategies such as aptamers has also generated great interest in drugs research due to its numerous merits compared with the traditional compounds (9). Lately, Hu et al. (21) developed an in vitro SELEX (Systematic Evolution of Ligands by Exponential Enrichment) system to select strong Dε binders to the P protein, and the in vivo results demonstrate that the selected priming-deficient aptamer S2 may be potentially used as a decoy for antiviral therapy.
In conclusion, the data of the DHBV P-ε interaction have been greatly enriched due to the development of genetic, biochemical and biophysical approaches (5, 16, 30). However, knowledge of the corresponding P-ε interaction of the human HBV is still far from complete. The most important reason is that the HBV P protein, either in vitro translated in rabbit RL or expressed in bacteria, does not show any DNA synthesis activity when provided with the same DHBV reconstitution conditions (14, 15). However, one recent breakthrough is that Hu et al. have established an in vitro system in which the binding of HBV εRNA to HBV P protein is dependent on the pre-existing chaperones such as Hsp90 etc (14). Although it lacks the subsequent priming reaction, this system provides a platform by which the sequence and the structure determinants for P-ε interaction in the human virus system can be studied. It is presently unclear why the initial RNP complex does not progress into the subsequent priming state. However, the biophysical demonstration of an extremely stable base-pairing in the upper stem of HBV ε, rather than a relatively loose structure, strongly suggests that the subsequent structural rearrangement of the upper stem does not occur (11). A possible cause of this problem might lie in absence of the key factor(s) under in vitro conditions that act(s) on thε RNA rearrangement, e.g. a helicase-type factor that might help in melting the stable upper stem for subsequent interaction with the P protein.
-
It is clear that generation of the new RC-DNA requires cooperation of the two pivotal domains-RT and RH. Here, we will consider these two domains in greater detail. As described above, P proteins share a number of conserved motifs with other retroviral RTs in both the RT and in RH domains. Box A, (Fig. 1B) located at the RT domain, is one conserved motifs. This box contains an invariant bulky residue which is crucial for dNTP versus NTP discrimination in RTs. The equivalent residue in DHBV P protein would be phenylalanine 451 (F451) (6). Studies on polymerase complexes have shown that this residue is part of the dNTP-binding pocket, and its side chain acts as a steric gate preventing the larger ribose moieties of NTPs with their 2x-OH from positioning correctly in the pocket (22, 23). Based on this finding, Beck et al. constructed four F451 mutants, where phenylalanine was substituted with glycine (F451G), alanine (F451A), valine (F451V) and aspartate (F451D) (6). Then they analyzed the abilities of these mutants to utilize dNTPs and NTPs in in vitro priming. They reported that the priming efficiencies with dNTPs decreased with decreasing side chain size, whereas the GTP utilization increased, though the wild-type enzyme was inactive with GTP. In addition, all mutant proteins were competent for RNA encapsidation. Their study clearly demonstrates a architecture similar to other RTs exists for the P protein dNTP-binding pocket, as the function of the discriminatory residue depends on its specific spatial disposition. Recently, Kim et al. also substituted a phenylalanine residue (F436) at the putative dNTP-binding cleft outside box A with a smaller amino acid (Gly or Val) (24), which lead to the mutant P losing its ability to discriminate between incorporating dNTPs or NTPs. Furthermore, some groups have characterized several residues of P protein e.g. L528 (33), P628 (28) and P306 (27) that seem to be important for regulating viral replication efficiency.
Studies from the early 1990s have shown that the hepadnavirus RH domain plays a role in optimizing priming (8) and elongation (7) of (—)-DNA. Potenza et al. assembled a synthetic HBV RH gene coding sequence from 12 oligonucleotides and expressed it in Escherichia coli (34). Based on a structural mode of the enzyme, it was shown that H715, R744 and K745 might be involved in substrate recognition.
-
The HBV RT plays a multitude of fundamental roles in the viral life cycle (26). It carries out two essential enzymatic activities during the conversion process of pgRNA to the progeny DNA, i.e. DNA polymerase activity and RNase H activity. Moreover, the most unusual event is that the RT itself serves a primer, as it harbors the primer tyrosine residue within its unique TP domain.
Since the novel retrovirus-like replication mode of hepadnavirus was discovered, numerous aspects of the replication mechanism have been unveiled by using the transfected cell models and the in vitro biochemical systems (5). However, the progress of RT-related studies is still relatively slow. The principal reason is a general lack of structural information on the viral RT. The problem is further compounded by the absence of suitable in vitro and infection systems, which are prerequisite for exploring the mechanism of the P protein.
Effectively blocking key positions of the P-ε interaction will inhibit viral replication at both pgRNA packaging and pgRNA reverse transcription levels. Hence this represents a highly attractive novel target for intervention by providing a range of different strategies that are distinct from current antiviral strategies, these include: (Ⅰ) Interfering with HBV replication by selection of RNA "decoys". At present, our laboratory is working towards the construction of a variant pool of human HBV ε, aimed at developing an in vitro SELEX to select priming-deficient RNAs with higher binding affinity for P protein than wild-type ε. As decoys, the selected RNAs are likely to directly compete with the wild-type ε but as they don't support priming, they interfere with HBV replication. (Ⅱ) Obstructing key RT domains for P-ε interaction by developing site-specific antibodies and other small molecule inhibitors (such as porphyrins). (Ⅲ) Blocking the binding/activation capacities of the chaperones by developing specific antagonists.
Ultimately, it is obvious that the multi-functional RT plays a vital role in propagation of the progeny virus. Thus, further studies on the HBV RT will undoubtedly provide better opportunities for a comprehensive understanding of the virus, as well as aiding the development of novel high-performance therapeutic strategies that may lead to life-long suppression or even elimination of HBV replication.