Classification of viruses | Viral genome structure and gene expression

Classification of viruses | Viral genome structure and gene expression

Viruses are electron microscopic parasites that replicates only inside the living cells of a host body. Classification of viruses is an important tool to study of viruses in deep.

Although viruses are similar to living organisms in some aspects, they are not considered as “living beings” as they cannot reproduce outside of a viable host cell. Viruses are only able to reproduce (replicate) by commandeering the host cell replication/reproductive apparatus and making it to reproduce the virus’s structural components instead. Thus, a virus cannot function or reproduce outside of a host cell.

Classification of viruses

Classification of viruses can be group as following;

  1. Group I – DNA viruses
  2. Group II – ss DNA viruses
  3. Group III – ds RNA viruses
  4. Group IV – ss(positive sense) RNA viruses
  5. Group V – ss(negative sense) RNA viruses
  6. Group VI – ss(positive sense) RNA viruses (reverse transcribe in to DNA)
  7. Group VII – ds DNA (reverse transcribing stage)

This article will be concentrating on following aspects under each viral group;

  • Characteristics and general structure of the genome
  • Structure and classification of several important viral species of the group
  • Viral gene expression

The viral life cycle

Most viruses are species specific. They only infect a narrow range of animals, plants, bacteria or fungi. Usually a viral infection occurs when a virus enters to host cell either;

  • Through a physical breach (a cut on the skin)
  • Direct inoculation (Ex: a mosquito bite)
  • Direct infection of the surface itself (Ex: inhalation to trachea)

A virus can gain access to a possible susceptible cell only after it enters to a viable host. Life cycle of a virus is composed of following steps;

  1. Viral entry
  2. Viral replication
  3. Viral shedding
  4. Viral latency

Viral entry

Before entering in to a possible host cell, a virus must find a way to attach on to that particular cell. In this sense, a virus faces a huge hurdle which is the thermodynamics of diffusion. As neutrally charged objects do not clump together naturally, viruses follow an alternative method for this purpose by reducing the cellular proximity of the host cell, which is known as Attachment or adsorption. Receptors in the viral envelope become connected with the Complementary receptor proteins in the cellular membrane of the susceptible host cell.

This particular attachment forces the two membranes to be in mutual proximity allowing the further interactions between the membrane proteins to occur. This is also the first requisite that must be satisfied before a cell to become infected. Satisfaction of this requisite makes the cell susceptible. Viruses that show this behavior include many enveloped viruses such as HIV and Herpes simplex virus. This primary idea extends to viruses that do not contain an envelope.

After the receptor proteins in the viral capsid or the envelope connect to the complementary receptor proteins in the cellular membrane of the host, it must find a way to enter the host cytoplasm through the phospholipid bilayer or the cytoplasmic membrane. Viruses overcome this challenge either via;

1. Membrane fusion or the Hemi-fusion state

2. Endocytosis

3. Viral penetration

The fact that which one of the above methods is used by a virus depends on the viral species. Viral entry and infection can be visualized real-time by using Green Fluorescence Protein (GFP). Once a virus enters to cell, replication is not immediate and indeed takes some time; possibly from seconds to hours.

Entry via membrane fusion

This is only possible with the enveloped viruses. After receptor proteins in the viral envelope binds with the complementary receptor proteins of the cell membrane of the host, there may be secondary receptors present that initiate the puncturing of or the fusion with the cell membrane. The viral envelope blends (fuse) with the cellular membrane of the host following attachment and thus emptying the bare-virus (envelope less) in to the cytoplasm.

Ex: HIV, Herpes simplex virus

Entry via Endocytosis

This method of entry is used only by the viruses that do not have an envelope. In this method, virus tricks the cell pretending to be a harmless nutritional particle and thus the cell, which naturally takes in resources by attaching them on to the surface receptors and bring them in, will engulf the virus. When the entry is successful, the virus must break out of the vesicle by which it was taken up in order to gain access to the cytoplasm.

Ex: Poliovirus, Hepatitis C virus, Foot and Mouth disease virus

Entry via Genetic injection

This is exhibited only by the viruses which only require its genome to cause an infection. (Ex: Most positive sense, single stranded RNA viruses). (No other viral structures are required. Ex: enzymes). The virus attaches itself to the host cell via binding to the receptors on the cellular membrane and injects only its genome in to the host cytoplasm.

Ex: Bacteriophages

Viral replication

Viral replication is the formation of biological viruses inside of the host cell during the process of infection. Viral replication can be summarized to two steps.

1. Replication of the viral genome

2. Packaging of the virus

After these two major steps, the resulting new viruses can continue infecting new hosts. Obviously, as a result of this viral replication, or in other words, reproduction; the survival of the viral species is ensured. Replication of viruses varied greatly and t depend on the genes involved in them. Replication of some viruses, especially DNA viruses, occurs within the nucleus whereas replication of other viruses occurs in the cytoplasm.

Class I: Double stranded DNA viruses

This type of viruses (most, but not all) must enter the nucleus of the host cell for the purpose of replication. Some ds DNA viruses depend on host cell polymerases to replicate their genomes where others encode their own replication factors. However, genome replication of almost all the ds DNA viruses is dependent highly on the cell cycle as they require a cellular state which is permissive to DNA replication.  The virus may induce the cell to forcefully undergo a cell division which may lead to transformation of the host cell and ultimately; a Cancer.

Genome replication of these viruses occurs using DNA-dependent DNA polymerase enzymes. This class can be further divided to two on the basis of the location of genome replication within the host cell.

1. Replication is exclusively nuclear

Ex: Family Adenoviridae, Polyomaviridae, Herpesviridae and Papillomaviridae Replication of the genome of these viruses is relatively dependent on the cellular factors

2. Replication is exclusively cytoplasmic

Ex: Family Poxviridae

These viruses acquired all the necessary factors for the transcription and replication of their genomes and thus largely independent from the cellular machinery except for the need of host’s ribosomes.

Family Adenoviridae

Members of the family Adenoviridae are medium sized (90-100nm), non-enveloped viruses with Icosahedral nuclear capsids with double stranded DNA genomes. They have a broad range of vertebrate hosts. In humans 57 different serotypes have been found which causes from mild respiratory illnesses in young children to life-threatening multi-organ infections in people with compromised immune systems.


Adenoviruses represent the largest non-enveloped viruses. Because of their size, they are able to transport via Endocytosis. The viral capsid also contains spikes at the base of each pentose base which enables them to attach to the various receptors in the host cellular membrane

classification of viruses - Adenoviridae


An Adenovirus have a linear, non-segmented, Double stranded DNA genome of about

30-38kbp in size. This theoretically enables the virus to carry 30-40 genes. Although the genome is comparatively larger than that of other virus families in the class I, these are simple viruses that show high dependence on the host cell for the survival and replication.

These viruses have a terminal 55kDa protein associated with each 5’ end of the linear ds DNA which are used as primers in replication and ensures the terminal genes are adequately replicated.

Viral gene expression

Gene expression of the Adenoviruses is divided in to two phases by the DNA

transcription process.

  • Early phase
  • Late phase

In both times a primary transcript that is alternatively spliced to generate monocistronic mRNAs. And also, mRNAs should be compatible with the host cell’s ribosomes are generated, and allowing for the product to be translated. Expression of the early genes results in encoding mainly non-structural, regulatory proteins. These proteins will;

  1. Alter the expression of host cell proteins that are necessary in DNA replication.
  2. Activate other virus genes (Ex: virally encoded DNA polymerase).
  3. Prevent premature death of the infected host cell by the host immune defen (Ex: via blockage of inferno activity, apoptosis etc.)

Once early genes have liberated adequate viral proteins, replication machinery and replication substrates, replication of the virus genome can occur. A terminal protein that is covalently bonded to the 5’ end of the DNA; act as the primer for the replication. The viral DNA polymerase enzyme then uses a strand displacement mechanism to replicate the genome.

Adenoviruses cause respiratory illnesses, common cold and conjunctivas (pink eye), croup and bronchitis in humans.

Ex: Human mastadenovirus C

Family Poxviridae

Family Poxviridae contains viruses that can infect both vertebrates and invertebrates. There is no order assigned for the family Poxviridae.


Poxviridae contain largest viruses among all virus groups. They have linear, non- segmented, double stranded DNA which is approximately about 205kbp in size. The two complementary strands of the DNA are joined.

Poxviridae viral particles (virions) are generally covered by an envelope (external enveloped virion-EEV).

Viral gene expression

Genome replication and gene expression of poxviruses are almost independent from the cellular mechanisms except for the requirement of the host cell ribosomes in their protein synthesis. Poxviruses genome encodes numerous enzymes involved in;

These enzymes are packed in a virus particle (contain about 100 enzymes) enabling the replication and transcription to occur within the host cell cytoplasm without entering to the nucleus after infection by almost totally under the control of the virus.

The synthesis of the poxvirus mRNA (transcription) begins before the genome is uncoated. Transcription is initiated by the virion associated proteins and is catalyzed by virion associated DNA-dependent RNA polymerase enzyme. This enables the replication and expression of the viral genome to be totally cytoplasmic whereas the particular process of other double stranded DNA viruses occurs within the host nucleus as a result of their dependence on host’s DNA-dependent RNA polymerases for the transcription process. Genomes of these viruses can be divided in to three sets depending on the transcription process. That is, early genes, intermediate genes and late genes. Functions of the gene products (proteins) of the above genes can be summarized as follows;

  • Early gene proteins : Completes the transcription process and initiate the replication of the genome. And allow the transcription of the intermediate genes.
  • Intermediate gene proteins : Allow the transcription of the late genes.
  • Late gene proteins : Structural proteins

Family Polyomaviridae and Papillomaviridae


Papillomaviruses and polyomaviruses contain circular, double stranded DNA of approximately 5kbp in size. The genomic organization of these viruses has evolved to pack maximum information (six genes) to minimum space (5kbp). This is achieved by the use of both strands and overlapping genes.

Class II: Single stranded DNA viruses

This group contains most of the viruses found in sea water, freshwater, sediments, terrestrial, extreme, metazoan-associated and marine microbial mats. Therefore, these viruses are known as “environmental viruses” and they are belonging to the family Microviridae. However vast majority of these viruses are yet to be studied and assigned to genera and higher taxa. Families of this group are assigned on the basis of;

  • Nature of the genome (circular or linear)
  • Host range

Family Parvoviridae

The Family Parvoviridae contains smallest known viruses and most environment resistant viruses. They are found to be affecting vertebrates and arthropods. They are mainly un- enveloped and have icosahedral nuclear capsids. Interestingly enough, parvoviruses are the only viruses that affect humans being single stranded DNA viruses. Family Parvoviridae has been divided in to two sub families, Parvovirinae (vertebrate viruses) and Densovirinae (invertebrate viruses). The sub family Parvovirinae contains the genus Dependovirus, which is also known as replication defective virus. Species of this genus can only replicate when the host is co-infected with a helper virus. Other Parvoviruses that do not require helper viruses are known as autonomous parvoviruses.


Dependoviruses depend on the help of a helper virus for their genome replication as mentioned earlier. Most of the time, the helper virus is an adenovirus. But other DNA viruses such as Herpes viruses can also act as helpers. In some cases, occasionally some Dependoviruses may replicate in the absence of a helper virus under some circumstances.

Dependoviruses are valuable gene vectors. They are used to introduce new genes to cell cultures for mass production of important proteins and also being investigated as possible vectors to introduce genes in to the cells of patients for the treatment of various genetic diseases and cancers. And also, importantly, Dependoviruses are not known to cause any disease.


Parvoviruses have genomes composed of Linear, single stranded DNA in the size of 4-6kbp.

At the end of the DNA molecule there are a number of short complementary sequences that can base pair to form a secondary structure. Some types of Parvovirus genomes have sequences at their ends called as Inverted terminal repeats (ITRs), that the sequence at one end is;

  • complementary to,
  • and in the opposite orientation to;

the sequence at the other end. As the sequences are complementary, the ends have identical secondary structures. Other parvoviruses have unique sequence and therefore a unique secondary structure, at each end of the DNA.

classification of viruses

During replication, parvoviruses with ITRs generate and package equal number of (+) and (-) strands of DNA, while most viruses with unique sequences at the termini do not. The percentage of virions containing (+) DNA and (-) DNA therefore may vary with different viruses.

In a (-) DNA, the genes for non-structural proteins are towards the 3’ end and the structural protein genes are towards the 5’ end.

classification of viruses

Viral gene expression

The small genome of a parvovirus can only encode a few proteins. Therefore the virus has to depend on its host cell (or another virus) to provide important proteins. Some of these proteins (a DNA polymerase and other important proteins in DNA replication) are only available in the S phase, when DNA synthesis takes place. This restricts the replication of parvoviruses to the S phase unlike other large DNA viruses such as Herpes simplex which can replicate in any phase of the host cell cycle as they encode their own replication factors.

Replication of the viral genome occurs in the host cell nucleus. In the nucleus, the single stranded DNA of the virus is converted to a double stranded DNA by a Host cell DNA polymerase. The ends of the genome are double stranded as a result of base pairing and at the 3’end the –OH group act as a primer to which the enzyme binds.

Transcription occurs as the cell RNA polymerase II enzyme transcribes the viral genes. In the transcription and translation of the viral genome, cell transcription factors play a major role.

The primary transcripts undergo various splicing events to produce two size classes of mRNA. The larger mRNA encodes the non-structural proteins and the smaller mRNA encodes the structural proteins. The non-structural proteins are phosphorylated and play role in the control of gene expression and in DNA replication.

After virion assembly conversion of the ss DNA genome to ds DNA, the DNA is replicated by a mechanism called “rolling hairpin replication”.

Pro-capsids are constructed from structural proteins and each is filled by a copy of the virus genome, either a (+) DNA or a (-) DNA as appropriate. One of the non-structural proteins act as a Helicase enzyme to unwind the double stranded DNA so that a single strand can enter the pro-capsid.

classification of viruses
Genome organization of Parvoviruses

RNA viruses

An RNA virus is a virus that uses RNA as its genetic material. The genome can be Double stranded, single stranded (+) sense, or single stranded (-) sense. Notable human diseases caused by RNA viruses include Ebola hemorrhagic fever, SARS, Hepatitis C, Influenza etc.

Normally RNA viruses pose higher mutation rates than DNA viruses. This is because viral RNA polymerase lack proof reading ability of DNA polymerase. Due to this reason, producing vaccines for RNA viruses is difficult. Apart from this, most of the mutations are not favorable for the virus as some genes of RNA viruses are important in viral replication cycles and thus a particular mutation thus could not be tolerated. For instance, the region of the Hepatitis C genome which encodes for the core protein is highly conserved because it contains an RNA structure involved in an internal ribosome entry site.

Mutation rate of RNA-dependent RNA polymerase is around 1 to 10000. Therefore, to minimize the mutations during transcription process, RNA viruses have to restrict their genome to be within approximately 10000 nucleotides that is 10kbp. (Have you heard of it before?)

According to the modern ICTV classification, RNA viruses are classified to Class III, IV and V.

Class III: Double stranded RNA viruses

Double stranded RNA viruses are diverse group that vary widely in,

  • Host range (Human, Animals, plants, Bacteria)
  • Genome segment number (one to twelve)
  • Virion organization

There are several families in this class. But among all of them, family Reoviridae is the most diverse family.

Family Reoviridae

Icosahedral viruses with double stranded RNA genomes isolated from Respiratory tract and Enteric tract of humans and many animals and with which no disease could be associated (Orphan), became known as Reo viruses. A large number of similar viruses have been found in many animals, fungi and plants and many of them are associated with various diseases. But the original name Reoviridae has been preserved and has been incorporated in to the names of several genera within the family.

An interesting fact is that most of the Plant infecting Reoviruses spread among the plants through  insect  vectors.  These  viruses  actively  replicate  in  both  the  plant  and  the  insect, generally causing the disease to the plant but little or no harm to the infected insect.

Our main focus will be Rotaviruses which has been the subject to intensive study as they are amongst the most important agents of gastroenteritis in human and animals.

Genome structure

Members of the family Reoviridae have Double stranded RNA genomes which are segmented to approximately to 10, 11 or 12 segments. Each segment is transcribed in to an independent mRNA by virion transcriptase. Most of these mRNAs are monocistronic.

Viral gene expression

The DS RNA is never completely uncoated.  This is to prevent the activation of the antiviral state by the host cell in response to ds RNA genome. Viral RNA-dependent RNA polymerase transcribes each DS RNA segment in to individual mRNAs. In the transcribing process, only (-) strand is used from each ds RNA molecule, result in synthesizing (+) sense mRNA, which are capped inside the core. These mRNAs are Trans located to the cytoplasm where they are translated. This is also known as the extrusion of mRNA.   (Reovirus core contain all the enzymes needed for transcription and capping and further proteins are produced by leaky scanning and protein processing.) The replication and transcription of all RNA viruses are completely cytoplasmic.

Single stranded RNA viruses

Single stranded RNA viruses can further be classified according to their sense or polarity of the RNA strand as positive sense, negative sense or ambisense.

A positive sense viral RNA is similar to mRNA and can be directly translated by the host cell ribosome system. A negative sense viral RNA is complementary to mRNA and thus should be converted to a positive strand by an RNA polymerase before translation. Due to this, purified RNA of a positive sense RNA virus can still be infectious although it is not infectious as the whole virus particle. But purified RNA of a negative sense RNA virus will not remain infectious as it should be transcribed in to positive sense RNA before translation. An ambisense RNA virus resembles the negative sense RNA viruses, except for that they also translate genes from the positive strand.

Single stranded RNA genomes vary in size from those of Picornaviruses which are approximately 8kbp in size to those of coronaviruses which are approximately 30kbp. The ultimate size of single-strand RNA is due to the fragility of RNA or the Tendency of long strands to break.

Class IV: Single stranded (+) sense RNA viruses

In all (+) sense single stranded RNA viral genomes, there are untranslated regions (UTR) in the 5’ end of the RNA strand which do not encode any protein and shorter UTRs at the 3’ end. These regions are functionally important in virus replication and are thus conserved inspite of the pressure to reduce the genome size. Both ends of (+) stranded eukaryotic virus genomes are often modified, the 5’ end by a small, covalently bonded protein or a methylated nucleotide cap structure, and 3’ end by polyadenylation. These signals allow viral RNA to be recognized by the host cell and to function as mRNA.

These viruses can be sub-divided in to two groups depending on their gene expression strategy;

  • Viruses with sub-genomic mRNA and poly protein strategy.
  • Viruses with Polycistronic mRNA strategy.

Viruses with sub genomic RNA and Poly-protein strategy

As viral RNA is (+) sense, it is infectious and it act as both viral genome and mRNA. The first 2/3 of the viral genome is translated in to a poly protein. This poly protein is cleaved in to non- structural proteins necessary for RNA synthesis (replication and transcription)

Replication takes place in the cytoplasmic viral factories at the surface of the endosomes. A double stranded RNA genome is synthesized using single stranded (+) sense RNA genome. The double stranded RNA is replicated thereby providing viral mRNAs/ new single stranded (+) sense RNA genomes. Expression of the sub genomic RNA (sgRNA) gives rise to structural proteins.

Alphavirus genomic RNA

Viruses with Polycistronic mRNA

As with all the other viruses in this class, the genomic RNA acts as the mRNA itself. This genomic RNA is directly translated to form a polyprotein product which consequently cleaved to produce the mature proteins.

Order Picornavirales

Picornavirales are an order of viruses with vertebrates, insects and plant hosts. There are five families in this order. All share common features mentioned below;

  • Conserved RNA-depended RNA polymerase.
  • Genome has a protein attached to 5’ end.
  • No over lapping reading frames (only open reading frames within the genome).
  • All RNA are translated in to a polyprotein before processing.

Family Picornaviridae


These are non-enveloped viruses with icosahedral capsids.


Picornaviruses have linear, non-segmented, positive sense, single stranded RNA genome of about 7.2-8.5kbp in size.

The 5’ end of the genome has a longer untranslated region (UTR) of about 600-1200 nucleotides (nt) in length which is important in;

  • Translation
  • Virulence
  • Encapsidation (possibly)

And there is a shorter UTR in the 3’end of about 50-100nt long which is important in the negative strand synthesis during replication.

The rest of the genome encodes for a simple polyprotein of between 2100-2400 amino acids. Both ends of the genome are modified, 5’ end by a covalently bonded small basic protein Vpg (23AA) and the 3’ end by polyadenylation.

Genomic structure and genome organization is a classification criterion for these viruses.

Viral gene expression

Picornaviruses use Polycistronic mRNA strategy of gene expression.  As mentioned above, the whole genome is translated in to a single polyprotein, which undergo auto catalytic cleavage (self-cleavage) consequently to make important structural and non- structural proteins. The process can be represented as below.

Classification of viruses

Order Nidovirales

Family Coronaviridae

Coronaviridae members infect wide range of mammals and birds worldwide. Although theses diseases are mild most of the time, some of the members cause lot severe infections in humans such as Severe Acute Respiratory Syndrome (SARS) and COVID 19. They can also cause enteric infections in very young infants and in rare occasions, neurological syndromes.


These are enveloped viruses which are spherical in shape.


These viruses have linear, non-segmented, positive sense RNA genomes  which are approximately  27-30kbp  in  size  which  is  the  largest  genomes  of  all  RNA  viruses available. The genome contains a Methylated nucleotide cap at the 5’ terminus and polyadenylation at the 3’ terminus of the genome.

Viral gene expression

These viruses use the sub-genomic mRNA with polyprotein strategy in their gene expression.

Starting from the 5’ end, 20kbp segment of the genome is translated first to produce a RNA-dependent RNA polymerase.

The polymerase then synthesizes a full length (-) sense RNA strand which is used as the template for the synthesis of viral sub-genomic mRNA as a ‘Nested set’ of transcripts all with;

  • Identical 5’ non-translated leader sequences
  • 3’ polyadenylation

All sub-genomic mRNAs are monocistronic. In the synthesis of sub genomic mRNAs, following mRNAs are synthesized which encodes for important proteins in the viral life cycle.

  • E1 – trans-membrane glycoprotein
  • E2 – peplomer glycoprotein
  • N – nucleo-protein
  • HE (E3) – hemagglutinin-esterase glycoprotein

By detaching and re-annealing, RNA polymerase complex makes copies of different genes. This feature is specific to order Nidovirales which is “producing nested set of transcripts”.

Class V: Single stranded (-) sense RNA viruses

Viruses with negative sense RNA genomes are generally more complex than viruses with positive senses RNA genomes. The meaning of the negative sense is that their single stranded RNA genome has the opposite nucleotide arrangement to the cell mRNAs. Therefore, in order for it to be translated in to proteins, it has to be copied/transcribed in to a positive sense RNA strand. As any Eukaryotic or prokaryotic cell does not have any mechanism or related enzymes for the transcription of RNA depending on another RNA strand, each negative stranded RNA virus must contain an RNA-dependent RNA polymerase enzyme within itself. Otherwise, the RNA genome of the virus will be biologically meaningless once it is in a host cell.

Possibly because of the difficulties in genome replication, these viruses tend to have larger genomes encoding more genetic information. Purified genomes of these viruses are not infectious and remain effectively inert as they make no sense without their replicase enzyme; that is RNA-dependent RNA polymerase. Some of these viruses are ambisense; a part negative sense and the other part positive sense.

Order Mononegavirales

Family paramyxoviridae

Paramyxoviridae is an important family of viruses which include several important parasitic species. Family paramyxoviridae divides in to two sub families; Paramyxovirinae and pneumovirinae.


These are enveloped viruses and virions can be either spherical, filamentous or pleomorphic.


These viruses have linear, non-segmented, single stranded, negative sense RNA genome of about 15-16kbp in size.

Typically, the genome contains 6-10 genes and Extracistronic regions (non-coding regions) including;

  1. A 3’ leader sequence of 50nt long which acts as a transcriptional promoter.
  2. A 5’ trailer sequence of about 50-161nt long.
  3. Inter-genomic regions between each gene

Each gene contains Transcriptional start and stop signals at the beginning and end, which are transcribed as parts of the gene. Inter-genomic regions include a polyadenylation at the end of the gene which acts as a stop signal for the transcription and inter-genic sequence and a transcription start signal at the beginning of the gene.

Viral gene expression

The viral RNA-dependent RNA polymerase initiates transcription by binding to the 3’ leader sequence of the genomic (-) RNA. RdRp complex transcribes a 5’ tri-phosphate leader (+) sense RNA end stops and restarts transcription from another transcription initiation signal. All RNAs initiated by these signals are capped. As mentioned earlier, at the end of each viral gene there is a transcription stop signal on which the RdRp complex will produce a polyadenylation signal by stuttering on a U stretch before releasing the mRNA.

On Polycistronic mRNAs, the RdRp complex can scan to the next transcription initiation signal and resume transcription of the next gene.

Family Rhabdoviridae


Rhabdoviruses have linear, non-segmented, single stranded negative sense RNA genomes approximately of about 11kbp.

There is a leader sequence of about 50nt at the 3’ end and an untranslated region (UTR) of about 60nt at the 5’ terminus of the viral RNA. The genetic arrangement is similar to that of paramyxoviridae as these viruses also contain a conserved polyadenylation signal at the end of each gene and short inter-genic regions between the genes.

Viral gene expression

Gene expression is similar to that of paramyxoviridae.

Class VI: Reverse transcribing RNA viruses

Family Retroviridae

These are spherical, enveloped viruses that are approximately 90nm in diameter.


These viruses have linear, diploid, single stranded positive sense RNA genomes with 5’ cap and 3’ polyadenylated tail. There are also long terminal repeats (LTRs) at each 3’ and 5’ ends. There is also a Primer binding site (PBS) at the 5’ end & a polypurine tract (PPT) at the 3’ end.

Viral gene expression

Transcription and translation of the genome of these viruses are totally dependent on the host cell. The reverse transcriptase enzyme uses the genomic positive sense RNA as a template for the reverse transcription process. Through this process, viral RNA is converted in to a Proviral DNA which is transported to the host nucleus and integrated in to the host cell genome using the enzyme Integrase.

Once the integration is completed, the Proviral DNA is under the control of the host cell and is transcribed exactly as the other cellular genes of the host. The enzymes used by the virus which are;

  • Reverse transcriptase and
  • Integrase

Are not used by the host cell commonly and thus for a successful infection, the virus must “bring” its own enzymes within the virion.

After the integration is completed, the transcription of the Proviral DNA can be initiated. This process is regulated by transcription RNA polymerase II enzyme, which is a non- viral enzyme (Host cell enzyme)

Transcription of the Proviral DNA results in producing two types of mRNA.

Family Retroviridae

Reverse transcription

Reverse transcriptase enzyme which is used by the virus for the revers transcribing its genomic  positive  sense  RNA  to  a  Proviral  DNA  has  three  sequential  biochemical activities. Those are;

    RNA-dependent DNA polymerase activity

    Ribonuclease activity (Ribonuclease H)

    DNA dependent DNA polymerase activity

These activities are used by the virus to convert its RNA genome to a complementary double stranded DNA (cDNA) which can then be integrated to the host genome, generating long term infections that can be very difficult to eradicate.

The process of revers transcription is extremely error prone and it is during this step that mutations may occur. Such mutation may lead to Drug resistance.

Reverse transcription

The process of reverse transcription can be summarized as follows;

  1. A specific cellular tRNA act as a primer and hybridize to the PBS region on the viral RNA
  2. Complementary DNA then binds to the U5 and R regions of the viral RNA
  3. A domain on the reverse transcriptase enzyme called RNAase H then degrades the 5’ end of the viral RNA which removes the U5 and R regions
  4. The primer then “jumps” to the 3’ end of the viral genome and newly synthesized DNA strands hybridized to the complementary R region on the RNA
  5. The first strand of the complementary DNA is then extended and the majority of the viral RNA is degraded by RNAase H
  6. Once the first strand is completed, synthesis of the second strand is initiated by the viral RNA.
  7. There is another “jump” where the PBS from the second strand hybridizes with the PBS of the first strand.
  8. Both strands are extended further and can be incorporated to the hosts genome by the enzyme Integrase.

Retroviridae members show some special features which make them different than other virus families. Few are indicated below.

  • These are the only positive sense RNA viruses whose genome does not serve as an mRNA after entering to the host.
  • These are the only diploid viruses
  • These viruses cause incurable disease If the virus Integrase germ line tissues, the symptoms will be passed to the next generation.
  • These are the only viruses whose genome requires a specific cellular RNA (tRNA) for replication
  • They are the only viruses whose genome is produced by cellular transcriptional machinery (without participation of any virally encoded polymerases)

Class VII: Reverse transcribing DNA viruses

Family Hepadnaviridae

These are spherical, non-enveloped viruses with small genomes.


These viruses contain partially double stranded (Gapped) DNA genomes consisting of negative strand of 3.0-3.3kbp and a positive strand of 1.7-2.8kbp. These genome sizes may vary between different Hepadnaviruses. And these viruses also contain an RNA dependent DNA polymerase (reverse transcriptase) enzyme within their capsids.

Viral gene expression

  • After the infection, before the initiation of the gene expression process, repairing of the gapped genome is one using host cell DNA polymerase enz
  • After repairing the genome, transcription occurs. In the process of transcription, four major genome transcripts are produced; S, C, P and X. 
Family Hepadnaviridae

Family Caulimoviridae

The genome structure and replication of Cauliflower Mosaic Virus (CaMV), the prototype member of the Caulimovirus genus, is similar to that of Hepadnaviruses although there are some differences.

The CaMV genome consists of a gapped, circular, double stranded DNA molecule of about

8kbp, one strand of which contains a single gap and a complementary strand which contain two gaps.



T.A.BROWN. (2010). GENE CLONING & DNA ANALYSIS (6th ed.). A John Wiley & Sons, Ltd,Publication.

Nalini Chanda, Susan Viselli. (2010). Cell and Molecular Biology (6th ed.). Lippincott Williams & Wilkins, a Wolters Kluwer business.

Schleif, R. (2015). Genetics and Molecular Biology (2nd ed.). The Johns Hopkins University Press Baltimore and London.

Article By,

Pasindu Chamikara – Microbiologist

Leave a Reply