Chapter 8: Transcription




Genes are considered expressed when information contained within deoxyribonucleic acid (DNA) affects the cell’s properties and activities. Ribonucleic acid (RNA) mediates gene expression. Generally, every gene contains two classes of information, one specifies the primary structure of the final product and the other is critical for regulated expression of the gene’s products. Both the timing and the amount of RNA produced are regulated during transcription. Messenger RNA encodes for the amino acid sequence of proteins, and both ribosomal and transfer RNAs directly participate in protein synthesis. Recently discovered microRNAs form yet another level of regulation affecting the stability and translation of the final product.

Gene expression begins with transcription, the DNA-directed synthesis of RNA. In order for an mRNA to be produced, a gene sequence on DNA needs to be identified, along with information necessary for the exact start site. Genes are split into exons and introns and the entire region is initially transcribed. This primary RNA transcript is processed before it exits the nucleus. Once created, mRNA is modified through RNA splicing, 5? end capping, and the addition of a poly(A) tail after which the mature mRNA enters the cytoplasm.


Four distinct types of RNA are known; ribosomal (rRNA), transfer (tRNA), messenger (mRNA), and micro (miRNA), each with its own distinctive structure and function.

Ribosomal RNA

rRNA accounts for approximately 80% of total RNA in the cell and associates with proteins to form ribosomes. Eukaryotes have several different rRNA molecules: 5S, 5.8S, 18S, and 28S. Ribosomes are important during protein synthesis as they contain peptidyl transferase “activity,” an activity catalyzed by ribozymes (Figure 8.1).

FIGURE 8.1.Different types of eukaryotic RNA.

Different types of eukaryotic RNA.

Transfer RNA

tRNA is the smallest of the three RNAs. It functions in the protein synthesis by virtue of its ability to carry the appropriate amino acid and also provide a mechanism by which nucleotide information can be translated to amino acid information through its anticodon.

Messenger RNA

mRNA carries genetic information from DNA to cytosol for translation. About 5% of the total RNA within a cell is mRNA. It is the most heterogeneous in terms of size and carries specific information necessary for the synthesis of different proteins.


miRNAs, like the other RNA molecules, are encoded by genes and are single-stranded RNA molecules about 21 to 23 nucleotides in length. These newly discovered molecules are transcribed but not translated. They function in regulating gene expression by their ability to bind mRNA and to down-regulate the gene expression.


The minimal linear sequence of genomic nucleic acids that encode proteins and structural RNA is termed a gene (Figure 8.2). Gene sequences are written from 5’ (5 prime) to 3?. Eukaryotic genes are composed of coding exons, noncoding introns, and noncoding consensus sequences. Intron’s (and exon?) number, size, location, and sequence differ from gene to gene. Noncoding regions at the 5? end to the first exon are referred to as upstream sequences and those at the 3? end are called downstream sequences.

FIGURE 8.2.Structure of a typical eukaryotic gene.

Structure of a typical eukaryotic gene.

Consensus sequences

Consensus sequences serve as recognition markers and are conserved. They define a potential DNA recognition site and are usually bound by proteins (transcription factors) and other regulatory proteins that recognize a particular sequence.

1. Promoters: Promoters are DNA sequences that select or determine the start site of RNA synthesis. The consensus sequence for promoters typically has the sequence “TATA” (or variations of T and A) and is often located 15 to 30 base pairs (bp) upstream from the transcription start site, called a TATA box (Figure 8.3). Additional sequences that may be required for promoter function include the CAAT box and the GC box. In eukaryotes, proteins known as transcription or basal factors bind to the TATA box and facilitate the binding of RNA polymerase II.

FIGURE 8.3.Promoter elements found upstream to the coding sequences in a gene.

Promoter elements found upstream to the coding sequences in a gene.

2. Splice acceptor and donor sequences: Splice acceptor and donor sequences are one type of consensus sequence found at the 5? and 3? ends of introns. Introns nearly always begin with guanine and uracil (GU) nucleotides and end with adenine and guanine (AG) nucleotides which are preceded by a pyrimidine-rich tract (Figure 8.4). This particular consensus sequence is essential for splicing introns out of the primary transcript.

FIGURE 8.4.Splice acceptor and donor sequences.

Splice acceptor and donor sequences.


Synthesis of RNA from DNA occurs in the nucleus and is catalyzed by an RNA polymerase. RNA differs significantly from DNA in that it is single stranded and contains uracil (U) instead of the thymine (T) found in DNA. Protein-encoding genes produce mRNA as an intermediate to the cytosol for protein synthesis. Regulatory mRNA sequences are important for stability and translational efficiency. These are sequences in the 5? and the 3? ends of the mRNA, called untranslated regions (UTR), and are not part of the final protein product.

RNA polymerases

There are several distinct RNA polymerases in eukaryotic cells. RNA polymerase I synthesizes rRNAs involved in facilitating protein synthesis by the ribosome. RNA polymerase II is responsible for the synthesis of mRNA and miRNAs. RNA polymerase III catalyzes the synthesis of tRNAs.

Several proteins bind to the gene to be transcribed

The reaction catalyzed by RNA polymerase II requires the formation of a large complex of proteins over the start site of the gene. This preinitiation complex is important for accurately positioning the RNA polymerase II on DNA for initiation. This complex consists of general transcription factors and accessory factors.

Regulatory regions

An mRNA-producing eukaryotic gene can be divided into its coding and regulatory regions as defined by the transcription start site. The coding region contains the DNA sequence that is transcribed into mRNA, which is ultimately translated into a protein. The regulatory region consists of two classes of sequences (Figure 8.5). One class is responsible for ensuring basal expression and the other for regulated expression.

FIGURE 8.5.Two types of regulatory sequences.

Two types of regulatory sequences.

1. Basal promoters: Basal promoter sequences generally have two components. The proximal component, generally the TATA box, directs RNA polymerase II to the correct site and a distal component specifies the frequency of initiation (CAAT and GC boxes).

The best studied of these is the CAAT box, but several other sequences may be used in various genes. These sequences determine how frequently the transcription event occurs. Mutations in these regions reduce the frequency of transcriptional starts 10 to 20 fold. Typical of these DNA sequences are the GC and CAAT boxes, so named because of the DNA sequences involved. These boxes bind specific proteins and the frequency of transcription initiation is a consequence of these protein-DNA interactions, whereas the protein-DNA interaction at the TATA box ensures fidelity of initiation.

2. Enhancers and response elements: Enhancers and response elements regulate gene expression. This class consists of sequences that enhance or repress expression and of others that mediate the response to various signals including hormones, chemicals, etc. Depending upon whether they increase or decrease the initiation rate of transcription, they are called enhancers or repressors and have been found both upstream and downstream from the transcription start site. In contrast to proximal and upstream promoter sequences, enhancers and repressors can exert their effects even when located hundreds or even thousands of bases away from the transcription units located on the same chromosome. They also function in an orientation-independent fashion. These regions are bound by proteins (specific transcription factors) that regulate gene expression and are discussed in Chapter 10.

Basal transcription complex formation

Basal transcription requires, in addition to RNA polymerase II, a number of transcription factors called A, B, D, E, F, and H, some of which are composed of several different subunits (Figure 8.6). These general transcription factors are conventionally abbreviated as TFII A, B, etc. (transcription factor, class II gene). TFIID (consists of TATA binding protein [TBP] + 8 to 10 TBP-associated factors), which binds to the TATA box, is the only one of these factors capable of binding to specific sequences of DNA. Binding of TBP to the TATA box in the minor groove causes a bend in the DNA helix. This bending is thought to facilitate the interaction of TBP-associated proteins with other components of the transcription initiation complex and, possibly, with other factors bound to the upstream sequences.

FIGURE 8.6.Formation of the transcription complex requires several proteins in addition to RNA Polymerase II.

Formation of the transcription complex requires several proteins in addition to RNA Polymerase II.

A single-stranded RNA is produced from a double-stranded DNA

Eukaryotic RNA polymerase is a DNA-dependent RNA polymerase as it uses information from DNA to synthesize a complementary sequence. Only one strand of the gene is used as a template for transcription and is referred to as the template strand. The product is a complementary single-stranded RNA. RNA polymerase reads DNA 3?-5? and produces an RNA molecule complementary to it (see Figure 8.6).

Bacterial DNA–directed RNA synthesis is inhibited by the antibiotic rifampin

Rifampin specifically inhibits bacterial RNA synthesis by interfering with the bacterial RNA polymerase. The inhibited enzyme remains bound to the promoter, thereby blocking the initiation by uninhibited enzyme. Rifampin is especially useful in the treatment of tuberculosis. This drug along with isoniazid (an antimetabolite) has greatly reduced morbidity due to tuberculosis.

Retroviruses, such as human immunodeficiency virus (HIV), have an RNA genome

Retroviruses, such as HIV and human T cell lymphotrophic virus, contain reverse transcriptase, an enzyme that copies the RNA genome of the virus into a cDNA. “Reverse” signifies that the biological information flows from RNA to DNA, opposite the usual direction of transfer. Reverse transcriptase mediates the RNA template–dependent information of double-stranded DNA from a single-stranded RNA by an intricate process. The transcribed DNA is integrated into the host cellular genome and is replicated with the host cellular machinery.

AZT and DdI inhibit reverse transcriptase of HIV

Many useful antiviral drugs act as antimetabolites because they are structurally similar to pyrimidine or purine bases. Drugs, such as zidovudine (AZT) and ddl (dideoxyinosine), undergo phosphorylation by host cellular kinases to form nucleotide analogues, which are incorporated into the viral nucleic acids resulting in chain termination. Selective toxicity results because viral enzymes are more sensitive to inhibition by these antimetabolites than mammalian polymerases.


Gene transcription produces an RNA that is larger than the mRNA found in the cytoplasm for translation. This larger RNA, called the primary transcript or heterogeneous nuclear RNA (hnRNA), contains segments of transcribed introns. The intron segments are removed and the exons are joined at specific sites, called donor and acceptor sequences, to form the mature mRNA by a mechanism of RNA processing (Figure 8.7).

FIGURE 8.7.mRNA is transcribed and processed in the nucleus.

mRNA is transcribed and processed in the nucleus.

Addition of a 5? cap

Almost immediately after the initiation of RNA synthesis, the 5? end of RNA is capped by a methyl guanosine residue, which protects it from degradation (by 5? exonucleases) during elongation of the RNA chain. The cap also helps the transcript bind to the ribosome during protein synthesis.

Addition of a poly(A) tail

The primary transcripts contain a highly conserved AAUAAA consensus sequence, known as a polyadenylation signal, near their 3? end. The polyadenylation site is recognized by a specific endonuclease that cleaves the RNA approximately 20 nucleotides downstream. Transcription may proceed for several hundred nucleotides beyond the polyadenylation site, but the 3? end of the transcript is discarded. The newly created 3? terminus, however, serves as a primer for enzymatic addition by poly(A) polymerase of up to 250 adenine nucleotides (Figure 8.8).

FIGURE 8.8.RNA processing reactions.

RNA processing reactions.

Intron removal

Splice sites are present within the gene and delineate the introns. Splice site sequences, which indicate the beginning (GU) and ending (AG) of each intron, are found within the primary RNA transcript. Introns are removed and exons are spliced (joined) together to form the mature mRNA (Figure 8.9). A special structure called the spliceosome converts the primary transcript into mRNA. Spliceosomes comprise the primary transcript, five small nuclear RNAs (U1, U2, U5, and U4/6), and more than 50 proteins. Collectively called snRNPs (pronounced “snurps”), the complex facilitates this process by positioning the RNA for necessary splicing reactions and helps form the structures and intermediates for removal of the intron. The mature RNA molecule now leaves the nucleus by passing into the cytoplasm through the pores in the nuclear membrane.

FIGURE 8.9.mRNA splicing.

mRNA splicing.

Mutations in splicing signals cause human disease

Thalassemias are hereditary anemias that comprise the single most common genetic disorder in the world. The mutations that cause the thalassemias affect the synthesis of either the a or the b chains of globin, causing a decreased production of hemoglobin and, consequently, an anemia. Point mutations can occur within the TATA box or mutations can occur in the splice junction sequences at the intron-exon boundaries.

Some of the splicing abnormalities alter the sequence GT at the beginning of an intron or the AG at the end. Because these sequences are absolutely required for normal splicing, such mutations lead to the loss of b-globin production. In the case of other mutations that affect the consensus region of the donor or acceptor site, there is a reduced ability of the RNA to correctly splice and it will result in decreased but detectable amounts of b-globin.

Chapter Summary

  • RNA polymerase II transcribes protein coding genes.
  • Transcription requires several factors to bind to the regulatory region of the gene.
  • Proximal and distal promoters and other regulatory sequences control gene expression.
  • RNA is transcribed in the nucleus and undergoes processing before entering the cytoplasm.
  • RNA processing reactions include addition of a 5? methyl guanosine cap, a poly(A) tail, and splicing of introns out of the heterogeneous nuclear transcript.