When a gene is activated, the gene is transcribed, producing an RNA intermediate. Structural genes are genes that are transcribed to produce messenger RNA (mRNA) molecules. The mRNA molecule is then translated to make a protein product. Nonstructural genes are also transcribed to produce RNA molecules; however, the RNA molecule is not translated and instead functions directly in the cell. These functional RNA molecules are called noncoding RNAs (ncRNAs). Noncoding RNAs include transfer RNA molecules (tRNAs), ribosomal RNA molecules (rRNAs), and the Xist and Tsix RNA molecules discussed in an earlier section.
- What is the difference between a structural and a nonstructural gene?
A. Transcription in Bacteria
Expression of Structural Genes
What factors determine whether a structural gene is expressed; in other words, the gene is activated to make a mRNA molecule? Expression requires the interaction between transcription factor proteins and specific DNA sequences near the gene.
The DNA sequences that regulate the expression of a particular gene include (see figure 9.1):
- Regulatory DNA sequences. Regulatory DNA sequences influence how often transcription starts. These DNA sequences are located, in most cases, adjacent to a gene, but in some cases, regulatory DNA sequences have been found that are far from the regulated gene.
- The promoter sequence. The promoter consists of DNA sequences that determine where transcription starts. The promoter is typically adjacent to the controlled gene.
- The terminator sequence. The terminator consists of DNA sequences that signal termination of transcription by causing the RNA polymerase to dissociate from the DNA.
Transcription produces an RNA molecule that is complementary to the template or antisense strand of DNA. The other DNA strand, the one that forms hydrogen bonds with the template DNA strand, is called the coding or sense DNA strand. The coding DNA strand is identical in sequence to the RNA transcript, except that the RNA molecule contains uracil (U) instead of thymine (T).
- What are the functions of the three DNA sequences that regulate transcription?
- What is the difference between the template and the coding DNA strands?
Transcription in the bacterium E. coli has the following three stages (see figure 9.2):
- Initiation. During the initiation stage of transcription in bacteria, a transcription factor protein called sigma (σ) factor guides the RNA polymerase to the promoter.
- Elongation. During elongation, the RNA polymerase bound to the promoter separates the two DNA strands, forming an open complex. RNA polymerase then reads the template DNA strand while synthesizing a complementary mRNA transcript.
- Termination. The termination stage involves the release of the RNA polymerase and the mRNA transcript from the DNA.
- What is happening during the three stages of transcription?
- What is the function of sigma factor?
Promoter Structure in Bacteria
The bacterial promoter is located upstream of the gene to be transcribed and serves as a docking site for the sigma (σ) factor protein and later RNA polymerase. Important DNA sequence elements within the promoter are numbered in relation to the +1 site, the first nucleotide in the template DNA strand that is transcribed (see figure 9.3). Important DNA sequence elements within the bacterial promoter include the following:
- -35 sequence. The -35 sequence (5’-TTGACA-3’ in the coding DNA strand) allows high transcription rates because it serves as part of the binding site for the sigma (σ) factor protein. The -35 sequence is located approximately 35 base pairs (bp) upstream of the transcription start site (+1 site).
- -10 sequence (Pribnow box). The -10 sequence (5’-TATAAT-3’ in the coding DNA strand) is essential for transcription in prokaryotes because it serves as the second part of the sigma factor binding site. Moreover, the -10 sequence is an AT-rich sequence, promoting the separation of the two DNA strands, a requirement for transcription. The -10 sequence is located approximately 10 bp upstream of the transcription start site (+1 site).
- +1 site. The +1 site is the transcription start site. The nitrogenous base at the +1 site in the coding DNA strand is usually adenine (A). Since the mRNA and the coding DNA strand have the same sequence, the first nitrogenous base in the mRNA is also adenine (A).
Both the -35 and -10 sequences (5’-TTGACA-3’ and 5’-TATAAT-3’, respectively) are consensus sequences, meaning that they are the “average” sequences found when the DNA sequences of many E. coli promoters are compared. Some bacterial promoters are strong promoters, whereas others are weak promoters. The difference between strong and weak promoters largely depends on how closely the promoter DNA sequence in question matches the -35 and -10 consensus sequences. Strong promoters initiate transcription frequently, while weak promoters initiate transcription less frequently.
- What is the function of the -35 sequence?
- What are the two functions of the -10 sequence?
- What is the function of the +1 site?
- What is the difference between a strong and a weak promoter?
Bacterial RNA Polymerase
In E. coli, the RNA polymerase core enzyme is composed of five protein subunits (α1, α2, β, β’, and ω) (see figure 9.4). The two α subunits and the ω subunit function to assemble the enzyme and bind to the DNA sequence to be transcribed. The RNA molecule is synthesized between the β and β’ subunits.
The RNA polymerase core enzyme (α1, α2, β, β’, and ω subunits) associates with a sigma (σ) factor protein to form the RNA polymerase holoenzyme. E. coli makes at least eight different types of sigma factor proteins, depending on the environmental conditions encountered by the cell. For example, the main sigma factor in E. coli is called the housekeeping sigma factor or σ70. σ70 functions to guide the RNA polymerase core enzyme to the promoters of genes that are required for the viability of the E. coli cell in a normal environment. In addition to σ70, there are specialized sigma factor proteins that guide the RNA polymerase core enzyme to survival genes when an E. coli cell encounters stressful environments. These specialized sigma factors include a nitrogen starvation sigma factor (σ54), a general starvation sigma factor (σ38), and a heat shock sigma factor (σ32). The σ factor proteins are example transcription factor proteins.
- Why does E. coli make several different types of sigma factor proteins?
- What is the difference between the RNA polymerase core enzyme and the RNA polymerase holoenzyme?
Transcription Initiation in Bacteria
Transcription initiation in bacteria (E. coli) occurs as follows:
- The RNA polymerase holoenzyme recognizes the promoter via sigma (σ) factor binding to the -35 and -10 DNA sequences. At this stage, the RNA polymerase holoenzyme:DNA complex is called a closed complex because the two DNA strands are still hydrogen bonded together.
- The AT hydrogen bonds within the -10 sequence are broken forming an open complex. The RNA polymerase core enzyme functions as the helicase that separates the two DNA strands at the -10 sequence.
- A short RNA molecule is synthesized beginning at the +1 sequence; however, the RNA polymerase core enzyme is still tethered to σ factor. Sigma (σ) factor is still bound to the -10 and -35 DNA sequences.
- The sigma (σ) factor protein is released, freeing the RNA polymerase core enzyme.
- Once the sigma (σ) factor protein is released, transcription transitions to the elongation phase as the RNA polymerase core enzyme incorporates additional nucleotides at the 3’ end of the RNA transcript.
- Describe the initiation phase of transcription in bacteria.
Elongation in Bacteria
The elongation phase of transcription in bacteria involves RNA synthesis by the RNA polymerase core enzyme (see figure 9.5). The E. coli RNA polymerase core enzyme has the following features:
- The RNA polymerase core enzyme does not require a primer for RNA synthesis (in other words, no 3’-OH group is required to initiate RNA synthesis).
- The RNA polymerase core enzyme has helicase activity, separating the two DNA strands during transcription elongation.
- The RNA polymerase core enzyme reads the template DNA strand in the 3’ to 5’ direction.
- The RNA polymerase core enzyme synthesizes RNA in the 5’ to 3’ direction.
- The RNA polymerase core enzyme catalyzes the formation of a covalent bond between the 3’-OH of the growing RNA strand and the 5’ phosphate group on the incoming nucleoside triphosphate (NTP). The NTPs used by the RNA polymerase core enzyme are ATP, UTP, CTP, and GTP. The NTP molecules are cleaved during transcription, releasing pyrophosphate (PPi) during the RNA synthesis reaction.
- RNA synthesis follows the AT/GC rule except that uracil is found in RNA (in other words, transcription follows the AU/GC rule).
- The RNA polymerase core enzyme does not have proofreading activity (no 3’ to 5’ exonuclease activity). As a result, the mRNA molecule made during transcription sometimes contain mistakes.
- The RNA polymerase core enzyme reforms hydrogen bonds within the two DNA strands after the open complex has passed by. As the RNA polymerase core enzyme rewinds the DNA double helix, the RNA transcript trails behind the core enzyme as a single-stranded molecule.
- What are the similarities between the RNA polymerase core enzyme and the DNA polymerases discussed previously?
- What are the differences between the RNA polymerase core enzyme and the DNA polymerases discussed previously?
- Which protein functions as the helicase for transcription?
- Which molecules provide the energy for transcription?
Transcription of Multiple Genes
Not all genes use the same DNA strand as the template strand. In figure 9.6, genes A and B use the bottom DNA strand as the template strand for RNA synthesis, because the promoter is located to the left of the gene. Alternatively, gene C uses the top DNA strand as the template strand, as the promoter is located to the right of the gene. Genes A and B are transcribed left to right, while gene C is transcribed right to left.
Rho (ρ)-Dependent Termination
While the RNA polymerase core enzyme is synthesizing a mRNA molecule, an RNA-DNA double helix molecule is formed within the enzyme. Transcriptional termination involves weakening the hydrogen bonds within this RNA-DNA double helix, resulting in dissociation of the RNA (and the RNA polymerase core enzyme) from the DNA.
Transcriptional termination can occur in two different ways in the bacterium E. coli:
- Rho (ρ)-dependent termination.
- Rho (ρ)-independent termination
The rho (ρ)–dependent mechanism of termination requires binding between the rho (ρ) protein, a helicase that breaks the hydrogen bonds within an RNA-DNA double helix, and an RNA sequence near the 3’ end of the mRNA transcript called the rho utilization site (rut) (see figure 9.7). The ρ-dependent mechanism of transcription termination also requires the formation of a secondary structure within the RNA transcript called a stem-loop or hairpin loop. The stem-loop is formed when guanine (G) and cytosine (C) bases are produced in the mRNA when RNA polymerase reads the terminator DNA sequence. The stem-loop, composed of hydrogen bonds between these G and C nucleotides within the same mRNA molecule, slows the RNA polymerase core enzyme during transcription. The ρ protein then catches up with the RNA polymerase, separates the RNA from the template DNA strand within the RNA polymerase, and releases the RNA transcript and the RNA polymerase core enzyme from the DNA. Transcription is terminated.
- What three components are involved in ρ-dependent termination?
- What are the functions of each of these components in ρ-dependent termination?
Rho (ρ)-Independent Termination
The rho (ρ)-independent termination mechanism does not require ρ protein or the rut RNA sequence (see figure 9.7). In rho (ρ)-independent termination of transcription, a stem-loop structure is formed in the newly synthesized RNA that slows the RNA polymerase core enzyme. This pausing of the RNA polymerase is aided by another protein called NusA. While the RNA polymerase slows down, a uracil-rich region is synthesized in the RNA because the RNA polymerase core enzyme is copying an adenine-rich region in the template DNA strand. Recall that each uracil base in the mRNA forms two hydrogen bonds with an adenine base in the template DNA strand. This weak base pairing between U and A bases tends to dissociate spontaneously, releasing the mRNA and RNA polymerase, terminating transcription.
The mechanism that is used for transcription termination depends on the gene. About 50% of E. coli genes use the ρ-dependent mechanism, the other 50% of genes use the ρ-independent mechanism.
- What three components are involved in ρ-independent termination?
- What are the functions of each of these components in ρ-independent termination?
B. Transcription in Eukaryotes
Transcription is one of the most important biological processes in a eukaryotic cell, as the activation of a gene allows eukaryotic cells to adapt to environmental changes (e.g., the sudden appearance of a hormone in the blood). Moreover, many eukaryotic organisms are multicellular, so genes need to be transcribed at the right time during development and in the correct cell type. For example, genes involved in building the central nervous system should be transcribed when the nervous system is forming during development. Genes that encode proteins involved in muscle contraction should be transcribed in muscle cells and not transcribed in other cell types, such as white blood cells. These phenotypic differences are due to differences in transcription, as all cell types (neurons, muscle cells, white blood cells) in the body contain an identical collection of genes.
DNA Sequences Control Eukaryotic Transcription
The transcription of eukaryotic genes is controlled by several types of DNA sequence elements, including the following (see figure 9.8):
- Core promoter. The core promoter determines where the RNA polymerase will bind to the DNA and begin transcription. The core promoter contains two important DNA sequence elements:
- TATA box (-25 sequence). The TATA box is a 5’-TATAAAA-3’ sequence in the coding DNA strand that is located 25-35 base pairs upstream of the transcriptional start site. The TATA box serves as the binding site for the general transcription factor protein TFIID (see below). The TATA box is also rich in AT base pairs, promoting DNA strand separation.
- Transcription start site (+1 site). The +1 site is the first nitrogenous base in the template DNA strand that is transcribed into an RNA nucleotide.
For a eukaryotic gene to be transcribed, the TATA box and the +1 site must be present. However, if these two sequences are the only DNA sequences present upstream of a gene, the gene can only be transcribed at a low, yet constant rate, the so-called basal level of transcription.
- Regulatory DNA sequences. Regulatory DNA sequences function to either help transcribe the gene above the basal level or transcribe a gene below the basal level. Regulatory DNA sequences serve as the binding sites for regulatory transcription factor proteins that influence the ability of the RNA polymerase to recognize the core promoter. Regulatory DNA sequences include:
- Enhancer DNA sequences. Enhancer sequences stimulate the transcription of the controlled gene above the basal level. Enhancer sequences are the binding sites for activator proteins.
- Silencer DNA sequences. Silencer sequences prevent or down-regulate transcription of the controlled gene below the basal level. Silencer DNA sequences are the binding sites for repressor proteins.
The DNA sequences that influence transcription of an adjacent gene are called cis-acting DNA elements. Cis-acting DNA elements include the core promoter, enhancer, and silencer sequences. The transcription factor proteins that bind to these cis-acting DNA elements are called trans-acting factor proteins. Trans-acting factors proteins, also called transcription factor proteins, include activator proteins, repressor proteins, and the general transcription factor (GTF) proteins (see below).
- What are the names of the two sequence features within the core promoter?
- What are the two functions of the TATA box?
- What are names and functions of the two regulatory DNA sequences that influence the transcription of eukaryotic genes?
- What are the names of the proteins that bind to these two regulatory DNA sequences?
RNA Polymerases in Eukaryotes
In eukaryotes, there are three types of RNA polymerases that handle transcription:
- RNA polymerase I. RNA polymerase I transcribes most of the eukaryotic ribosomal RNA (rRNA) genes to make rRNA molecules. We will learn in Part 11 that rRNA molecules are noncoding RNA molecules that play a critical role in the translation process.
- RNA polymerase II. RNA polymerase II transcribes eukaryotic structural genes. Recall that structural genes produce mRNA molecules upon transcription. In this section, we will focus our attention on RNA polymerase II.
- RNA polymerase III. RNA polymerase III transcribes all eukaryotic transfer RNA (tRNA) genes. We will learn in Part 11 that tRNA molecules are noncoding RNA molecules that play a critical role in translation, functioning to deliver amino acids to the ribosome.
- What types of genes do the three eukaryotic RNA polymerases transcribe?
Initiation in Eukaryotes
Both basal (constant, low level) transcription and regulated (above or below the basal level) transcription of structural genes in eukaryotes require the following proteins (see figure 9.9):
- RNA polymerase II.
- General transcription factor (GTF) proteins. The GTF proteins function like the bacterial sigma (σ) factor protein; the GTFs deliver RNA polymerase II to the core promoter. There are six major GTF proteins in eukaryotes:
- TFIID. The TFIID protein binds to the core promoter by recognizing the TATA box DNA sequence. TFIID is actually a multi-subunit protein “machine” composed of at least ten protein subunits. One of these protein subunits is the TATA-binding protein (TBP) that binds directly to the TATA box.
- TFIIA. The TFIIA protein helps TFIID bind to the TATA box DNA sequence.
- TFIIB. The TFIIB protein binds to TFIID and recruits the RNA polymerase II/TFIIF complex to the core promoter.
- TFIIF. The TFIIF protein is always associated with RNA polymerase II. When the TFIIF protein binds to TFIIB, RNA polymerase II is located at the +1 site in the DNA.
- TFIIH. The TFIIH protein is another multi-subunit protein complex. One subunit of TFIIH is a DNA helicase that forms the open complex by breaking the hydrogen bonds at the TATA box sequence in the DNA. Another subunit is a kinase, phosphorylating RNA polymerase II to activate transcription. TFIIH uses the chemical energy within ATP to activate RNA polymerase II.
- TFIIE. TFIIE assists TFIIH to separate the two DNA strands to activate transcription.
The association of RNA polymerase II with the six GTF proteins listed above forms a preinitiation complex. The preinitiation complex is also called the basal transcription apparatus.
- Which of the GTFs binds to the core promoter?
- Which of the GTFs acts as a bridge to connect the GTF bound to the core promoter to the GTF bound to RNA polymerase II?
- Which of the GTFs is both a kinase and a helicase?
- Which GTF activates RNA polymerase II?
General and Regulatory Transcription Factors
Transcription factor proteins influence the ability of RNA polymerase II to bind to a eukaryotic core promoter. A huge number of eukaryotic genes encode transcription factor proteins. For example, it is estimated that as many as 1000 human genes encode proteins that regulate transcription! There are two categories of transcription factor proteins:
- General transcription factor (GTFs) proteins. The GTFs include the TFIID, TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH proteins described above. The GTFs function to recruit RNA polymerase II to the core promoter and activate RNA polymerase II to begin transcription. The GTFs are required for all transcription events. If these transcription factors are the only ones involved, the gene is transcribed at a low, yet constant level, the so-called basal level. GTFs are also required for transcription rates above and below the basal level.
- Regulatory transcription factor proteins. Regulatory transcription factor proteins function to either increase the rate of transcription above the basal level or decrease the rate of transcription below the basal level (see figure 9.10). Activator proteins are regulatory transcription factors that bind to enhancer DNA sequences and increase the level of transcription above the basal level. Repressor proteins are regulatory transcription factors that bind to silencer DNA sequences and decrease transcription below the basal level. Many regulatory transcription factors are only expressed in certain tissues or at certain times during development, thus playing a critical role in tissue-specific or time-specific gene transcription.
The DNA binding sites (core promoter, enhancer, and silencer sequences) for these transcription factor proteins tend to be near the genes they control. As a result, the DNA sequences are called cis-acting DNA elements. However, these cis-acting DNA elements do not need to be immediately adjacent to the core promoter. Some enhancers and silencers can be within the gene that they control or can be hundreds of thousands of base pairs away. The transcription factor proteins (GTFs, activators, and repressors) that bind to the cis-acting elements are called trans-acting factor proteins.
Since transcriptional control requires both input from a myriad of DNA sequences and proteins, some factor in the cell needs to interpret the various activation and repression signals to provide an overall signal to RNA polymerase II. A large multi-subunit protein complex called mediator regulates the interaction between RNA polymerase II and the activator and repressor proteins. Mediator thus serves as a link between transcription factors that bind to enhancer and silencer DNA sequences and RNA polymerase II, thereby determining the overall rate of transcription.
- What are three examples of cis-acting DNA elements?
- What are three examples of trans-acting factor proteins?
- What is the function of the mediator complex?
Transcription Elongation in Eukaryotes
The elongation step in eukaryotic transcription is virtually identical to the transcription elongation step in prokaryotes. RNA polymerase II in eukaryotes has the same functional capabilities as the RNA polymerase core enzyme in E. coli.
- What are the names of the two proteins that act as a DNA helicase in eukaryotic transcription?
Transcription Termination in Eukaryotes
Transcriptional termination in eukaryotes occurs during the process of 3' end polyadenylation, a modification to the 3’ ends of eukaryotic mRNAs. We will cover 3' end polyadenylation in more detail in Part 10. In short, a protein called cleavage and polyadenylation specificity factor (CPSF) binds to a polyadenylation signal sequence (5'-AAUAAA-3') in the mRNA. CPSF then acts as an endonuclease, cleaving the mRNA approximately 20 nucleotides downstream (towards the 3’ end of the mRNA) from the polyadenylation signal sequence. Cleavage of the mRNA by CPSF releases the mRNA from RNA polymerase II.
After CPSF releases the mRNA from RNA polymerase II, there are two potential ways that RNA polymerase II can be released from the DNA, thereby terminating transcription:
- Torpedo model. The torpedo model involves a 5’ to 3’ exonuclease called XRN2 degrading the remaining RNA linked to RNA polymerase II and dissociating RNA polymerase II from the DNA (see figure 9.11a).
- Allosteric model. When RNA polymerase II transcribes the portion of the gene that produces the polyadenylation signal sequence, the RNA polymerase is destabilized and is released from the DNA (see figure 9.11b).
- What is the difference between the torpedo and the allosteric models of transcription termination?
Fill in the blank:
- When structural genes are expressed, they produce ___________________RNA molecules; when nonstructural genes are expressed, they produce _________________RNA molecules.
- _________________ is an example of a GTF protein that has both helicase and kinase activity.
- The _____________ protein binds to the -10 and -35 sequences.
- The RNA polymerase holoenzyme consists of the _____________________________ protein subunits and the _________________________ factor protein.
- The -25 sequence is the binding site for the _________________ protein.
- The ____________ protein binds to the rut sequence found in 50% of bacterial mRNA molecules.
- RNA polymerase _____ is responsible for transcribing eukaryotic structural genes.
- Phosphorylation of _________________________ helps to activate transcription in eukaryotes.
- A(n) __________________ protein binds to an enhancer sequence in the DNA to activate transcription above the basal level, while a(n) ______________ protein binds to a silencer sequence in the DNA to decrease transcription below the basal level.
- The protein ____________ causes the RNA polymerase core enzyme to pause at the stem loop in the rho (ρ)-independent mechanism.
- DNA replication requires the use of DNA helicase to unwind double-stranded DNA, while transcription in bacteria uses the _________________________ to unwind double-stranded DNA.