1 - Chromosome Structure
The first reading assignment this semester examines the general features and organization of the chromosomes found within viruses, prokaryotic cells (bacteria), and eukaryotic cells (fungi, protozoa, algae, plants, and animals).
The genetic material (genome) of viruses can be composed of either RNA or DNA. A single virus type never has both DNA and RNA in the same virus particle.
The genomes of viruses can be in several forms:
- Double-stranded DNA (dsDNA). A dsDNA genome contains two individual DNA strands held together by hydrogen bonds. Even though you will not be required to know examples of viruses with dsDNA genomes, several human pathogens have dsDNA genomes, including the smallpox virus and the herpes viruses.
- Single-stranded DNA (ssDNA). The genomes of many viruses are composed of a single DNA strand. Parvovirus, which infects dogs and cats, has a single-stranded DNA genome.
- Double-stranded RNA (dsRNA). A dsRNA genome contains two individual RNA strands held together by hydrogen bonds. Rotavirus, which causes severe diarrhea in humans, has a double-stranded RNA genome.
- Single-stranded RNA (ssRNA). The genomes many viruses are composed of a single RNA strand. Many disease-causing viruses, such as poliovirus, influenza virus, SARS-CoV-2 (causes COVID-19), and the human immunodeficiency virus (HIV) contain single-stranded RNA genomes.
The genomes of viruses can be circular or linear. One way to determine if a viral genome is circular or linear is to isolate the viral genome and treat the genome with nucleases, enzymes that digest (cut) DNA or RNA. There are two types of nucleases. Exonucleases can digest nucleic acids into nucleotides only if there is a free end; endonucleases can cut DNA or RNA in the middle of a nucleic acid molecule. As a result, circular genomes are sensitive to endonucleases, while linear genomes are sensitive to both exonucleases and endonucleases.
The virus genome can be contained within one continuous nucleic acid molecule, or the viral genome can be divided into segments. The genome of the influenza virus, for example, contains eight linear ssRNA segments.
When the genome of a virus is located within a virus particle, the genome is inert, meaning that the genome is not copied and viral genes are not transcribed. A virus genome is copied and viral genes are transcribed only during the infection of a host cell.
Viral genomes can range from a few thousand base pairs to 250,000 base pairs in length. For comparison, the genome of the bacterium E. coli is 4 million base pairs in length, while the haploid human genome is 3 billion base pairs in length.
- What are the four major criteria used for classifying viral genomes?
- How can a scientist determine if a viral genome is linear or circular?
The genome of a bacterium is typically composed of a single chromosome. Bacteria are prokaryotic, and since prokaryotes do not contain nuclei, the bacterial chromosome is not bounded by a nuclear membrane. Instead the bacterial chromosome is found in a region of the bacterial cytoplasm called the nucleoid (see Figure 1.1).
A bacterial chromosome has the following features:
- The bacterial chromosome is usually a single circular double-stranded DNA molecule.
- The bacterial chromosome is usually a few million base pairs in length.
- The bacterial chromosome contains a few thousand structural genes. These structural genes are transcribed and translated to make protein products.
- The bacterial chromosome has a single origin of replication. The origin of replication serves as the binding site for proteins involved in initiating DNA replication. The origin of replication in the bacterium E. coli is called oriC.
- The bacterial chromosome includes intergenic sequences. Intergenic sequences are located between structural genes and are not transcribed. Intergenic sequences can serve as the binding sites for proteins that function to activate or deactivate structural genes.
- The bacterial chromosome contains repetitive sequences. Repetitive sequences are repeats of a particular base pair sequence, are found within the intergenic sequences, and are involved in compacting the chromosome to fit into the nucleoid region of the bacterial cell.
The genome of a eukaryotic cell is subdivided into multiple chromosomes. Each eukaryotic chromosome is a single linear double-stranded DNA molecule that is approximately 10–100 million base pairs (bp) in length (see Figure 1.2).
A eukaryotic chromosome has several important features:
- Origins of replication. Eukaryotic chromosomes contain many origins of replication, spaced at approximately 100,000 base pair (bp) intervals along the chromosome. In the yeast Saccharomyces cerevisiae, each origin is called an ARS element.
- Centromeres. Each eukaryotic chromosome has a single centromere. Centromeres play a critical role in chromosome separation into daughter cells during mitosis and meiosis. A protein structure called the kinetochore covers the centromere DNA sequence. The kinetochore functions to link the centromere DNA to the microtubule spindle of the dividing cell, ensuring proper chromosome movement during mitosis and meiosis.
- Telomeres. Telomeres are the ends of eukaryotic chromosomes. Telomeres function to prevent chromosomes from sticking together (i.e., prevent translocations). Telomeres also protect the ends of chromosomes from exonucleases and prevent chromosome shortening during DNA replication.
- Structural genes. Several hundred to thousands of structural genes are found within a typical eukaryotic chromosome. Recall that structural genes encode protein products. Eukaryotic structural genes are composed of two types of DNA sequences: exons and introns. The exon sequences encode the the amino acids within the protein product, while the intron sequences do not code for the protein product.
- Intergenic sequences. Intergenic sequences are located between structural genes and are not typically transcribed. Intergenic sequences include DNA sequences that serve as the binding sites for the proteins that function to activate or deactivate genes.
- Repetitive sequences. Repetitive DNA sequences are multiple repeats of the same or similar DNA sequence and comprise approximately 60% of the human genome. Most of the repetitive DNA sequences do not encode protein products. Examples of these repetitive DNA sequences will be discussed in more detail below.
- Heterochromatin. Heterochromatin refers to regions along a chromosome that contain highly condensed DNA. These heterochromatin regions either lack genes altogether or contain genes that are not actively transcribed. The centromere and telomere regions of chromosomes are composed of heterochromatin.
- Euchromatin. Euchromatin refers to the loosely condensed regions along the chromosome. Many structural genes are located within euchromatin.
- How are prokaryotic and eukaryotic chromosomes similar?
- How are prokaryotic and eukaryotic chromosomes different?
- What is meant by the term structural gene?
- What is the difference between an exon and an intron?
- What are the functions of centromeres and telomeres?
- What is the difference between heterochromatin and euchromatin?
Repetitive Sequences in Eukaryotes
Some DNA sequences found within eukaryotic chromosomes are unique sequences. These unique sequences are found in a single copy per genome. Keep in mind however, that most eukaryotes are diploid, having two copies of each chromosome (forming a homologous chromosome pair). As a result, eukaryotes typically have two copies of each unique gene: one on each chromosome within a homologous chromosome pair. Most structural genes are examples of unique DNA sequences.
Eukaryotic genomes also contain repetitive DNA sequences. These repetitive DNA sequences include moderately repetitive sequences and highly repetitive sequences. Moderately repetitive sequences are present in a few hundred to a few thousand copies per genome. Highly repetitive sequences are present in tens of thousands to millions of copies in a single genome.
DNA Reassociation Experiments
Before scientists were able to determine the base pair sequence of a DNA molecule, DNA reassociation experiments were done to determine the overall composition of the genome, focusing on repetitive DNA sequences. In a typical DNA reassociation experiment, entire chromosomes are isolated and are mechanically sheared into fragments. The chromosome fragments are then denatured into single strands by increasing the temperature of the reaction. The reaction mixture is then cooled. As the reaction cools, single-stranded DNA molecules attempt to find each other and form hydrogen bonds to create double-stranded DNA molecules; different DNA fragments do so at different rates (see Figure 1.3). Think of it this way, a single-stranded DNA molecule will move around looking for its complement to reattach to base-for-base to form a double-stranded molecule. For highly or moderately repetitive DNA sequences, there are many single strands in the reaction with the same sequence to choose from. As a result, highly and moderately repetitive sequences will find each other more rapidly than unique DNA sequences. The DNA reassociation experiment measures the amount of time it takes for single-stranded DNA to form double strands. DNA reassociation experiments showed that there are three populations of DNA: highly repetitive DNA reassociated most rapidly, followed by moderately repetitive, and finally, unique DNA sequences had the slowest rate of reassociation.
- Describe how highly repetitive, moderately repetitive, and unique DNA sequences behave in a DNA reassociation experiment.
Moderately Repetitive Sequences
Moderately repetitive DNA sequences include some genes that produce products. For example, the genes that produce the ribosomal RNA (rRNA) components of ribosomes (see Part 11) and the genes that make histone proteins (see Part 2) are considered moderately repetitive sequences.
Moderately repetitive sequences also include DNA sequences of unknown function. A good example of this type of moderately repetitive sequence is the variable number tandem repeat (VNTR) sequences. VNTRs are typically 15 to 100 base pairs long, are often located between genes, and are present in multiple copies repeated along the length of the chromosome. The number of VNTR repeat copies is unique to each individual. As a result, this variation in VNTR repeats is the basis of an important forensics technique called DNA fingerprinting (see Figure 1.4).
The telomere repeat sequences (see figure 1.6) are also considered to be moderately repetitive sequences.
- What are four examples of moderately repetitive DNA sequences?
- Why are VNTRs well suited for forensics?
Highly Repetitive Sequences
The centromere region (CEN region) of the chromosome contains highly repetitive DNA. In humans, the CEN region is approximately 106 base pairs long, consisting of a tandem repeat of many copies of a particular 170 base pair sequence. Tandem repeats are copies of the same DNA sequence repeated many times in a row along the chromosome.
The Alu family of sequences in humans is another good example of a highly repetitive sequence. An individual Alu sequence within the human genome is only 300 base pairs long; however, there are so many copies of this Alu sequence that approximately 10% of the human genome is thought to be composed strictly of Alu sequences (see Figure 1.5). Some of these Alu sequences are particularly interesting because they have the ability to move from one location in the genome to another. DNA sequences that can move in this manner are called transposable elements.
Finally, the heterochromatin regions of a chromosome often contain highly repetitive DNA sequences.
- What are three examples of highly repetitive DNA sequences?
- What makes transposable elements unique?
The telomeres of eukaryotic chromosomes have the following features:
- Telomeres contain tandem repeat DNA sequences. The tandem repeat DNA sequences within telomeres are 6–8 base pairs (bp) long and often contain multiple G and T nucleotides. For example, the telomere repeat sequence in humans is 5’-TTAGGG-3’. Depending on the particular eukaryotic species, each telomere may contain several hundred to several thousand tandem repeats of a similar 6–8 nucleotide-long sequence. (see Figure 1.6)
- Telomeres contain 3’ single-stranded overhangs. The 3’ overhang is a single-stranded DNA sequence; containing multiple copies of the telomere repeat. The overhang is typically 12–16 bp in length.
- Telomere overhangs can form loops. The telomere DNA sequence has the potential to turn back on itself to form a t-loop (see Figure 1.7). Within the t-loop, the 3’ overhang of the telomere invades another portion of the same chromosome by forming unusual hydrogen bonds involving multiple G bases (G quartet). The t-loop is thought to be the actual structure that protects the eukaryotic chromosome from exonucleases.
- At what locations on a chromosome are you more likely to find repetitive DNA sequences?
- What is the function of a t-loop?
Eukaryotic chromosomes can be distinguished from each other by the location of the centromere (see Figure 1.8), the size of the chromosome, and the banding patterns produced along the chromosome after staining with certain dyes. The centromere separates the chromosome into halves (each half is called an arm); the shorter of the two chromosome arms is designated p, while the longer arm is designated q.
In terms of centromere location, chromosomes are classified in the following ways:
- Metacentric. The centromere of a metacentric chromosome is located near the center of the chromosome.
- Submetacentric. The centromere of a submetacentric chromosome is located slightly off center.
- Acrocentric. The centromere of an acrocentric chromosome is located significantly off center. In humans, there are five pairs of acrocentric chromosomes: namely, 13, 14, 15, 21, and 22. These five pairs of chromosomes contain short p arms having multiple copies of the same ribosomal RNA (rRNA) genes. Having many copies of the ribosomal RNA genes ensures that the cell is able to produce enough ribosome components for translation (see Part 11). The number of rRNA gene copies varies among individuals, but the average is 100 copies per genome.
- Telocentric. The centromere of a telocentric chromosome is located at the end of the chromosome. Humans do not contain telocentric chromosomes.
- What is meant by the terms metacentric, submetacentric, acrocentric, and telocentric?
- What genes are found on the short arms of the five acrocentric chromosomes in humans?
Human Karyotype and Staining
A karyotype is an image of all of the chromosomes within a dividing cell, in which the homologous chromosomes (recall that one chromosome in a homologous pair is inherited from mom; the other chromosome is inherited from dad) are arranged in pairs (see Figure 1.9). The p arms are arranged above the centromere and the q arms are arranged below the centromere. Human autosomal (non-sex) chromosomes are numbered from the longest to the shortest chromosome, 1 to 22. The sex chromosomes are labeled X and Y.
Some chromosomes are very similar in size and in centromere location. These chromosomes are difficult to distinguish from each other in the microscope, unless the chromosomes are stained with certain dyes to produce banding patterns that are unique to each chromosome. A common staining procedure involves a dye called Giemsa, which produces a unique pattern of light and dark bands on each chromosome (G banding). Dark bands on the chromosomes represent areas of the DNA that are tightly compacted (heterochromatin); light bands represent areas of the DNA that are not as tightly compacted (euchromatin).
- What are three ways that scientists can distinguish one chromosome from another?
A set of rules has been established to number the human chromosomes based on the size, centromere location, and banding pattern. Additionally, the dark and light bands that result from chromosome staining are numbered within a chromosome. This numbering assists in determining where chromosome mutations occur and helps to delineate the exact location of the chromosome abnormality. For example, band 22q12 refers to chromosome 22, the long arm (q), region 1 (closest to the centromere), band 2. If a deletion removes a portion of chromosome 22, the exact location of that deletion could be identified based on this numbering system.
Fill in the blanks:
- A(n) ___________________ is a type of enzyme that can digest from the ends of linear nucleic acid molecules.
- A(n) _________________ is a type of enzyme that can cut linear or circular nucleic acid molecules.
- Bacterial chromosomes are found in a region of the cytoplasm called the ______________.
- One distinction between prokaryotic and eukaryotic cells is that bacteria have _____ origin of replication while eukaryotes have ___________.
- Eukaryotes contain highly condensed DNA that lacks genes, these regions called _________________ are not generally transcribed and appear as __________ bands on a Giemsa-stained chromosome.
- ________________ are highly repetitive DNA sequences that compose up to 10% of the human genome.
- The ends of a linear chromosome are called __________________ and the middle where the spindle proteins attach is called the ________________.
- Two types of genes that are moderately repetitive include __________ and __________ genes.
- The shorter piece of a chromosome is called the _____ arm while the longer piece is called the _____ arm.
- _____________ genes are found on the p arms of chromosomes 13, 14, 15, 21, and 22.
End-of-Chapter Survey: How would you rate the overall quality of this chapter?
- Very Low Quality
- Low Quality
- Moderate Quality
- High Quality
- Very High Quality