Organization of the Eukaryotic Genome and Gene Expression
They are in you and me; they created us, body and mind; and their preservation is the ultimate rationale for our existence… they go by the name of genes, and we are their survival machines.
—Richard Dawkins (English biologist, 1941–)
In: The Selfish Gene (1976)
Human genetic information, collectively known as the genome, exists as deoxyribonucleic acid or DNA within every nucleated somatic cell of the body. The DNA in each cell contains all instructions necessary to direct the growth and development of cells into an organism and to maintain cellular function throughout the individual’s lifespan. Replication or copying of DNA, during development and also during growth and repair, must ensure that the instructions within DNA are faithfully passed from cellular generation to generation. This requirement ensures the preservation of both the individual organism and the species. Neither human DNA nor human cells can exist without the other, in a symbiotic type of relationship. Cells provide the framework and machinery to ensure that the genetic instructions within the DNA are reproduced and followed with fidelity.
Within DNA are nucleotide bases that are arranged in specific sequences to form genes. Genes exist to code for proteins that carry out functions of the cells and, therefore, of the organism. Yet, while this basic genetic blueprint is identical within all somatic cells of an individual, proteins within cells differ according to the cell type. While liver cells, for example, require an array of protein enzymes to carry out metabolic functions, bone cells require more protein support structures.
Through transcription, instructions of DNA are converted into mRNA and the nucleic acid language is then translated into a protein. By regulating the regions of DNA that are transcribed, specific cell types are able to dictate the proteins that are produced. After their manufacture, the proteins are trafficked or moved to functional locations within or outside their cell or origin. A careful balance is needed between protein synthesis and degradation to enable cell function and survival and, therefore, also the continued existence of the DNA that instructs its formation.
Every nucleated eukaryotic somatic cell contains essentially the same blueprint—a set of genetic information collectively known as the genome. The tremendous potential for understanding the genetic basis for human development and disease led to the successful worldwide effort to sequence the entire human genome. We still need to understand how the human genome is organized and how to decipher the meaning of much of the human deoxyribonucleic acid (DNA) sequence.
Some parts of the genome contain instructions used daily by the cell. But other genetic instructions are useful to a cell only when it is stressed. Still other genetic instructions are never used by the cell. Because of the vast amount of genetic information, it is critical for a cell to retrieve this information in a timely manner. Knowledge of how genetic information is stored and retrieved is essential for an understanding of the functioning of the eukaryotic genome.
We will first examine the physical organization of the genome and then proceed to the biochemical processes required to maintain and manage the genome. Just as the page you are reading contains letters that are arranged in discrete information units known as words and words are combined into sentences, paragraphs, chapters, and so on, DNA contains nucleotides arranged into genes, chromosomes, and so on (Figure 6.1). This chapter will describe both the physical and the informational organization of the genome.
FIGURE 6.1.Data storage analogy.
The human genome is contained within two distinct compartments: the nucleus and the mitochondria. The bulk of the genome, containing about 20,000 to 25,000 genes encoded by DNA, is contained within a set of linear chromosomes within the cell nucleus and contains genetic material of both maternal and paternal origin. In contrast, mitochondrial DNA contains 37 genes that are essential for normal mitochondrial function and is exclusively of maternal origin. This chapter addresses the organization of nuclear DNA. Eukaryotic nuclear DNA is associated with a variety of proteins, which together form a complex structure, chromatin, that allows for numerous configurations of the DNA molecule and types of control unique to the eukaryotic organism.
DNA building blocks
DNA contains the structural blueprint for all genetic instructions. The genetic code contained within the DNA is composed of four “letters” or bases. Two of the bases are heterocyclic compounds or purines—adenine (A) and guanine (G)—and two are six-member rings known as pyrimidines—cytosine (C) and thymidine (T). The famous double-helix structure of DNA derives from its phosphate-deoxyribose backbone (Figure 6.2). The backbone comprises five-carbon sugar (pentose) molecules bound to a nucleoside (A, G, C, or T). The pentose molecules are also asymmetrically joined to phosphate groups by phosphodiester bonds. Hydrogen bonds between complementary (G:C or A:T) nucleotides (a nucleoside linked to a sugar and one or more phosphate groups) interact to stabilize and form the double-helix structure.
FIGURE 6.2.Nuclear structure of eukaryotic DNA.
Chromatin consists of very long double-stranded DNA molecules, nearly an equal mass of rather small basic proteins termed histones, as well as smaller amounts of nonhistone proteins, and a small quantity of ribonucleic acid (RNA). Histones are a heterogeneous group of closely related arginine- and lysine-rich basic proteins, which together make up one fourth of amino acid residues. These positively charged amino acids help histones to bind tightly to the negatively charged sugar phosphate backbone of DNA. Functionally, histones provide for the compaction of chromatin.
Just how much DNA is there?
The human haploid genome contains approximately 3 × 108 base pairs packaged into 23 chromosomes. Total uncoiled DNA within a single human cell would stretch to more than a meter. Uncoiled individual chromosomes would measure 1.7 to 8.5 cm in length.
The nucleus of a human cell is typically 6 ?m in diameter but contains DNA which at its maximum stages of condensation is only about 1/50,000th of its linear length. At least four levels of packaging of DNA take place in order that DNA in individual chromosomes fits into the 1.4 ?m chromosome seen at metaphase (a stage in mitosis in the cell cycle where the DNA is most condensed).
Nucleosomes are the fundamental organization upon which the higher order packing of chromatin is built. Each nucleosome core consists of a complex of eight histone proteins (two molecules each of histone H2A, H2B, H3, and H4) with double-stranded DNA wound around it. 146 base pairs (bp) of DNA are associated with the nucleosome particle and a 50 to 70 bp span of linker DNA bound by a linker histone H1 separates each nucleosome (Figure 6.3).
FIGURE 6.3.Structure of a nucleosome.
In addition to their role in packaging DNA, nucleosomes also regulate gene expression, or activity, by determining whether the DNA sequences can be accessed by transcription factors, allowing the factors to regulate expression of a nearby gene (Chapter 10). Nucleosomes are in turn successively packed into higher order structures by coiling and looping (Figure 6.4).
FIGURE 6.4.Higher order structures formed during progressive compaction of chromatin.
Each core histone has a structured domain and an unstructured amino-terminal “tail” of 25 to 40 amino acid residues. Enzymatic modification of the amino-terminal tails (e.g., by acetylation, methylation, or phosphorylation) modifies the histones’ net electric charge and shape. These modifications are physiologically reversible and are thought to prepare the chromatin for DNA replication and transcription.
1. Acetylation and deacetylation of lysine residues: These processes are important in making DNA more or less accessible to transcription factors (proteins that regulate gene expression by direct binding to DNA). Lysine residue acetylation weakens the DNA-histone interactions and makes the DNA more accessible to factors needed for transcription (Figure 6.5). Therefore, histone acetylation (catalyzed by histone acetyl transferases or HATs) is generally associated with transcriptional activation. On the other hand, histone deacetylation (catalyzed by histone deacetylase or HDAC) is associated with gene silencing. The interplay of acetylase and deacetylase activities defines the transcriptional activity of a given chromatin region.
FIGURE 6.5.Histone (de)acetylation controls chromatin compaction and decompaction.
2. Euchromatin and heterochromatin: These terms describe the compaction of DNA in the chromosome and are used to further classify chromatin. Densely packed or compacted regions of chromatin are termed heterochromatin and, for the most part, are genetically inactive (Figure 6.6). Transcription is inhibited in heterochromatin because the DNA is packaged so tightly that it is inaccessible to the proteins responsible for RNA transcription.
FIGURE 6.6.Euchromatin and heterochromatin.
3. Transcriptionally active nucleus: Less densely compacted chromatin regions in a transcriptionally active nucleus are called euchromatin (Figure 6.6) and are commonly undergoing, or preparing for, or have just completed transcription. For a gene to be transcribed, its gene sequence must become available to the RNA polymerases and regulatory proteins that influence the rate at which the gene is transcribed. Euchromatin represents uncoiled chromatin structures that allow RNA polymerases’ and regulatory proteins’ access to DNA. During cell division, the chromatin becomes highly compact and coiled and condenses into the familiar structure of the mitotic chromosome.
Individual chromosomes are composed of both a noncovalent complex of one very long, linear duplex DNA and associated histone proteins. Chromosome structure varies with the cell cycle, from the loose threadlike appearance in G1 phase to the tightly compacted state observed during M phase (see Chapter 20). Chromosomes require three sequence elements for their propagation and maintenance as individual units. Telomeres are hexameric DNA repeats [(TTAGGG)n] found at the ends of chromosomes that serve to protect the chromosome from degradation (Figure 6.7). Sequence elements known as centromeres serve as “handles,” which allow mitotic spindles to attach to the chromosome during cell division.
As the cell progresses through the mitotic or M phase of the cell cycle, the nuclear envelope breaks down, and chromosomes segregate into the opposite poles of the cells (to form daughter cells), while a kinetochore forms consisting of the centromere and mitotic spindles. The centromere also serves as a boundary that separates the two arms (short, or p, from the French petite, and long, or q, because “q” follows “p” in the alphabet) of the chromosome (placement varies for different chromosome types). We discuss more about the mechanism of the cell cycle in Chapter 20.
FIGURE 6.7.Chromosome structure.
In order for DNA in chromosomes to replicate, a specific nucleotide sequence acts as a DNA replication origin. Each chromosome contains multiple origins of replication, dispersed throughout its length. At the origin of replication, there is an association of sequence-specific, double-stranded DNA binding proteins with a series of direct repeat DNA sequences.
Metaphase chromosomes can be visualized microscopically to allow geneticists to detect and identify chromosome abnormalities. Karyotype analysis is an important diagnostic tool in the prenatal diagnosis of chromosomal abnormalities such as Down syndrome (trisomy 21), the staging of tumor progression (tumors often have abnormal number of chromosomes), determining infertility, and even to prevent males from performing as athletes in female sports events.