|Lecture Notes: 6 April
© R. Paselk 2006
Eukaryotic RNA Polymerases
[See: Maniatis, Tom and Robin Reed (4 April 2002) An extensive network of coupling among gene expression machines. Nature 416 pp499-506 for new picture]
Unlike prokaryotes, eukaryotes have a variety of RNA polymerases: a mitochondrial polymerase (and a chloroplast polymerase in plants), and three nuclear polymerases. We will focus on the three nuclear RNA polymerases:
- RNA Polymerase I: This enzyme is localized in the nucleolus and is responsible for synthesizing the rRNA precursor.
- RNA Polymerase II: This enzyme is in the nucleoplasm, synthesizing the mRNA precursors.
- RNA Polymerase III: This enzyme is also in the nucleoplasm, but specializes in synthesizing tRNA, the 5s rRNA and other small RNA precursors.
There is much variety and complexity in the make-up of the three polymerases. All are large enzymes with up to 14 different subunits. Polymerase II, which is also know as RNA Polymerase B, has gathered the greatest attention as one would expect. A comparison of these enzymes based on Polymerase II from yeast follows. The subunits of polymerase II are named RPB1-10 (for RNA Polymerase B 1-10).
- RNB1 (220 kD): This largest subunit has homologous subunits with similar sequences in polymerases I & III as well as the E coli subunit '. It has an unusual structural feature not found in prokaryotes, a long C-terminal 'tail' (the CTD = C Terminal Domain) with 27 repeats of the sequence PTSPSYS (pro-thr-ser-pro-ser-tyr-ser). Note that this sequence is quite hydrophilic, and has many potential sites for phosphorylation (5/7 have -OH groups).
- RNB2 (150 kD): The next largest subunit again has homologous subunits with similar sequences in polymerases I & III and this time E coli subunit . As in the case of E coli this subunit binds a NTP. Both RNB1 and RNB2 participate in the catalytic site of the polymerase
- RNB3 (45 kD): The next largest subunit is homologous with the E coli subunit . Two copies are present in the polymerase and are necessary for core assembly, as is the case in the bacteria. It is unique to Polymerase II in eukaryotes.
- RNB4 (32 kD): This subunit is the last to have a bacterial homolog, in this case sharing significant sequence similarity with the factor of E coli and thus thought to be involved with promoter recognition. It readily dissociates from the polymerase. Like RNB3, it is unique to Polymerase II in eukaryotes.
- RNB5 (27 kD), RNB6 (23 kD), RNB8 (14 kD), & RNB10 (10 kD) are all shared by the three eukaryotic polymerases.
- RNB7 (17 kD) Is unique to Polymerase II, and readily dissociates.
- RNB9 (13 kD).
Promoters and Enhancers
Eukaryotic polymerases differ in the strategies of promotion.
RNA Polymerase I
There is only one type of rRNA gene in a given species of eukaryote, though there may be hundreds or even thousands of copies of that gene. As a result there is only one promoter in each species for polymerase I, though the promoters are quite species specific.
The rRNA promoter for yeast has a sequence from -31 to +6 (core promoter element) with an additional upstream elements at - 187 and -107. A short sequence is probably required for polymerase binding with the rest required for transcription factors (Nested control regions).
The product of RNA polymerase I is a 7500 bp transcript (approx. 45s) which has, in order ( 5' 3') the 18s, 5.8s, and 28s rRNAs separated by spacers.
RNA pol II promoters are more diverse, as would be expected given the vast number of genes it transcribes.
- Constitutive (house keeping) genes expressed in all tissues have one or more copies of the GC box (GGGCGG) or its complement, upstream from the start. It seems to be analogous to the eubacterial promoter.
- Genes specific to tissues often lack the GC box, but have instead an AT rich conserved region 25-30 bp upstream, the TATA box (or Goldberg-Hogness box). The TATA box resembles the Pribnow box of eubacteria, but is at -27 instead of -10 and is not required for transcription. Instead it seems to select the start site.
- Additional promoter sequences for structural genes occur between -50 and -110. These promoters appear to be DNA-binding sites for RNA polymerase and proteins involved in initiation.
- The CCAAT box is located between -70 and -80.
- For globin genes the CACCC box is upstream from CCAAT.
RNA Polymerase II also has enhancers - sequences of variable portions and orientation relative to sequences - must be associated with promoters to function.
- need to get full activity of promoter
- seem to be entry points or transcription factor binding sites, either of which enhances RNA polymerase binding.
- So far only associated with tissue specific genes
- Seem to mediate selective gene expression in eukaryotes
RNA Polymerase III
Promoters can be totally within transcribed sequences.
- Binding site for transcription factor that stimulate upstream binding binding of polymerase III promotion can be upstream of start.
The Genetic Code
Major considerations in understanding the coding required to
translate the four base nucleic acid alphabet to the 20 amino
acid alphabet include:
- How many bases are used to determine each amino acid? Obviously
need at least 20 codon "words." From a simple consideration
- A one base codon could code for a maximum of 41 = 4 amino acids, clearly not sufficient.
- A two base codon could code for a maximum of 42 = 16 amino acids, still not enough.
- A three base codon could code for a maximum of 43 = 64 amino acids, which is more than adequate. Thus a three base
codon is needed, but will be highly degenerate if all codons
- Is the code punctuated - that is, are there signals between
codons indicating the beginning and ends of codons. (For example,
one combination of bases could be set aside as a "period"
to indicate and set off read codons.)
- Is the code overlapping? Thus we could imagine a triplet
code where every possible triplet is read such that ABCDABCD
might be read as: ABC, BCD, CDA, DAB, etc. instead of ABC, DAB,
In fact the code has proven to be a non-overlapping, non-punctuated,
triplet code in which gene sequences are co-linear with peptide
sequences, and where 5' 3' corresponds
to NH2 COO-.
The code was originally elucidated in cell-free systems containing
the complete protein synthetic system except for a messenger RNA
(ribosomes, GTP, amino acyl tRNAs etc.). If polyU is then introduced
to the system, a poly-phe is produced, so one codon for phe =
UUU, similarly each of the other three polyNA's can be used. Then
can do alternate (e.g. UCUCUCUCUCUC) two different amino acids
will be coded etc. Finally, were able to synthesize and work with
triplets to get the entire code.
The "Standard" genetic code
is given in Table 27-7, p 1069 of your text. This is the code used by
all known organisms, the only exceptions being some deviations
in the mitochondrial tRNAs, and, it is now known, in the ciliated
- Of the 64 possible codons, 61 code for amino acids. The remaining
three are "stop" codons (UAA = ochre, UAG = amber,
and UGA = opal. The names are derived from the discoverer of
UAG, Bernstein which is German for amber. The other two are puns
- The code is very conservative, many mutations will
have no effect, particularly in the third base.
- The second base determines the character of the amino acid.
- U in the second position gives a hydrophobic amino acid.
- C in the second position gives a neutral hydrophilic amino acid or proline.
- G in the second position gives a basic or neutral hydrophilic amino acid.
- A in the second position gives a hydrophilic amino acid.
tRNA functions as an adaptor to correlate the four base nucleic
acid language to the 20 amino acid language. One end of the folded
molecule binds to the three-base codon on the messenger RNA while
the other end is bound to an amino acid residue. We discussed its structure earlier.
Aminoacyl tRNA Synthetases
Amino acid residues are covalently linked to tRNA in an "activated"
form in a two reaction process:
- Activation of the amino acid residue:
- Formation of the aminoacyl-tRNA:
Note that the first reaction should have a free energy of
about zero since we are breaking and forming acid anhydride bonds,
and thus the reaction is driven by the subsequent hydrolysis of
The second reaction is then driven to completion because the
"activated" amino acid acid anhydride bond is broken
and replaced with the relatively low energy ester bond.
Remarkably aminoacyl tRNA synthetases do not appear
to be closely related to one-another (they have different sequences, and different folds!) - apparently they are so ancient
they started independently. They exhibit a variety of quaternary
structural patterns , 2, 4, and 22, with
between 334 1000 amino acid residues. As another indicator
of the great age of these proteins, the aminoacyl tRNA synthetases
for the same amino acids are similar in evolutionarily diverse
organisms, but the aminoacyl tRNA synthetases for different amino
acids in the same organisms are generally dissimilar.
In the case of tyrosyl-tRNA synthetase the catalysis appears
to operate strictly via transition state and proximity/orientation
catalysis - there is no classical chemical catalysis (acid/base,
covalent, etc.) apparent.
Most aminoacyl tRNA synthetase-tRNA contact sites are on the
inner face of the 'L', but otherwise show no regularity. Some
seem to recognize only the acceptor region, others the anticodon,
etc. (see figures 27-22 in your text).
Finally, aminoacyl tRNA synthetases exhibit remarkable specificity
by the use of editing in addition to substrate binding. For example,
for isoleucyl tRNA synthetase:
- expect a differentiation between ilu and val of about 100
fold due to binding,
- in actuality see about a 50,000 fold discrimination in favor
- Explanation is twofold for general case:
- For amino acids larger than ilu, the size will prevent them
from entering the active site, so high discrimination.
- For amino acids like valine which are smaller, they can enter
the active site and be attached to the tRNA, however a second,
hydrolytic site is present on the enzyme which will allow the
valyl residue to enter, but not isolucyl, thus valyl-tRNA is
broken down, but isoleucyl-tRNA is released intact.
Wobble and Code Degeneracy
Even though there are isoaccepting tRNAs (different tRNAs specific
for the same amino acid), it turns out that many tRNAs bind to
a number of different codons specifying the same amino acid!
This observation is explained by the "wobble hypothesis"
of Francis Crick. According to this model,
- the first two codon-anticodon pairings follow standard Watson-Crick
rules (i.e AT and GC etc.),
- however, the third base pair can exhibit a certain amount
of play or wobble, enabling a varieity of specific non-Watson-Crick
The various pairing possibilities allowed by wobble are shown
in the table below (Table 27-4 in text):
(third anticodon/codon positions)
||A or G
||U or C
||U, C, or A
- No cytosolic tRNAs are known which participate in non-wobble
- No cytosolic tRNAs are known with A in the 3rd anticodon
position. Apparently the U-A pair is not permitted.
The wobble hypothesis requires at least 31 tRNAs to translate
all 61 coding triplets plus one for special initiation tRNA. Most
cells have >32. All isoaccepting tRNAs in a cell have the same
aminonoacyl tRNA synthetase.
Note that the most frequently used codons (those specifying
the most frequently used amino acids) are complementary to the
most abundant tRNA species.
- Some mitochondrial tRNAs have more permissive wobble pairings.
Some also have unusual structures.
- Selenocysteine is specified by UGA, apparently with the aid
of context (local sequence) effects.
- Read about non-sense suppression (substitute an amino
acid for a stop). Due to tRNA mutation.
- Generally mutated tRNAs are minor members of isoaccepting
sets, and protein stop factors compete, so most proteins are
Last modified 9 April 2009