From Wikipedia,
the free encyclopedia.
The genetic code is a
set of rules, which
maps
DNA sequences to
proteins in the living
cell, and is employed in the
process of
protein synthesis. Nearly all
living things use the same genetic
code, called the standard
genetic code, although a few
organisms use minor variations of
the standard code.
Genome expression
The genetic information carried
by an organism - its
genome - is inscribed in one
or more
DNA molecules. Each functional
portion of a DNA molecule is
referred to as a
gene. Each gene is
transcribed into a short
template molecule of the related
polymer
RNA, which is better suited
for protein synthesis. This in
turn is
translated, by mediation of a
machinery consisting of
ribosomes and a set of
transfer RNAs and associated
enzymes, into an
amino acid chain (polypeptide),
which will then be folded into a
protein.
The gene sequence inscribed in
DNA, and in RNA, is composed of
tri-nucleotide units called
codons, each coding for a
single amino acid. Each nucleotide
sub-unit consists of a
phosphate,
deoxyribose sugar and one of
the 4 nitrogenous
nucleotide bases grouped into
2 categories,
purine and
pyrimidine. The
purine bases
adenine (A) and
guanine (G) are larger
and consist of two aromatic rings
. The
pyrimidine bases
cytosine (C) and
thymine (T) are smaller
and only consist of one aromatic
ring. In RNA however, thymine (T)
is substituted by
uracil (U) and the
deoxyribose is substituted by
ribose.
Overall, there are 43
= 64 different codon combinations.
For example, the RNA sequence
UUUAAACCC contains the codons UUU,
AAA and CCC, each of which
specifies one amino acid. So, this
RNA sequence represents a protein
sequence, three amino acids long.
(DNA
is also a sequence of nucleotide
bases, but there
thymine takes the place of
uracil.)
The standard genetic code is
shown in the following tables.
Table 1 shows what amino acid
each of the 64 codons specifies.
Table 2 shows what codons
specify each of the 20 standard
amino acids involved in
translation. These are called
forward and reverse codon tables,
respectively. For example, the
codon AAU represents the amino
acid
asparagine (Asn), and
cysteine (Cys) is represented
by UGU and by UGC.
Table 1: RNA Codon table
1The codon AUG both
codes for methionine and serves as
an initiation site: the first AUG
in an
mRNA's coding region is where
translation into protein begins.
2This is a start codon
for
prokaryotes only.
Table 2: Reverse codon table
This table shows the 20
amino acids used in proteins,
and the codons that code for
each amino acid.
| Ala |
A |
GCU, GCC, GCA,
GCG |
Leu |
L |
UUA, UUG, CUU,
CUC, CUA, CUG |
| Arg |
R |
CGU, CGC, CGA,
CGG, AGA, AGG |
Lys |
K |
AAA, AAG |
| Asn |
N |
AAU, AAC |
Met |
M |
AUG |
| Asp |
D |
GAU, GAC |
Phe |
F |
UUU, UUC |
| Cys |
C |
UGU, UGC |
Pro |
P |
CCU, CCC, CCA,
CCG |
| Gln |
Q |
CAA, CAG |
Ser |
S |
UCU, UCC, UCA,
UCG, AGU,AGC |
| Glu |
E |
GAA, GAG |
Thr |
T |
ACU, ACC, ACA,
ACG |
| Gly |
G |
GGU, GGC, GGA,
GGG |
Trp |
W |
UGG |
| His |
H |
CAU, CAC |
Tyr |
Y |
UAU, UAC |
| Ile |
I |
AUU, AUC, AUA |
Val |
V |
GUU, GUC, GUA,
GUG |
| Start |
|
AUG, GUG |
Stop |
|
UAG, UGA, UAA |
Marshall W. Nirenberg and
Heinrich J. Matthaei at the
National Institutes of Health
performed the experiments which
first elucidated the
correspondence between the codons
and the amino acids for which they
code.
Har Gobind Khorana expanded on
Nirenberg's work and found the
codes for the amino acids that
Nirenberg's methods could not.
Khorana and Nirenberg won a share
of the 1968
Nobel Prize in Physiology or
Medicine for this work.
Technical details
Stop Codons
In classical genetics, the stop
codons were given names: UAG was
amber, UGA was opal,
and UAA was ochre. These
names were originally the names of
the specific genes in which
mutation of each of these stop
codons was first detected.
Translation starts with a chain
initiation codon (start codon).
But unlike stop codons, these are
not sufficient to begin the
process; nearby initiation
sequences are also required to
induce transcription into
mRNA and binding by
ribosomes. The most notable
start codon is AUG, which also
codes for methionine. CUG and UUG,
and in
prokaryotes GUG and AUU, also
work.
Degeneracy of the genetic code
Many codons are degenerate
or redundant, meaning that
two or more codons may code for
the same amino acid. Degenerate
codons typically differ in their
third positions; e.g. both GAA and
GAG code for the amino acid
glutamic acid. A codon is said
to be four-fold degenerate
if any nucleotide at its third
position specifies the same amino
acid; it is said to be two-fold
degenerate if only two of four
possible nucleotides at its third
position specify the same amino
acid. In two-fold degenerate
codons, the equivalent third
position nucleotides are always
either two
purines (A/G) or two
pyrimidines (C/T). The
degeneracy of the genetic code is
what accounts for the existence of
silent mutations.
Degeneracy is required in order
to produce enough different codons
to code for 20 amino acids and a
stop and start codon (at least 22
codons required). Because there
are four different bases, triplet
codons are the minimum number
required to produce at least 22
different codes. For example if
there were two bases per codon
then only 16 amino acids could be
coded for (4²=16). Because at
least 22 codes are required, then
4³ gives 64, which is the number
of possible codons.
These properties of the genetic
code make it more fault-tolerant
for
point mutations. For example,
four-fold degenerate codons can
tolerate any point mutation at the
third position; two-fold
degenerate codons can tolerate one
out of the three possible point
mutations at the third position.
Since transition mutations (purine
to purine or pyrimidine to
pyrimidine mutations) are more
likely than transversion (purine
to pyrimidine or vice-versa)
mutations, the equivalence of
purines or that of pyrimidines at
two-fold degenerate sites adds a
further fault-tolerance.
A practical consequence of
redundancy is that some errors in
the genetic code only cause either
a silent mutation or an error that
would not affect the amino acid's
hydrophilic/hydrophobic
property, eg. a codon of XUX
(where X = any nucleotide) tends
to code for hydrophobic amino
acids. Even so, it is a single
point mutation which causes a
modified haemoglobin molecule in
sickle-cell disease. The
hydrophilic
glutamate (Glu) is substituted
by the hydrophobic
valine (Val) which reduces the
solubility of ß-globin. This
causes haemoglobin to form linear
polymers linked by the hydrophobic
interaction between the valine
groups causing sickle cell
deformation of erythrocytes.
Sickle cell disease is generally
not caused by a
de novo mutation. Rather
it is selected for in
malarial regions (in a similar
way to
thalassemia) as
heterozygous people have some
resistance to the malarial
Plasmodium parasite (heterozygote
advantage).
In general, these properties
are widely interpreted to form
part of the reason for the origin
of the standard genetic code [see
below].
These variable codes for amino
acids are possible because of
modified bases in the first base
of the
anticodon, and the basepair
formed is called a
wobble base pair. The modified
bases include
inosine and the U-G basepair.
Only two amino acids are
specified by a single codon; one
of these is the amino-acid
methionine, specified by the
codon AUG, which also specifies
the start of transcription; the
other is
tryptophan, specified by the
codon UGG.
Phase or reading frame of a
sequence
Note that a "codon" is entirely
defined by your starting position.
For example, the string GGGAAACCC,
if read from the first position,
contains the codons GGG, AAA and
CCC. If read from the second
position, it contains the codons
GGA and AAC (partial codons being
ignored). If read starting from
the third position, GAA and ACC.
Every DNA sequence can thus be
read in three reading frames,
each of which will produce a
radically different amino acid
sequence (in our example,
Gly-Lys-Pro, Gly-Asp, and Glu-Thr,
respectively). The actual frame a
protein sequence is translated in
is defined by a start codon,
usually the first occurrence of
AUG in the RNA sequence. Mutations
that disrupt the reading frame
(i.e. insertions or deletions of
one or two nucleotide bases)
severely impair the function of a
protein and are thus exceedingly
rare in
in vivo protein-coding
sequences, since they often lead
to death before an organism is
viable.
Origin of the genetic code
Numerous variations of the
standard genetic code are found in
mitochondria, which are
energy-producing
organelles.
Ciliate
protozoa also have some
variation in the genetic code: UAG
and often UAA code for Glutamine
(a variant also found in some
green algae), or UGA codes for
Cysteine. Another variant is found
in some species of the
yeast
candida, where CUG codes for
Serine. In some species of
bacteria and
archaea, a few non-standard
amino acids are substituted for
standard stop codons; UGA can code
for
selenocysteine and UAG can
code for
pyrrolysine. There may be
other non-standard interpretations
that are not known.
Despite these variations, the
genetic codes used by all known
forms of life on Earth are very
similar. Since there are many
possible genetic codes that are
thought to have similar utility to
the one used by Earth life, the
theory of
evolution suggests that the
genetic code was established very
early in the history of life.
One can ask the question: is
the genetic code completely
random, just one set of codon-amino
acid correspondences that happened
to establish itself and be "frozen
in" early in evolution, although
functionally any other of
the near-infinite set of possible
transcription tables would have
done just as well? Already a
cursory look at the table shows
patterns that suggest that this is
not the case.
There are three themes running
through the many theories that
seek to explain the evolution of
the genetic code (and hence the
origin of these patterns)1.
One is illustrated by recent
aptamer experiments which show
that some amino acids have a
selective chemical affinity for
the base triplets that code for
them.2
This suggests that the current,
complex transcription mechanism
involving
tRNA and associated enzymes
may be a later development, and
that originally, protein sequences
were directly templated on base
sequences. Another is that the
standard genetic code that we see
today grew from a simpler, earlier
code through a process of
"biosynthetic expansion". Here the
idea is that primordial life
'invented' new amino acids (e.g.
as by-products of metabolism) and
later back-incorporated some of
these into the machinery of
genetic coding. Although much
circumstantial evidence has been
found to indicate that originally
the number of different amino
acids used may have been
considerably smaller than today3,
precise and detailed hypotheses
about exactly which amino acids
entered the code in exactly what
order has proved far more
controversial45.
A third is that
natural selection organized
the codon assignments of the
genetic code to minimize the
effects of genetic errors (mutations)6.
Other Resources
There are several books
available online that go into
great detail on this topic. They
are available through the
NCBI Bookshelf, maintained by
the
United States National Institutes
of Health. In particular the
following books would be useful to
consult:
- Griffiths, Anthony J.F.;
Miller, Jeffrey H.; Suzuki,
David T.; Lewontin, Richard C.;
Gelbart, William M. (1999).
Introduction to Genetic
Analysis (7th ed.). New
York: W. H. Freeman & Co.
ISBN 0-7167-3771-X
- Alberts, Bruce; Johnson,
Alexander; Lewis, Julian; Raff,
Martin; Roberts, Keith; Walter,
Peter. (2002).
Molecular Biology of the Cell
(4th ed.). New York: Garland
Publishing.
ISBN 0815332181
- Lodish, Harvey; Berk,
Arnold; Zipursky, S. Lawrence;
Matsudaira, Paul; Baltimore,
David; Darnell, James E. (1999).
Molecular Cell Biology
(4th ed.). New York: W. H.
Freeman & Co.
ISBN 0-7167-3706-X
There is also a themed wiki
devoted to the topic of how the
genetic code evolved, and its
effects on the subsequent
evolution of the genome:
References
- Note
1: e.g. see Knight,
R.D.; Freeland S. J. and
Landweber, L.F. (1999)
The 3 Faces of the Genetic Code.
Trends in the Biochemical
Sciences 24(6),
241-247.
- Note
2: Knight, R.D. and
Landweber, L.F. (1998).
Rhyme or reason: RNA-arginine
interactions and the genetic
code. Chemistry & Biology
5(9), R215-R220.
PDF version of manuscript
- Note
3: Brooks, Dawn J.;
Fresco, Jacques R.; Lesk, Arthur
M.; and Singh, Mona. (2002).
Evolution of Amino Acid
Frequencies in Proteins Over
Deep Time: Inferred Order of
Introduction of Amino Acids into
the Genetic Code.
Molecular Biology and Evolution
19, 1645-1655.
- Note
4: Amirnovin R.
(1997)
An analysis of the metabolic
theory of the origin of the
genetic code. Journal of
Molecular Evolution 44(5),
473-6.
- Note
5: Ronneberg T.A.;
Landweber L.F. and Freeland S.J.
(2000)
Testing a biosynthetic theory of
the genetic code: Fact or
artifact? Proceedings of
the National Academy of
Sciences, USA 97(25),
13690-13695.
- Note
6: e.g. see review by
Freeland S.J.; Wu T. and
Keulmann N. (2003)
The Case for an Error Minimizing
Genetic Code. Orig Life
Evol Biosph. 33(4-5),
457-77.
See also