From Wikipedia,
the free encyclopedia.
In
molecular biology and
bioinformatics, a consensus
sequence is a way of
representing the results of a
multiple
sequence alignment, where
related sequences are compared to
each other, and similar functional
sequence motifs are found. The
consensus sequence shows which
residues are conserved (are always
the same), and which residues are
variable.
Developing software for
pattern recognition is a major
topic in
genetics,
molecular biology, and
bioinformatics. Specific
sequence motifs can function
as
regulatory sequences
controlling biosynthesis, or as
signal sequences that direct a
molecule to a specific site within
the cell or regulate its
maturation. Since the regulatory
function of these sequences is
important, they are thought to be
conserved across long periods of
evolution. In some cases,
evolutionary relatedness can be
estimated by the amount of
conservation of these sites.
The conserved sequence motifs
are called consensus sequences
and they show which residues are
conserved and which residues are
variable. Consider the following
example
DNA sequence:
- A[CT]N{A}
In this notation, A means that
always an A is found in that
position. [CT] stands for either C
or T, N stands for any base, and
{A} means any base except A.
In this example, the notation
[CT] does not give any indication
of the relative frequency of C or
T occurring at that position. An
alternative method of representing
a consensus sequence uses a
sequence logo. This is a
graphical representation of the
consensus sequence, in which the
size of a symbol is related to the
frequency that a given nucleotide
(or amino acid) occurs at a
certain position. In sequence
logo's the more conserved the
residue, the larger the symbol for
that residue is drawn, the less
frequent, the smaller the symbol.
Sequence logos can be generated
using the
Gestalt Workbench, a
publically available visualization
tool written by Gustavo Glusman at
the
Institute for Systems Biology.
A consensus sequence may be a
short sequence of
nucleotides which is found
several times in the
genome and is thought to play
the same role in its different
locations.
For example, many
transcription factors
recognise particular consensus
sequences in the promoters of the
genes they regulate. In the same
way
restriction enzymes usually
have
palindromic consensus
sequences, usually corresponding
to the site where they cut the
DNA. Finally
splice sites (sequences
immediately surrounding the
exon-intron
boundaries) can also be considered
as consensus sequences.
Thus a consensus sequence
defines a putative
DNA recognition site: it is
obtained by aligning all known
examples of a certain recognition
site and defined as the idealized
sequence that represents the
predominant base at each position.
All the actual examples shouldn't
differ from the consensus by more
than a few substitutions.
See also