From Wikipedia,
the free encyclopedia.
Protein structure prediction
is one of the most significant
tasks tackled in
computational structural biology
and
theoretical chemistry. It has
the aim of determining the
three-dimensional structure of
proteins from their
amino acid sequences. In more
formal terms, this is the
prediction of protein
tertiary structure from
primary structure. Given the
usefulness of known protein
structures in such valuable tasks
as
rational drug design, this is
a highly active field of research.
Every two years, the
performance of current methods is
assessed in the
CASP experiment.
Overview
The practical role of protein
structure prediction is now more
important than ever. Massive
amounts of protein sequence data
may be derived from modern
large-scale
DNA sequencing efforts of, for
example, the
Human Genome Project. The
output of experimentally
determined protein structures,
typically by time-consuming and
relatively expensive
X-ray crystallography or
NMR spectroscopy, is lagging
far behind the output of protein
sequences.
A number of factors exist that
make protein structure prediction
a very difficult task, including:
- The number of possible
structures that proteins may
possess is extremely large, as
highlighted by the
Levinthal paradox.
- The physical basis of
protein structural stability is
not fully understood.
- The primary sequence may not
fully specify the tertiary
structure. For example, proteins
known as
chaperones have the
ability to induce proteins to
fold in specific ways.
- Direct simulation of
protein folding via methods
such as
molecular dynamics is not
generally tractable for both
practical and theoretical
reasons. However, the
distributed computing project,
Folding@home, is tackling
such simulation difficulties.
Despite the above hinderances,
much progress is being made by the
many research groups that are
interested in the task. Prediction
of structures for small proteins
is now a perfectly realistic goal.
A wide range of approaches are
routinely applied for such
predictions. These approaches may
be classified into two broad
classes;
de novo modelling and
comparative modelling.
For a fictional account of the
kind of progress that might be
realized in molecular biology if
the protein folding problem is
solved, see the novel
Ninth Day of Creation,
published in
2000. In the book, a genuine
protein fold predictive algorithm
is supposedly realized and put to
practice in an application known
as BLINDFOLD.
De novo protein
modelling
De novo- or
ab initio- protein
modelling methods seek to build
three-dimensional protein models
"from scratch". There are many
possible procedures that either
attempt to mimic
protein folding or apply some
stochastic method to search
possible solutions (i.e.
global optimization of a
suitable energy function). These
procedures tend to require vast
computational resources, and have
thus only been carried out for
tiny proteins. To attempt to
predict protein structure de novo
for larger proteins, we will need
better algorithms and larger
computational resources like those
afforded by either powerful
supercomputers (such as
Blue Gene) or distributed
computing (see
Human Proteome Folding Project).
Although these computational
barriers are vast the potential
benefits of structural genomics
(by predicted or experimental
methods) make de novo structure
prediction an active research
field.
Comparative protein modelling
Comparative protein modelling
uses previously solved structures
as starting points, or templates.
This is effective because it
appears that although the number
of actual proteins is vast, there
is a limited set of
tertiary
structural motifs to which
most proteins belong. It has been
suggested that there are only
around 2000 distinct protein folds
in nature, though there are many
millions of different proteins.
These methods may also be split
into two groups:
- Homology modelling is
based on the reasonable
assumption that two
homologous proteins will
share very similar structures.
Given the amino acid sequence of
a unknown structure and the
solved structure of a homologous
protein, each amino acid in the
solved structure is
mutated, computationally,
into the corresponding amino
acid from the unknown structure.
- Protein threading
scans the amino acid sequence of
an unknown structure against a
database of solved structures.
In each case, a scoring function
is used to assess the
compatibility of the sequence to
the structure, thus yielding
possible three-dimensional
models.