From Wikipedia,
the free encyclopedia.
The Protein Data Bank (PDB)
is a repository for 3-D structural
data of
proteins and
nucleic acids. This data,
typically obtained by
X-ray crystallography or
NMR spectroscopy, is submitted
by
biologists and
biochemists from around the
world, is released into the
public domain, and can be
accessed for free. The database is
the central repository for
biological structural data.
History
Founded in
1971 by
Brookhaven National Laboratory,
the Protein Data Bank was
transferred in
1998 to the Research
Collaboratory for Structural
Bioinformatics (RCSB), which is
composed of
Rutgers University, the
University of Wisconsin, Madison,
NIST and the
San Diego Supercomputer Center.
Funding comes from the
National Science Foundation,
Department of Energy,
National Library of Medicine
and the
National Institute of General
Medical Sciences. The
European Bioinformatics Institute
in the
UK and the
Institute for Protein Research
in
Japan also collect, process
and submit data files.
The PDB is a key resource in
structural biology and is
critical to more recent work in
structural genomics.
Countless derived databases and
projects have been developed to
integrate and classify the PDB
interms of
protein structure,
protein function and
protein evolution.
Growth
When the PDB was originally
founded it contained just 7
protein structures. Since then it
has undergone an approximate
exponential growth in the number
of structures, which does not show
any sign of falling off.
The growth rate of the PDB has
been the subject of fairly
extensive analysis.
Contents
As of
1 October
2004, the database contained
27,428 released atomic coordinate
entries (or "structures") and took
in about 2,000-3,000 new ones per
year. Data are stored in the
mmCIF format specifically
developed for the purpose.
Note that the database stores
information about the exact
location of all
atoms in a large biomolecule;
if one is only interested in
sequence data, i.e. the list
of
amino acids making up a
particular
protein or the list of
nucleotides making up a
particular
nucleic acid, the much larger
databases from
Swiss-Prot and the
International Nucleotide Sequence
Database Collaboration should
be used.
Statistics
As of
22 February
2005, the "PDB Holdings List"
at
RCSB reported the following
statistics:
Through the years the PDB has
undergone many, many changes and
revisions. Its original format was
dictated by the width of computer
punch cards.
This legacy format has caused
many problems with the format, and
consequently the PDB has three
distinct 'clean-up' projects;
Each of these grant-funded
projects has attempted to achieve
the same goal via different
routes. The Data Uniformity
Project is hosted by the RCSB (the
current home of the PDB). Each
uses the original PDB data to
derive a new format; The MMDB uses
ASN.1 (and an XML conversion of
this format); The MSD uses a
Relational Database; The Data
Uniformity Project uses mmCIF (and
another XML conversion of this
format).
Some people would say that this
is a
Good Thing; others would argue
that, without a universal
repository of information (i.e., a
common dictionary), how can we
talk about the same thing.
Each structure published in PDB
receives a four-character
alphanumeric identifier, its PDB
ID. This should not be used as an
identifier for biomolecules, since
often several structures for the
same molecule (in different
environments or conformations) are
contained in PDB with different
PDB IDs.
If a biologist submits
structure data for a protein or
nucleic acid, PDB staff reviews
and annotates it. The data are
then automatically checked for
plausibility. The
source code for this
validation software has been
released for free. The main data
base accepts only experimentally
derived structures, and not
theoretically predicted ones (see
protein structure prediction).
Various funding agencies and
scientific journals now require
scientists to submit their
structure data to PDB.
Viewing the data
The structural data can be used
to visualize the
biomolecules with appropriate
software, such as
rasmol,
chime or a
web browser
VRML plugin. The PDB website
also contains resources for
education, structural genomics,
and related software.
References
Printed
- Bernstein FC, Koetzle TF,
Williams GJ, Meyer Jr EF, Brice
MD, Rodgers JR, Kennard O,
Shimanouchi T, Tasumi M. The
Protein Data Bank: a
computer-based archival file for
macromolecular structures. J
Mol Biol 1977;112:535-542.
PMID 875032.
Online