"The price of greatness is responsibility." Sir Winston Churchill


Search the IBPA



Top Menu

Menu Sidebar

IBPA Issues
About IBPA
IBPA Constitution
FAQ-s
IBPA Events
Individual Membership
Institutional Membership
IBPA Forums / Groups
Cooperation with IBPA
Links

Publications
IBPA Careers Newsletter
Past Issues
Industry Publications
Promote Yourself within the Industry
Submit Your Article

Career Center: Employers
Job Posting
Free Resume Database
Volunteers Database

Career Center: Job Seekers
Now Hiring
Submit Resume
Career Training
Nurses Careers in Biopharm
Scholarship Programs
Internship Programs
Resume Editing & Interview Coaching
Volunteer for the Industry
Download IBPA Career Info Brochure

Industry Directories and Listings
Pharmaceutical Companies
Contract Research Organizations
Professional Associations
Recruiters and Staffing Agencies
Clinical Research Centers
Consulting Companies
Education & Training Institutions
Jobs and Resume Searching Directories
Research and Development Companies
Industry Service Providers
List Your Company

Investor's Center
Offers
Calls

Contact IBPA
USAChapter
Canadian Chapter
European Chapter
Asian Chapter

Start Your Career in Biotech with IBPA Scholarship Programs
Untitled Document



Subscribe to our "Careers in the Biopharmaceutical Industry" newsletter:

Name*:

Email*:

City:

Country:

Phone:

To unsubscribe, click here

 

 

Sequence alignment

From Wikipedia, the free encyclopedia.

 

Sequence alignment is an arrangement of two or more sequences, highlighting their similarity. The sequences are padded with gaps (usually denoted by dashes) so that wherever possible, columns contain identical or similar characters from the sequences involved:

 tcctctgcctctgccatcat---caaccccaaagt

 |||| ||| ||||| |||||   ||||||||||||

 tcctgtgcatctgcaatcatgggcaaccccaaagt

It is usually used to study the evolution of the sequences from a common ancestor, especially biological sequences such as protein sequences or DNA sequences. Mismatches in the alignment correspond to mutations, and gaps correspond to insertions or deletions. Sequence alignment can also be used to study things like the evolution of languages and the similarity between texts.

The term sequence alignment may also refer to the process of constructing such alignment or finding significant alignments in a database of potentially unrelated sequences.

Contents

[hide]

[edit]

 

Pairwise alignment

Pairwise sequence alignment methods are concerned with finding the best-matching piecewise (local) or global alignments of protein (amino acid) or DNA (nucleic acid) sequences.

Typically, the purpose of this is to find homologues (relatives) of a gene or gene-product in a database of known examples. This information is useful for answering a variety of biological questions. The most important application of pairwise alignment is identification of sequences of unknown structure or function. Another important use is the study of molecular evolution.

[edit]

 

Global alignment

A global alignment between two sequences is an alignment in which all the characters in both sequences participate in the alignment. Global alignments are useful mostly for finding closely-related sequences. As these sequences are also easily identified by local alignment methods global alignment is now somewhat deprecated as a technique. Further, there are several complications to molecular evolution (such as domain shuffling - see below) which prevent these methods from being useful.

[edit]

 

Local alignment

Local alignment methods find related regions within sequences - in other words they can consist of a subset of the characters within each sequence. For example, positions 20-40 of sequence A might be aligned with positions 50-70 of sequence B.

This is a more flexible technique than global alignment and has the advantage that related regions which appear in a different order in the two proteins (which is known as domain shuffling) can be identified as being related. This is not possible with global alignment methods.

[edit]

 

Significance of alignment

The typical assumption in the use of alignment is the mechanism of molecular evolution. DNA carries over genetic material from generation to generation, by virtue of its semi-conservative duplication mechanism. Changes in the material are introduced by occasional errors and mutations in the duplication, and by viruses and other mechanisms which sometimes move sub-sequences within the chromosome and between individuals. Consequently, an alignment between sequences indicates that the sequences evolved from a common ancestor which contained the matching subsequences. In the case of genetic sequences, it implies that their carriers evolved from a common ancestor.

Using assumptions about the probabilities of these change events, we can estimate the time when sequences diverged from a common ancestor or the time required for changing one sequence into another. However, there is disagreement about the value and nature of these probabilities for biological evolution. One school of thought assumes a simple, constant rate of change (an application of Occam's Razor) while the other school assumes short evolutionary periods when changes were extremely high.

Many genetic changes are evolutionarily non-significant (that is, neutral, neither positive or negative) in that they do not affect the selective advantage (the fitness) of the carrier. This is the basis of Kimura's Neutral theory. This is partly supported by examples such as the Globin protein sequence which shows normal functionality in-spite of several changes at various bases but loses functionality only when changes occur at critical sequence locations.

The actual biological meaning of any alignment can never be absolutely guaranteed. However, statistical methods can be used to assess the likelihood of finding an alignment between two regions (or sequences) by chance, given the size of the database and its composition.

Two important related issues for sequence alignment are:

  1. How should we choose the best alignment between two sequences (or regions of sequences)?
  2. How do we rank the alignments between a query and the many sequences in the database, according to their significance in the domain (such as biological significance)?

In studying evolution, the first question can be addressed by developing a model of how likely are certain changes between characters in the sequence. There are many ways to do this, none of which is generally superior. The models are derived empirically using related sequences, and are expressed as substitution matrices. These matrices are used by the algorithms named below to evaluate alternative alignments between two sequences. The actual biological quality of the alignments then depends upon the evolutionary model used to generate the score.

The second question is purely statistical. Research has determined a few hard theoretical rules and many approximations. It is now generally accepted that the scores of alignments between random sequences follow the extreme value distribution. Pairwise alignment programs such as BLAST use simulation to estimate the parameters of this distribution given a particular query, database, substitution matrix and certain other parameters. Alignments can then be given a statistical significance value, allowing inference of possible relationships between sequences.

[edit]

 

Structural alignment

One potential way to validate sequence alignments is to use structural alignments where possible. Because protein structure is more conserved through evolution than is sequence (as shown by Cyrus Chothia and Arthur Lesk in 1986), structural alignments can be more reliable over long evolutionary distances, when the sequences have diverged so much that simple sequence comparison cannot detect their similarity. However, although structural alignments are useful, they are completely artifactual.

[edit]

 

Multiple alignment

Multiple alignment is an extension of pairwise alignment to incorporate more than two sequences into an alignment. Multiple alignment methods try to align all of the sequences in a specified set. Alignments help in the identification of common regions between the sequences. There are several approaches to creating multiple sequence alignments, one of the most popular being the progressive alignment strategy used by the Clustal family of programs. Clustal is used in cladistics to build phylogenetic trees, and to build sequence profiles which are used by PSI-BLAST and Hidden Markov model- (HMM-) methods to search sequence databases for more distant relatives.

Multiple sequence alignment is computationally difficult and is classified as an NP-Hard problem.

[edit]

 

Algorithms

[edit]

 

Needleman-Wunsch

Pairwise global alignment only.

[edit]

 

Smith-Waterman

Pairwise, local or global alignment. It works by generating all six possible translations of each nucleotide sequence, then uses those translations for protein search.

[edit]

 

Framesearch

This is an extension of Smith-Waterman, for pairwise alignment between a protein sequence and a nucleotide sequence.

Framesearch dynamically considers every possible single-nucleotide insertion or deletion (indel) to generate the translation that best matches the protein sequence. This is unlike six-frame-translated BLAST or Smith-Waterman, which generate all six possible translations of each nucleotide sequence and use them for protein search. This allows dramatically better alignments when the nucleotide sequence contains many indels, as is often the case with single-read expressed sequence tag sequences and very early versions of genomic sequences.

Unfortunately Framesearch is extremely CPU-intensive. Its practical use typically requires a large Computer cluster or specialized computer hardware optimized for running dynamic programming algorithms on a large scale. Several companies sell such systems, which use either custom integrated circuits or Field-programmable gate array technology.

[edit]

 

Software

[edit]

 

SSearch

Implements the standard Smith-Waterman algorithm. It is considerably slower than the more modern BLAST and FASTA methods. However, Smith-Waterman remains the gold standard for protein-protein or nucleotide-nucleotide pairwise alignment; its speed can be improved by using specialized hardware or a computer cluster.

[edit]

 

BLAST (Basic Local Alignment Search Tool)

Main article: BLAST

This method uses a pre-computed hash table to serve as an index for short sequences. Given a query sequence, the sub-sequences are looked up in the index to reduce the amount of time and searching involved. Several parameters need to be provided to make this method faster or more accurate. Once patterns that match the search sequence are found, more accurate and intensive algorithms may be applied.

BLAST uses a pairwise local search and uses a number of methods to increase the speed of the original Smith-Waterman algorithm.

[edit]

 

FASTA

Main article: FASTA

Pairwise local search. Superseded by BLAST.

[edit]

 

Clustal

Main article: Clustal

Progressive multiple alignment method. Comes in several varieties (ClustalW, ClustalX etc.)

[edit]


External links




Learn More About the Biopharmaceutical Industry and Clinical Research:


Category:

Logo sidebar
  • Analytical Chemistry
  • Bioinformatics
  • Biology
  • Biochemistry
  • Biotechnology
  • Biotechnology Companies
  • Cell Imaging
  • Chemistry
  • Chemists
  • Crystallography
  • Ecology
  • Environmentalism
  • Genetic Engineering
  • Genetically Modified Organisms
  • Genetics
  • Health
  • Health Care
  • Health Sciences
  • Medical Specialities
  • Medicine
  • Molecular Genetics
  • Pharmaceutical Industry
  • Pharmacy
  • Pharmacology

  • Powered by Wikipedia, the free encyclopedia. Articles were developed by IBPA volunteers.

    Logo sidebar

    A

    B

    C

    D

    E

    F

    G

    I

    K

    L

    M

    N

    P

    Q

    R

    S

    T


    Logo sidebar


    IBPA Sponsors and Active Supporters

    http://www.payoneer.com/
    Access Clinical Trials

    Access Clinical Trials
    Access Clinical Trials


    Allied Research International
    Allied Research International

    Altaspera Global Services Inc.
    Altaspera Global Services

    Financial Planning and Personal Insurance
    For Canadian Pharmaceutical Industry Executives


    Biorole Scientific Solutions
    Biorole Scientific Solutions

    CEREPROTEC INC. Development of Novel Neuroprotective Drugs
    CEREPROTEC INC. Development of Novel Neuroprotective Drugs

    Recruitment Advertising Agencies
    Recruitment Advertising Agencies

    Cellular Technology Ltd.
    Cellular Technology Ltd.

    Clinical Trial Network
    Free Database of Clinical Investigators

    ClinQua Clinical Trials Inc.
    ClinQua Clinical Trials Inc.

    Coronis Clinical Research Organization
    Coronis Clinical Research Organization

    CPIC Latin America
    CPIC Latin America

    Espoir Bridge Recruiters
    Espoir Bridge Recruiters

    Genentech
    Genentech

    ILS SA
    Independent Research and Laboratory Solutions

    Inova Health Research
    Inova Health Research, Inc.

    Kriger Research Group International
    Kriger Research Group International

    LCCT
    LCCT

    Metrics Research
    Complete Research Solutions on a Single Platform

    Pharmalef Developments
    Pharmalef Developments

    PrimeHealth Clinical Research Organization
    PrimeHealth Clinical Research Organization

    Research & Development RA SA
    Research & Development RA SA

    Scios Inc.
    Scios Inc. - Manufacturer of Health Care Products

    Scios Inc.
    Southeast Regional Research Group LLC.

    UniMR
    UniMR Clinical Research

    YM BioSciences
    YM BioSciences

    Become IBPA Sponsor
    Post Your Logo Here

    ©2004 International Biopharmaceutical Association Inc., all rights reserved
    Privacy Policy - Terms of Use

    Google