Introduction
LatFit calculates a low deviation on-lattice model of a given full atom
protein structure in Protein Data Base (PDB) format. It utilizes a greedy distance
or coordinate RMSD optimizating approach while successively fitting the structures
monomers on the lattice. It supports backbone-only and sidechain-including models
within various lattices.
Beneath final deviations and the resulting lattice protein model coordinate
data an absolute move string representation is generated.
LatFit follows a chain elongation procedure similar to that described by Park and Levitt (JMB,1995).
Residues are placed on the lattice sequentially starting from the amino terminus, for each residue placement the current best lattice fit is iteratively extended.
Each extension is evaluated via RMSD, and the best fit further extended until full chain length is achieved.
The coordinates fitted are user defined.
By default LatFit takes the C_alpha atom coordinates for the backbone and the centre of mass of non-hydrogen side chain atoms as the side chain.
A depiction of the workflow is given below
When using LatFit please cite :
- Martin Mann, Rhodri Saunders, Cameron Smith, Rolf Backofen, Charlotte M. Deane
Producing high-accuracy lattice models from protein atomic co-ordinates including side chains
Advances in Bioinformatics, Article ID 148045, 6, 2012
- Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen
CPSP-web-tools: a server for 3D lattice protein studies
Bioinformatics, 25 (5), 676-677, 2009
Results are computed with LatFit version 1.9.0
Overview
The following parameters are used to control the execution of LatFit
Furthermore, additional information is available
PDB data
ID of PDB to download and fit
The PDB identifier for the protein of interest form the Protein
Database. It will be automatically downloaded for the fitting.
The parameter constraints are: Has to be a PDB id of the format DXXX, where D is a number [1-9] and X either a number or a letter. The ID has to be known within the PDB data base.
Defaults to ()
Atom to Fit
Gives the PDB atom identifier for that coordinates should be extracted and a
lattice protein model should be derived. The special string "CoM" denotes that
the centroid of the amino acids sidechain atoms should be calculated and fitted.
Default values are:
"CA" (C_alpha-atom) for backbone-only models
"CoM" (sidechain's centroid) for models including sidechains
NOTE: In case models including sidechains are fitted, the given atom string denotes
the position of the sidechain monomers to fit. The backbone monomers are fitted onto
the C_alpha atom positions.
The parameter constraints are: String length has to be in range (1,3). Has to be an identifier using letters or numbers. Has to be not 'CA' if a model with side chains is to fit.
Defaults to (CA)
Alternative Atom Identifier
In case the atom identifier is not found within the PDB file, this
identifier is used to extract the atom positions to fit.
The parameter constraints are: String length has to be in range (0,3). Has to be an identifier using letters or numbers.
Defaults to ()
Chain identifier
Specifies which protein chain within the PDB file is to be handled.
The default is chain "A". If no chain identifier is given within the
PDB file please use "_" instead of a white space character.
The parameter constraints are: Has to be a single letter or number.
Defaults to (A)
Model identifier
Some PDB files contain several models of the same protein.
This parameter allows for the specification what model to fit.
The parameter constraints are: Has to be a single letter or number.
Defaults to (1)
Lattice Model
Lattice Protein Type
Defines what type of lattice protein model to be fitted.
This could be a backbone-only or sidechain-including model.
For backbone-only models, each amino acid is represented by a single monomer.
This is usually done to
represent the backbone (C_alpha) trail of a protein chain.
Models including sidechains represent each amino acid with two monomers,
typically one representing the
C_alpha backbone position and one to represent the centroid of
the sidechain group of each amino acid.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to ( Backbone Only)
Lattice Type
The lattice type to use for optimal structure calculation.
Currently, LatFit supports the
-
SQR : 2D-square lattice
-
CUB : 3D-cubic lattice
-
FCC : 3D face centered cubic (FCC or 110) lattice
-
210 : 3D chess knights (210) lattice
CA-CA bond length
Since lattice proteins have a fixed distance between all connected monomers, a length of
the connections has to be specified. The user can specify the C_alpha-C_alpha distance,
which is usually fixed within proteins and thereby very well suited to scale the lattice
protein according to the provided protein's coordinate data. The default distance is set
to 3.8 Angstroems, the average distance between successive C_alpha atoms in proteins.
This distance is as well close to the mean distance between the C_alpha atom and the centroid
of an amino acids sidechain (about 3.6 Angstroems).
The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 1 and must be smaller than or equal to 10.
Defaults to (3.8)
Method
Optimization Mode
LatFit enables two heuristics to guide the greedy optimization method for the
creation of the lattice protein models. It either searchs for a model that minimizes
the distance RMSD (dRMSD) between the original and the produced lattice protein or
optimizes the coordinate RMSD (cRMSD).
The optimization strategies differ technically alot since the cRMSD depends heavily on
the superpositioning (relative positioning) between the two structures. Thus, one has to
find (a) the best set of lattice points to represent the protein and (b) the best rotation
of the lattice according to the orientation of the original protein in 3D-space.
In contrast, dRMSD calculation is independent from the relative orientation of the two proteins,
since it compares structure internal distances only. Therefore, 'only' the best set of lattice
points to represent the protein on the lattice has to be identified, independently from the
orientation of the lattice dimensions in 3D-space.
The final structure of the dRMSD-based fitting procedure might be mirrored
compared to the original structure, since dRMSD does not account for reflection.
To find the lattice fit in the right orientation we generate and return the mirrored
structure that minimizes a cRMSD when superpositioned using the algorithm by Kabsch.
Max to keep per iteration
The RMSD-optimizing fitting procedures of latFit build the lattice model
sequentially starting from the amino terminus of the original protein.
A greedy chain-growth procedure is used, i.e. only the best lattice models
are considered for elongation to derive the next longer fit. The "Max. to keep
per Iteration" parameter determines how many of the best structures are considered
for the next iteration.
This parameter influences therefore directly the runtime of the program.
Generally, for dRMSD optimization a high value (about 100-1000) is useful,
while for cRMSD optimization a lower number (10-100) should be used since the correct
lattice rotation has to be determined as well.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 10 and must be smaller than or equal to 1000.
Defaults to (100)
Output Description
The output is a PDB file which encodes both the original coordinates
to be fitted (which were extracted from the input PDB file) as well
as the fitted lattice protein model's coordinates. Both data sets
are represented by different chains (P=original, L=model) in the
PDB file.
Input Examples
Backbone-only model in FCC lattice
Exemplifies the fitting of a backbone-only lattice protein within
the FCC lattice.
The
example's result can be directly accessed
here
Cubic lattice model with side chains
Exemplifies the fitting of a lattice protein including sidechain
representation within the cubic lattice.
The
example's result can be directly accessed
here
Frequently Asked Questions
If your question is not listed, please send it to us!
What is an RMSD and how to calculate dRMSD and cRMSD?
To compare protein structures often the
root mean square deviation (RMSD)
is used. There are two types of distance measures used:
- cRMSD (coordinate RMSD) : measures the average displacement of
each structure monomer compared to the corresponding one in the
second structure. Thus, the measure depends on the superpositioning
of the two structures to each other to yield reasonable results.
- dRMSD (distance RMSD) : measures the average deviation of the
structure internal distances compared to the corresponding distances
within the second structure. Therefore, this measure is independent
from the relative positioning of the structures to each other.