Introduction
LatFit calculates a low deviation on-lattice model of a given full atom protein structure in Protein Data Base (PDB) format. It utilizes a greedy distance or coordinate RMSD optimizating approach while successively fitting the structures monomers on the lattice. It supports backbone-only and sidechain-including models within various lattices.Beneath final deviations and the resulting lattice protein model coordinate data an absolute move string representation is generated.
Different Parameters
- PDB ID
- Atom to Fit
- Chain Identifier
- Model Number
- Lattice Protein Type
- Lattice Form
- CA-CA bond length
- Optimization Mode
- Max. to keep per Iteration
- Rotation Steps / Interval
- Refinement Rotation
PDB ID
Explanation for this parameter
Atom to Fit
Gives the PDB atom identifier for that coordinates should be extracted and a
lattice protein model should be derived. The special string "CoM" denotes
that the centroid of the amino acids sidechain atoms should be calculated
and fitted.
Default values are:
- "CA" (C_alpha-atom) for backbone-only models
- "CoM" (sidechain's centroid) for models including sidechains
NOTE: In case models including sidechains are fitted, the given atom string denotes the position of the sidechain monomers to fit. The backbone monomers are fitted onto the C_alpha atom positions.
Chain Identifier
Specifies which protein chain within the PDB file is to be handled. The
default is chain "A". If no chain identifier is given within the PDB file
please use "_" instead of a white space character.
Model Number
Some PDB files contain several models of the same protein. This parameter
allows for the specification what model to fit.
Lattice Protein Type
Defines what type of protein to be fitted.
This could be a backbone-only or sidechain-including model.
For backbone-only models, each amino acid is represented by a single
monomer. This is usually done to represent the backbone (C_alpha) trail of
a protein chain.
Models including sidechains represent each amino acid with two monomers,
typically one representing the C_alpha backbone position and one to represent the
centroid of the sidechain group of each amino acid.
Lattice Form
The lattice model to use for the fitting. Currently, LatFit supports the
- SQR : 2D-cubic lattice
- CUB : 3D-cubic (100) lattice
- FCC : 3D face centered cubic (FCC or 110) lattice
- 210 : 3D chess knights (210) lattice
CA-CA bond length
Since lattice proteins have a fixed distance
between all connected monomers, a length of the connections has to be
specified. The user can specify the C_alpha-C_alpha distance, which is
usually fixed within proteins and thereby very well suited to scale the
lattice protein according to the provided protein's coordinate data.
The default distance is set to 3.8 Angstroems, the average distance
between successive C_alpha atoms in proteins. This distance is as well close
to the mean distance between the C_alpha atom and the centroid of an
amino acids sidechain (about 3.6 Angstroems).
Optimization Mode
LatFit enables two heuristics to guide the greedy optimization method for
the creation of the lattice protein models. It either searchs for a
model that minimizes the distance RMSD (dRMSD)
between the original and the produced lattice protein or optimizes the
coordinate RMSD (cRMSD).
The optimization strategies differ technically alot since the cRMSD depends
heavily on the superpositioning (relative positioning) between the
two structures. Thus, one has to find
(a) the best set of lattice points to represent the protein and
(b) the best rotation of the lattice according to the orientation of the
original protein in 3D-space.
In contrast, dRMSD calculation is independent from the relative orientation
of the two proteins, since it compares structure internal distances only.
Therefore, 'only' the best set of lattice points to represent the protein
on the lattice has to be identified, independently from the orientation
of the lattice dimensions in 3D-space.
The final structure of the dRMSD-based fitting procedure might be mirrored
compared to the original structure, since dRMSD does not account for
reflection. To find the lattice fit in the right orientation we generate
and return the mirrored structure that minimizes a cRMSD when superpositioned
using the algorithm by Kabsch.
Max. to keep per Iteration
The RMSD-optimizing fitting procedures of latFit build the
lattice model sequentially starting from the amino terminus of the original
protein. A greedy chain-growth procedure is used, i.e. only the best lattice
models are considered for elongation to derive the next longer fit. The
"Max. to keep per Iteration" parameter determines how many of the best
structures are considered for the next iteration.
This parameter influences therefore directly the runtime of the program.
Generally, for dRMSD optimization a high value (about 100-1000) is useful, while
for cRMSD optimization a lower number (10-100) should be used since the
correct lattice rotation has to be determined as well.
Rotation Steps / Interval
The cRMSD-optimzing fitting procedure allows for a
fast, additive coordinate RMSD update along the chain extension, but depends on the
relative orientation of the protein within the lattice. Thus, we follow
Miao et al.(JMB,2004) to find the best fit.
In general a user
defined number of rotation intervals R are trialled for each of the XYZ
rotation axes. For each rotation, we transform the original protein coordinates
to get the rotated current target structure. By applying the cRMSD based
fitting procedure we get the best fit for the current rotation. Successively,
we evaluate the best fit for all trialled rotations. To optimise results, a
further rotational refinement step can be applied around the best
resulting model.
The run time of LatFit scales with respect to the lattice co-ordination number
(i.e. the number of neighboring vectors),
the max. number of structures to keep per iteration,
and most importantly the number of rotation intervals R trialled.
Refinement Rotation
As explained for the rotation steps, the determination
of the correct lattice rotation is essential for the fitting quality when
applying a cRMSD-optimizing fitting procedure. Thus, when determined the
best rotation according to the given rotation steps one
can apply another refinement rotation in order to determine an even better
lattice rotation close to the rotation angles determined by the first
rotation screen.