Introduction

LatFit calculates a low deviation on-lattice model of a given full atom protein structure in Protein Data Base (PDB) format. It utilizes a greedy distance or coordinate RMSD optimizating approach while successively fitting the structures monomers on the lattice. It supports backbone-only and sidechain-including models within various lattices. Beneath final deviations and the resulting lattice protein model coordinate data an absolute move string representation is generated.

LatFit follows a chain elongation procedure similar to that described by Park and Levitt (JMB,1995). Residues are placed on the lattice sequentially starting from the amino terminus, for each residue placement the current best lattice fit is iteratively extended. Each extension is evaluated via RMSD, and the best fit further extended until full chain length is achieved. The coordinates fitted are user defined. By default LatFit takes the C_alpha atom coordinates for the backbone and the centre of mass of non-hydrogen side chain atoms as the side chain.

A depiction of the workflow is given below
latfit-workflow

When using LatFit please cite :

Martin Mann, Rhodri Saunders, Cameron Smith, Rolf Backofen, Charlotte M. Deane
Producing high-accuracy lattice models from protein atomic co-ordinates including side chains
Advances in Bioinformatics, Article ID 148045, 6, 2012
Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen
CPSP-web-tools: a server for 3D lattice protein studies
Bioinformatics, 25 (5), 676-677, 2009

Results are computed with LatFit version 1.9.0

Overview

The following parameters are used to control the execution of LatFit

PDB data
Lattice Model
Method
- Optimization Mode
- Max to keep per iteration

Furthermore, additional information is available

Output Description
Input Examples
- Backbone-only model in FCC lattice
- Cubic lattice model with side chains
Frequently Asked Questions

PDB data

ID of PDB to download and fit

The PDB identifier for the protein of interest form the Protein Database. It will be automatically downloaded for the fitting.

The parameter constraints are: Has to be a PDB id of the format DXXX, where D is a number [1-9] and X either a number or a letter. The ID has to be known within the PDB data base.
Defaults to ()

Atom to Fit

Gives the PDB atom identifier for that coordinates should be extracted and a lattice protein model should be derived. The special string "CoM" denotes that the centroid of the amino acids sidechain atoms should be calculated and fitted.

Default values are: "CA" (C_alpha-atom) for backbone-only models "CoM" (sidechain's centroid) for models including sidechains

NOTE: In case models including sidechains are fitted, the given atom string denotes the position of the sidechain monomers to fit. The backbone monomers are fitted onto the C_alpha atom positions.

The parameter constraints are: String length has to be in range (1,3). Has to be an identifier using letters or numbers. Has to be not 'CA' if a model with side chains is to fit.
Defaults to (CA)

Alternative Atom Identifier

In case the atom identifier is not found within the PDB file, this identifier is used to extract the atom positions to fit.

The parameter constraints are: String length has to be in range (0,3). Has to be an identifier using letters or numbers.
Defaults to ()

Chain identifier

Specifies which protein chain within the PDB file is to be handled. The default is chain "A". If no chain identifier is given within the PDB file please use "_" instead of a white space character.

The parameter constraints are: Has to be a single letter or number.
Defaults to (A)

Model identifier

Some PDB files contain several models of the same protein. This parameter allows for the specification what model to fit.

The parameter constraints are: Has to be a single letter or number.
Defaults to (1)

Lattice Model

Lattice Protein Type

Defines what type of lattice protein model to be fitted. This could be a backbone-only or sidechain-including model. For backbone-only models, each amino acid is represented by a single monomer. This is usually done to represent the backbone (C_alpha) trail of a protein chain. Models including sidechains represent each amino acid with two monomers, typically one representing the C_alpha backbone position and one to represent the centroid of the sidechain group of each amino acid.

The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to ( Backbone Only)

Lattice Type

The lattice type to use for optimal structure calculation. Currently, LatFit supports the

SQR : 2D-square lattice
CUB : 3D-cubic lattice
FCC : 3D face centered cubic (FCC or 110) lattice
210 : 3D chess knights (210) lattice

CA-CA bond length

Since lattice proteins have a fixed distance between all connected monomers, a length of the connections has to be specified. The user can specify the C_alpha-C_alpha distance, which is usually fixed within proteins and thereby very well suited to scale the lattice protein according to the provided protein's coordinate data. The default distance is set to 3.8 Angstroems, the average distance between successive C_alpha atoms in proteins. This distance is as well close to the mean distance between the C_alpha atom and the centroid of an amino acids sidechain (about 3.6 Angstroems).

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 1 and must be smaller than or equal to 10.
Defaults to (3.8)

Method

Optimization Mode

LatFit enables two heuristics to guide the greedy optimization method for the creation of the lattice protein models. It either searchs for a model that minimizes the distance RMSD (dRMSD) between the original and the produced lattice protein or optimizes the coordinate RMSD (cRMSD).

The optimization strategies differ technically alot since the cRMSD depends heavily on the superpositioning (relative positioning) between the two structures. Thus, one has to find (a) the best set of lattice points to represent the protein and (b) the best rotation of the lattice according to the orientation of the original protein in 3D-space.

In contrast, dRMSD calculation is independent from the relative orientation of the two proteins, since it compares structure internal distances only. Therefore, 'only' the best set of lattice points to represent the protein on the lattice has to be identified, independently from the orientation of the lattice dimensions in 3D-space.

The final structure of the dRMSD-based fitting procedure might be mirrored compared to the original structure, since dRMSD does not account for reflection. To find the lattice fit in the right orientation we generate and return the mirrored structure that minimizes a cRMSD when superpositioned using the algorithm by Kabsch.

Max to keep per iteration

The RMSD-optimizing fitting procedures of latFit build the lattice model sequentially starting from the amino terminus of the original protein. A greedy chain-growth procedure is used, i.e. only the best lattice models are considered for elongation to derive the next longer fit. The "Max. to keep per Iteration" parameter determines how many of the best structures are considered for the next iteration.

This parameter influences therefore directly the runtime of the program. Generally, for dRMSD optimization a high value (about 100-1000) is useful, while for cRMSD optimization a lower number (10-100) should be used since the correct lattice rotation has to be determined as well.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 10 and must be smaller than or equal to 1000.
Defaults to (100)

Output Description

The output is a PDB file which encodes both the original coordinates to be fitted (which were extracted from the input PDB file) as well as the fitted lattice protein model's coordinates. Both data sets are represented by different chains (P=original, L=model) in the PDB file.

Input Examples

Backbone-only model in FCC lattice

Exemplifies the fitting of a backbone-only lattice protein within the FCC lattice.

The example's result can be directly accessed here

Cubic lattice model with side chains

Exemplifies the fitting of a lattice protein including sidechain representation within the cubic lattice.