CPSP-Tools Server
Frequently Asked Questions
BIF
IFF

Frequently Asked Questions

If your question is not listed, please send it to us!

Troubleshooting

? How long are computed results available and stored?

All jobs computed on the Freiburg Tools webserver are stored for 30 days. Afterwards they are automatically removed. In order to preserve your job results you might want to use the job zip-file that is offered for each job on the according result page. This file contains all input information, call details and output files generated.

? What publications to cite when using the server?

Each tool offered by this web server comes with a specific list of publications. Please cite them when using the web server.

? Why do I get a warning saying that my browser has incompatibility issues with the web server?

The Freiburg Tools web server is known to have serious compatibility issues with Internet Explorer. If you are using this browser to access it, we highly recommend switching to Mozilla Firefox so that you don't experience lack of functionality.
The web server was also tested under Google Chrome, Opera, Safari and Konqueror and they are all known to work fine.

If you are not using Internet Explorer and you still get the warning message, then browser mimicry could be the reason. Please let us know and we will correct the problem as soon as possible.

? Who to contact if I have further trouble or encounter problems not listed here?

Please contact us as soon as you encounter any problems or difficulties. If you have problems with a specific tool please send as much detail as possible. In case your problems are related to a certain jobs, please provide the job ID etc. Thanks for your help and feedback!

? I have created a cool new RNA Bioinformatics tool, is it possible to integrate it into the web server?

The Freiburg Tools web server is a very flexible and generic platform to integrate new tools. So please contact us and we will happily discuss the possibilities of an integration of YOUR TOOL into our web server.

General questions

? What are lattice proteins?

Lattice proteins represent a class of protein models that restrict the conformation/structure space of the represented proteins. This is done in two ways. First, all model atoms are confined to nodes of a discrete lattice. Thus, a discretization of the structure space is achieved that enables a full enumeration of all structures a modeled protein can adopt. Furthermore, lattice protein models represent amino acids with one or a few model atoms (monomers) instead of the real number of different atoms like the C_alpha, H, etc. By that a further restriction and simplication of the structure space is obtained.

backbone model side chain model
Backbone Sidechain

Two common forms of lattice proteins are backbone and side chain models. The first represents each amino acid by one monomer only. Thus a selfavoiding consecutive chain of monomers in the lattice yields a valid structure. In side chain models each amino acid is represented by two monomers: one for the backbone atoms and one for the atoms that form the side chain. This yields a more realistic model at the cost of increased computational complexity and structure space. For an example see the picture below. It shows a side chain lattice protein (thick balls and sticks) and the modeled full atom amino acids (thin lines). A backbone model would consist of the blue balls and sticks only. The monomers are placed in an FCC-lattice.

side chain lattice protein

? What is the HP-model?

The HP-model was invented by Kit F. Lau and Ken A. Dill to model hydrophic forces that are known to be a driving force in a protein's folding process. First defined on the 2D-square lattice it is applicable and used in various lattices and even in off-lattice models. In the easiest form it is a backbone model (i.e. one monomer per amino acid) but also side chain models are possible. The model only represents two groups of amino acids : (H)ydrophobic and (P)olar ones. To determine the energy of a protein structure hydrophobic contacts are considered only. Thus the number of H-H-monomer interactions are counted, excluding consecutive ones along the chain. Two monomers interact if they occupy neighboring positions in the lattice, adding an energy gain of -1.

For a 2D example including energy calculation see the following link.

? What does 'unrestricted' lattice model mean?

Often lattice protein studies are restricted to compact structures only. Such structures completely fill a cuboid in the lattice and yield to a futher restriction of structure space. Unfortunatly, such restrictions of structure space bias the studies while reducing the computational complexity.

CPSP-tools do not use this restriction and utilize the full unrestricted structure space of the protein in the lattice model. In short we use the term unrestricted lattice model to distinguish these studies from e.g. the cuboid confined ones.

? How are the lattice neighboring vectors defined?

The simplest 2D square lattice is defined by 4 rectangular neighboring positions/vectors. Despite of its crude protein structure representation it is widely used in HP lattice protein studies.

The 3D cubic lattice with 6 neighboring vectors is also widely used. It shows, as the 2D square lattice, the parity problem.

The 3D face centered cubic (FCC) lattice is defined by 12 neighboring vectors. The FCC lattice was shown to allow for the best protein structure approximations in a lattice.

2D-Square-Lattice         3D-Cubic-Lattice         3D-FCC-Lattice


For the list of neighboring vectors see the absolute move description.

? What are 'absolute move' strings?

Absolute move strings are a compressed string representation of structures in lattice protein models. Here the lattice specific neighboring vectors between successive monomers are described instead of their exact coordinates. Thus all possible vectors are uniquely encoded. Their number depends on the lattice. Encodings used by the CPSP-tools:

Vector Move Vector Move Vector Move
2D-square (+1,0,0) F 3D-cubic (+1,0,0) F 3D-FCC (+1,+1,0) FR
(-1,0,0) B (-1,0,0) B (+1,-1,0) FL
(0,+1,0) R (0,+1,0) R (-1,+1,0) BR
(0,-1,0) L (0,-1,0) L (-1,-1,0) BL
(0,0,+1) U (+1,0,+1) FU
(0,0,-1) D (+1,0,-1) FD
(-1,0,+1) BU
(-1,0,-1) BD
(0,+1,+1) RU
(0,+1,-1) RD
(0,-1,+1) LU
(0,-1,-1) LD

Note that we use a two-letter encoding in the face-centered-cubic (FCC) lattice. This was done to allow for an intuitive readable notation since all neighboring vectors in FCC are a combination of two standard 3D-cubic directions. The encoding follows the description in the order of X-, Y-, Z-changes to get a unique encoding.

? What is the parity problem?

The 2D square and the 3D cubic lattice allow only for 180º or 90º angles between successive monomers (see lattices). This leads to a lattice based restriction of possible contacts between monomers of the protein chain. Caused by the right-angles only monomers with different parity in sequence position can make contacts. Monomers with equal parity can never be neighbored even if they are at the opposite ends of the chain. This is known as the parity problem.

The figure below illustrates the problem. Lattice positions with even coordinate sum are given in blue, odd ones in green. Due to the self-avoidance and connectivity constraints on the chain it can only be placed on iterating blue and green positions. As shown by the figure, no two green or blue nodes are neighbored according to the neighboring vectors in 2D-square or 3D-cubic.

3D-Cubic-Lattice

? What is the degeneracy of a lattice protein sequence?

The degeneracy of a lattice protein sequence is the number of optimal structures the sequence can adopt. This number can be immense in the HP-model due to the simple energy function. Here, the P-monomers have no energy contribution and their placement is are not much constrained.

For example have a look at the sequence HHPPPP. All possible structures are optimal structures with energy 0. But there are a lot (depending on the underlying lattice) due to the long 'P-tail'.

? What is the CPSP-approach and what does 'CPSP' stand for?

CPSP stands for 'Constraint-based Protein Structure Prediction' and is the first complete and exact approach to predict all optimal structures in the 3D-cubic HP-model and was extended to the 3D-FCC lattice as well. It is based on the observation that optimal structures show an (almost) optimal packing of their H-monomers in 3D-lattices. Thus a database of such (sub)optimal packings, so called H-cores is precalculated. These cores are used in the final step to formulate CSPs, Constraint Satisfaction Problems. Utilizing the powerful methods of Constraint Programming, the CPSP-approach solves these problems and is capable of predicting all optimal structures of a given HP-sequence.

The CPSP approach follows for an HP sequence with k H monomers a workflow as sketched in the following cartoon:

CPSP workflow


For a detailed description of the method please see the publication by Backofen and Will (2006) or check the introductory slides [pdf].

? Is there an offline version of the CPSP-tools?

The CPSP-tools are available as an open source package for local installation and usage at http://www.bioinf.uni-freiburg.de/sw/cpsp/.

The package is provided as C++ source code package including standard GNU automake and configure scripts.

? What is an H-core?

Within the CPSP-approach we use the term H-core to describe the set of all H-monomer positions an HP-protein structure adopts in the lattice. Thus no sequence-dependent connectivity information is present.

fcc lattice protein     Structure vs. H-core     fcc lattice H-core


An optimal H-core is a maximally compact set of lattice positions allowing for the maximal number of contacts between these points. For example, in the 2D-square lattice the optimal H-core of size 4 are the edges of a square. Here it is the only one, but usually their number if (much) higher.
A suboptimal H-core is analogously a set of positions with a number of contacts below the maximum. These core are usually still very compact and connected.

Note: H-cores are lattice-specific!

We recursively define the level of suboptimality of H-cores:
  • level 0 : optimal H-cores with maximal number of contacts
  • level i : H-cores with the maximal number of contacts that is less than the contacts of cores in level (i-1)
Note: The number of contacts represented by consecutive levels is not necessarily consecutive as well!

The calculation of optimal and suboptimal H-cores (as needed for the CPSP-approach) is a hard computational problem on its own and relates to the densest packing of spheres in a lattice. It can be solved within the 3D-cubic and 3D-FCC lattice using Constraint Programming techniques. (See Backofen and Will, Optimally compact finite sphere packings - hydrophobic cores in the FCC, 2001).

HPstruct

? Why does HPstruct predicts less structures than requested?

Usually the degeneracy of a sequence in the HP-model is very high. Therefore, we would like to restrict the number of calculated optimal structures.
In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.

? Why does HPstruct sometimes fail to predict optimal structures?

The HPstruct approach is based on a precalculated database of optimally and suboptimally dense packed H-monomer distributions, so called H-cores (see FAQs). Currently we have computed a large number of these H-cores for several levels of suboptimality (see H-cores). They can be used for up to about 60 H-monomers.

The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.

Thus the failure to predict an optimal structure is not a bug nor an inconsistency of the CPSP-approach. It is a limitation of the currently available H-core database.

? Why HPstruct allows only for a restricted number of H-monomers in the sequence?

As described above, the CPSP approach is based on a precalculated database of H-cores. Therefore, HPstruct can only handle sequences where corresponding H-cores are in the available database. So we restrict the number of H-monomers in the sequence to the currently maximal H-core size available.

LatFit

? What is an RMSD and how to calculate dRMSD and cRMSD?

To compare protein structures often the root mean square deviation (RMSD) is used. There are two types of distance measures used:
  • cRMSD (coordinate RMSD) : measures the average displacement of each structure monomer compared to the corresponding one in the second structure. Thus, the measure depends on the superpositioning of the two structures to each other to yield reasonable results.
  • dRMSD (distance RMSD) : measures the average deviation of the structure internal distances compared to the corresponding distances within the second structure. Therefore, this measure is independent from the relative positioning of the structures to each other.