CPSP Tools

Menu

Home


HPstruct
structure pred.

HPconvert
PDB, CML, ...

HPview
3D visualization

HPdeg
degeneracy

HPnnet
neutral network

HPdesign
seq. design

LatFit
PDB to lattice

Results
direct access

Help

FAQ




CPSP Tools - FAQ
Constraint-based Protein Structure Prediction


Bioinformatics Group
Albert-Ludwigs-University Freiburg



The CPSP-tools package provides programs to solve exactly and completely the problems typical of studies using 3D lattice protein models. Among the tasks addressed are the prediction of globally optimal and/or suboptimal structures as well as sequence design and neutral network exploration.

You might start with the slides of an Introductory talk on lattice proteins, the CPSP approach, and its applications.

Publications

If you use the CPSP-tools for research or education, please cite the following publications:


If you use LatFit from the LatPack package please cite the following publication:


Frequently Asked Questions


Concerning Usage: Concerning CPSP-tools: Concerning Lattice Proteins:

Concerning Usage

When I press GO nothing happens. Why?    FAQs

This might occure if Java Applets or JavaScript is not enabled in your browser. Furthermore missing mandatory entries might lead to no page forwarding. This is usually combined with a context sensitive help output. If it is missing please let us know.

You can download and install the Java Runtime from here.
java logo

Why is HPview or the HPstruct result page not showing any structure?    FAQs

For interactive structure viewing we are utilizing the Jmol Java applet. Thus Java and JavaScript have to be enabled in your browser. Check if you are blocking the applet.

Why is the input length of some tools like HPstruct restricted?    FAQs

Most of the problems connected to proteins are NP-complete, i.e. computationally very hard to solve. Unfortunately many problems such as the prediction of optimal structures or the inverse folding problem stay hard even in the very simplified HP-model. The computational complexity increases exponentially with the input size. Due to hardware restrictions we decided to restrict the input for the web-applications to ensure feasible processing times for all users. In case longer proteins are of interest we suggest to install and use the offline CPSP-tools package. Furthermore, feel free to contact us for help and support.

Why are some tools restricted to 3D-cubic and 3D-FCC lattice?    FAQs

Most of the tools in the CPSP-tools package are based on the CPSP-approach by Backofen and Will. This approach is based on precalculated compact H-cores, i.e. compact H-monomer distributions in the lattice. The calculation of these cores is a complex, NP-complete problem on its own. In the 2D-square lattice, optimal protein structures do not necessarily show a compact or at least connected H-core. Thus a precalculated database, as used for 3D-cubic or 3D-FCC lattice is not applicable and combinations of H-cores have to be considered. This is currently not handled by the CPSP-approach.
Currently, we only know of approaches to calculate all optimal and suboptimal H-cores for the 3D-cubic and 3D-FCC lattice. Therefore, only these two lattices are supported for the CPSP-based tools.
CPSP-independent tools such as HPconvert support other lattices e.g. the 2D-square lattice.

What CPSP-tools version is interfaced by the web tools?    FAQs

The provided web tools are interfacing CPSP-tools version 2.4.2.

What color coding is used in the 3D views?    FAQs

We use the following default color coding for Jmol:

  • Backbone Model
    • green : H monomers
    • gray : P monomers
  • Sidechain Model
    • pink : backbone monomers
    • green : H side chain monomers
    • gray : P side chain monomers
backbone model side chain model
Backbone Sidechain

Why is the side chain structure model not available for all tools?    FAQs

Currently, only a few tools like HPstruct support their application to the side chain HP model. Bit by bit we are extending all the other tools too and will enable their side chain online usage as soon as possible. If you are in urgent need of one of the tools please contact us.

Concerning CPSP-tools

What is the CPSP-approach and what does 'CPSP' stand for?    FAQs

CPSP stands for 'Constraint-based Protein Structure Prediction' and is the first complete and exact approach to predict all optimal structures in the 3D-cubic HP-model and was extended to the 3D-FCC lattice as well. It is based on the observation that optimal structures show an (almost) optimal packing of their H-monomers in 3D-lattices. Thus a database of such (sub)optimal packings, so called H-cores is precalculated. These cores are used in the final step to formulate CSPs, Constraint Satisfaction Problems. Utilizing the powerful methods of Constraint Programming, the CPSP-approach solves these problems and is capable of predicting all optimal structures of a given HP-sequence.

The CPSP approach follows for an HP sequence with k H monomers a workflow as sketched in the following cartoon:

CPSP workflow


For a detailed description of the method please see the publication by Backofen and Will, 2006.

Where I can read more about the method?    FAQs

For a detailed description of the CPSP approach please read Furthermore you can have a look into the Introductory talk on lattice proteins, the CPSP approach, and its applications for further literature (last slide).

Is there an offline version of the CPSP-tools?    FAQs

The CPSP-tools are available as an open source package for local installation and usage.

Check : Where to get the standalone CPSP-tools package?

What constraint programming framework is used?    FAQs

To implement the CPSP approach we are utilizing the Gecode constraint programming library.

... Gecode is an open, free, portable, accessible, and efficient environment for developing constraint-based systems and applications. ...
Gecode logo

Where can I get the standalone CPSP-tools package?    FAQs

The CPSP-tools are freely available at http://www.bioinf.uni-freiburg.de/sw/cpsp/.

The package is provided as C++ source code package including standard GNU automake and configure scripts.

Why does HPstruct predicts less structures than requested?    FAQs

Usually the degeneracy of a sequence in the HP-model is very high. Therefore, we would like to restrict the number of calculated optimal structures.
In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.

Why does HPstruct sometimes fail to predict optimal structures?    FAQs

The HPstruct approach is based on a precalculated database of optimally and suboptimally dense packed H-monomer distributions, so called H-cores (see HPstruct Help or CPSP publications). Currently we have computed a large number of these H-cores for several levels of suboptimality. They can be used for up to 60 H-monomers.

The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.

Thus the failure to predict an optimal structure is not a bug nor an inconsistency of the CPSP-approach. It is a limitation of the currently available H-core database.

What is an H-core?    FAQs

Within the CPSP-approach we use the term H-core to describe the set of all H-monomer positions an HP-protein structure adopts in the lattice. Thus no sequence-dependent connectivity information is present.

fcc lattice protein     Structure vs. H-core     fcc lattice H-core


An optimal H-core is a maximally compact set of lattice positions allowing for the maximal number of contacts between these points. For example, in the 2D-square lattice the optimal H-core of size 4 are the edges of a square. Here it is the only one, but usually their number if (much) higher.
A suboptimal H-core is analogously a set of positions with a number of contacts below the maximum. These core are usually still very compact and connected.

Note: H-cores are lattice-specific!

We recursively define the level of suboptimality of H-cores:
  • level 0 : optimal H-cores with maximal number of contacts
  • level i : H-cores with the maximal number of contacts that is less than the contacts of cores in level (i-1)
Note: The number of contacts represented by consecutive levels is not necessarily consecutive as well!

The calculation of optimal and suboptimal H-cores (as needed for the CPSP-approach) is a hard computational problem on its own and relates to the densest packing of spheres in a lattice. It can be solved within the 3D-cubic and 3D-FCC lattice using Constraint Programming techniques. (See Backofen and Will, Optimally compact finite sphere packings - hydrophobic cores in the FCC, 2001)

Why HPstruct allows only for a restricted number of H-monomers in the sequence?    FAQs

As described here, the CPSP approach is based on a precalculated database of H-cores. Therefore, HPstruct can only handle sequences where corresponding H-cores are in the available database. So we restrict the number of H-monomers in the sequence to the currently maximal H-core size available.

Concerning Lattice Proteins

What are lattice proteins?    FAQs

Lattice proteins represent a class of protein models that restrict the conformation/structure space of the represented proteins. This is done in two ways. First, all model atoms are confined to nodes of a discrete lattice. Thus, a discretization of the structure space is achieved that enables a full enumeration of all structures a modeled protein can adopt. Furthermore, lattice protein models represent amino acids with one or a few model atoms (monomers) instead of the real number of different atoms like the C_alpha, H, etc. By that a further restriction and simplication of the structure space is obtained.

backbone model side chain model
Backbone Sidechain

Two common forms of lattice proteins are backbone and side chain models. The first represents each amino acid by one monomer only. Thus a selfavoiding consecutive chain of monomers in the lattice yields a valid structure. In side chain models each amino acid is represented by two monomers: one for the backbone atoms and one for the atoms that form the side chain. This yields a more realistic model at the cost of increased computational complexity and structure space. For an example see the picture below. It shows a side chain lattice protein (thick balls and sticks) and the modeled full atom amino acids (thin lines). A backbone model would consist of the blue balls and sticks only. The monomers are placed in an FCC-lattice.

side chain lattice protein

What is the HP-model?    FAQs

The HP-model was invented by Kit F. Lau and Ken A. Dill to model hydrophic forces that are known to be a driving force in a protein's folding process. First defined on the 2D-square lattice it is applicable and used in various lattices and even in off-lattice models. In the easiest form it is a backbone model (i.e. one monomer per amino acid) but also side chain models are possible. The model only represents two groups of amino acids : (H)ydrophobic and (P)olar ones. To determine the energy of a protein structure hydrophobic contacts are considered only. Thus the number of H-H-monomer interactions are counted, excluding consecutive ones along the chain. Two monomers interact if they occupy neighboring positions in the lattice, adding an energy gain of -1.

For a 2D example including energy calculation see the following link.

What does unrestricted lattice model mean?    FAQs

Often lattice protein studies are restricted to compact structures only. Such structures completely fill a cuboid in the lattice and yield to a futher restriction of structure space. Unfortunatly, such restrictions of structure space bias the studies while reducing the computational complexity.

CPSP-tools do not use this restriction and utilize the full unrestricted structure space of the protein in the lattice model. In short we use the term unrestricted lattice model to distinguish these studies from e.g. the cuboid confined ones.

How are the lattice neighboring vectors defined?    FAQs

The simplest 2D square lattice is defined by 4 rectangular neighboring positions/vectors. Despite of its crude protein structure representation it is widely used in HP lattice protein studies.

The 3D cubic lattice with 6 neighboring vectors is also widely used. It shows, as the 2D square lattice, the parity problem.

The 3D face centered cubic (FCC) lattice is defined by 12 neighboring vectors. For a 3D interactive visualization click the 3rd of the following pictures. The FCC lattice was shown to allow for the best protein structure approximations in a lattice.

2D-Square-Lattice         3D-Cubic-Lattice         3D-FCC-Lattice


For the list of neighboring vectors see the absolute move description.

What are absolute move strings?    FAQs

Absolute move strings are a compressed string representation of structures in lattice protein models. Here the lattice specific neighboring vectors between successive monomers are described instead of their exact coordinates. Thus all possible vectors are uniquely encoded. Their number depends on the lattice. Encodings used by the CPSP-tools:

Vector Move Vector Move Vector Move
2D-square (+1,0,0) F 3D-cubic (+1,0,0) F 3D-FCC (+1,+1,0) FR
(-1,0,0) B (-1,0,0) B (+1,-1,0) FL
(0,+1,0) R (0,+1,0) R (-1,+1,0) BR
(0,-1,0) L (0,-1,0) L (-1,-1,0) BL
(0,0,+1) U (+1,0,+1) FU
(0,0,-1) D (+1,0,-1) FD
(-1,0,+1) BU
(-1,0,-1) BD
(0,+1,+1) RU
(0,+1,-1) RD
(0,-1,+1) LU
(0,-1,-1) LD

Note that we use a two-letter encoding in the face-centered-cubic (FCC) lattice. This was done to allow for an intuitive readable notation since all neighboring vectors in FCC are a combination of two standard 3D-cubic directions. The encoding follows the description in the order of X-, Y-, Z-changes to get a unique encoding.

What is the parity problem?    FAQs

The 2D square and the 3D cubic lattice allow only for 180º or 90º angles between successive monomers (see lattices). This leads to a lattice based restriction of possible contacts between monomers of the protein chain. Caused by the right-angles only monomers with different parity in sequence position can make contacts. Monomers with equal parity can never be neighbored even if they are at the opposite ends of the chain. This is known as the parity problem.

The figure below illustrates the problem. Lattice positions with even coordinate sum are given in blue, odd ones in green. Due to the self-avoidance and connectivity constraints on the chain it can only be placed on iterating blue and green positions. As shown by the figure, no two green or blue nodes are neighbored according to the neighboring vectors in 2D-square or 3D-cubic.

3D-Cubic-Lattice

What is the degeneracy of a lattice protein sequence?    FAQs

The degeneracy of a lattice protein sequence is the number of optimal structures the sequence can adopt. This number can be immense in the HP-model due to the simple energy function. Here, the P-monomers have no energy contribution and their placement is are not much constrained.

For example have a look at the sequence HHPPPP. All possible structures are optimal structures with energy 0. But there are a lot (depending on the underlying lattice) due to the long 'P-tail'.

What is an RMSD and how to calculate dRMSD and cRMSD?    FAQs

To compare protein structures often the root mean square deviation (RMSD) is used. There are two types of distance measures used:
  • cRMSD (coordinate RMSD) : measures the average displacement of each structure monomer compared to the corresponding one in the second structure. Thus, the measure depends on the superpositioning of the two structures to each other to yield reasonable results.
  • dRMSD (distance RMSD) : measures the average deviation of the structure internal distances compared to the corresponding distances within the second structure. Therefore, this measure is independent from the relative positioning of the structures to each other.