The CPSP-tools package provides programs to solve exactly and completely
the problems typical of studies using 3D lattice protein models.
Among the tasks addressed are the prediction of globally optimal
and/or suboptimal structures as well as sequence design and neutral
network exploration.
You might start with the slides of an
Introductory talk on lattice proteins, the CPSP approach, and its
applications.
CPSP Tools - FAQ
Constraint-based Protein Structure Prediction
Bioinformatics Group
Albert-Ludwigs-University Freiburg
Publications
If you use the CPSP-tools for research or education, please cite the following publications:
- Martin Mann, Sebastian Will, and Rolf Backofen.
CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies.
In BMC Bioinformatics, 9, 230, 2008. - Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen.
CPSP-web-tools: a server for 3D lattice protein studies.
Bioinformatics, 25 (5), 676-677, 2009. - Rolf Backofen and Sebastian Will.
A constraint-based approach to fast and exact structure prediction in three-dimensional protein models.
Journal of Constraints, 11 (1), 5-30, 2006.
If you use LatFit from the LatPack package please cite the following publication:
- Martin Mann, Daniel Maticzka, Rhodri Saunders, and Rolf Backofen.
Classifying protein-like sequences in arbitrary lattice protein models using LatPack.
In HFSP Journal, 2(6), 396, 2008.
Frequently Asked Questions
Concerning Usage:
- When I press GO nothing happens. Why?
- Why is HPview or the HPstruct result page not showing any structure?
- Why is the input length of some tools like HPstruct restricted?
- Why are some tools restricted to 3D-cubic and 3D-FCC lattice?
- What CPSP-tools version is interfaced by the web tools?
- What color coding is used in the 3D views?
- Why is the side chain structure model not available for all tools?
- What is the CPSP-approach and what does 'CPSP' stand for?
- Where I can read more about the method?
- Is there an offline version of the CPSP-tools?
- What constraint programming framework is used?
- Where can I get the standalone CPSP-tools package?
- Why does HPstruct predicts less structures than requested?
- Why does HPstruct sometimes fail to predict optimal structures?
- What is an H-core?
- What is the level of suboptimality of an H-core?
- Why HPstruct allows only for a restricted number of H-monomers in the sequence?
- What are lattice proteins?
- What is the HP-model?
- What does unrestricted lattice model mean?
- How are the lattice neighboring vectors defined?
- What are absolute move strings?
- What is the parity problem?
- What is the degeneracy of a lattice protein sequence?
- What is an RMSD and how to calculate cRMSD and dRMSD?
Concerning Usage
When I press GO nothing happens. Why? FAQs
|
This might occure if Java Applets or JavaScript is not enabled in your browser.
Furthermore missing mandatory entries might lead to no page forwarding.
This is usually combined with a context sensitive help output. If it is
missing please let us know.
You can download and install the Java Runtime from here. |
|
Why is HPview or the HPstruct result page not showing any structure? FAQs
For interactive structure viewing we are utilizing the
Jmol Java applet. Thus Java and
JavaScript have to be enabled in your browser. Check if you are
blocking the applet.
Why is the input length of some tools like HPstruct restricted? FAQs
Most of the problems connected to proteins are NP-complete, i.e.
computationally very hard to solve. Unfortunately many problems
such as the prediction of optimal structures or the inverse folding problem
stay hard even in the very simplified HP-model. The
computational complexity increases exponentially with the input size.
Due to hardware restrictions we decided to restrict the input for the
web-applications to ensure feasible processing times for all users.
In case longer proteins are of interest we suggest to install and use
the offline CPSP-tools package. Furthermore, feel
free to contact us for
help and support.
Why are some tools restricted to 3D-cubic and 3D-FCC lattice? FAQs
Most of the tools in the CPSP-tools package are based on the
CPSP-approach by Backofen and Will. This approach
is based on precalculated compact H-cores,
i.e. compact H-monomer distributions in the lattice. The calculation
of these cores is a complex, NP-complete problem on its own.
In the 2D-square lattice, optimal protein structures do not necessarily
show a compact or at least connected H-core. Thus a precalculated
database, as used for 3D-cubic or 3D-FCC lattice is not applicable and
combinations of H-cores have to be considered. This is currently not
handled by the CPSP-approach.
Currently, we only know of approaches to calculate all optimal and suboptimal H-cores for the 3D-cubic and 3D-FCC lattice. Therefore, only these two lattices are supported for the CPSP-based tools.
CPSP-independent tools such as HPconvert support other lattices e.g. the 2D-square lattice.
Currently, we only know of approaches to calculate all optimal and suboptimal H-cores for the 3D-cubic and 3D-FCC lattice. Therefore, only these two lattices are supported for the CPSP-based tools.
CPSP-independent tools such as HPconvert support other lattices e.g. the 2D-square lattice.
What CPSP-tools version is interfaced by the web tools? FAQs
The provided web tools are interfacing CPSP-tools version
2.4.2.
What color coding is used in the 3D views? FAQs
We use the following default color coding for
Jmol:
|
|
Why is the side chain structure model not available for all tools? FAQs
Currently, only a few tools like HPstruct support their application to
the side chain HP model. Bit by bit we are extending
all the other tools too and will enable their side chain online usage as soon as
possible. If you are in urgent need of one of the tools please
contact us.
Concerning CPSP-tools
What is the CPSP-approach and what does 'CPSP' stand for? FAQs
CPSP stands for 'Constraint-based Protein Structure Prediction' and is
the first complete and exact approach to predict all optimal structures
in the 3D-cubic HP-model and was extended to the 3D-FCC lattice as well.
It is based on the observation that optimal structures show an (almost)
optimal packing of their H-monomers in 3D-lattices. Thus a database of
such (sub)optimal packings, so called H-cores is
precalculated. These cores are used in the final step to formulate
CSPs, Constraint Satisfaction Problems. Utilizing the powerful methods
of Constraint Programming, the CPSP-approach solves these problems and
is capable of predicting all optimal structures of a given HP-sequence.
The CPSP approach follows for an HP sequence with k H monomers a workflow as sketched in the following cartoon:
For a detailed description of the method please see the publication by Backofen and Will, 2006.
The CPSP approach follows for an HP sequence with k H monomers a workflow as sketched in the following cartoon:
For a detailed description of the method please see the publication by Backofen and Will, 2006.
Where I can read more about the method? FAQs
For a detailed description of the CPSP approach please read
- Rolf Backofen and Sebastian Will.
A constraint-based approach to fast and exact structure prediction in three-dimensional protein models.
Journal of Constraints, 11 (1), 5-30, 2006.
Is there an offline version of the CPSP-tools? FAQs
The CPSP-tools are available as an open source package for local
installation and usage.
Check : Where to get the standalone CPSP-tools package?
Check : Where to get the standalone CPSP-tools package?
What constraint programming framework is used? FAQs
|
To implement the CPSP approach we are utilizing the
Gecode constraint
programming library.
... Gecode is an open, free, portable, accessible, and efficient environment for developing constraint-based systems and applications. ... |
|
Where can I get the standalone CPSP-tools package? FAQs
The CPSP-tools are freely available at
http://www.bioinf.uni-freiburg.de/sw/cpsp/.
The package is provided as C++ source code package including standard GNU automake and configure scripts.
The package is provided as C++ source code package including standard GNU automake and configure scripts.
Why does HPstruct predicts less structures than requested? FAQs
Usually the degeneracy of a sequence in the HP-model
is very high. Therefore, we would like to restrict the number of
calculated optimal structures.
In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.
In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.
Why does HPstruct sometimes fail to predict optimal structures? FAQs
The HPstruct approach is based on a precalculated database of optimally
and suboptimally dense packed H-monomer distributions, so called
H-cores (see HPstruct Help
or CPSP publications).
Currently we have computed a large number of these H-cores
for several levels of suboptimality.
They can be used for up to 60 H-monomers.
The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.
Thus the failure to predict an optimal structure is not a bug nor an inconsistency of the CPSP-approach. It is a limitation of the currently available H-core database.
The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.
Thus the failure to predict an optimal structure is not a bug nor an inconsistency of the CPSP-approach. It is a limitation of the currently available H-core database.
What is an H-core? FAQs
Within the CPSP-approach we use the term H-core to describe the
set of all H-monomer positions an HP-protein structure adopts in the
lattice. Thus no sequence-dependent connectivity information is present.
Structure vs. H-core
An optimal H-core is a maximally compact set of lattice positions allowing for the maximal number of contacts between these points. For example, in the 2D-square lattice the optimal H-core of size 4 are the edges of a square. Here it is the only one, but usually their number if (much) higher.
A suboptimal H-core is analogously a set of positions with a number of contacts below the maximum. These core are usually still very compact and connected.
Note: H-cores are lattice-specific!
We recursively define the level of suboptimality of H-cores:
The calculation of optimal and suboptimal H-cores (as needed for the CPSP-approach) is a hard computational problem on its own and relates to the densest packing of spheres in a lattice. It can be solved within the 3D-cubic and 3D-FCC lattice using Constraint Programming techniques. (See Backofen and Will, Optimally compact finite sphere packings - hydrophobic cores in the FCC, 2001)
Structure vs. H-core
An optimal H-core is a maximally compact set of lattice positions allowing for the maximal number of contacts between these points. For example, in the 2D-square lattice the optimal H-core of size 4 are the edges of a square. Here it is the only one, but usually their number if (much) higher.
A suboptimal H-core is analogously a set of positions with a number of contacts below the maximum. These core are usually still very compact and connected.
Note: H-cores are lattice-specific!
We recursively define the level of suboptimality of H-cores:
- level 0 : optimal H-cores with maximal number of contacts
- level i : H-cores with the maximal number of contacts that is less than the contacts of cores in level (i-1)
The calculation of optimal and suboptimal H-cores (as needed for the CPSP-approach) is a hard computational problem on its own and relates to the densest packing of spheres in a lattice. It can be solved within the 3D-cubic and 3D-FCC lattice using Constraint Programming techniques. (See Backofen and Will, Optimally compact finite sphere packings - hydrophobic cores in the FCC, 2001)
Why HPstruct allows only for a restricted number of H-monomers in the sequence? FAQs
As described here, the CPSP approach is based
on a precalculated database of H-cores.
Therefore, HPstruct can only handle sequences where corresponding
H-cores are in the available database. So we restrict the number of
H-monomers in the sequence to the currently maximal H-core size
available.
Concerning Lattice Proteins
What are lattice proteins? FAQs
Lattice proteins represent a class of protein models that restrict the
conformation/structure space of the represented proteins. This is done
in two ways. First, all model atoms are confined to nodes of a discrete
lattice. Thus, a discretization of the structure space is achieved that
enables a full enumeration of all structures a modeled protein can
adopt. Furthermore, lattice protein models represent amino acids with
one or a few model atoms (monomers) instead of the real number of
different atoms like the C_alpha, H, etc. By that a further restriction
and simplication of the structure space is obtained.
Two common forms of lattice proteins are backbone and side chain models. The first represents each amino acid by one monomer only. Thus a selfavoiding consecutive chain of monomers in the lattice yields a valid structure. In side chain models each amino acid is represented by two monomers: one for the backbone atoms and one for the atoms that form the side chain. This yields a more realistic model at the cost of increased computational complexity and structure space. For an example see the picture below. It shows a side chain lattice protein (thick balls and sticks) and the modeled full atom amino acids (thin lines). A backbone model would consist of the blue balls and sticks only. The monomers are placed in an FCC-lattice.
|
|
| Backbone | Sidechain |
Two common forms of lattice proteins are backbone and side chain models. The first represents each amino acid by one monomer only. Thus a selfavoiding consecutive chain of monomers in the lattice yields a valid structure. In side chain models each amino acid is represented by two monomers: one for the backbone atoms and one for the atoms that form the side chain. This yields a more realistic model at the cost of increased computational complexity and structure space. For an example see the picture below. It shows a side chain lattice protein (thick balls and sticks) and the modeled full atom amino acids (thin lines). A backbone model would consist of the blue balls and sticks only. The monomers are placed in an FCC-lattice.
What is the HP-model? FAQs
The HP-model was invented by Kit F. Lau and Ken A. Dill to model
hydrophic forces that are known to be a driving force in a protein's
folding process. First defined on the 2D-square lattice it is applicable
and used in various lattices and even in off-lattice models. In the easiest
form it is a backbone model (i.e. one monomer per amino acid) but also
side chain models are possible. The model only represents two groups of
amino acids : (H)ydrophobic and (P)olar ones. To determine the
energy of a protein structure hydrophobic contacts are considered only.
Thus the number of H-H-monomer interactions are counted, excluding
consecutive ones along the chain. Two monomers interact if they occupy
neighboring positions in the lattice, adding an energy gain of -1.
For a 2D example including energy calculation see the following link.
For a 2D example including energy calculation see the following link.
What does unrestricted lattice model mean? FAQs
Often lattice protein studies are restricted to compact
structures only. Such structures completely fill a cuboid in the lattice
and yield to a futher restriction of structure space. Unfortunatly,
such restrictions of structure space bias the studies while reducing
the computational complexity.
CPSP-tools do not use this restriction and utilize the full unrestricted structure space of the protein in the lattice model. In short we use the term unrestricted lattice model to distinguish these studies from e.g. the cuboid confined ones.
CPSP-tools do not use this restriction and utilize the full unrestricted structure space of the protein in the lattice model. In short we use the term unrestricted lattice model to distinguish these studies from e.g. the cuboid confined ones.
How are the lattice neighboring vectors defined? FAQs
The simplest 2D square lattice is defined by 4 rectangular neighboring
positions/vectors. Despite of its crude protein structure representation
it is widely used in HP lattice protein studies.
The 3D cubic lattice with 6 neighboring vectors is also widely used. It shows, as the 2D square lattice, the parity problem.
The 3D face centered cubic (FCC) lattice is defined by 12 neighboring vectors. For a 3D interactive visualization click the 3rd of the following pictures. The FCC lattice was shown to allow for the best protein structure approximations in a lattice.
For the list of neighboring vectors see the absolute move description.
The 3D cubic lattice with 6 neighboring vectors is also widely used. It shows, as the 2D square lattice, the parity problem.
The 3D face centered cubic (FCC) lattice is defined by 12 neighboring vectors. For a 3D interactive visualization click the 3rd of the following pictures. The FCC lattice was shown to allow for the best protein structure approximations in a lattice.
For the list of neighboring vectors see the absolute move description.
What are absolute move strings? FAQs
Absolute move strings are a compressed string representation of
structures in lattice protein models. Here the lattice specific
neighboring vectors between successive monomers are described instead
of their exact coordinates.
Thus all possible vectors are uniquely encoded. Their number
depends on the lattice. Encodings used by the CPSP-tools:
Note that we use a two-letter encoding in the face-centered-cubic (FCC) lattice. This was done to allow for an intuitive readable notation since all neighboring vectors in FCC are a combination of two standard 3D-cubic directions. The encoding follows the description in the order of X-, Y-, Z-changes to get a unique encoding.
| Vector | Move | Vector | Move | Vector | Move | |||
| 2D-square | (+1,0,0) | F | 3D-cubic | (+1,0,0) | F | 3D-FCC | (+1,+1,0) | FR |
| (-1,0,0) | B | (-1,0,0) | B | (+1,-1,0) | FL | |||
| (0,+1,0) | R | (0,+1,0) | R | (-1,+1,0) | BR | |||
| (0,-1,0) | L | (0,-1,0) | L | (-1,-1,0) | BL | |||
| (0,0,+1) | U | (+1,0,+1) | FU | |||||
| (0,0,-1) | D | (+1,0,-1) | FD | |||||
| (-1,0,+1) | BU | |||||||
| (-1,0,-1) | BD | |||||||
| (0,+1,+1) | RU | |||||||
| (0,+1,-1) | RD | |||||||
| (0,-1,+1) | LU | |||||||
| (0,-1,-1) | LD |
Note that we use a two-letter encoding in the face-centered-cubic (FCC) lattice. This was done to allow for an intuitive readable notation since all neighboring vectors in FCC are a combination of two standard 3D-cubic directions. The encoding follows the description in the order of X-, Y-, Z-changes to get a unique encoding.
What is the parity problem? FAQs
The 2D square and the 3D cubic lattice allow only for 180º or
90º angles between
successive monomers (see lattices). This leads to
a lattice based restriction of possible contacts between monomers of the
protein chain. Caused by the right-angles only monomers with different
parity in sequence position can make contacts. Monomers with equal
parity can never be neighbored even if they are at the opposite ends of
the chain. This is known as the parity problem.
The figure below illustrates the problem. Lattice positions with even coordinate sum are given in blue, odd ones in green. Due to the self-avoidance and connectivity constraints on the chain it can only be placed on iterating blue and green positions. As shown by the figure, no two green or blue nodes are neighbored according to the neighboring vectors in 2D-square or 3D-cubic.
The figure below illustrates the problem. Lattice positions with even coordinate sum are given in blue, odd ones in green. Due to the self-avoidance and connectivity constraints on the chain it can only be placed on iterating blue and green positions. As shown by the figure, no two green or blue nodes are neighbored according to the neighboring vectors in 2D-square or 3D-cubic.
What is the degeneracy of a lattice protein sequence? FAQs
The degeneracy of a lattice protein sequence is the number of optimal
structures the sequence can adopt. This number can be immense in the
HP-model due to the simple energy function. Here, the P-monomers have
no energy contribution and their placement is are not much constrained.
For example have a look at the sequence HHPPPP. All possible structures are optimal structures with energy 0. But there are a lot (depending on the underlying lattice) due to the long 'P-tail'.
For example have a look at the sequence HHPPPP. All possible structures are optimal structures with energy 0. But there are a lot (depending on the underlying lattice) due to the long 'P-tail'.
What is an RMSD and how to calculate dRMSD and cRMSD? FAQs
To compare protein structures often the root mean square deviation (RMSD)
is used. There are two types of distance measures used:
- cRMSD (coordinate RMSD) : measures the average displacement of each structure monomer compared to the corresponding one in the second structure. Thus, the measure depends on the superpositioning of the two structures to each other to yield reasonable results.
- dRMSD (distance RMSD) : measures the average deviation of the structure internal distances compared to the corresponding distances within the second structure. Therefore, this measure is independent from the relative positioning of the structures to each other.