Introduction

HPstruct calculates optimal structures and the corresponding minimal energy reachable of a given HP-sequence in unrestricted 3D-lattice HP-models. The structures are given in absolute move representation where we use the following coordinate encoding for visualization : F/B +-Z, L/R +-X, U/D +-Y.

To find these structures a so called H-core database is used with precalculated optimal and suboptimal packings of H-monomers (H-cores). For a concrete sequence S the approach systematically examines the list of H-cores compatible with S in decreasing maximal contact number. For each core, it attempts to thread the sequence through the core. Threading means to find a placement of the monomers of S in a self-avoiding walk such that all H-monomers are elements of the given H-core and all P-monomers are outside of the core. Since the H-cores are considered in the order of decreasing contacts, the first successful threading results in a structure with global minimal energy. Note that at this point the algorithm has proven that there is no structure of S that forms more HH-contacts.

To handle the common case of highly degenerated HP-sequences (with many optima), HPstruct offers the possibility to limit the number of predicted structures or to generate only a representing subset. Such a subset only contains structures that are separated by at least (a user defined) distance k. The distance measure is the hamming distance on the move strings, i.e. the number of different positions in the strings.

Currently, for each H-core size only a restricted number of levels of suboptimality are calculated. For a low number of sequences (less than 10%) these levels are not sufficient to find a valid optimal conformation. Thus currently no structure can be computed but a proven and usually close lower bound on the optimal energy can be given. This is no bug nor an inconsistency of the CPSP approach but only a limitation of the current H-core database as described here.

When using HPstruct please cite :

Martin Mann, Sebastian Will, and Rolf Backofen
CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies.
In BMC Bioinformatics, 9, 230, 2008.
Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen.
CPSP-web-tools: a server for 3D lattice protein studies.
Bioinformatics, 25 (5), 676-677, 2009.
Martin Mann, Rolf Backofen, and Sebastian Will.
Equivalence Classes of Optimal Structures in HP Protein Models Including Side Chains.
Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009.
Rolf Backofen and Sebastian Will.
A constraint-based approach to fast and exact structure prediction in three-dimensional protein models.
Journal of Constraints, 11 (1), 5-30, 2006.

Results are computed with HPstruct version 2.4.6 linking Gecode 1.3.0 and BIU 2.3.6

Overview

The following parameters are used to control the execution of HPstruct

Input

Furthermore, additional information is available

Output Description
Input Examples
- Backbone-only representatives in FCC lattice
- Sidechain structures in CUB lattice
Frequently Asked Questions

Input

HP sequence

HP sequence for which optimal structures and its minimal energy are to be calculated. Should only contain "H"ydrophobic and "P"olar characters.

The parameter constraints are: String length has to be in range (1,300). Maximally 1 line is allowed. Only the HP alphabet is allowed for specification.
Defaults to ()

Lattice

The 3D-lattice model to use for optimal structure calculation. Currently, HPstruct supports the

CUB : Unrestricted 3D-cubic lattice
FCC : Unrestricted 3D face centered cubic lattice

Unrestricted means that the structures are not restricted e.g. to a compact cube as done by other approaches.

Structure Model

CPSP approach enables the calculation of optimal structures for the backbone and sidechain HP-model. In backbone HP-models, each amino acid is represented by a single hydrophobic or polar monomer. Thus a structure is a connected self-avoiding chain of such monomers. In sidechain HP-models an amino acid is modeled via two monomers: one representing the C_alpha atom (or all backbone atoms), the other symbolizing the whole amino acid side chain group. Backbone and sidechain monomer are neighbored in the lattice. While backbone monomers are neutral, the sidechain monomers are considered to be hydrophic or polar and thus define the energy of a structure. A structure in the sidechain model is a selfavoiding chain of successive backbone monomers where each backbone monomer is connected to its sidechain monomer (selfavoiding) too.

Allow symmetric structures

By default, HPstruct does not calculate symmetric optimal structures, i.e. no two resulting structures can be transformed into each other by rotation or reflection!

To allow for applications where these symmetric structures are needed, the symmetry breaking can be disabled.

The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

Number of Structures

The maximal number of optimal structures to calculate.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 50.
Defaults to (10)

One representative per equivalence class

Identifies only one representative structure per structural equivalence class. Thus, resulting structures show different placements of their hydrophobic residues. For further information please refer to the publication by Mann et al. (2009).

The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

Output Description

HPstruct computes the minimal energy structures for the given HP sequence and thus provides the global lower energy bound of any structure formed by the sequence. If no global optimal structure could be computed, still the lower bound is provided.

Input Examples

Backbone-only representatives in FCC lattice

Exemplifies the prediction of optimal backbone-only lattice protein structures within the FCC lattice that show different placements of their hydrophobic residues, i.e. they are representatives of the according equivalence classes.

The example's result can be directly accessed here

Sidechain structures in CUB lattice

Exemplifies the prediction of optimal lattice protein structures including sidechain representations within the unrestricted cubic lattice.

The example's result can be directly accessed here

Frequently Asked Questions

Why does HPstruct predicts less structures than requested?
Why does HPstruct sometimes fail to predict optimal structures?
Why HPstruct allows only for a restricted number of H-monomers in the sequence?

If your question is not listed, please send it to us!

Why does HPstruct predicts less structures than requested?

Usually the degeneracy of a sequence in the HP-model is very high. Therefore, we would like to restrict the number of calculated optimal structures.
In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.

Why does HPstruct sometimes fail to predict optimal structures?

The HPstruct approach is based on a precalculated database of optimally and suboptimally dense packed H-monomer distributions, so called H-cores (see FAQs). Currently we have computed a large number of these H-cores for several levels of suboptimality (see H-cores). They can be used for up to about 60 H-monomers.

The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.

Thus the failure to predict an optimal structure is not a bug nor an inconsistency of the CPSP-approach. It is a limitation of the currently available H-core database.

Why HPstruct allows only for a restricted number of H-monomers in the sequence?

As described above, the CPSP approach is based on a precalculated database of H-cores. Therefore, HPstruct can only handle sequences where corresponding H-cores are in the available database. So we restrict the number of H-monomers in the sequence to the currently maximal H-core size available.

Main Menu

CPSP-Tools Server

Freiburg RNA Tools

MoDPepInt Server

Galaxy-FR