HPstruct calculates optimal structures and the corresponding minimal energy
reachable of a given HP-sequence in unrestricted 3D-lattice HP-models. The
structures are given in absolute move representation where we use the following
coordinate encoding for visualization : F/B +-Z, L/R +-X, U/D +-Y.

To find these structures a so called H-core database is used with precalculated
optimal and suboptimal packings of H-monomers (H-cores). For a concrete sequence S
the approach systematically examines the list of H-cores compatible with S in
decreasing maximal contact number. For each core, it attempts to thread the sequence
through the core. Threading means to find a placement of the monomers of S in a
self-avoiding walk such that all H-monomers are elements of the given H-core and
all P-monomers are outside of the core. Since the H-cores are considered in the order
of decreasing contacts, the first successful threading results in a structure with
global minimal energy. Note that at this point the algorithm has proven that there
is no structure of S that forms more HH-contacts.

To handle the common case of highly degenerated HP-sequences (with many optima),
HPstruct offers the possibility to limit the number of predicted structures or to
generate only a representing subset. Such a subset only contains structures that
are separated by at least (a user defined) distance k. The distance measure is the
hamming distance on the move strings, i.e. the number of different positions in the strings.

Currently, for each H-core size only a restricted number of levels of suboptimality
are calculated. For a low number of sequences (less than 10%) these levels are not
sufficient to find a valid optimal conformation. Thus currently no structure can be
computed but a proven and usually close lower bound on the optimal energy can be given.
This is no bug nor an inconsistency of the CPSP approach but only a limitation of the
current H-core database as described here.

**Introduction**

# When using HPstruct please cite :

- Martin Mann, Sebastian Will, and Rolf Backofen

CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies.

In BMC Bioinformatics, 9, 230, 2008. - Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen.

CPSP-web-tools: a server for 3D lattice protein studies.

Bioinformatics, 25 (5), 676-677, 2009. - Martin Mann, Rolf Backofen, and Sebastian Will.

Equivalence Classes of Optimal Structures in HP Protein Models Including Side Chains.

Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009. - Rolf Backofen and Sebastian Will.

A constraint-based approach to fast and exact structure prediction in three-dimensional protein models.

Journal of Constraints, 11 (1), 5-30, 2006.

Results are computed with HPstruct version 2.4.6 linking Gecode 1.3.0 and BIU 2.3.6

**Overview**

The following parameters are used to control the execution of HPstruct

Furthermore, additional information is available

# Input

## HP sequence

HP sequence for which optimal structures and its minimal energy are to be calculated.
Should only contain "H"ydrophobic and "P"olar characters.

The parameter constraints are: String length has to be in range (1,300). Maximally 1 line is allowed. Only the HP alphabet is allowed for specification.

## Lattice

The 3D-lattice model to use for optimal structure calculation. Currently, HPstruct supports the

- CUB : Unrestricted 3D-cubic lattice
- FCC : Unrestricted 3D face centered cubic lattice

## Structure Model

CPSP approach enables the calculation of optimal structures for the backbone and sidechain HP-model.
In backbone HP-models, each amino acid is represented by a single hydrophobic or polar monomer.
Thus a structure is a connected self-avoiding chain of such monomers.
In sidechain HP-models an amino acid is modeled via two monomers: one representing the C_alpha
atom (or all backbone atoms), the other symbolizing the whole amino acid side chain group. Backbone
and sidechain monomer are neighbored in the lattice. While backbone monomers are neutral, the sidechain
monomers are considered to be hydrophic or polar and thus define the energy of a structure.
A structure in the sidechain model is a selfavoiding chain of successive backbone monomers where
each backbone monomer is connected to its sidechain monomer (selfavoiding) too.

## Allow symmetric structures

By default, HPstruct does not calculate symmetric optimal structures, i.e.
no two resulting structures can be transformed into each other by rotation or reflection!

To allow for applications where these symmetric structures are needed, the symmetry breaking can be disabled.

To allow for applications where these symmetric structures are needed, the symmetry breaking can be disabled.

The parameter constraints are: Input value has to be parsable as Boolean.

## Number of Structures

The maximal number of optimal structures to calculate.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 50.

## One representative per equivalence class

Identifies only one representative structure per structural
equivalence class. Thus, resulting structures show different
placements of their hydrophobic residues. For further
information please refer to the
publication by
Mann et al. (2009).

The parameter constraints are: Input value has to be parsable as Boolean.

# Output Description

HPstruct computes the minimal energy structures for the given
HP sequence and thus provides the global lower energy bound of any
structure formed by the sequence. If no global optimal structure could
be computed, still the lower bound is provided.

# Input Examples

## Backbone-only representatives in FCC lattice

Exemplifies the prediction of optimal backbone-only lattice protein
structures within the FCC lattice that show different placements
of their hydrophobic residues, i.e. they are representatives of the
according equivalence classes.

The example's result can be directly accessed here

## Sidechain structures in CUB lattice

Exemplifies the prediction of optimal lattice protein
structures including sidechain representations
within the unrestricted cubic lattice.

The example's result can be directly accessed here

# Frequently Asked Questions

- Why does HPstruct predicts less structures than requested?
- Why does HPstruct sometimes fail to predict optimal structures?
- Why HPstruct allows only for a restricted number of H-monomers in the sequence?

If your question is not listed, please send it to us!

## Why does HPstruct predicts less structures than requested?

Usually the degeneracy of a sequence in the HP-model
is very high. Therefore, we would like to restrict the number of
calculated optimal structures.

In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.

In case the degeneracy is below the given threshold, HPstruct will calculate all optimal structures, displayed in the list. Thus, the list is shorter but complete.

## Why does HPstruct sometimes fail to predict optimal structures?

The HPstruct approach is based on a precalculated database of optimally
and suboptimally dense packed H-monomer distributions, so called

The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.

Thus the failure to predict an optimal structure is not

*H-cores*(see FAQs). Currently we have computed a large number of these H-cores for several*levels of suboptimality*(see H-cores). They can be used for up to about 60 H-monomers.The current set is sufficient for more than 90% of HP-sequences up to a length of a hundred or more monomers (depending on the number of Hs in the sequence). Some rare sequences have, due to special sequence properties, a sparse H-monomer distribution in their optimal structures. In such cases the current H-core database is not sufficient (not enough levels of suboptimality) to predict these structures. This can be solved by extending the database for the problematic H-monomer number, with one or two additional levels of suboptimal H-cores.

Thus the failure to predict an optimal structure is not

*a bug*nor*an inconsistency*of the CPSP-approach. It is a limitation of the currently available H-core database.## Why HPstruct allows only for a restricted number of H-monomers in the sequence?

As described above, the CPSP approach is based
on a precalculated database of H-cores.
Therefore, HPstruct can only handle sequences where corresponding
H-cores are in the available database. So we restrict the number of
H-monomers in the sequence to the currently maximal H-core size
available.