Frequently Asked Questions
We collected here a list of frequently occurring problems and their solutions. The following topics are currently available:
- What about missing atoms or chain breaks?
- What about point mutations?
- What about ions?
- Domain definition for docking
- Clustering issues
- Running HADDOCK on a cluster using a queuing system (e.g. Torque or Slurm)
- Small ligand docking with HADDOCK
- Usage of dummy atoms/beads with HADDOCK
- Typical error messages
If your problem falls outside of the topics, please see the Getting support / How to ask for help section.
What about missing atoms?
Missing atoms will be automatically detected (if part of the HADDOCK library) and re-generated when running the [topoaa]
module.
For this reason, it is always used as the first module in a haddock3 workflow configuration file, not only to generate the topology of the input molecules but also to add and reconstruct missing atoms.
What about chain breaks?
In case of missing residues, chain breaks will be introduced.
This might cause segments of your molecule to move with respect to each other during the refinement stages.
To avoid that, you can define a few specific distance restraints, for example between CA atoms.
This can be easily performed by the haddock3-restraints
command line interface supporting the restrain_bodies
subcommand that allows the detection of such breaks and define distance restraints.
Here is the documentation to the haddock3-restraints restrain_bodies
subcommand.
Those restraints can then be provided to haddock3 as unambiguous restraints for example (using the unambig_fname
parameter in CNS modules).
What about point mutations?
To introduce mutations in your input PDB files you can do the following:
- edit the PDB file and rename the mutated residue to the proper amino acid name
- keep or rename appropriately the matching side-chain atoms
The extra/missing atoms will be automatically detected and the corrected topology and coordinates will be regenerated by the [topoaa]
module.
It is important to have at least the backbone atoms and at least the CB atom along the side-chain defined since their average position will be used as a starting point to "grow" the missing atoms.
Always check that the sequence of the various PDB files matches!
Note that this approach is only functional for residues supported in the HADDOCK library.
What about ions?
Some proteins contain ions such as for example calcium.
Their inclusion might be important for docking purposes, in particular for proper electrostatics!
In principle, they should be recognized when running the [topoaa]
module, provided their name in the PDB file matches the ion names in the list of supported ions (can be found here).
Domain definition for docking
In general, it is recommended to remove any part of your system such as flexible linkers that are not involved in the interaction with the partner for docking. Keeping these might cause trouble in the sorting of solutions. For example, such a linker can make contact with the partner molecule, resulting in lower total energy, and, in that way, "bad" solutions could still be kept.
The same applies to AlphaFold2 spaghetti like disordered regions that often surround the domain of interest. Indeed, these regions may induce van der Walls forces due to sterical clashes before the two domains of interest could even interact. Removing regions with low pLDDT (~< 60) can be an appropriate solution so use AlphaFold2 models for docking.
Clustering issues
When performing RMSD clustering, two modules can be used to compute the RMSD matrix:
[rmsdmatrix]
: computing the full complex (or single chain) RMSD matrix[ilrmsdmatrix]
: computing the interface-ligand RMSD matrix
The [rmsdmatrix]
module allows you to define a subset of residues used to perform both the structural alignment and the RMSD computation.
For this, you need to specify a list of residues for each chain, using the parameter resdic_*
, where *
is the chainID.
As an example, to perform the selection of residues 12, 13, 14 and 15 from chain A and 1, 2, 3 from chain B, refine the following parameters:
[rmsdmatrix]
resdic_A = [12, 13, 14, 15]
resdic_B = [1, 2, 3]
This will result in the selection of those 7 residues to perform the structural alignment onto the reference and then compute the RMSD.
While for the [ilrmsdmatrix]
module, a different approach is taken.
Two parameters must be defined
receptor_chain
: defining the chainID of the receptor. By default "A".ligand_chains
: a list of other chain IDs that should represent the "ligands". If not set, all the remaining chains will be considered as ligand.
During the computational workflow, first, all the residue-residue contacts between the receptor and ligand are selected. This selection is then used to perform later structural alignment and RMSD computation.
Those two modules must be followed by the [clustrmsd]
module, otherwise, only the pair-wise RMSD matrix will be computed, and clustering not performed.
Note that this is not an issue if fractions of common contact (FCC) clustering ([clustfcc
] module) is used as the matrix is computed within the clustering module directly (as much faster).
Running HADDOCK on a cluster using a queuing system (e.g. Torque or Slurm)
In order to submit to the queuing system we typically use a wrapper script that will add some directives to the job files.
First, we must define a haddock3 workflow (e.g.: haddock_run.cfg
):
#################################
# GLOBAL PARAMETERS
#################################
run_dir = "amazing_docking_experiment"
molecules ["protein1.pdb", "protein2.pdb"]
# Here we define the maximum number of available cores to use
ncores = 40
#################################
# WORKFLOW MODULES PARAMETERS
#################################
[topoaa]
[rigidbody]
[seletop]
[flexref]
[emref]
[clustfcc]
[seletopclusts]
[contactmap]
[caprieval]
Here is one example of such a wrapper script (named haddock3_run.job
) that would submit to the slurm queue:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks-per-node=40
#SBATCH -J haddock3-run
#SBATCH -p short
# 1. Active the haddock3 virtual environment
# From venv
source /path/to/haddock3/install/dir/.haddock3-env/bin/activate
# Or using conda
source /path/to/conda/install/dir/bin/activate
conda activate haddock3-env
# Go to the base directory where the `haddock_run.cfg` workflow is written
cd /path/to/workflow/
# Execute haddock3 on the `haddock_run.cfg` workflow
haddock3 haddock_run.cfg
Note here that we set up the number of cores for both the haddock3 run (in haddock_run.cfg
) and slurm job (in haddock3_run.job
)to 40
Cofactors / Small-ligand docking with HADDOCK
It's possible to dock small ligands or cofactor using haddock3, but for that topology and parameter files for the ligand should be provided in CNS format.
Several sources exist to find such files:
-
ccp4-prodrg:
ccp4-prodrg
. -
the PRODRG server was maintained by Daan van Aalten at Dundee University. This server allows you to draw your molecule or paste coordinates and will return topologies and parameter files in various formats, including CNS. You should turn on the electrostatic to obtain partial charges. Save the resulting PDB file and the corresponding CNS parameter and topology files to use in HADDOCK.
Important: The generated parameter file contains a CNS
NBONds
statement which should be removed prior to their use in HADDOCK. Look in the parameter file for:NBONds
CUTNB=7.0 WMIN=1.5 REPEL=1.0 REXPONENT=4 IREXPONENT=1 RCONST=16.0 TOLERANCE=0.5 NBXMOD=5 CTONNB=5.5 CTOFNB=6.0 END
and remove or comment it out (by adding ! before each line).
- the Automated Topology Builder (ATB) and Repository developed in Prof. Alan Mark's group at the University of Queensland in Brisbane: https://atb.uq.edu.au/
Note: we have not yet tested those parameters in HADDOCK.
For docking small ligands with haddock3 using custom-made topology and parameter files, you should:
- Define the path to the files in CNS modules (
[topoaa]
,[flexref]
,[emref]
,[mdref]
,[emscoring]
,[mdscoring]
)- Input the topology file using
ligand_top_fname
parameter. - Input the parameter file using
ligand_param_fname
parameter.
- Input the topology file using
Also, we recommend setting the number of MD steps for the first two parts (rigid-body high temperature dynamic and slow cooling annealing) of the [flexref] module
to 0.
This is performed by tuning the mdsteps_rigid
and mdsteps_cool1
parameters and setting their values to 0.
Haddock3 comes with an example for protein-ligand docking. Check the setting in that example.
Important: When starting a run, always check for error messages in the 0_topoaa
directory in the various generated .out
files, especially for your ligand.
Beads dummy atoms docking with haddock3
Dummy atoms can be used in haddock3 and can be useful as distance restraints can be built towards them. This is used, for example:
Dummy atoms (also called shape beads) must be defined in a separate PDB file and have the following naming convention:
- Start with the
ATOM
- using
SHA
for both atom and residue name - defined as chain
S
- have the same atom and residue index
ATOM 1 SHA SHA S 1 24.222 -6.426 -14.545 1.00 1.00
ATOM 2 SHA SHA S 2 23.059 -6.675 -14.930 1.00 1.00
...
Because such types of dummy atoms neither have topology nor force-field parameters, they must be explicitly defined as shapes. To do so, two parameters must be set in your configuration file:
mol_shape_X = true
: allows to tell haddock3 that moleculeX
is a shape.mol_fix_origin_X = true
: allows telling haddock3 not to move moleculeX
, and keep the original coordinates.
Where X
is a number that corresponds to the molecule position in the input list of molecules in the configuration file.
Here is an example, where the shapes will be input at the second position in the molecules:
run_dir = "test_shape"
molecules = ["protein.pdb", "shapes.pdb"]
[topoaa]
[rigidbody]
# `_2` as the shape is placed second in the input molecules
mol_shape_2 = true # Defines second input molecule as `shape`
mol_fix_origin_2 = true # Fix origin/input coordinates of the second input molecule
Typical haddock3 error messages
In some cases, the haddock3 execution can stop for a given reason. While we are already trying the handle possible errors, some of them will lead to critical failure, terminating the workflow. If such an error occurs, we report it in the log file. The log file can be found at two locations:
- printed on your screen as standard output.
- written in a file named
log
located in the workflow run directory.
We often try to provide a meaningful error message that can help you figure out what could be the issue related to it. If not, please refer to the Getting support / How to ask for help section to get assistance.
Here is a list of the most common errors:
Tolerance issue
Here is a typical tolerance issue log error message:
[2024-09-09 20:04:33,709 libutil ERROR] 100.00% of output was not generated for this module and tolerance was set to 5.00%.
Traceback (most recent call last):
File "/data/haddock3/src/haddock/libs/libutil.py", line 335, in log_error_and_exit
yield
File "/data/haddock3/src/haddock/clis/cli.py", line 192, in main
workflow.run()
File "/data/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
step.execute()
File "/data/haddock3/src/haddock/libs/libworkflow.py", line 162, in execute
self.module.run() # type: ignore
File "/data/haddock3/src/haddock/modules/base_cns_module.py", line 61, in run
self._run()
File "/data/haddock3/src/haddock/modules/sampling/rigidbody/__init__.py", line 246, in _run
self.export_io_models(faulty_tolerance=self.params["tolerance"])
File "/data/haddock3/src/haddock/modules/__init__.py", line 300, in export_io_models
self.finish_with_error(_msg)
File "/data/haddock3/src/haddock/modules/__init__.py", line 308, in finish_with_error
raise RuntimeError(reason)
RuntimeError: 100.00% of output was not generated for this module and tolerance was set to 5.00%.
This means that models that should have been generated by a module are missing from the file system, and therefore were not written. This can come from multiple reasons:
- There is an issue with the parameters/topology generation.
- The model contained clashes that led to extremely high energetics that blew up the system.
- The
tolerance
threshold was set too low.
If 100.00% of output was not generated
, there is probably an issue with the input molecules:
- unrecognized amino acids
- missing parameters/topology for residues outside of the HADDOCK library
- huge sterical clashes
If the value is inferior to 100.00%
, it can come from either one of the input conformers or a random error from the molecular dynamic simulation.
In this case, you could increase the tolerance threshold to a higher value (e.g.: tolerance = 10
), for the workflow to continue, as missing a few models can be acceptable.
Note that you could also restart the workflow from the next module (e.g.: haddock3 workflow.cfg --restart 5
to restart from module 5 if the module did not meet the tolerance threshold), allowing you to save computational resources not having to recompute data from module 0 to 4.