HADDOCK2.4 manual - the docking protocol
The entire docking protocol in HADDOCK consists of five stages:
- Topologies and structures generation
- Randomization of starting orientations and rigid body energy minimization
- Semi-flexible simulated annealing
- Flexible final refinement
Topologies and structures generation
Generation of all atoms topologies and coordinates for each molecule separately
The first step in HADDOCK in the generation of the CNS topologies and coordinates files for the various molecules and for the complex from the input PDB files. HADDOCK should automatically recognize chain breaks, disulphide bonds, cis-prolines, d-amino acids and even ions provided they are named as defined in the ion.top topology file located in the toppar directory. A number of modified amino acids is also supported. Those should be defined with the proper residue naming in the input PDB files. Refer for the list of supported modified amino acids to the online page on the HADDOCK2.4 webserver.
First the topologies will be generated for each molecule separately and all missing atoms in the input PDB files will be generated. For each input structure
Job files will be generated in the run directory and the topologies, structures and output files will be generated in the begin directory. HADDOCK will use the fileroot names specified in the run.cns file by the prot_root_molX
variable for each molecule and fileroot
for the complex.
The following scripts will be run:
- fileroot_generate_X.job: Generates the CNS topology and coordinates file(s) (if starting from an ensemble) for the various molecules (X indicated the molecule number).
The correspond output files are:
- prot_root_molX.psf: topology file
- prot_root_molX.pdb: coordinates file
- prot_root_molX_1.pdb, prot_root_molX_2.pdb …: coordinate files when starting from an ensemble of models
- file_X.list: list of PDB coordinates files
- file_X.nam: list of PDB coordinates files
CNS scripts called (depending on the options defined):
- generate_X.inp
- initialize.cns: Initialize the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- build-missing.cns: Builds all missing atoms
- flex_segment_back.cns: Defines semi-flexible segments
- prot_break.cns: Detects chain breaks in the protein
- dna_break.cns: Detects chain breaks in nucleic acids
- covalheme.cns: Detects covalent bonds for heme groups
- coval-ace-cys.cns: Detects a cyclic structure connecting an acetylated N-ter to a cysteine
- auto-his.cns: Automatically defines the protonation state of histidines.
Note: If solvated docking is turned on, generate-water_X.inp will be used instead, which calls in addition generate_water.cns,rotate_pdb.cns and generates additional output pdb files containing the water (prot_root_molX_1_water.pdbw, …)
Generation of coarse grained topologies and coordinates for each molecule separately
This will only be performed if the coarse-graining option is turned on. In that case the following scripts will be run:
- fileroot_generate-cg_X.job: Generates the coarse grained CNS topology and coordinates file(s) (if starting from an ensemble) for the various molecules (X indicated the molecule number).
The correspond output files in the begin
directory are:
- prot_root_molX.psf: topology file
- prot_root_molX.pdb: coordinates file
- prot_root_molX_1.pdb, prot_root_molX_2.pdb …: coordinate files when starting from an ensemble of models
- file_X.list: list of PDB coordinates files
- file_X.nam: list of PDB coordinates files
Note that the all atoms topologies and file are created into the begin-aa
directory when coarse graining is turned on.
CNS scripts called (depending on the options defined):
- generate_X.inp
- initialize.cns: Initialize the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- patch-types-cg.cns: Defines secondary structure-specific beads type based on the encoding in the B-factor column in the input coarse grained PDB files.
- patch-bb-cg.cns: Defines secondary structure-specific backbone angles based on the encoding in the B-factor column in the input coarse grained PDB files.
- charge-beads-interactions.cns: Turns off vdw interactions for the fake charged beads; turns off electrostatic interactions between the fake charges beads within one residue
- prot_break.cns: Detects chain breaks in the protein
- dna_break.cns: Detects chain breaks in nucleic acids
- patch-breaks-cg-dna: Detects covalent bonds for heme groups
- flex_segment_back.cns: Defines semi-flexbible segments
Generation of topologies and starting coordinates for the complex
Once the individual topologies and PDB files have been generated, these will be merged to generate the starting models of the complex.
The following scripts will be run:
- fileroot_generate_complex.job: Generates the CNS topology and coordinates file(s) for the complex by merging the various topologies and coordinates files. When starting from ensembles, all combinations will be generated.
Output files:
- fileroot.psf: topology file
- fileroot.pdb: coordinates file
- fileroot_1.pdb, fileroot_2.pdb, … : coordinates files when starting from an ensemble of structures
- file.cns, file.list, file.nam: list of PDB coordinates files
CNS scripts called:
- generate_complex.inp
- initialize.cns: Initialize the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- rebuild-unknown.cns: Rebuilds missing atoms in the context of the complex (if turned on for refinement mode).
Note: If solvated docking is turned on, generate_complex-water.inp will be used instead which will generates additional output pdb files containing the water (fileroot_1_water.pdbw ,…)
In case of problems (and in general to make sure that everything is OK) look into the output files generated (.out) for error messages (search for ERR).
Randomization of starting orientations and rigid body energy minimization
The first docking step in HADDOCK is a rigid body energy minimization.
First the molecules are separated by a minimum of 25Å and rotated randomly around their center of mass. This randomization step can be turned off in the run.cns parameter file. If you wish to decrease (or increase) the separation distance between the two molecules, edit in the protocols directory the separate.cns CNS script and change the value of the $minispacing parameter.
The rigid body minimization is performed in multiple steps:
-
four cycles of rotational minimization in which each molecule (molecule+associated solvent in case of solvated docking) is allowed to rotate in turn
-
two cycles of rotational and translational rigid body minimization in which each molecule+associated solvent is treated as one rigid body
If solvated docking is turned on the following additional steps will be performed:
-
rotational and translational rigid body minimization with each molecule and water molecule treated as separate rigid bodies
-
Biased Monte Carlo removal of water molecules based on propensity of finding a water mediated contact until a user-defined percentage of water molecules remains
-
rotational and translational rigid body minimization with each molecule and water molecule treated as separate rigid bodies
For details of the solvated docking protocol refer to:
-
A.D.J. van Dijk and A.M.J.J. Bonvin
“Solvated docking: introducing water into the modelling of biomolecular complexes”.
Bioinformatics, 22 2340-2347 (2006). -
M. van Dijk, K. Visscher, P.L. Kastritis and A.M.J.J. Bonvin
“Solvated protein-DNA docking using HADDOCK.”
J. Biomol. NMR, 56, 51-63 (2013). -
P.L. Kastritis, K.M. Visscher, A.D.J. van Dijk and A.M.J.J. Bonvin
“Solvated docking using Kyte-Doolittle-based water propensities.”
Proteins: Struc. Funct. & Bioinformatic., 81, 510-518 (2013).
If RDC, PCS or diffusion anisotropy restraints are used two additional minimization steps are carried out to optimize the orientation of the molecules with respect to the alignment tensor(s).
For each starting structure combination, the rigid body minimisation step is repeated a number of times (given by the Ntrials parameter in the run.cns parameter file. In addition, 180 degree rotated solutions are systematically sampled if the rotate180_0 parameter in the run.cns parameter file is set to true (default behavior). Only the best solution from these docking trials is written to disk.
Note: The translational minimization can be turned off in run.cns by setting rigidmini
to false
(default is true
). This option can be useful for example for small flexible molecules to perform the docking during the simulated annealing stage allowing conformational changes to take place during the docking process. The number of steps in the first two stages of the simulated annealing should then be increased by at least a factor four to allow the molecules to approach each other.
The refine.inp CNS script is used for the rigid body minimisation step and the resulting models are written as fileroot_1.pdb, fileroot_2.pdb, … in the structures/it0 directory
Note1: If solvated docking is turned on (waterdock=true
in run.cns, additional output pdb files will be written to disk containing the water (fileroot_1_water.pdbw ,…).
Note2: If random removal of restraints is turned on (noecv=true
in run.cns), additional files will be written to disk containing the random number seed (fileroot_1.seed ,…). This seed is used in the refinement to make sure that the same restraints are removed.
Note3: If random AIR definition is turned on (ranair=true
in run.cns), additional files will be written to disk containing the list of residues selected for the AIR definition (fileroot_1.disp ,…)..
The CNS scripts called in sequential order for the rigid body EM are (depending on the options selected):
- initialize.cns: Initializes the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- read_struc.cns: Reads in the topologies and parameters
- centroids_initialize.cns: Initialize dummy residues for centroids EM restraints
- covalions.cns: Defines covalent bonds to single ions
- setflags.cns: Defines the active energy terms
- read_data.cns: Reads the various restraints
- em_read_data.cns: Reads the EM restraints
- centroids_set_restraints: Defines centroid restraints for EM
- read_water1.cns Reads water coordinates for solvated docking
- water_rest.cns Define restraints between interfacial waters and highly solvated amino acids
- read_data.cns: Reads the various restraints
- setflags.cns: Defines the active energy terms
- randomairs.cns: Defines ambiguous restraints based on random patches
- symmultimer.cns: Defines symmetry restraints
- zrestrainting.cns: Defines harmonic Z-restraints
- cm-restraints.cns: Defines center-of-mass distance restraints
- surf-restraints.cns: Defines surface restraints
- centroids_initialize.cns: Initializes the centroids for EM restraining
- centroids_set_map: Defines map centroids for EM restraining
- separate.cns: Separates the molecules in space
- random_rotations.cns: Applies random rotations to each molecule
- centroids_init_placement.cns: Sets the initial positions of molecules in case of EM restraints
- scale_inter_mini.cns: Defines the scaling of intermolecular interactions for rigid-body EM
- mini_tensor.cns: Optimizes the tensor orientation for RDC restraints
- mini_tensor_para.cns: Optimizes the tensor orientation for PCS restraints
- mini_tensor_dani.cns: Optimizes the tensor orientation for diffusion anisotropy restraints
- waterdock_remove-water.cns: Remove interfacial waters following a Monte Carlo approach
- db0.cns: Used in the removal of water for solvated docking
- db00.cns: Used in the removal of water for solvated docking
- db1.cns: Used in the removal of water for solvated docking
- waterdock_mini.cns: Minimizes the interfacial waters
- em_orien_search.cns: Performs a search to orient the complex properly in the EM density
- bestener.cns: Keeps track of the best generated model so far
- rotation180.cns: Performs a 180 rotation around a vector perpendicular to the interface and minimize the complex again
- em_orien_search.cns: Performs a search to orient the complex properly in the EM density
- bestener.cns: Keeps track of the best generated model so far
- scale_inter_only.cns: Turns on only intermolecular interactions
- mini_tensor.cns: Optimizes the tensor orientation for RDC restraints
- mini_tensor_para.cns: Optimizes the tensor orientation for PCS restraints
- mini_tensor_dani.cns: Optimizes the tensor orientation for diffusion anisotropy restraints
- scale_intra_only.cns: Defines only intremolecular interactions
- read_noes.cns: Reads again the distance restraints
- symmultimer.cns: Defines symmetry restraints
- read_noes.cns: Reads again the distance restraints
- scale_inter_final.cns: Turns on only intermolecular interactions and apply final scaling factor
- scale_intra_only.cns: Defines only intermolecular interactions (if only 1 molecule)
- print_coorheader.cns: Defines the remarks with energy statistics for the output PDB files
- waterdock_out0.cns: Writes output PDB files for water in case of solvated docking
When all structures have been generated (typically in the order of 1000 to 10000 depending on the number of starting conformations, the protocol settings and your CPU resources), HADDOCK will sort them accordingly to the criterion defined in the run.cns parameter file and write the sorted PDB filenames into file.cns, file.list and file.nam in the structures/it0 directory. These will be used for the next step (semi-flexible simulated annealing).
Semi-flexible simulated annealing
The best XXX structures after rigid body docking (typically 200, but this is left to the user’s choice (see the run.cns file section)) will be subjected to a semi-flexible simulated annealing (SA) in torsion angle space. This semi-flexible annealing consists of several stages:
- High temperature rigid body search
- Rigid body SA
- Semi-flexible SA with flexible side-chains at the interface
- Semi-flexible SA with fully flexible interface (both backbone and side-chains)
The temperatures and number of steps for the various stages are defined in the run.cns parameter file.
HADDOCK allows to automatically define the semi-flexible regions by considering all residues within 5A of another molecule. To use this option, set nseg_X to -1 in run.cns (or another negative number if you still want to define manually segments for random AIRs definition from a limited region of the surface). This can be set for each molecule separately.
HADDOCK also allows the definition of fully flexible regions (defined by the nfle_X parameter in run.cns). Those remain fully flexible throughout all four stages. This should be useful for cases where part of a structure are disordered or unstructured or when docking small flexible ligands or peptides onto a protein. This option also allows the use of HADDOCK for structure calculations of complexes when classical NMR restraints are available to drive the folding.
The generated output files are:
- fileroot_1.pdb, fileroot_2.pdb, … written in the *structures/it1 directory
- *fileroot_runX_it1_refine_1.out, … written in the run directory
Note1: If solvated docking is turned on (waterdock=true in run.cns), additional output pdb files will be written to disk containing the water (fileroot_1_water.pdbw ,…).
Note2: If random removal of restraints is turned on (noecv=true in run.cns), additional files will be written to disk containing the random seed number (fileroot_1.seed ,…). This seed is used in the explicit solvent refinement to make sure that the same restraints are removed.
The refine.inp CNS script is used for this step and the CNS scripts called in sequential order for this semi-flexible refinement stage are (depending on the options selected):
- initialize.cns: Initialize the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- read_struc.cns: Reads in the topologies and parameters
- centroids_initialize.cns: Initializes dummy residues for centroids EM restraints
- covalions.cns: Defines covalent bonds to single ions
- setflags.cns: Defines the active energy terms
- read_data.cns: Reads the various restraints
- em_read_data.cns: Reads the EM restraints
- centroids_set_restraints: Defines centroid restraints for EM
- read_water1.cns Reads water coordinates for solvated docking
- water_rest.cns Define restraints between interfacial waters and highly solvated amino acids
- read_data.cns: Reads the various restraints
- setflags.cns: Defines the active energy terms
- centroids_initialize.cns: Initializes the centroids for EM restraining
- centroids_set_restraints.cns: Sets the centroid-based distance restraints
- expand.cns: Expands the initial structure and randomly rotates each component. Saves the initial center of mass positions
- contactairs.cns: Defines ambiguous distance restraints between contacting surfaces
- water_rest.cns Define restraints between interfacial waters and highly solvated amino acids
- symmultimer.cns: Defines symmetry restraints
- zrestrainting.cns: Defines harmonic Z-restraints
- cm-restraints.cns: Defines center-of-mass distance restraints
- contactairs.cns: Defines ambiguous distance restraints between contacting surfaces instead of surface restraints if defined
- dna-rna_restraints.def: Defines DNA/RNA specific restraints
- protein-ss-restraints-all.def: Defines secondary structure restraints for all residues if defined
- protein-ss-restraints-alpha.def: Defines secondary structure restraints for alpha helices if defined
- protein-ss-restraints-alpha-beta.def: Defines secondary structure restraints for alpha helices and beta sheets if defined
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- mini_tensor.cns: Optimizes the tensor orientation for RDC restraints
- mini_tensor_para.cns: Optimizes the tensor orientation for PCS restraints
- mini_tensor_dani.cns: Optimizes the tensor orientation for diffusion anisotropy restraints
- scale_inter_only.cns: Turns on only intermolecular interactions
- setflags.cns: Defines the active energy terms
- flex_segment_back.cns: Define flexible interface for initial energy minimization
- torsiontop.cns : Generate topology for torsion angle MD
- sa_ltad_hightemp.cns: Runs high temperature torsion angle molecular dynamics (TAD) stage (rigid bodies)
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- scale_inter.cns: Scaling of intermolecular interactions
- sa_ltad_cool1.cns: Runs first slow cooling simulated annealing TAD stage (rigid bodies)
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- scale_inter.cns: Scaling of intermolecular interactions
- torsiontop_flex.cns : Generate topology for semi-flexible side-chains torsion angle MD
- sa_ltad_cool2.cns: Runs second slow cooling simulated annealing TAD stage (flexible side-chains at interface)
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- scale_inter.cns: Scaling of intermolecular interactions
- torsiontop_flex_back.cns : Generate topology for semi-flexible side-chains and backbone torsion angle MD
- sa_ltad_cool3.cns: Runs third slow cooling simulated annealing TAD stage (flexible side-chains and backbone at interface)
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- scale_inter.cns: Scaling of intermolecular interactions
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- flex_segment_back.cns: Define flexible interface for final energy minimization
- calc_free-ene.cns: Calculate the total energy of each molecule in isolation
- scale_intra_only.cns: Defines only intermolecular interactions
- scale_inter_final.cns: Turns on only intermolecular interactions and apply final scaling factor
- scale_intra_only.cns: Defines only intermolecular interactions (if only 1 molecule)
- symmultimer.cns: Defines symmetry restraints
- read_noes.cns: Reads again the distance restraints
- dna-rna_restraints.def: Defines DNA/RNA specific restraints
- em_calc_lcc.cns: Calculate the local EM cross-correlation
- print_coorheader.cns: Defines the remarks with energy statistics for the output PDB files
- waterdock_out1.cns: Writes output PDB files for water in case of solvated docking
At the end of the calculation, HADDOCK generates the file.cns, file.list and file.nam files containing the filenames of the generated structures sorted accordingly to the criterion defined in the run.cns parameter file.
At the end of this stage, the structures are analyzed and the results are placed in the structures/it1/analysis directory (see the analysis section).
Flexible final refinement
In this final step, the structures obtained after the semi-flexible simulated annealing are refined. The default in HADDOCK2.4 is to perform only a final energy minimization. But it is also possible to perform a short MD refinement in an explicit solvent layer (8A for water, 12.5A for DMSO). In this step, no spectacular changes are expected, however, the scoring of the various structures is improved.
The generated output files are:
- fileroot_1w.pdb, fileroot_2w.pdb, … written in the structures/it1/water directory
- fileroot_runX_1w.out, … written in the run directory
Note1: The numbering of the structures from it1 is kept.
Note2: If keepwater is set to true in run.cns, two additional output pdb files will be written to disk containing the the entire water shell (fileroot_1_h2o-all.pdb) and only the intermolecular water molecules (fileroot_1_h2o-inter.pdb ,…).
The re_h2o.inp or re_dmso.inp are used for this step and the CNS scripts called in sequential order for this final stage are (depending on the options selected):
- initialize.cns: Initialize the iteration variable
- iterations.cns: Defines the iteration variable
- run.cns: Reads in all parameter settings for the run
- read_struc.cns: Reads in the topologies and parameters
- read_struc-cg.cns: Reads in the coarse grained topologies and parameters
- cg-to-aa.cns: Performs the morphing of the all-atoms model onto the CG model
- read_water1.cns Reads water coordinates for solvated docking
- generate_water.cns (or generate_dmso.cns): Generates the solvent shell
- read_data.cns: Reads the various restraints
- em_read_data.cns: Reads the EM restraints
- water_rest.cns Define restraints between interfacial waters and highly solvated amino acids
- setflags.cns: Defines the active energy terms
- water_rest.cns Define restraints between interfacial waters and highly solvated amino acids
- symmultimer.cns: Defines symmetry restraints
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- dna-rna_restraints.def: Defines DNA/RNA specific restraints
- mini_tensor.cns: Optimizes the tensor orientation for RDC restraints
- mini_tensor_para.cns: Optimizes the tensor orientation for PCS restraints
- mini_tensor_dani.cns: Optimizes the tensor orientation for diffusion anisotropy restraints
- protein-ss-restraints-all.def: Defines secondary structure restraints for all residues if defined
- protein-ss-restraints-alpha.def: Defines secondary structure restraints for alpha helices if defined
- protein-ss-restraints-alpha-beta.def: Defines secondary structure restraints for alpha helices and beta sheets if defined
- flex_segment_side.cns: Define flexible interface side-chains for initial energy minimization
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- flex_segment_back.cns: Define flexible interface for final energy minimization
- set_noe_scale.cns: Define the weight of distance restraints in case of automatic scaling
- calc_free-ene.cns: Calculate the total energy of each molecule in isolation
- scale_intra_only.cns: Defines only interemolecular interactions
- scale_inter_final.cns: Turns on only intermolecular interactions and apply final scaling factor
- scale_intra_only.cns: Defines only intermolecular interactions (if only 1 molecule)
- symmultimer.cns: Defines symmetry restraints
- read_noes.cns: Reads again the distance restraints
- dna-rna_restraints.def: Defines DNA/RNA specific restraints
- em_calc_lcc.cns: Calculate the local EM cross-correlation
- print_coorheader.cns: Defines the remarks with energy statistics for the output PDB files
At the end of the explicit solvent refinement, HADDOCK generates the file.cns, file.list and file.nam files containing the filenames of the generated structures sorted accordingly to the criterion defined in the run.cns parameter file.
Finally, the structures are analyzed and the results are placed in the structures/it1/water/analysis directory (see the analysis section).