Structure preparation - HADDOCK3 User Manual

Structures

Best practice guide

The first step in your docking protocol is to know which molecules you want to find a complex for. This might sound easy, but it can be quite tricky. This section explains where to find or model input structures, how to edit them, and prepare them for HADDOCK.

Which structures are available?

Experimental structures

In the best-case scenario, there is an experimental structure available. All crystallographic, NMR, or cryo-EM structures protein structures are deposited in protein data banks:

Worldwide Protein Data Bank wwPDB
Protein Data Bank in Europe PDBe
The Research Collaboratory for Structural Bioinformatics Protein Data Bank RCSB PDB
Protein Data Bank Japan PDBj
Biological Magnetic Resonance Data Bank BMRB

Sequence and homologous proteins

In case when there is no experimental structure available for molecules of proteins of interest, one can use protein homologs as templates for protein modeling. There are multiple tools that help us to do so. Some online tools for homolog search are here:

Once one finds the protein homologs, some freely available software for homology model building are here:

SWISS-MODEL
- this online tool can both look for homologous proteins and build a protein model
MODELLER
- online version ModLoop for loop modeling
- local version for homology or comparative modeling of protein three-dimensional structures

Homology modeling using these tools is described in our tutorial here:

Homology Modeling of the mouse MDM2 protein

AI-generated structures

Using AI tools to generate structures is now becoming the standard. Nevertheless, one should always be careful when using it, as artifacts can be generated. Indeed, sterical clashes can be present. Also, long disordered regions with low predicted pLDDT around the protein will not help during the docking, as it may prevent the interaction of the structured domain. To prevent this, try to energy minimize the structure and remove spaghetti around the domain of interest.

AlphaFoldDB: Hosted by the EBI/EMBL, it contains more than 2 milions predicted monomeric strucutre for a bunch of taxonomic spieces, that can be downloaded.
UniProt: The UniProtKB now also provides, in the 'Structure' section, direct links to AlphaFoldDB, when available.
Online ColabFold: Written and maintained by Sergey Ovchinnikov & Martin Steinegger, allows to run AlphaFold2 on a jupyter notebook using online resources.
Local ColabFold: The GitHub repository of ColabFold host multiple solutions to install AlphaFold2 locally.

Modelling of peptides and mutations in proteins

Point-mutations in HADDOCK are handled by changing the amino acid name and HADDOCK will fill the missing side chain atoms. This step is further described here and can be done using the pdb_mutate.py tool in haddock-tools.

Note that pdb_mutate.py will not create the new side-chain atoms (this is handled by HADDOCK). But if you prefer to have control of the side-chain conformation rather use tools like Pymol to introduce the mutation. This is even recommended in the case of a mutation to Histidine as the server can not automatically guess the protonation state if the side-chain is missing.

Pymol is an almost irreplaceable tool in the every-day life of a computational chemist. Pymol is often used in a number of HADDOCK tutorials for structure preparations as well as analysis of docking results.
- Pymol offers a lot of handy plugins that extend its usability, for example, peptide-building ,some of them can be found here:
  - https://github.com/Pymol-Scripts/Pymol-script-repo
- Pymol offers an option to mutate residues and choose the side chain conformation manually.
- Modelling of peptides using Pymol modeling scripts is described here.
Rosetta
- Rosetta, as well as plenty of other online tools have now functionalities with which you can build peptides from their sequences.
A list of modified amino acids supported by HADDOCK can be found here.

Modeling of small molecules

OpenEye OMEGA
- OMEGA uses the SMILES strings as input to generate three-dimensional (3D) conformations of ligands. OMEGA was used by our group in previous rounds of the D3R challenge.
- license necessary
RDKit
- open source chemoinformatics and machine learning software
OpenBabel
- open source chemoinformatics software, with an online version accessible here.

to prepare topology and parameter files for the ligand in CNS format one can use:
- ccp4-prodrg: ccp4-prodrg
- the Automated Topology Builder (ATB) and Repository developed in the group of Prof. Alan Mark at the University of Queensland in Brisbane: https://atb.uq.edu.au/
- BioBB using acpype: The BioExcel BioBuildingBlock (BioBB) library is hosting several tutorials on how to perform computations with a variety of different tools. Here is a link to the workflow used to parametrize ligands: https://mmb.irbbarcelona.org/biobb/workflows/tutorials/biobb_wf_ligand_parameterization.

The preparation of small molecules for docking is further described in the frequently asked questions page.

Using Molecular Dynamics for generating multiple conformations

Proteins are not rock-solid and HADDOCK can handle flexibility of the interface to a certain extent. Ensemble docking of conformations generated by molecular dynamics (MD) is an elegant way to account for larger conformational changes. There are a number of MD engines available for generating of conformations such as:

GROMACS
OpenMM: Can also be used within haddock3 itself as it is now a module (see refenement module / openmm)
GROMOS
AMBER
CHARMM

Examples of using MD for HADDOCK are shown here:

Editing pdb files

Upon acquiring the input structures provided you might want to modify in one way or the other. This might not be very straightforward since pdb files have to meet strict formatting requirements and are rather lengthy to edit manually. The HADDOCK group has therefore developed a pipeline called PDB-Tools where pdb files can be submitted and edited it as needed. PDB-tools are available here:

In your haddock3 environement: command line interface
PDB-Tools Web: online version
Local version of PDB-Tools: for a separated local installation

Tutorials:

Getting structures HADDOCK-ready

haddock3 manual -> Structure requirements
Preparation of coarse-grained pdb files
- HADDOCK can now handle large complexes containing up to 20 chains. An elegant way to increase the speed of these calculations is to use coarse graining with Martini.
Preparation of pdb files for the local version of HADDOCK2.4
Haddock tools are a bunch of useful tools available on [Github](https://github.com/haddocking/haddock-tools for use with local version of HADDOCK) that can be used to modify pdb or restraint files.
A list of modified amino acids and other molecule types supported by HADDOCK can be found here.

Dos and Don'ts

Don't	Do instead
input a pdb file without checking it first	carefully inspect your pdb and remove any unwanted atoms (water molecules, ions, crystallization agents)
edit pdb files in Word, OpenOffice or LibreOffice editor	edit pdb files in an ASCII text editor
use residues with multiple occupancies (e.g. 124A, 124B)	use pdb_selaltloc to choose only one residue occupancy
use residues with overlapping numbering	use pdb_reres to renumber residues
use atoms with identical atom names for the same residue	edit your molecule with an ASCII text editor to make all atom names unique or use pdb_uniqname from our PDB-tools
use a pdb file with incorrect formatting	pdb formatting is very strict, check your file with pdb_validate and reload and export the file in Pymol if necessary

Once you have your structures HADDOCK-ready you can go to the next step and define restraints.

Any more questions about pdb preparation for HADDOCK?

Have a look at: