HADDOCK2.2 manual
Ambiguous Interaction Restraints (AIRs)
General:
Before starting HADDOCK, Ambiguous Interaction Restraints (AIRs) should be generated. For this, it is important to define the residues at the interface for each molecule based on NMR chemical shift perturbation data, mutagenesis data or any kind of data that provides information on the interaction interface.
In the definition of those residues, one distinguishes between "active" and "passive" residues.
- The "active" residues are those experimentally identified
to be involved in the interaction between the two molecules AND solvent
accessible (either main chain or side chain relative accessibility
should be typically > 40-50%).
- Note that the accessibility cutoff is not a hard limit. You
should carefully check the identity of the residues at the interface and possibly include
residues with lower accessibility if they carry potentially important functional groups.
- The "passive" residues are all solvent accessible surface neighbors of active residues.
Note that the active and passive residues have to be defined by the users based on their own interpretation of the experimental data, especially in the case of NMR titration data. One way to interpret the significance of the shift is to calculate the average perturbation and to consider that all perturbations higher than the average are significant.
Once you have defined your active and passive residues,
- go to the HADDOCK online page
(https://www.bonvinlab.org/software/haddock2.2/haddock-start)
- and click on "Generate AIR restraint file" (or in case of multibody docking, use the specific option)
- Enter the residue numbers corresponding to the active and passive residues for each molecule.
- Define the upper distance limit for AIRs (maximum distance between any
atom of an active residue of one molecule to any atom of an active or passive residue of the
second molecule).
Note that the current upper distance limit default value is 2A, which might seem quite short, BUT remember that the effective distance deff will always be shorter than the shortest distance entering the sum:
- deff=[Sum(1/r6)]-1/6
In addition since the degree of ambiguity is very high (several thousands distances can enter the sum), the effective distance can be quite shorter than the shortest distance entering the sum!!!
- Finally, click on generate AIR restraints. An AIR restraint file in CNS format is generated. Use "copy and paste" or save the generate AIR restraints to disk using "file save as".
Use of bioinformatic interface predictions:
In absence of any experimental information on the interaction surfaces, you might want to try to predict them based on sequence conservation and other properties. We have developed for this purpose interface prediction softwares called WHISCY and CPORT. They has been designed to provide an easy interface to HADDOCK and will output, among others, lists of active and passive residues for HADDOCK. CPORT is a meta predictor that integrates results from various other servers.
For more information refer to:
- S.J. de Vries, A.D.J. van Dijk and A.M.J.J. Bonvin
"WHISCY: WHat Information does Surface Conservation Yield? Application to data-driven docking."
Proteins: Struc. Funct. & Bioinformatics, 63, 479-489 (2006).
- S.J. de Vries and A.M.J.J. Bonvin
"CPORT: a Consensus Interface Predictor and its Performance in Prediction-driven Docking with HADDOCK."
PlosONE, 6 e17695 (2011).
Random AIR definition (ab-initio mode):
In the absence of any experimental and/or bioinformatic information to drive the docking, HADDOCK 2.x now offers the possibility to randomly define AIRs from solvent accessible residues (>20% relative accessibility). For each docking trial another set of AIRs will be used. These restraints are defined in the randomairs.cns CNS script.
The sampling of residues is limited to the defined semi-flexible segments (nseg_X and following parameters in run.cns). If no semi-flexible segment is defined, then all solvent accessible residues will be sampled (provided enough structures are generated in the rigid-body docking stage (it0)). By defining semi-flexible segments in combination with random AIR definition (ranair=true in run.cns), it is possible to limit the sampling to a selected region of the surface (e.g. the CDR loops in an antibody-antigen complex).
The random AIRs are defined (in the randomairs.cns CNS script) as follow (only for the rigid-body energy minimization stage):
- One residue on each molecule is selected randomly (Ai,Bi)
- All surface neighbors within 5A are also selected
- AIRs are defined between each residue selected from molecule A (Ai + 5A neighbors) and
the first residue randomly selected from molecule B and all its surface neighbors within
a 7.5A cutoff (Bi + 7.5A neighbors)
- AIRs are defined between each residue selected from molecule B (Bi + 5A neighbors) and the first residue randomly selected from molecule A and all its surface neighbors within a 7.5A cutoff (Ai + 7.5A neighbors)
For the semi-flexible refinement stage, contact AIRs are automatically defined between all residues within 5A across the interface. In the final explicit solvent refinement, no AIR restraints will be defined.
Surface contact restraints:
Surface contact restraints between the various molecules can be automatically defined in HADDOCK 2.x (surfrest=true in run.cns). These restraints are defined in the surf-restraint.cns CNS script. This option is fully compatible with all other types of restraints.
If turned on, one surface contact restraint will be defined between each molecule as an ambiguous distance restraint with sum averaging (as for the AIRs) between all CA or P atoms (protein and/or DNA) of one molecule and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected instead. The upper distance limit is set to either 7A (both molecules contain CA and/or P atoms) or 4.5A (only one molecule contains CA and/or P atoms) or 2A (no molecule contains CA and/or P atoms).
Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus
promote compactness of the docking solutions. As for the random AIRs, surface contact
restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of
the random starting orientations and this significantly increases the number of structures for rigid-body docking.
Center of mass restraints:
Center of mass restraints between the various molecules can be automatically defined in HADDOCK 2.x (cmrest=true in run.cns). These restraints are defined in the cm-restraint.cns CNS script. This option is fully compatible with all other types of restraints.
If turned on, one center of mass restraint will be defined between each molecule as an ambiguous distance
restraint with center averaging between all CA or P atoms (protein and/or DNA) of one molecule
and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected
instead. The upper distance limit is automatically defined as the sum of the "effective radius" of each molecule.
The "effective radius" is defined as half the average length of the three principal components.
Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus
promote compactness of the docking solutions. As for the random AIRs, center of mass
restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of
the random starting orientations and this increase significantly the number of structures for rigid-body docking.
Use of NMR chemical shift perturbation data:
We will here illustrate the process of defining AIRs in the case of NMR chemical shift perturbation data (CSP) describing the following steps:
- Defining residues with "significant" chemical shift perturbations
- Filtering active residues with solvent accessibility
- Defining passive residues
- Residue filtering from an ensemble of structures
- Generating the AIR restraint file
1. Defining residues with "significant" chemical shift perturbations
We will assume that we have a file called csp.dat containing the combined proton/nitrogen chemical shift changes as obtained from 15N HSQC titration experiments in the following format:
1 0.0 2 0.0 3 0.06 4 0.3 ...The first column corresponds to the residue number and the second to the combined chemical shift perturbation.
HADDOCK comes with a number of awk,csh and perl scripts to handle and analyze data. To calculate the average perturbation use the average.perl script located in $HADDOCKTOOLS (see installation). The following command will give you the average of the second column of the above file:
awk '{print $2}' csp.dat | $HADDOCKTOOLS/average.perlSelect then all residues that have a combined chemical shift perturbation larger than for example the average value avcsp:
awk '{if ($2>avcsp) print $0}' csp.datThis will list you all the residues selected.
The next step consists of filtering those residues according to their solvent accessibility.
2. Filtering active residues with solvent accessibility
An important parameter in defining AIRs consists of the relative residue solvent accessibility. It can be calculated with the program NACCESS (see software links).
NACCESS will output a file with extension .rsa containing the per- residue solvent accessibilities divided into various classes:
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar REM ABS REL ABS REL ABS REL ABS REL ABS REL RES MET 1 125.45 64.6 75.64 48.3 49.81 132.8 75.64 47.9 49.81 137.1 RES PHE 2 83.49 41.9 83.49 50.9 0.00 0.0 83.49 50.5 0.00 0.0 RES GLN 3 79.31 44.4 62.27 44.2 17.04 45.4 17.75 34.0 61.56 48.7 RES GLN 4 83.82 47.0 83.82 59.4 0.00 0.0 15.03 28.8 68.79 54.5 RES GLU 5 133.48 77.5 100.65 74.7 32.83 87.5 34.78 57.7 98.70 88.2 RES VAL 6 20.78 13.7 20.78 18.2 0.00 0.0 20.78 18.0 0.00 0.0 ...Only the high solvent accessible amino acids should be selected. The selection can be done either on the all-atoms accessibilities (e.g. >40%) using the following command at the Unix prompt:
awk '{if (NF==13 && $5>40) print $0; if (NF==14 && $6>40) print $0}' pdb_filename.rsaor by requesting that either the main-chain or the side-chain relative accessibility be larger than 50%:
awk '{if (NF==13 && ($7>40 || $9>40)) print $0; if (NF==14 && ($8>40 || $10>40)) print $0}' pdb_filename.rsaBy combining the experimental data (mutagenesis or chemical shift perturbation) and the solvent accessibility, you should be able to define precisely the active residues to use in HADDOCK.
3. Defining passive residues
The passive residues are all solvent accessible surface neighbors of active residues. To define them you can display your molecule in space-filling model (rasmol will do) and color the active residues for example in red.
Then, filter out the residues having a low solvent accessibility (colored yellow in the figure). | |
Select then all surface neighbors to define the passive residues (colored green in the figure) | |
and filter them with the solvent accessibility criterion (see above). |
4. Residue filtering from an ensemble of structures
If you perform the docking from an ensemble of structures, the solvent accessibility filtering should be performed using the average relative accessibilities ASAav over the ensemble. In such a case we are using the following accessibility cut-off:
- ASAav + SD > 40% (for either all or main-chain or side-chain atoms)
where SD corresponds to the standard deviation.
We are providing in the $HADDOCKTOOLS directory a csh script called calc_ave_asa.csh that will allow you to calculate the average accessibilities from an ensemble of structures using NACCESS.
To do so, you should split your pdb file into different files containing each one structure and then use calc_ave_rsa.csh:
$HADDOCKTOOLS/calc_ave_rsa.csh *.pdbA file named "rsa_ave.lis" will be created that contains the average solvent accessibility and the standard deviation for each residue:
# resnam resnum < rsa_all > (sd) < rsa_back > (sd) < rsa_side > (sd) MET 1 69.323 10.370 125.390 13.626 55.903 12.599 PHE 2 37.490 5.216 0.320 0.753 45.500 6.390 GLN 3 53.793 8.246 50.147 14.108 54.770 10.873 GLN 4 40.907 5.578 0.070 0.306 51.757 7.042 GLU 5 70.330 6.312 68.017 15.608 70.963 7.614 VAL 6 16.183 4.345 0.133 0.483 21.397 5.791 ...To select the residues that satisfy the 40% accessibility cut-off type:
awk '{if (($5+$6)>=40 || ($7+$8)>=40) print $0}' rsa_ave.lisNote that the 40% cut-off is not a hard limit and is left to the user choice.
5. Generating the AIR restraints file
Once you have defined your active and passive residues, go to the HADDOCK online page (https://www.bonvinlab.org/software/haddock2.2/haddock-start) and click on "Generate AIR restraint file"
Enter the residue numbers corresponding to the active and passive residues for each molecule.
You can define the upper distance limit for AIRs (maximum distance between any atom of an active residue of one molecule to any atom of an active or passive residues of the second molecule).
click on generate AIR restraints. An AIR restraint file in CNS format is generated. Use "copy and paste" or save the generate AIR restraints to disk using "file save as".