LightDock+HADDOCK membrane proteins tutorial

This tutorial consists of the following sections:

Introduction
LightDock general concepts
Setup/Requirements
Data preparation
LightDock simulation
HADDOCK Refinement
Analysis
Conclusions
Congratulations!

Introduction

This tutorial demonstrates the use of LightDock for predicting the structure of membrane receptor–soluble protein complex using the topological information provided by the membrane to guide the modeling process. The resulting LightDock models are then refined using HADDOCK2.4 webserver. We will be following the protocol described in Roel-Touris et al, 2020.

Membrane proteins are among the most challenging systems to study with experimental structural biology techniques, thus computational techniques such as docking might offer invaluable insights on the modeling of those systems.

**Fig.1** 3X29 complex in a lipid bilayer as simulated by MemProtMD.

In this tutorial we will be working with the crystal structure of Mus musculus Claudin-19 transmembrane protein (PDB code 3X29, chain A) in complex with the unbound C-terminal fragment of the Clostridium perfringens Enteroxin (PDB code 2QUO, chain A). The PDB code of the complex is 3X29 (chains A and B).

3X29 complex is one of the cases covered in the MemCplxDB membrane protein complex benchmark (Koukos et al, 2018). Despite not being one of the most challenging cases covered in the benchmark in terms of flexibility, its relatively small size will help us describing the complete modeling protocol in the short time intended for a tutorial.

For this tutorial we will make use of the HADDOCK2.4 webserver and LightDock software.

R.V. Honorato, M.E. Trellet, B. Jiménez-García1, J.J. Schaarschmidt, M. Giulini, V. Reys, P.I. Koukos, J.P.G.L.M. Rodrigues, E. Karaca, G.C.P. van Zundert, J. Roel-Touris, C.W. van Noort, Z. Jandová, A.S.J. Melquiond and A.M.J.J. Bonvin. The HADDOCK2.4 web server: A leap forward in integrative modelling of biomolecular complexes. Nature Prot., Advanced Online Publication DOI: 10.1038/s41596-024-01011-0 (2024).

The integrative approach followed in this tutorial is described here:

J. Roel-Touris, B. Jimenez-Garcia and A.M.J.J. Bonvin. Integrative modeling of membrane-associated protein assemblies. Nat. Commun., 11, 6210 (2020).

LightDock docking framework is described in the following publications:

J. Roel-Touris, A.M.J.J. Bonvin and B. Jimenez-Garcia. LightDock goes information-driven. Bioinformatics, 36:3 950-952 (2020).
B. Jimenez-Garcia, J. Roel-Touris, M. Romero-Durana, M. Vidal, D. Jimenez-Gonzalez, J. Fernandez-Recio. LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics, 34:1 49-55 (2018).

Throughout the tutorial, colored text will be used to refer to questions or instructions, commands to be executed in the terminal, and/or PyMOL commands.

This is a question prompt: try answering it! This an instruction prompt: follow it! This is a PyMOL prompt: write this in the PyMOL command line prompt! This is a Linux prompt: insert the commands in the terminal!

LightDock general concepts

LightDock is a macromolecular docking software written in the Python programming language, designed as a framework for rapid prototyping and test of scientific hypothesis in structural biology. It was designed to be easily extensible by users and scalable for high-throughput computing (HTC). LightDock is capable of modeling backbone flexibility of molecules using anisotropic model networks (ANM) and the energetic minimization is based on the Glowworm Swarm Optimization (GSO) algorithm.

LightDock protocol is divided in two main steps: setup and simulation. On setup step, input PDB structures for receptor and ligand partners are parsed and prepared for the simulation. Moreover, a set of swarms is arranged around the receptor surface. Each of these swarms represents an independent simulation in LightDock where a fixed number of agents, called glowworms, encodes for a possible receptor-ligand pose. During the simulation step, each of these glowworms will sample a given region of the energetic landscape depending on its neighboring glowworms.

**Fig.2** A receptor surface showing only two swarms. Each swarm contains a set of glowworms representing a possible receptor-ligand pose.

Swarms on the receptor surface can be easily filtered according to regions of interest. Figure 2 shows an example where only two swarms have been calculated to focus on two residues of interest on the receptor partner (depicted in orange). On this tutorial we will explore this capability in order to filter out incompatible transmembrane binding regions in membrane complex docking.

For more information about LightDock, please visit the tutorials section.

Setup/Requirements

In order to run this tutorial you will need to have the following software installed: LightDock, pdb-tools and PyMOL.

Also, if not provided with special workshop credentials to use the HADDOCK portal, make sure to register in order to be able to submit jobs. Use for this the following registration page: https://haddock.science.uu.nl/auth/register/haddock.

Installing LightDock

LightDock is distributed as a Python package through the Python Package Index (PyPI) repository.

Command line

Installing LightDock is as simple as creating a virtual environment for Python 3.6+ and running pip command (make sure your instances of virtualenv and pip are for Python 3.6+ versions). We will install the version 0.9.2post1 of LightDock which is the first released version with support for the membrane protocol and execution in Jupyter Notebooks (see next section):

virtualenv venv
source venv/bin/activate
pip install numpy && pip install lightdock==0.9.2post1

If the installation finished without errors, you should be able to execute LightDock in the terminal:

lightdock3.py -v

and to see an output similar to this:

lightdock3 0.9.2post1

Jupyter Notebook and Google Colab

Another option to use LightDock is through Google Colaboratory (“Colab” for short) which allows you to write and execute Python in the browser using notebooks. In case of choosing this option, simply execute in a new notebook in the first cell the following command:

!pip install lightdock==0.9.2post1

Installing PDB-Tools

PDB-Tools is a set of Python scripts for manipulating PDB files following the philosophy of one script, one task. For different manipulating tasks on a PDB file, the procedure would be to pipe the different PDB-Tools scripts to accomplish the different tasks.

PDB-Tools is distributed as a PyPI package. To install it, simply:

pip install pdb-tools

or alternatively in a Google Colab notebook:

!pip install pdb-tools

Tutorial Notebook

We have prepared a Colab Notebook ready to be imported. Download it: tutorial.ipynb

In case of using the provided Colab notebook, go to the Colab site and upload the provided notebook (see A):

Once imported, click on the icon pointed in (B) and upload the required files as marked in (C) by the orange arrow. The two required files (3x29_receptor_membrane.pdb and 3x29_ligand.pdb) should appear now as in (D).

Then, run one by one each of the code cells as several libraries will be installed in the first code cells.

Data preparation

We will make use of the 3X29 complex simulated in a membrane lipid bilayer from the MemProtMD database (Newport et al., 2018).

First, browse the 3X29 complex page at MemProtMD and locate the Data Download section.

Download the PDB file corresponding to the Coarse-grained snapshot (MARTINI representation)

This file in PDB format contains the MARTINI coarse-grained (CG) representation of the phospholipid bilayer membrane and the protein complex. We will use the phosphate beads as the boundary for the transmembrane region for filtering the sampling region of interest in LightDock.

**Fig.3** MARTINI Coarse-grained representation of the 3X29 complex in a lipid bilayer. Protein is depicted as blue surface, CG beads for phospholipids in white, except for NC3 beads in turquoise and PO4 beads in orange.

We have prepared a Python script to parse, rename and remove non-necessary beads for the membrane protocol in LightDock: prepare4lightdock.py. You will need to execute it in your terminal using the 3x29_default_dppc-coarsegrained.pdb PDB file as input:

python3 prepare4lightdock.py 3x29_default_dppc-coarsegrained.pdb membrane_cg.pdb

The output of this script is the membrane_cg.pdb PDB file (Figure 4).

**Fig.4** Lipid bilayer membrane and protein after using the `prepare4lightdock.py` script. Protein is depicted as blue surface (only CA), membrane beads ready for LightDock in orange.

The last step will be to open the just generated membrane_cg.pdb file in PyMOL to align the atomistic representation of the 3X29 receptor partner to the membrane CG bilayer:

pymol 3x29_receptor.pdb membrane_cg.pdb

Now execute the following commands on PyMOL to align the membrane with the atomistic receptor and saving the resulting PDB structure:

align 3x29_receptor and name CA, membrane_cg and name CA remove membrane_cg and name CA save 3x29_receptor_membrane.pdb, all

The last PyMOL command will save the aligned atomistic 3X29 receptor to the CG lipid bilayer: 3x29_receptor_membrane.pdb.

LightDock simulation

Simulation set up

The fist step in any LightDock simulation is setup. We will make use of lightdock3_setup.py command to initialize our 3X29 membrane simulation and the required input data is:

Use the lightdock3_setup.py command to set up the LightDock simulation:

lightdock3_setup.py 3x29_receptor_membrane.pdb 3x29_ligand.pdb --noxt --noh --membrane

In short, we are indicating to the setup command to use 3x29_receptor_membrane.pdb as the receptor partner, 3x29_ligand.pdb as the ligand, to skip NOXT and hydrogen atoms and to detect membrane beads with the --membrane flag. The output of the command should look similar to this:

[lightdock3_setup] INFO: Ignoring OXT atoms
[lightdock3_setup] INFO: Ignoring Hydrogen atoms
[lightdock3_setup] INFO: Reading structure from 3x29_receptor_membrane.pdb PDB file...
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue GLN.61
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue GLN.63
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.65
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LEU.66
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue ASP.68
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue HIS.76
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue MET.95
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue MET.102
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.103
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.115
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue ARG.117
[lightdock3_setup] INFO: 1608 atoms, 601 residues read.
[lightdock3_setup] INFO: Ignoring OXT atoms
[lightdock3_setup] INFO: Ignoring Hydrogen atoms
[lightdock3_setup] INFO: Reading structure from 3x29_ligand.pdb PDB file...
[lightdock3_setup] INFO: 933 atoms, 117 residues read.
[lightdock3_setup] INFO: Calculating reference points for receptor 3x29_receptor_membrane.pdb...
[lightdock3_setup] INFO: Reference points for receptor found, skipping
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: Calculating reference points for ligand 3x29_ligand.pdb...
[lightdock3_setup] INFO: Reference points for ligand found, skipping
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: Saving processed structure to PDB file...
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: Saving processed structure to PDB file...
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: Calculating starting positions...
[lightdock3_setup] INFO: Generated 62 positions files
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: Number of calculated swarms is 62
[lightdock3_setup] INFO: Preparing environment
[lightdock3_setup] INFO: Done.
[lightdock3_setup] INFO: LightDock setup OK

In previous versions of LightDock, the number of swarms of the simulated was given by the user (typically around 400), but since version 0.9.0, the number of swarms of the simulation is automatically calculated depending on the surface area of the receptor structure. However, the number of swarms can be fixed by the user using the -s flag for reproducibility of old results purpose. Another difference with previous versions is that the number of glowworms is now default to 200. This value has been extensively tested on our previous work, but it may be defined by the user as well using the -g flag.

A complete list of lightdock3_setup.py command options might be obtained executing the command without arguments or with the --help flag:

usage: lightdock3_setup [-h] [-s SWARMS] [-g GLOWWORMS] [--seed_points STARTING_POINTS_SEED] [--noxt] [--noh] [--verbose_parser] [-anm] [--seed_anm ANM_SEED] [-ar ANM_REC]
                        [-al ANM_LIG] [-r restraints] [-membrane] [-transmembrane] [-sp] [-sd SURFACE_DENSITY] [-sr SWARM_RADIUS]
                        receptor_pdb_file ligand_pdb_file
lightdock3_setup: error: the following arguments are required: receptor_pdb_file, ligand_pdb_file

The setup command has generated several files and directories:

What is the content of the setup.json file?

What does the init directory contains?

We may visualize the distribution of swarms over the receptor:

pymol lightdock_3x29_receptor_membrane.pdb init/swarm_centers.pdb

**Fig.5** Distribution of swarms in the current simulation.

Is this a regid-body or a flexible simulation?

Running the simulation

The simulation is ready to run at this point. The number of swarms after focusing on the cytosolic region of the membrane is of 62.

LightDock optimization strategy (using the GSO algorithm) is agnostic of the scoring function (force-field). There are several scoring functions available at LightDock, from atomistic to coarse-grained. In this tutorial we will make use of fastdfire, which is the implementation of DFIRE using the Python C API and the default if no scoring function is specified by the user. Find here a complete list of the current supported scoring functions by LightDock.

Simulation is the most time-consuming part of the protocol. For that reason, we will only simulate one of the 62 total swarms. Pick a swarm number between [0..61] and use that id in the -l argument:

lightdock3.py setup.json 100 -c 1 -s fastdfire -l 60

In the command above, we specify the JSON file of the simulation (setup.json), the number of steps of the simulation (100), the number of CPU cores to use (-c 1), the scoring function (-s fastdfire). If no -l argument is provided, the protocol would simulate all the swarms.

For your convenience, you can download the full run as a compressed file (45MB).

Once the simulation has finished, navigate to the swarm_60 directory (or the one you have selected) and list the directory.

How many gso_* files have been generated? Which one corresponds to the last step of the simulation?

Generating models

Once the simulation has finished successfully, it is time to generate the predicted models. For each swarm, there is a gso_100.out file containing the information to generate as many models as glowworms were defined in the simulation (200 in this tutorial). The command in charge of generating the models is lgd_generate_conformations.py.

Pick a swarm folder and generate the 200 models simulated as in step 100:

lgd_generate_conformations.py 3x29_receptor_membrane.pdb 3x29_ligand.pdb swarm_60/gso_100.out 200

You should see an output similar to this:

[generate_conformations] INFO: Reading ../lightdock_3x29_receptor_membrane.pdb receptor PDB file...
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue GLN.61
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue GLN.63
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.65
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LEU.66
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue ASP.68
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue HIS.76
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue MET.95
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue MET.102
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.103
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue LYS.115
[pdb] WARNING: Possible problem: [SideChainError] Incomplete sidechain for residue ARG.117
[generate_conformations] INFO: 1608 atoms, 601 residues read.
[generate_conformations] INFO: Reading ../lightdock_3x29_ligand.pdb ligand PDB file...
[generate_conformations] INFO: 933 atoms, 117 residues read.
[generate_conformations] INFO: Read 200 coordinate lines
[generate_conformations] INFO: Generated 200 conformations

Clustering models

To remove very similar and redundant models in the same swarm, we will cluster the 200 generated models:

lgd_cluster_bsas.py swarm_60/gso_100.out

After a verbose output of the command above, a new file cluster.repr is generated inside the swarm_60 folder. This file should look like this:

0:3:26.80832:115:lightdock_115.pdb
1:9:24.45152:42:lightdock_42.pdb
2:62:22.70320:37:lightdock_37.pdb
3:35:20.35832:0:lightdock_0.pdb
4:41:16.69347:38:lightdock_38.pdb
5:3:15.71026:79:lightdock_79.pdb
6:7:13.89057:72:lightdock_72.pdb
7:7:11.84427:164:lightdock_164.pdb
8:1: 8.69611:92:lightdock_92.pdb
9:1: 2.92199:137:lightdock_137.pdb
10:22:-0.00771:95:lightdock_95.pdb
11:2:-24.59441:93:lightdock_93.pdb
12:7:-31.35821:57:lightdock_57.pdb

Each line represents a different cluster and lines are sorted from best to worst energy. For each line, there is information about the cluster id, the number of structures in the cluster, the best energy of the cluster, the glowworm id of the model with best energy and the PDB file name of the structure with best energy.

Open the best predicted model for this swarm in PyMOL and have a look.

pymol swarm_60/lightdock_115.pdb

How does this model look in general? What about the side chains?

HADDOCK (see https://www.bonvinlab.org/software/haddock2.4) is a collection of python scripts derived from ARIA that harness the power of CNS (Crystallography and NMR System – https://cns-online.org) for structure calculation of molecular complexes. What distinguishes HADDOCK from other docking software is its ability, inherited from CNS, to incorporate experimental data as restraints and use these to guide the docking process alongside traditional energetics and shape complementarity. Moreover, the intimate coupling with CNS endows HADDOCK with the ability to actually produce models of sufficient quality to be archived in the Protein Data Bank. In this tutorial we will make use of HADDOCK for removing clashes at the interface of protein complexes generated by rigid-body docking approaches.

Data preparation

The first step to refine out top predicted models by LightDock will be to prepare a multi-model ensemble PDB file containing those top predicted models.

First, download and decompress the provided complete run.
Using pdb-tools, we will remove MMB fake bead residues, copy the chain ID into the segid field and finally creating an ensemble of the top 20 models (we provide the generated top20_ensemble.pdb for your convenience):

cd clustered; pdb_mkensemble `head -20 rank_clustered.list | awk '{printf "%s%s",sep,$1; sep=" "} END{print ""}'` | pdb_delresname -MMB | pdb_chainxseg > top20_ensemble.pdb

Please note that the structures are located inside the clustered directory.

HADDOCK2.4 web server

We will make use of the HADDOCK2.4 web interface to set up the final refinement step of the membrane protocol. Please make sure you are already registered and authenticated on the HADDOCK2.4 server. To start the submission select the Input data tab at https://wenmr.science.uu.nl/haddock2.4/submit.

Input data

Step1: Define a name for your docking run in the field “Job name”, e.g. 3x29-Lightdock-CG-refine.
Step2: Select the number of molecules to dock, in this case the default 2.
Step3: Input the first protein PDB file. For this unfold the Molecule 1 - input if it isn’t already unfolded.

First molecule: where is the structure provided? -> “I am submitting it” Which chain to be used? -> A (for this particular case) PDB structure to submit -> Browse and select top20_ensemble.pdb (the file you edited to modify the histidine) Do you want to coarse-grain your molecule? -> Switch on

Note: Leave all other options to their default values.

Step4: Input the second protein PDB file. For this unfold the Molecule 2 - input if it isn’t already unfolded.

First molecule: where is the structure provided? -> “I am submitting it” Which chain to be used? -> B (for this particular case) PDB structure to submit -> Browse and select top20_ensemble.pdb (the file you saved)

Step 5: Click on the “Next” button at the bottom left of the interface. This will upload the structures to the HADDOCK webserver where they will be processed and validated (checked for formatting errors). The server makes use of Molprobity to check side-chain conformations, eventually swap them (e.g. for asparagines) and define the protonation state of histidine residues.

Input parameters

Step 6: Click on the “Next” button at the bottom left of the interface. Leave everything as if in this second step form and simply click on Next. This is a step intended for defining residues involved in binding for docking. Ignore as well any warning about the protonation of the system if it is the case.

Docking parameters

In this third and final step, we will need to set several options on three main sections: Distance restraints, Sampling parameters and Advanced sampling parameters.

Step 7: Modify the Distance restraints settings (for this unfold the menu if it isn’t already unfolded).

Turn off Randomly exclude a fraction of the ambiguous restraints (AIRs) Turn on Define surface contact restraints to enforce contact between the molecules

Step 8: Modify the Sampling parameters (for this unfold the menu if it isn’t already unfolded).

Number of structures for rigid body docking -> 200 Number of trials for rigid body minimisation -> 10 Number of structures for semi-flexible refinement -> 200 Number of structures for the final refinement -> 200 Number of structures to analyze -> 200

Step 9: Modify the Advanced sampling parameters (for this unfold the menu if it isn’t already unfolded).

Turn off Perform cross-docking Turn off Randomize starting orientations Turn off Perform initial rigid body minimisation Turn off Allow translation in rigid body minimisation Number of MD steps for rigid body high temperature TAD -> 0 Number of MD steps during first rigid body cooling stage -> 0 Number of MD steps during second cooling stage with flexible side-chains at interface -> 0 Number of MD steps during third cooling stage with fully flexible interface -> 0

Job submission

This interface allows us to modify many parameters that control the behavior of HADDOCK but in our case only the above changes are required and all other parameters can be left to their default values. The interface also allows us to download the input structures of the docking run (in the form of a tgz archive) and a haddockparameter file which contains all the settings and input structures for our run (in JSON format). We strongly recommend to download this file as it will allow you to repeat the run after uploading into the file upload interface of the HADDOCK webserver. It can serve as input reference for the run. This file can also be edited to change a few parameters for example. An excerpt of this file is shown here:

{
  "runname": "3x29-Lightdock-CG-refine",
  "auto_passive_radius" : 6.5,
  "create_narestraints" : true,
  "delenph": true,
  "ranair" : false,
  "cmrest" : false,
  "kcont" : 1.0,
  "surfrest" : true,
  "ksurf" : 1.0,
  "noecv" : false,
  "ncvpart" : 2.0,
  "structures_0" : 500,
  "ntrials" : 5,
  ...
}

Step 10: Click on the “Submit” button at the bottom left of the interface.

Upon submission you will be presented with a web page which also contains a link to the previously mentioned haddockparameter file as well as some information about the status of the run.

Currently your run should be queued but eventually its status will change to “Running”:

The page will automatically refresh and the results will appear upon completions (which can take between 1/2 hour to several hours depending on the size of your system and the load of the server). You will be notified by email once your job has successfully completed.

Results

Depending on the server load, your refinement job may take some time, but you will receive an email once the job has completed (and the results page will be automatically refreshed).

For your convenience, we provide the refinement job already calculated for you: https://wenmr.science.uu.nl/haddock2.4/run/4242424242/52544-3x29-Lightdock-CG-refine

Inspect the results page

How many clusters are generated?

Analysis

The results page for the refinement job displays a very useful summary of the energies of the top ranked clusters according to HADDOCK scoring energy. For each cluster, the mean and standard deviation of the HADDOCK score for this cluster is presented. The more the negative this score is, the better. The total energy (and the partial contributions from electrostatics, Van der Waals, desolvation, etc.) will give you a qualitative measure of the goodness of the models, but it is important to notice that the quality of the model does not guarantee it to be a near native solution.

Which metric to your knowledge might classify a model as a near native one?

In the CAPRI (Critical Prediction of Interactions) Méndez et al. 2003 experiment, one of the parameters used is the Ligand root-mean-square deviation (L-RMSD) which is calculated by superimposing the structures onto the backbone atoms of the receptor and calculating the RMSD on the backbone residues of the ligand. To calculate the L-RMSD it is possible to either use the software Profit or Pymol.

We will have a quick look at the top 10 models predicted by LightDock and the top 10 refined by the HADDOCK server and compare them with the 3X29 complex reference. Below you will find a table with the top 10 models according to the LightDock and HADDOCK refinement ranking (click on a name to download):

Top	Docking (LightDock)	Refinement (HADDOCK)
1	swarm_22_112.pdb	cluster17_1.pdb
2	swarm_37_11.pdb	cluster2_1.pdb
3	swarm_39_11.pdb	cluster16_1.pdb
4	swarm_60_115.pdb	cluster15_1.pdb
5	swarm_54_167.pdb	cluster14_1.pdb
6	swarm_37_34.pdb	cluster10_1.pdb
7	swarm_55_181.pdb	cluster13_1.pdb
8	swarm_60_42.pdb	cluster11_1.pdb
9	swarm_37_169.pdb	cluster1_1.pdb
10	swarm_37_83.pdb	cluster9_1.pdb

You can also download as compressed files:

Visualizing and aligning in PyMOL

First, open the target and reference structures in PyMOL (in this case top 1 from LightDock ranking):

pymol swarm_22_112.pdb 3x29_reference.pdb

Color by chain (and leave orange the membrane beads): util.cbc color orange, swarm_22_112 and name BJ

Align both structures to the receptor partner (chain A): align swarm_22_112 and chain A, 3x29_reference and chain A

Center visualization: z vis

Calculating L-RMSD in PyMOL

From the alignment of the previous section, we can easily calculate the L-RMSD in PyMOL:

First, remove all segid information to let PyMOL correctly find the target chains: alter all, segi = ‘ ‘

And now calculate L-RMSD using rms_cur command: rms_cur swarm_22_112 and chain B, 3x29_reference and chain B

Which leaves a L-RMSD of 22.7Å.

**Fig.7** L-RMSD calculation in PyMOL of the top 1 model against the 3X29 reference structure.

Repeat the same process for more structure on the top 10 ranking from LightDock and HADDOCK refinement

See the calculated L-RMSDs:

HADDOCK:
cluster17_1.pdb  22.531Å
cluster2_1.pdb   13.404Å
cluster16_1.pdb  15.082Å
cluster15_1.pdb  24.945Å
cluster14_1.pdb  24.022Å
cluster10_1.pdb   6.082Å
cluster13_1.pdb  20.806Å
cluster11_1.pdb  15.325Å
cluster1_1.pdb   28.809Å
cluster9_1.pdb   25.826Å

LightDock:
swarm_22_112.pdb 22.551Å
swarm_37_11.pdb  11.850Å
swarm_39_11.pdb  13.424Å
swarm_60_115.pdb  5.735Å
swarm_54_167.pdb 25.031Å
swarm_37_34.pdb  28.587Å
swarm_55_181.pdb 20.470Å
swarm_60_42.pdb  10.801Å
swarm_37_169.pdb 14.894Å
swarm_37_83.pdb  25.857Å

Which is the best structure in terms of L-RMSD in the HADDOCK ranking? And in the LightDock ranking?

In CAPRI, the L-RMSD value defines the quality of a model:

incorrect model: L-RMSD>10Å
acceptable model: L-RMSD<10Å
medium quality model: L-RMSD<5Å
high quality model: L-RMSD<1Å

What is the quality of these models? Did any model pass the acceptable threshold?

A more in deep look

The best models in terms of quality (L-RMSD) from the LightDock and HADDOCK rankings are swarm_60_115.pdb and cluster10_1.pdb respectively.

Open them in PyMOL, align both structures as explained before and compare both models qualitatively.

How similar do they look to you?

Now use the lines visualization in PyMOL:

show lines

Have a close look at the interface of both models. You may visualize atoms clashing in PyMOL:

show sphere, (chain A within 2.5 of chain B) or (chain B within 2.5 of chain A)

What is the best model in terms of clashes?

**Fig.8** Detail of the interface (in yellow the HADDOCK refined model).

Conclusions

We have demonstrated the use of membrane information on a protein docking simulation and how to further refine predicted models using a coarse-grained protocol.

It is of paramount importance not to blindly trust docking predictions and rankings. Always inspect the results and predictions and further assess their quality using more metrics and different criteria if possible.

Congratulations!

You have completed this tutorial. If you have any questions or suggestions, feel free to contact us via email or asking a question through our support center.

And check also our education web page where you will find more tutorials!

LightDock+HADDOCK membrane proteins tutorial

Introduction

LightDock general concepts

Setup/Requirements

Installing LightDock

Command line

Jupyter Notebook and Google Colab

Installing PDB-Tools

Tutorial Notebook

Data preparation

LightDock simulation

Simulation set up

Running the simulation

Generating models

Clustering models

HADDOCK Refinement

Data preparation

HADDOCK2.4 web server

Input data

Input parameters

Docking parameters

Job submission

Results

Analysis

Visualizing and aligning in PyMOL

Calculating L-RMSD in PyMOL

A more in deep look

Conclusions

Congratulations!