PowerFit Tutorial

This tutorial consists of the following sections:

Introduction
Setup
Inspecting the data
Rigid body fitting
Analyzing the results
Integrative modeling with HADDOCK

Introduction

PowerFit is a software application developed to fit atomic resolution structures of biomolecules to cryo-electron microscopy (cryo-EM) density maps. It is open-source and available for download on Github.

This tutorial will show you how to utilize PowerFit by applying it to an E.coli ribosome case. To follow this tutorial, you need, in addition to PowerFit, the UCSF Chimera visualization software, a popular tool in the cryo-electron microscopy community for its volume visualization capabilities. We will further discuss the limits of rigid body fitting, and how HADDOCK can alleviate some of the shortcomings. We provide the data necessary to run this tutorial here. If you are following one of our workshops, where we use a Virtual Machine, then all the required software and data should already be installed.

The PowerFit and HADDOCK software are described in

G.C.P. van Zundert and A.M.J.J. Bonvin. Fast and sensitive rigid-body fitting into cryo-EM density maps with PowerFit. AIMS Biophysics. 2, 73-87 (2015).
G.C.P. van Zundert, A.S.J. Melquiond and A.M.J.J. Bonvin. Integrative modeling of biomolecular complexes: HADDOCKing with Cryo-EM data. Structure. 23, 949-960 (2015).

Throughout the tutorial, colored text will be used to refer to questions or instructions, Linux and/or Chimera commands.

This is a question prompt: try answering it! This an instruction prompt: follow it! This is a Chimera prompt: write this in the Chimera command line prompt! This is a Linux prompt: insert the commands in the terminal!

The case we will be investigating is a complex between the 30S maturing E. coli ribosome and KsgA, a methyltransferase. There are models available for the E. coli ribosome and KsgA, and a cryo-EM density map of around 13Å resolution (EMD-2017).

Setup

If you are using one of our pre-packed VM images, the data should be directly available in the image. We prepared a folder that contains the cryo-EM density map file in CCP4 format and the starting models of the ribosome and KsgA. The ribosome has already been properly fitted in the density.

Copy the data to the Desktop and then move the newly copied folder. cp -r /opt/powerfit-tutorial ~/Desktop
cd ~/Desktop/powerfit-tutorial

In case you might run this tutorial on your own, make sure to have the required software installed (UCSF Chimera and PowerFit), and download the data to run this tutorial from our GitHub data repository here or clone it from the command line

git clone https://github.com/haddocking/powerfit-tutorial

Inspecting the data

Let us first inspect the data we have available, namely the cryo-EM density map and the structures we will attempt to fit.

Using Chimera, we can easily visualize and inspect the density and models, mostly through a few mouse clicks.

Open the density map together with the ribosome and KsgA. chimera ribosome-KsgA.map ribosome.pdb KsgA.pdb

In the Volume Viewer window, the middle slide bar provides control on the value at which the isosurface of the density is shown. At high values, the envelope will sink while lower values might even display the noise in the map. We will first make the density transparent, to see the fitted structure inside:

Within the Volume Viewer click on the gray box next to Color, which opens the Color Editor window.
In there, check the Opacity box. An extra slider bar appears in the box called A, for the alpha channel.
Set the alpha channel value to around 0.6.

Notice that the density becomes transparent providing a better view of the fit of the ribosome model. On closer inspection, you can also discern a region of the density that is not accounted by the ribosome structure alone; this is the binding location of KsgA. Although you could try and manually place the crystal structure in that region, finding the correct orientation is not straightforward. PowerFit can help here as it attempts to find the best fit automatically and exhaustively, based on an objective score.

Rigid body fitting

PowerFit is a rigid body fitting software that quickly calculates the cross-correlation, a common measure of the goodness-of-fit, between the atomic structure and the density map. It performs a systematic 6-dimensional scan of the three translational and three rotational degrees of freedom. In short, PowerFit will try to fit the structure in many orientations at every position on the map and calculate a cross-correlation score for each of them.

Perform the rigid-body fitting of the KsgA structure on the cryo-EM density map. powerfit ribosome-KsgA.map 13 KsgA.pdb -d run-KsgA -a 20 -p 2 -l

While performing the search, PowerFit will update you on the progress of the search. The example case in this tutorial should run in 10 minutes. If the ETA on your screen is substantially lower, your computer might be fast enough to allow an increase in the rotational sampling interval to 10°.

While the calculation is running, open a second terminal window (or tab) and type powerfit --help to have a look at the several features and options of PowerFit and what each flag of the previous command means.

PowerFit requires three arguments: a high-resolution atomic structure of the biomolecule to be fitted (KsgA.pdb), a target cryo-EM density map to fit the structure in (ribosome-KsgA.map), and the resolution, in ångstrom, of the density map (13).

The -a (or --angle) option specifies the rotational sampling interval in degrees, i.e. how tightly the three rotational degrees of freedom will be sampled. Lower values will cause PowerFit to perform a finer search, at the expense of computational time. The default value is 10°, but it can be lowered to 5° for more sensitive searches, or raised to 20° if time is an issue or if there aren’t sufficient computational resources. For the sake of time in this tutorial, we set the sampling interval to this latter coarser value. The -d option specifies where the results will be stored while the -p option specifies the number of processors that PowerFit can use during the search, to leverage available CPU resources.

Finally, the -l flag applies a Laplace pre-filter on the density data, which increases the cross-correlation sensitivity by enhancing edges in the density. In this example scenario, all other options are left at their default values but feel free to explore them.

Analyzing the results

After the search, PowerFit creates a run-KsgA directory containing the following files:

fit_N.pdb: the best N fits, judged by the cross-correlation score.
solutions.out: all the non-redundant solutions found, ordered by their correlation score. The first column shows the rank, column 2 the correlation score, column 3 and 4 the Fisher z-score and the number of standard deviations; column 5 to 7 are the x, y and z coordinate of the center of the chain; column 8 to 17 are the rotation matrix values.
lcc.mrc: a cross-correlation map showing, at each grid position, the highest cross-correlation score found during the search, thus showing the most likely location of the center of mass of the structure.
powerfit.log: a log file of the calculation, including the input parameters with date and timing information.

Open the density map, the lcc.mrc cross-correlation map, and the 10 best-ranked solutions in Chimera. chimera ribosome-KsgA.map run-KsgA/lcc.mrc ribosome.pdb run-KsgA/fit_*.pdb

Make the density map transparent again, by adjusting the alpha channel value to 0.6. The values of the lcc.mrc slider bar correspond to the cross-correlation score found. In this way, you can selectively visualize regions of high or low cross-correlation values: i.e., pushing the slider to the right (higher cutoff) shows only regions of the grid with high cross-correlation scores.

As you can see, PowerFit found quite some local optima, one of which stands out (if the rotational search was tight enough). Further, the 10 best-ranked solutions are centered on regions corresponding to local cross-correlation maxima.

To view each fitted solution individually, in the main panel, go to Favorites → Model Panel to open the Model Panel window. The window shows each model and its associated color that Chimera has processed. To show or hide a specific model you can click the box in the S column.

Go through the 10 solutions one by one to appreciate their goodness-of-fit with the density. Do you agree with what PowerFit proposes as the best solution? In a new Chimera session, reopen the density map and the fit that you find best. Replace ? by the appropriate solution number. chimera ribosome-KsgA.map ribosome.pdb run-KsgA/fit_?.pdb

You now have combined the ribosome structure with the rigid-body fit of KsgA calculated by PowerFit, yielding an initial model of the complex. Mutagenesis experiments performed on this complex indicate three charged residues of KsgA - R221, R222, and K223 - that are of special importance for the interaction.

In the same session of Chimera where you have your chosen fitted KsgA structure, go to Favorites → Command Line. A command line is now present below the main viewing window. In the command line of Chimera, type the following instructions to center your view on these residues and highlight their interactions:

show #2:221-223 zr<5 & #1 || #2:221-223
center #2:221-223 zr<5 & #1 || #2:221-223 Take some time to inspect the model, paying particular attention to these three residues and their spatial neighbors. Are there any clashes between the ribosome and KsgA chains? Is the mutagenesis data explained by the model, i.e. are the three charged amino acids involved in strong interactions?

Chimera also includes a tool to locally optimize the fit of a rigid structure against a given density map, which can be an additional help on top of the PowerFit calculations. Make the main display window active by clicking on it, then go to Tools → Volume data → Fit in Map. In the newly opened Fit in Map window, select the best-fitted structure of PowerFit (fit_?.pdb) as Fit model and the original density map (ribosome-KsgA.map) as the map. Press Fit to start the optimization.

Does the Chimera local fit optimization tool improve the results of PowerFit?

The scoring function used by Chimera to estimate the quality of the fit makes our model worse, increasing the number of clashes between the ribosomal RNA and KsgA. Click Undo in the Fit in Map window to undo the optimization.

Next, we will try to optimize the fit using the cross-correlation that Chimera provides. Click Options and check the Use map simulated from atoms, resolution box and fill in 13 for resolution. Check the correlation radio button and uncheck the Use only data above contour level from first map. Press Fit.

Does this second strategy improve the quality of the fit? If not, undo it again.

Integrative modeling with HADDOCK

The obvious limitation of rigid-body fitting is that it cannot account for any conformational changes the structures might undergo. Further, the low resolution of this particular density map does not allow the identification of side-chain atoms. The quality of the fitted models by PowerFit is, therefore, limited.

Given the availability of both the cryo-EM density map and of the mutagenesis experiments, we can integrate both in HADDOCK and benefit of its semi-flexible refinement protocols to improve the stereochemistry of our model. To use cryo-EM data, HADDOCK requires the map and also the approximate positions of each chain, as given by their centers of mass. This information is provided directly by PowerFit, in the solutions.out file, columns 5 to 7 (x, y, z coordinates):

head -n 10 run-KsgA/solutions.out

Unfortunately, running HADDOCK is out of the scope of this tutorial as it requires a significant amount of time. Therefore, we provide the best-ranked HADDOCK model, generated by combining the cryo-EM map, the PowerFit centroid positions, and the mutagenesis data, in the tutorial data folder.

Open the density map in Chimera and load the best-ranked HADDOCK model. chimera ribosome-KsgA.map HADDOCK-ribosome.pdb HADDOCK-KsgA.pdb Does HADDOCK improve the quality of the model, i.e. are the number of clashes reduced? Are the three residues identified by mutagenesis involved in any energetically favourable interaction?

Finally, to make the impact of HADDOCK more quantitative, we will make a distance histogram of the contacts between the ribosome and KsgA. First, combine the ribosome together with your preferred fitted model.

cat ribosome.pdb run-KsgA/fit_?.pdb > ribosome-KsgA.pdb

To calculate all the contacts within a 5.0Å cutoff distances, we make use of a standard tool (contact-chainID) that is shipped with HADDOCK.

./contact-chainID ribosome-KsgA.pdb 5.0 > ribosome-KsgA.contacts

Now we can generate the histogram, and visualize it with xmgrace

./make-contact-histogram.csh ribosome-KsgA.contacts
xmgrace ribosome-KsgA-contacts-histogram.xmgr Are there any clashes to be found in the model? An interaction is typically considered clashing if the distance is smaller than 2.8Å.

For the HADDOCK model we already combined the ribosome and KsgA (HADDOCK-ribosome-KsgA.pdb).

Make a distance histogram for the HADDOCK generated model. Are there any clashes found for the HADDOCK model?

The combination of cryo-EM and mutagenesis data, a physics-based force field, and a semi-flexible refinement protocol improves the quality of the resulting models. In this tutorial, we showed you how to use PowerFit to fit high-resolution structures to a cryo-EM density map and how to interpret the results. Further, we also showed how integrative modeling using HADDOCK can improve the stereochemistry of the models, in particular if done in combination with additional experimental data, such as mutagenesis.

Thank you for following this tutorial. If you have any questions or suggestions, feel free to contact us via email or by submitting an issue in the appropriate Github repository.