A swiss army knife for manipulating and editing PDB files.
Installation Instructions
pdb-tools
are available on PyPi and can be installed though pip
. This is the
recommended way as it makes updating/uninstalling rather simple:
Because we use semantic versioning in the development of pdb-tools
, every bugfix
or new feature results in a new version of the software that is automatically published
on PyPI. As such, there is no difference between the code on github and the latest version
you can install with pip
. To update your installation to the latest version of the code
run:
pip install --upgrade pdb-tools
What can I do with them?
The purpose of each tool should be obvious from its name. In any case, here
is a list of all the tools in the suite and their function. All tools share the
same command-line interface. Below are a couple of examples to get you started. If you want to
check out more examples of how to use the tools and their applications, or have any cool examples
of your own, check out the cookbook.
- Downloading a structure
pdb_fetch 1brs > 1brs.pdb # 6 chains
pdb_fetch -biounit 1brs > 1brs.pdb # 2 chains
- Renumbering a structure
pdb_reres -1 1ctf.pdb > 1ctf_renumbered.pdb
- Selecting chain(s)
pdb_selchain -A 1brs.pdb > 1brs_A.pdb
pdb_selchain -A,D 1brs.pdb > 1brs_AD.pdb
- Deleting hydrogens
pdb_delelem -H 1brs.pdb > 1brs_noH.pdb
- Selecting backbone atoms
pdb_selatom -CA,C,N,O 1brs.pdb > 1brs_bb.pdb
- Selecting chains, removing HETATM, and producing a valid PDB file
pdb_selchain -A,D 1brs.pdb | pdb_delhetatm | pdb_tidy > 1brs_AD_noHET.pdb
Note: On Windows the tools will have the .exe
extension.
What can’t I do with them?
Operations that involve coordinates or numerical calculations are usually not in
the scope of pdb-tools
. Use a proper library for that, it will be much faster
and scalable. Also, although we provide mmCIF<->PDB converters, we do not support
large mmCIF files with more than 99999 atoms, or 9999 residues in a single chain.
Our tools will complain if you try using them on such a molecule.
About
Manipulating PDB files is often painful. Extracting a particular chain or set of
residues, renumbering residues, splitting or merging models and chains, or just
ensuring the file is conforming to the PDB specifications are examples of tasks
that can be done using any decent parsing library or graphical interface. These,
however, almost always require 1) scripting knowledge, 2) time, and 3) installing
one or more programs.
pdb-tools
were designed to be a swiss army knife for the PDB format. The
philosophy of the scripts is simple: one script, one task. If you want to do two
things, pipe the scripts together. Requests for new scripts will be taken into
consideration - use the Issues button or write them yourself and create a Pull
Request.
The Harms lab maintains a set of tools also called pdbtools
, which perform a
slightly different set of functions. You can find them here.
Citation
We finally decided to write up a small publication describing the tools. If you
used them in a project that is going to be published, please cite us:
Rodrigues JPGLM, Teixeira JMC, Trellet M and Bonvin AMJJ.
pdb-tools: a swiss army knife for molecular structures.
F1000Research 2018, 7:1961 (https://doi.org/10.12688/f1000research.17456.1)
If you use a reference manager that supports BibTex, use this record:
@Article{ 10.12688/f1000research.17456.1,
AUTHOR = { Rodrigues, JPGLM and Teixeira, JMC and Trellet, M and Bonvin, AMJJ},
TITLE = {pdb-tools: a swiss army knife for molecular structures [version 1; peer review: 2 approved]
},
JOURNAL = {F1000Research},
VOLUME = {7},
YEAR = {2018},
NUMBER = {1961},
DOI = {10.12688/f1000research.17456.1}
}
Requirements
pdb-tools
should run on Python 2.7+ and Python 3.x. We test on Python 2.7, 3.6,
and 3.7. There are no dependencies.
Installing from Source
Download the zip archive or clone the repository with git. We recommend the git
approach since it makes updating the tools extremely simple.
# To download
git clone https://github.com/haddocking/pdb-tools
cd pdb-tools
# To update
git pull origin master
# To install
python setup.py install
Contributing
If you want to contribute to the development of pdb-tools
, provide a bug fix,
or a new tools, read our CONTRIBUTING
instructions here.
License
pdb-tools
are open-source and licensed under the Apache License, version 2.0.
For details, see the LICENSE file.
pdb_bModifies the temperature factor column of a PDB file (default 10.0).
Usage:
python pdb_b.py -<bfactor> <pdb file>
Example:
python pdb_b.py -10.0 1CTF.pdb
pdb_chainModifies the chain identifier column of a PDB file (default is an empty chain).
Usage:
python pdb_chain.py -<chain id> <pdb file>
Example:
python pdb_chain.py -C 1CTF.pdb
pdb_chainbowsRenames chain identifiers sequentially, based on TER records.
Since HETATM records are not separated by TER records and usually come together
at the end of the PDB file, this script will attempt to reassign their chain
identifiers based on the changes it made to ATOM lines. This might lead to bad
output in certain corner cases.
Usage:
python pdb_chainbows.py <pdb file>
Example:
python pdb_chainbows.py 1CTF.pdb
pdb_chainxsegSwaps the segment identifier for the chain identifier.
Usage:
python pdb_chainxseg.py <pdb file>
Example:
python pdb_chainxseg.py 1CTF.pdb
pdb_chkensembleChecks all models in a multi-model PDB file have the same composition.
Composition is defined as same atoms/residues/chains.
Usage:
python pdb_chkensemble.py <pdb file>
Example:
python pdb_chkensemble.py 1CTF.pdb
pdb_delchainDeletes all atoms matching specific chains in the PDB file.
Usage:
python pdb_delchain.py -<option> <pdb file>
Example:
python pdb_delchain.py -A 1CTF.pdb # removes chain A from PDB file
python pdb_delchain.py -A,B 1CTF.pdb # removes chains A and B from PDB file
pdb_delelemDeletes all atoms matching the given element in the PDB file.
Elements are read from the element column.
Usage:
python pdb_delelem.py -<option> <pdb file>
Example:
python pdb_delelem.py -H 1CTF.pdb # deletes all protons
python pdb_delelem.py -N 1CTF.pdb # deletes all nitrogens
python pdb_delelem.py -H,N 1CTF.pdb # deletes all protons and nitrogens
pdb_delhetatmRemoves all HETATM records in the PDB file.
Usage:
python pdb_delhetatm.py <pdb file>
Example:
python pdb_delhetatm.py 1CTF.pdb
pdb_delinsertionDeletes insertion codes in a PDB file.
Deleting an insertion code shifts the residue numbering of downstream
residues. Allows for picking specific residues to delete insertion codes for.
Usage:
python pdb_delinsertion.py [-<option>] <pdb file>
Example:
python pdb_delinsertion.py 1CTF.pdb # delete ALL insertion codes
python pdb_delinsertion.py -A9,B12 1CTF.pdb # deletes ins. codes for res
# 9 of chain A and 12 of chain B.
pdb_delresDeletes a range of residues from a PDB file.
The range option has three components: start, end, and step. Start and end
are optional and if ommitted the range will start at the first residue or
end at the last, respectively. The step option can only be used if both start
and end are provided. Note that the start and end values of the range are
purely numerical, while the range actually refers to every N-th residue,
regardless of their sequence number.
Usage:
python pdb_delres.py -[resid]:[resid]:[step] <pdb file>
Example:
python pdb_delres.py -1:10 1CTF.pdb # Deletes residues 1 to 10
python pdb_delres.py -1: 1CTF.pdb # Deletes residues 1 to END
python pdb_delres.py -:5 1CTF.pdb # Deletes residues from START to 5.
python pdb_delres.py -::5 1CTF.pdb # Deletes every 5th residue
python pdb_delres.py -1:10:5 1CTF.pdb # Deletes every 5th residue from 1 to 10
pdb_delresnameRemoves all residues matching the given name in the PDB file.
Residues names are matched *without* taking into consideration spaces.
Usage:
python pdb_delresname.py -<option> <pdb file>
Example:
python pdb_delresname.py -ALA 1CTF.pdb # removes only Alanines
python pdb_delresname.py -ASP,GLU 1CTF.pdb # removes (-) charged residues
pdb_elementAssigns the elements in the PDB file from atom names.
Usage:
python pdb_element.py <pdb file>
Example:
python pdb_element.py 1CTF.pdb
pdb_fetchDownloads a structure in PDB format from the RCSB website.
Allows downloading the (first) biological structure if selected.
Usage:
python pdb_fetch.py [-biounit] <pdb code>
Example:
python pdb_fetch.py 1brs # downloads unit cell, all 6 chains
python pdb_fetch.py -biounit 1brs # downloads biounit, 2 chains
pdb_fixinsertFixes insertion codes in a PDB file.
Works by deleting an insertion code and shifting the residue numbering of
downstream residues. Allows for picking specific residues to delete insertion
codes for.
Usage:
python pdb_fixinsert.py [-<option>] <pdb file>
Example:
python pdb_fixinsert.py 1CTF.pdb # delete ALL insertion codes
python pdb_fixinsert.py -A9,B12 1CTF.pdb # deletes ins. codes for res
# 9 of chain A and 12 of chain B.
pdb_fromcifRudimentarily converts a mmCIF file to the PDB format.
Will not convert if the file does not 'fit' in PDB format, e.g. too many
chains, residues, or atoms. Will convert only the coordinate section.
Usage:
python pdb_fromcif.py <pdb file>
Example:
python pdb_fromcif.py 1CTF.pdb
pdb_gapFinds gaps between consecutive protein residues in the PDB.
Detects gaps both by a distance criterion or discontinuous residue numbering.
Only applies to protein residues.
Usage:
python pdb_gap.py <pdb file>
Example:
python pdb_gap.py 1CTF.pdb
pdb_headReturns the first N coordinate (ATOM/HETATM) lines of the file.
Usage:
python pdb_head.py -<num> <pdb file>
Example:
python pdb_head.py -100 1CTF.pdb # first 100 ATOM/HETATM lines of the file
pdb_intersectReturns a new PDB file only with atoms in common to all input PDB files.
Atoms are judged equal is their name, altloc, res. name, res. num, insertion
code and chain fields are the same. Coordinates are taken from the first input
file. Keeps matching TER/ANISOU records.
Usage:
python pdb_intersect.py <pdb file> <pdb file>
Example:
python pdb_intersect.py 1XYZ.pdb 1ABC.pdb
pdb_keepcoordRemoves all non-coordinate records from the file.
Keeps only MODEL, ENDMDL, END, ATOM, HETATM, CONECT.
Usage:
python pdb_keepcoord.py <pdb file>
Example:
python pdb_keepcoord.py 1CTF.pdb
pdb_mergeMerges several PDB files into one.
The contents are not sorted and no lines are deleted (e.g. END, TER
statements) so we recommend piping the results through `pdb_tidy.py`.
Usage:
python pdb_merge.py <pdb file> <pdb file>
Example:
python pdb_merge.py 1ABC.pdb 1XYZ.pdb
pdb_mkensembleMerges several PDB files into one multi-model (ensemble) file.
Strips all HEADER information and adds REMARK statements with the provenance
of each conformer.
Usage:
python pdb_mkensemble.py <pdb file> <pdb file>
Example:
python pdb_mkensemble.py 1ABC.pdb 1XYZ.pdb
pdb_occModifies the occupancy column of a PDB file (default 1.0).
Usage:
python pdb_occ.py -<occupancy> <pdb file>
Example:
python pdb_occ.py -1.0 1CTF.pdb
pdb_reatomRenumbers atom serials in the PDB file starting from a given value (default 1).
Usage:
python pdb_reatom.py -<number> <pdb file>
Example:
python pdb_reatom.py -10 1CTF.pdb # renumbers from 10
python pdb_reatom.py --1 1CTF.pdb # renumbers from -1
pdb_reresRenumbers the residues of the PDB file starting from a given number (default 1).
Usage:
python pdb_reres.py -<number> <pdb file>
Example:
python pdb_reres.py -10 1CTF.pdb # renumbers from 10
python pdb_reres.py --1 1CTF.pdb # renumbers from -1
pdb_rplchainPerforms in-place replacement of a chain identifier by another.
Usage:
python pdb_rplchain.py -<from>:<to> <pdb file>
Example:
python pdb_rplchain.py -A:B 1CTF.pdb # Replaces chain A for chain B
pdb_rplresnamePerforms in-place replacement of a residue name by another.
Affects all residues with that name.
Usage:
python pdb_rplresname.py -<from>:<to> <pdb file>
Example:
python pdb_rplresname.py -HIP:HIS 1CTF.pdb # changes all HIP residues to HIS
pdb_segModifies the segment identifier column of a PDB file (default is an empty segment).
Usage:
python pdb_seg.py -<segment id> <pdb file>
Example:
python pdb_seg.py -C 1CTF.pdb
pdb_segxchainSwaps the chain identifier by the segment identifier.
If the segment identifier is longer than one character, the script will
truncate it. Does not ensure unique chain IDs.
Usage:
python pdb_segxchain.py <pdb file>
Example:
python pdb_segxchain.py 1CTF.pdb
pdb_selaltlocSelects altloc labels for the entire PDB file.
By default, selects the label with the highest occupancy value for each atom,
but the user can define a specific altloc label to select.
Selecting by highest occupancy removes all altloc labels for all atoms. If the
user provides an option (e.g. -A), only atoms with conformers with an altloc A
are processed by the script. If you select -A and an atom has conformers with
altlocs B and C, both B and C will be kept in the output.
Usage:
python pdb_selaltloc.py [-<option>] <pdb file>
Example:
python pdb_selaltloc.py 1CTF.pdb # picks locations with highest occupancy
python pdb_selaltloc.py -A 1CTF.pdb # picks alternate locations labelled 'A'
pdb_selatomSelects all atoms matching the given name in the PDB file.
Atom names are matched *without* taking into consideration spaces, so ' CA '
(alpha carbon) and 'CA ' (calcium) will both be kept if -CA is passed.
Usage:
python pdb_selatom.py -<option> <pdb file>
Example:
python pdb_selatom.py -CA 1CTF.pdb # keeps only alpha-carbon atoms
python pdb_selatom.py -CA,C,N,O 1CTF.pdb # keeps only backbone atoms
pdb_selchainExtracts one or more chains from a PDB file.
Usage:
python pdb_selchain.py -<chain id> <pdb file>
Example:
python pdb_selchain.py -C 1CTF.pdb # selects chain C
python pdb_selchain.py -A,C 1CTF.pdb # selects chains A and C
pdb_selelemSelects all atoms that match the given element(s) in the PDB file.
Elements are read from the element column.
Usage:
python pdb_selelem.py -<option> <pdb file>
Example:
python pdb_selelem.py -H 1CTF.pdb # selects all protons
python pdb_selelem.py -N 1CTF.pdb # selects all nitrogens
python pdb_selelem.py -H,N 1CTF.pdb # selects all protons and nitrogens
pdb_selhetatmSelects all HETATM records in the PDB file.
Usage:
python pdb_selhetatm.py <pdb file>
Example:
python pdb_selhetatm.py 1CTF.pdb
pdb_selmodelExtracts one or more models from a PDB file.
If the PDB file has no MODEL records, returns the entire file.
Usage:
python pdb_selmodel.py -<model id> <pdb file>
Example:
python pdb_selmodel.py -1 1GGR.pdb # selects model 1
python pdb_selmodel.py -1,3 1GGR.pdb # selects models 1 and 3
pdb_selresSelects residues by their index, piecewise or in a range.
The range option has three components: start, end, and step. Start and end
are optional and if ommitted the range will start at the first residue or
end at the last, respectively.
Usage:
python pdb_selres.py -[resid]:[resid]:[step] <pdb file>
Example:
python pdb_selres.py -1,2,4,6 1CTF.pdb # Extracts residues 1, 2, 4 and 6
python pdb_selres.py -1:10 1CTF.pdb # Extracts residues 1 to 10
python pdb_selres.py -1:10,20:30 1CTF.pdb # Extracts residues 1 to 10 and 20 to 30
python pdb_selres.py -1: 1CTF.pdb # Extracts residues 1 to END
python pdb_selres.py -:5 1CTF.pdb # Extracts residues from START to 5.
python pdb_selres.py -::5 1CTF.pdb # Extracts every 5th residue
python pdb_selres.py -1:10:5 1CTF.pdb # Extracts every 5th residue from 1 to 10
pdb_selresnameSelects all residues matching the given name in the PDB file.
Residues names are matched *without* taking into consideration spaces.
Usage:
python pdb_selresname.py -<option> <pdb file>
Example:
python pdb_selresname.py -ALA 1CTF.pdb # keeps only Alanines
python pdb_selresname.py -ASP,GLU 1CTF.pdb # keeps (-) charged residues
pdb_selsegSelects all atoms matching the given segment identifier.
Usage:
python pdb_selseg.py -<segment id> <pdb file>
Example:
python pdb_selseg.py -C 1CTF.pdb # selects segment C
python pdb_selseg.py -C,D 1CTF.pdb # selects segments C and D
pdb_shiftresShifts the residue numbers in the PDB file by a constant value.
Usage:
python pdb_shiftres.py -<number> <pdb file>
Example:
python pdb_shiftres.py -10 1CTF.pdb # adds 10 to the original numbering
python pdb_shiftres.py --5 1CTF.pdb # subtracts 5 from the original numbering
pdb_sortSorts the ATOM/HETATM/ANISOU/CONECT records in a PDB file.
Atoms are always sorted by their serial number, meaning the original ordering
of the atoms within each residue are not changed. Alternate locations are sorted
by default.
Residues are sorted according to their residue sequence number and then by their
insertion code (if any).
Chains are sorted by their chain identifier.
Finally, the file is sorted by all keys, and the records are placed in the
following order:
- ATOM/ANISOU, intercalated if the latter exist
- HETATM
- CONECT, sorted by the serial number of the central (first) atom
MASTER, TER, END statements are removed. Headers (HEADER, REMARK, etc) are kept
and placed first. Does NOT support multi-model files. Use pdb_splitmodel, then
pdb_sort on each model, and then pdb_mkensemble.
Usage:
python pdb_sort.py -<option> <pdb file>
Example:
python pdb_sort.py 1CTF.pdb # sorts by chain and residues
python pdb_sort.py -C 1CTF.pdb # sorts by chain (A, B, C ...) only
python pdb_sort.py -R 1CTF.pdb # sorts by residue number/icode only
pdb_splitchainSplits a PDB file into several, each containing one chain.
Usage:
python pdb_splitchain.py <pdb file>
Example:
python pdb_splitchain.py 1CTF.pdb
pdb_splitmodelSplits a PDB file into several, each containing one MODEL.
Usage:
python pdb_splitmodel.py <pdb file>
Example:
python pdb_splitmodel.py 1CTF.pdb
pdb_splitsegSplits a PDB file into several, each containing one segment.
Usage:
python pdb_splitseg.py <pdb file>
Example:
python pdb_splitseg.py 1CTF.pdb
pdb_tidyModifies the file to adhere (as much as possible) to the format specifications.
Expects a sorted file - REMARK/ATOM/HETATM/END - so use pdb_sort in case you are
not sure.
This includes:
- Adding TER statements after chain breaks/changes
- Truncating/Padding all lines to 80 characters
- Adds END statement at the end of the file
Will remove all original TER/END statements from the file.
Usage:
python pdb_tidy.py [-strict] <pdb file>
Example:
python pdb_tidy.py 1CTF.pdb
python pdb_tidy.py -strict 1CTF.pdb # does not add TER on chain breaks
pdb_tocifRudimentarily converts the PDB file to mmCIF format.
Will convert only the coordinate section.
Usage:
python pdb_tocif.py <pdb file>
Example:
python pdb_tocif.py 1CTF.pdb
pdb_tofastaExtracts the residue sequence in a PDB file to FASTA format.
Canonical amino acids and nucleotides are represented by their
one-letter code while all others are represented by 'X'.
The -multi option splits the different chains into different records in the
FASTA file.
Usage:
python pdb_tofasta.py [-multi] <pdb file>
Example:
python pdb_tofasta.py 1CTF.pdb
pdb_uniqnameRenames atoms sequentially (C1, C2, O1, ...) for each HETATM residue.
Relies on an element column being present (see pdb_element).
Usage:
python pdb_uniqname.py <pdb file>
Example:
python pdb_uniqname.py 1CTF.pdb
pdb_validateValidates the PDB file ATOM/HETATM lines according to the format specifications.
Does not catch all the errors though... people are creative!
Usage:
python pdb_validate.py <pdb file>
Example:
python pdb_validate.py 1CTF.pdb
pdb_wcSummarizes the contents of a PDB file, like the wc command in UNIX.
By default, this tool produces a general summary, but you can use several
options to produce focused but more detailed summaries:
[m] - no. of models.
[c] - no. of chains (plus per-model if multi-model file).
[r] - no. of residues (plus per-model if multi-model file).
[a] - no. of atoms (plus per-model if multi-model file).
[h] - no. of HETATM (plus per-model if multi-model file).
[o] - presence of disordered atoms (altloc).
[i] - presence of insertion codes.
Usage:
python pdb_wc.py [-<option>] <pdb file>
Example:
python pdb_wc.py 1CTF.pdb