PDB-Tools
PDB-tool is set of python scripts dedicated at manipulating PDB files, select and rename chains and segids, renumber residues... and much more! (Rodrigues et al. F1000 Research (2018)) The source code can be obtain from its GitHub repository. Alternatively you can also make use of our new PDB-tools webserver.
In addition, it comes as one of the dependencies installed by default in your haddock3 environement. Therefore, once the environement is activated, you will be able to access all the functionalities from the command line.
Here is a list of all available command line interface installed together with haddock3:
- pdb_b: Modifies the temperature factor column of a PDB file (default 10.0).
- pdb_head: Returns the first N coordinate (ATOM/HETATM) lines of the file.
- pdb_rplchain: Performs in-place replacement of a chain identifier by another.
- pdb_selhetatm: Selects all HETATM records in the PDB file.
- pdb_splitmodel: Splits a PDB file into several, each containing one MODEL.
- pdb_chain: Modifies the chain identifier column of a PDB file (default is an empty chain).
- pdb_delres: Deletes a range of residues from a PDB file.
- pdb_intersect: Returns a new PDB file only with atoms in common to all input PDB files.
- pdb_rplresname: Performs in-place replacement of a residue name by another.
- pdb_selmodel: Extracts one or more models from a PDB file.
- pdb_splitseg: Splits a PDB file into several, each containing one segment.
- pdb_chainbows:
- pdb_delresname: Removes all residues matching the given name in the PDB file.
- pdb_keepcoord: Removes all non-coordinate records from the file.
- pdb_seg: Modifies the segment identifier column of a PDB file (default is an empty segment).
- pdb_selres: Selects residues by their index, piecewise or in a range.
- pdb_tidy: Modifies the file to adhere (as much as possible) to the format specifications.
- pdb_chainxseg: Swaps the segment identifier for the chain identifier.
- pdb_element: Assigns the elements in the PDB file from atom names.
- pdb_merge: Merges several PDB files into one.
- pdb_segxchain: Swaps the chain identifier by the segment identifier.
- pdb_selresname: Selects all residues matching the given name in the PDB file.
- pdb_tocif: Rudimentarily converts the PDB file to mmCIF format.
- pdb_chkensemble: Checks all models in a multi-model PDB file have the same composition.
- pdb_fetch: Downloads a structure in PDB format from the RCSB website.
- pdb_mkensemble: Merges several PDB files into one multi-model (ensemble) file.
- pdb_selaltloc: Selects altloc labels for the entire PDB file.
- pdb_selseg: Selects all atoms matching the given segment identifier.
- pdb_tofasta: Extracts the residue sequence in a PDB file to FASTA format.
- pdb_delchain: Deletes all atoms matching specific chains in the PDB file.
- pdb_fixinsert: Fixes insertion codes in a PDB file.
- pdb_occ: Modifies the occupancy column of a PDB file (default 1.0).
- pdb_selatom: Selects all atoms matching the given name in the PDB file.
- pdb_shiftres: Shifts the residue numbers in the PDB file by a constant value.
- pdb_uniqname: Renames atoms sequentially (C1, C2, O1, ...) for each HETATM residue.
- pdb_delelem: Deletes all atoms matching the given element in the PDB file.
- pdb_fromcif: Rudimentarily converts a mmCIF file to the PDB format.
- pdb_reatom: Renumbers atom serials in the PDB file starting from a given value (default 1).
- pdb_selchain: Extracts one or more chains from a PDB file.
- pdb_sort: Sorts the ATOM/HETATM/ANISOU/CONECT records in a PDB file.
- pdb_validate: Validates the PDB file ATOM/HETATM lines according to the format specifications.
- pdb_delhetatm: Removes all HETATM records in the PDB file.
- pdb_gap: Finds gaps between consecutive protein residues in the PDB.
- pdb_reres: Renumbers the residues of the PDB file starting from a given number (default 1).
- pdb_selelem: Selects all atoms that match the given element(s) in the PDB file.
- pdb_splitchain: Splits a PDB file into several, each containing one chain.
- pdb_wc: Summarizes the contents of a PDB file, like the wc command in UNIX.
pdb_b
Modifies the temperature factor column of a PDB file (default 10.0).
Usage:
python pdb_b.py -<bfactor> <pdb file>
Example:
python pdb_b.py -10.0 1CTF.pdb
pdb_head
Returns the first N coordinate (ATOM/HETATM) lines of the file.
Usage:
python pdb_head.py -<num> <pdb file>
Example:
python pdb_head.py -100 1CTF.pdb # first 100 ATOM/HETATM lines of the file
pdb_rplchain
Performs in-place replacement of a chain identifier by another.
Usage:
python pdb_rplchain.py -<from>:<to> <pdb file>
Example:
python pdb_rplchain.py -A:B 1CTF.pdb # Replaces chain A for chain B
pdb_selhetatm
Selects all HETATM records in the PDB file.
Usage:
python pdb_selhetatm.py <pdb file>
Example:
python pdb_selhetatm.py 1CTF.pdb
pdb_splitmodel
Splits a PDB file into several, each containing one MODEL.
Usage:
python pdb_splitmodel.py <pdb file>
Example:
python pdb_splitmodel.py 1CTF.pdb
pdb_chain
Modifies the chain identifier column of a PDB file (default is an empty chain).
Usage:
python pdb_chain.py -<chain id> <pdb file>
Example:
python pdb_chain.py -C 1CTF.pdb
pdb_delres
Deletes a range of residues from a PDB file.
The range option has three components: start, end, and step. Start and end are optional and if ommitted the range will start at the first residue or end at the last, respectively. The step option can only be used if both start and end are provided. Note that the start and end values of the range are purely numerical, while the range actually refers to every N-th residue, regardless of their sequence number.
Usage:
python pdb_delres.py -[resid]:[resid]:[step] <pdb file>
Example:
python pdb_delres.py -1:10 1CTF.pdb # Deletes residues 1 to 10
python pdb_delres.py -1: 1CTF.pdb # Deletes residues 1 to END
python pdb_delres.py -:5 1CTF.pdb # Deletes residues from START to 5.
python pdb_delres.py -::5 1CTF.pdb # Deletes every 5th residue
python pdb_delres.py -1:10:5 1CTF.pdb # Deletes every 5th residue from 1 to 10
pdb_intersect
Returns a new PDB file only with atoms in common to all input PDB files.
Atoms are judged equal is their name, altloc, res. name, res. num, insertion code and chain fields are the same. Coordinates are taken from the first input file. Keeps matching TER/ANISOU records.
Usage:
python pdb_intersect.py <pdb file> <pdb file>
Example:
python pdb_intersect.py 1XYZ.pdb 1ABC.pdb
pdb_rplresname
Performs in-place replacement of a residue name by another.
Affects all residues with that name.
Usage:
python pdb_rplresname.py -<from>:<to> <pdb file>
Example:
python pdb_rplresname.py -HIP:HIS 1CTF.pdb # changes all HIP residues to HIS
pdb_selmodel
Extracts one or more models from a PDB file.
If the PDB file has no MODEL records, returns the entire file.
Usage:
python pdb_selmodel.py -<model id> <pdb file>
Example:
python pdb_selmodel.py -1 1GGR.pdb # selects model 1
python pdb_selmodel.py -1,3 1GGR.pdb # selects models 1 and 3
pdb_splitseg
Splits a PDB file into several, each containing one segment.
Usage:
python pdb_splitseg.py <pdb file>
Example:
python pdb_splitseg.py 1CTF.pdb
pdb_chainbows
Renames chain identifiers sequentially, based on TER records.
Since HETATM records are not separated by TER records and usually come together at the end of the PDB file, this script will attempt to reassign their chain identifiers based on the changes it made to ATOM lines. This might lead to bad output in certain corner cases.
Usage:
python pdb_chainbows.py <pdb file>
Example:
python pdb_chainbows.py 1CTF.pdb
pdb_delresname
Removes all residues matching the given name in the PDB file.
Residues names are matched without taking into consideration spaces.
Usage:
python pdb_delresname.py -<option> <pdb file>
Example:
python pdb_delresname.py -ALA 1CTF.pdb # removes only Alanines
python pdb_delresname.py -ASP,GLU 1CTF.pdb # removes (-) charged residues
pdb_keepcoord
Removes all non-coordinate records from the file.
Keeps only MODEL, ENDMDL, END, ATOM, HETATM, CONECT.
Usage:
python pdb_keepcoord.py <pdb file>
Example:
python pdb_keepcoord.py 1CTF.pdb
pdb_seg
Modifies the segment identifier column of a PDB file (default is an empty segment).
Usage:
python pdb_seg.py -<segment id> <pdb file>
Example:
python pdb_seg.py -C 1CTF.pdb
pdb_selres
Selects residues by their index, piecewise or in a range.
The range option has three components: start, end, and step. Start and end are optional and if ommitted the range will start at the first residue or end at the last, respectively.
Usage:
python pdb_selres.py -[resid]:[resid]:[step] <pdb file>
Example:
python pdb_selres.py -1,2,4,6 1CTF.pdb # Extracts residues 1, 2, 4 and 6
python pdb_selres.py -1:10 1CTF.pdb # Extracts residues 1 to 10
python pdb_selres.py -1:10,20:30 1CTF.pdb # Extracts residues 1 to 10 and 20 to 30
python pdb_selres.py -1: 1CTF.pdb # Extracts residues 1 to END
python pdb_selres.py -:5 1CTF.pdb # Extracts residues from START to 5.
python pdb_selres.py -::5 1CTF.pdb # Extracts every 5th residue
python pdb_selres.py -1:10:5 1CTF.pdb # Extracts every 5th residue from 1 to 10
pdb_tidy
Modifies the file to adhere (as much as possible) to the format specifications.
Expects a sorted file - REMARK/ATOM/HETATM/END - so use pdb_sort in case you are not sure.
This includes: - Adding TER statements after chain breaks/changes - Truncating/Padding all lines to 80 characters - Adds END statement at the end of the file
Will remove all original TER/END statements from the file.
Usage:
python pdb_tidy.py [-strict] <pdb file>
Example:
python pdb_tidy.py 1CTF.pdb
python pdb_tidy.py -strict 1CTF.pdb # does not add TER on chain breaks
pdb_chainxseg
Swaps the segment identifier for the chain identifier.
Usage:
python pdb_chainxseg.py <pdb file>
Example:
python pdb_chainxseg.py 1CTF.pdb
pdb_element
Assigns the elements in the PDB file from atom names.
Usage:
python pdb_element.py <pdb file>
Example:
python pdb_element.py 1CTF.pdb
pdb_merge
Merges several PDB files into one.
The contents are not sorted and no lines are deleted (e.g. END, TER
statements) so we recommend piping the results through pdb_tidy.py
.
Usage:
python pdb_merge.py <pdb file> <pdb file>
Example:
python pdb_merge.py 1ABC.pdb 1XYZ.pdb
pdb_segxchain
Swaps the chain identifier by the segment identifier.
If the segment identifier is longer than one character, the script will truncate it. Does not ensure unique chain IDs.
Usage:
python pdb_segxchain.py <pdb file>
Example:
python pdb_segxchain.py 1CTF.pdb
pdb_selresname
Selects all residues matching the given name in the PDB file.
Residues names are matched without taking into consideration spaces.
Usage:
python pdb_selresname.py -<option> <pdb file>
Example:
python pdb_selresname.py -ALA 1CTF.pdb # keeps only Alanines
python pdb_selresname.py -ASP,GLU 1CTF.pdb # keeps (-) charged residues
pdb_tocif
Rudimentarily converts the PDB file to mmCIF format.
Will convert only the coordinate section.
Usage:
python pdb_tocif.py <pdb file>
Example:
python pdb_tocif.py 1CTF.pdb
pdb_chkensemble
Checks all models in a multi-model PDB file have the same composition.
Composition is defined as same atoms/residues/chains.
Usage:
python pdb_chkensemble.py <pdb file>
Example:
python pdb_chkensemble.py 1CTF.pdb
pdb_fetch
Downloads a structure in PDB format from the RCSB website.
Allows downloading the (first) biological structure if selected.
Usage:
python pdb_fetch.py [-biounit] <pdb code>
Example:
python pdb_fetch.py 1brs # downloads unit cell, all 6 chains
python pdb_fetch.py -biounit 1brs # downloads biounit, 2 chains
pdb_mkensemble
Merges several PDB files into one multi-model (ensemble) file.
Strips all HEADER information and adds REMARK statements with the provenance of each conformer.
Usage:
python pdb_mkensemble.py <pdb file> <pdb file>
Example:
python pdb_mkensemble.py 1ABC.pdb 1XYZ.pdb
pdb_selaltloc
Selects altloc labels for the entire PDB file.
By default, selects the label with the highest occupancy value for each atom, but the user can define a specific altloc label to select.
Selecting by highest occupancy removes all altloc labels for all atoms. If the user provides an option (e.g. -A), only atoms with conformers with an altloc A are processed by the script. If you select -A and an atom has conformers with altlocs B and C, both B and C will be kept in the output.
Usage:
python pdb_selaltloc.py [-<option>] <pdb file>
Example:
python pdb_selaltloc.py 1CTF.pdb # picks locations with highest occupancy
python pdb_selaltloc.py -A 1CTF.pdb # picks alternate locations labelled 'A'
pdb_selseg
Selects all atoms matching the given segment identifier.
Usage:
python pdb_selseg.py -<segment id> <pdb file>
Example:
python pdb_selseg.py -C 1CTF.pdb # selects segment C
python pdb_selseg.py -C,D 1CTF.pdb # selects segments C and D
pdb_tofasta
Extracts the residue sequence in a PDB file to FASTA format.
Canonical amino acids and nucleotides are represented by their one-letter code while all others are represented by 'X'.
The -multi option splits the different chains into different records in the FASTA file.
Usage:
python pdb_tofasta.py [-multi] <pdb file>
Example:
python pdb_tofasta.py 1CTF.pdb
pdb_delchain
Deletes all atoms matching specific chains in the PDB file.
Usage:
python pdb_delchain.py -<option> <pdb file>
Example:
python pdb_delchain.py -A 1CTF.pdb # removes chain A from PDB file
python pdb_delchain.py -A,B 1CTF.pdb # removes chains A and B from PDB file
pdb_fixinsert
Fixes insertion codes in a PDB file.
Works by deleting an insertion code and shifting the residue numbering of downstream residues. Allows for picking specific residues to delete insertion codes for.
Usage:
python pdb_fixinsert.py [-<option>] <pdb file>
Example:
python pdb_fixinsert.py 1CTF.pdb # delete ALL insertion codes
python pdb_fixinsert.py -A9,B12 1CTF.pdb # deletes ins. codes for res
# 9 of chain A and 12 of chain B.
pdb_occ
Modifies the occupancy column of a PDB file (default 1.0).
Usage:
python pdb_occ.py -<occupancy> <pdb file>
Example:
python pdb_occ.py -1.0 1CTF.pdb
pdb_selatom
Selects all atoms matching the given name in the PDB file.
Atom names are matched without taking into consideration spaces, so ' CA ' (alpha carbon) and 'CA ' (calcium) will both be kept if -CA is passed.
Usage:
python pdb_selatom.py -<option> <pdb file>
Example:
python pdb_selatom.py -CA 1CTF.pdb # keeps only alpha-carbon atoms
python pdb_selatom.py -CA,C,N,O 1CTF.pdb # keeps only backbone atoms
pdb_shiftres
Shifts the residue numbers in the PDB file by a constant value.
Usage:
python pdb_shiftres.py -<number> <pdb file>
Example:
python pdb_shiftres.py -10 1CTF.pdb # adds 10 to the original numbering
python pdb_shiftres.py --5 1CTF.pdb # subtracts 5 from the original numbering
pdb_uniqname
Renames atoms sequentially (C1, C2, O1, ...) for each HETATM residue.
Relies on an element column being present (see pdb_element).
Usage:
python pdb_uniqname.py <pdb file>
Example:
python pdb_uniqname.py 1CTF.pdb
pdb_delelem
Deletes all atoms matching the given element in the PDB file.
Elements are read from the element column.
Usage:
python pdb_delelem.py -<option> <pdb file>
Example:
python pdb_delelem.py -H 1CTF.pdb # deletes all protons
python pdb_delelem.py -N 1CTF.pdb # deletes all nitrogens
python pdb_delelem.py -H,N 1CTF.pdb # deletes all protons and nitrogens
pdb_fromcif
Rudimentarily converts a mmCIF file to the PDB format.
Will not convert if the file does not 'fit' in PDB format, e.g. too many chains, residues, or atoms. Will convert only the coordinate section.
Usage:
python pdb_fromcif.py <pdb file>
Example:
python pdb_fromcif.py 1CTF.pdb
pdb_reatom
Renumbers atom serials in the PDB file starting from a given value (default 1).
Usage:
python pdb_reatom.py -<number> <pdb file>
Example:
python pdb_reatom.py -10 1CTF.pdb # renumbers from 10
python pdb_reatom.py --1 1CTF.pdb # renumbers from -1
pdb_selchain
Extracts one or more chains from a PDB file.
Usage:
python pdb_selchain.py -<chain id> <pdb file>
Example:
python pdb_selchain.py -C 1CTF.pdb # selects chain C
python pdb_selchain.py -A,C 1CTF.pdb # selects chains A and C
pdb_sort
Sorts the ATOM/HETATM/ANISOU/CONECT records in a PDB file.
Atoms are always sorted by their serial number, meaning the original ordering of the atoms within each residue are not changed. Alternate locations are sorted by default.
Residues are sorted according to their residue sequence number and then by their insertion code (if any).
Chains are sorted by their chain identifier.
Finally, the file is sorted by all keys, and the records are placed in the following order:
- ATOM/ANISOU, intercalated if the latter exist
- HETATM
- CONECT, sorted by the serial number of the central (first) atom
MASTER, TER, END statements are removed. Headers (HEADER, REMARK, etc) are kept and placed first. Does NOT support multi-model files. Use pdb_splitmodel, then pdb_sort on each model, and then pdb_mkensemble.
Usage:
python pdb_sort.py -<option> <pdb file>
Example:
python pdb_sort.py 1CTF.pdb # sorts by chain and residues
python pdb_sort.py -C 1CTF.pdb # sorts by chain (A, B, C ...) only
python pdb_sort.py -R 1CTF.pdb # sorts by residue number/icode only
pdb_validate
Validates the PDB file ATOM/HETATM lines according to the format specifications.
Does not catch all the errors though... people are creative!
Usage:
python pdb_validate.py <pdb file>
Example:
python pdb_validate.py 1CTF.pdb
pdb_delhetatm
Removes all HETATM records in the PDB file.
Usage:
python pdb_delhetatm.py <pdb file>
Example:
python pdb_delhetatm.py 1CTF.pdb
pdb_gap
Finds gaps between consecutive protein residues in the PDB.
Detects gaps both by a distance criterion or discontinuous residue numbering. Only applies to protein residues.
Usage:
python pdb_gap.py <pdb file>
Example:
python pdb_gap.py 1CTF.pdb
pdb_reres
Renumbers the residues of the PDB file starting from a given number (default 1).
Usage:
python pdb_reres.py -<number> <pdb file>
Example:
python pdb_reres.py -10 1CTF.pdb # renumbers from 10
python pdb_reres.py --1 1CTF.pdb # renumbers from -1
pdb_selelem
Selects all atoms that match the given element(s) in the PDB file.
Elements are read from the element column.
Usage:
python pdb_selelem.py -<option> <pdb file>
Example:
python pdb_selelem.py -H 1CTF.pdb # selects all protons
python pdb_selelem.py -N 1CTF.pdb # selects all nitrogens
python pdb_selelem.py -H,N 1CTF.pdb # selects all protons and nitrogens
pdb_splitchain
Splits a PDB file into several, each containing one chain.
Usage:
python pdb_splitchain.py <pdb file>
Example:
python pdb_splitchain.py 1CTF.pdb
pdb_wc
Summarizes the contents of a PDB file, like the wc command in UNIX.
By default, this tool produces a general summary, but you can use several options to produce focused but more detailed summaries:
- [m] - no. of models.
- [c] - no. of chains (plus per-model if multi-model file).
- [r] - no. of residues (plus per-model if multi-model file).
- [a] - no. of atoms (plus per-model if multi-model file).
- [h] - no. of HETATM (plus per-model if multi-model file).
- [o] - presence of disordered atoms (altloc).
- [i] - presence of insertion codes.
Usage:
python pdb_wc.py [-<option>] <pdb file>
Options:
[m] - no. of models.
[c] - no. of chains (plus per-model if multi-model file).
[r] - no. of residues (plus per-model if multi-model file).
[a] - no. of atoms (plus per-model if multi-model file).
[h] - no. of HETATM (plus per-model if multi-model file).
[o] - presence of disordered atoms (altloc).
[i] - presence of insertion codes.
Example:
python pdb_wc.py 1CTF.pdb