PDB preprocessing
Process input PDB files to ensure compatibility with HADDOCK3.
This module checks and modifies PDB files for compatibility with HADDOCK3. There are three types of checks/modifications:
Performed to each PDB line-by-line, in a equal fashion of
pdb-tools
. In fact, this step mostly uses thepdb-tools
package.Performed on each PDB as a whole.
Performed on all PDBs together.
Main functions
read_additional_residues()
Corrections performed on 1)
The following actions are perfomed sequentially over all PDBs:
from
pdb-tools
:pdb_keepcoord
from
pdb-tools
:pdb_tidy
withstrict=True
from
pdb-toos
:pdb_element
from
pdb-tools
:pdb_selaltloc
from
pdb-tools
:pdb_pdb_occ
withoccupancy=1.00
replace
MSE
toMET
replace
HSD
toHIS
replace
HSE
toHIS
replace
HID
toHIS
replace
HIE
toHIS
add_charges_to_ions, see
add_charges_to_ions()
convert
ATOM
toHETATM
for those atoms that should beHETATM
. Considers the additional residues provided by the user. Seeconvert_ATOM_to_HETATM()
.convert
HETATM
toATOM
for those atoms that should beATOM
,from
pdb-toos
:pdb_fixinsert
, withoption_list=[]
.remove unsupported
HETATM
. Considers residues provided by the user.remove unsupported
ATOM
. Considers residues provided by the user.from
pdb-tools
:pdb_reatom
, start from1
.from
pdb-tools
:pdb_tidy
withstrict=True
Corrections performed on 2)
The following actions are performed sequentially for each PDB:
Read the documentation of the above functions for details what they do.
Corrections performed on 3)
The following actions are performed to all PDBs together:
Read the documentation of the above functions for details what they do.
When it happens
The PDB processing step is performed by default when reading the input molecules and copying them to the data/ folder inside the run directory. When PDBs are processed, a copy of the original input PDBs is also stored in the data/ folder.
To deactivate this initial PDB processing, set skip_preprocess = False
in the general parameters of the configuration file.
Additional information
If you are a developer and want to read more about the history of this preprocessing module, visit:
- exception haddock.gear.preprocessing.ModelsDifferError[source]
Bases:
HaddockError
MODELS of the PDB differ in atom labels.
- haddock.gear.preprocessing.add_charges_to_ions(fhandler: Iterable[str]) Generator[str, None, None] [source]
Add charges to ions according to HADDOCK3 specifications.
Check if charge is correctly defined in residue name. If so, yield the line with correct residue name and charge at the end.
Check if charge is correctly defined in atom name.
Create charge from element. This might need manual edit in case the atom as an unconventional charge.
- Parameters:
fhandler (file-hanlder, list, or list-like) – Lines of the PDB file. This function will consumes lines over a
for
loop; mind it if you use a generator.- Yields:
line (str) – Line-by-line: modified ion lines and any other line.
- haddock.gear.preprocessing.convert_ATOM_to_HETATM(fhandler: Iterable[str], *, record: str = 'ATOM', other_record: str = 'HETATM', residues: Container[str] = {'A2G', 'ABE', 'ACD', 'ACN', 'ACT', 'ADY', 'AG', 'AG1', 'AL', 'AL3', 'AMN', 'AR', 'AS', 'AU', 'AU1', 'AU3', 'BDP', 'BDY', 'BEN', 'BGC', 'BMA', 'BR', 'BR1', 'BUT', 'CA', 'CA2', 'CD', 'CD2', 'CHE', 'CL', 'CL1', 'CO', 'CO2', 'CO3', 'COH', 'COM', 'CR', 'CR2', 'CR3', 'CS', 'CS1', 'CU', 'CU1', 'CU2', 'CYA', 'DFO', 'DME', 'DMS', 'DOD', 'EOL', 'ETA', 'ETH', 'F', 'F1', 'FCA', 'FCB', 'FE', 'FE2', 'FE3', 'FUC', 'FUL', 'GAL', 'GLA', 'GLC', 'GXL', 'HEB', 'HEC', 'HG', 'HG1', 'HG2', 'HO', 'HO3', 'HOH', 'I', 'I1', 'IMI', 'IR', 'IR3', 'K', 'K1', 'KR', 'LI1', 'MAG', 'MAN', 'MER', 'MG', 'MG2', 'MIY', 'MMA', 'MN', 'MN2', 'MN3', 'MO', 'MO3', 'NA', 'NA1', 'NAG', 'NDG', 'NGA', 'NI', 'NI2', 'O2', 'OS', 'OS4', 'PB', 'PB2', 'PHN', 'PO4', 'PT', 'PT2', 'RAM', 'SIA', 'SIB', 'SO4', 'SR', 'SR2', 'THS', 'TIP', 'TIP3', 'U3', 'U4', 'URE', 'V', 'V2', 'V3', 'WAT', 'WO4', 'XE', 'XYP', 'XYS', 'YB', 'YB2', 'YB3', 'ZN', 'ZN2'}) Generator[str, None, None]
Convert
ATOM
toHETATM
for HADDOCK3 supportedHETATM
.
- haddock.gear.preprocessing.convert_HETATM_to_ATOM(fhandler: Iterable[str], *, record: str = 'HETATM', other_record: str = 'ATOM ', residues: Container[str] = {'A', 'ACE', 'ALA', 'ALY', 'ARG', 'ASH', 'ASN', 'ASP', 'C', 'CFE', 'CHX', 'CSP', 'CTN', 'CYC', 'CYF', 'CYM', 'CYS', 'DA', 'DC', 'DDZ', 'DG', 'DJ', 'DT', 'DUM', 'G', 'GLH', 'GLN', 'GLU', 'GLY', 'HIS', 'HLY', 'HY3', 'HYP', 'ILE', 'LEU', 'LYS', 'M3L', 'MET', 'MLY', 'MLZ', 'MSE', 'NEP', 'NME', 'PHE', 'PNS', 'PRO', 'PTR', 'QSR', 'SEC', 'SEP', 'SER', 'SHA', 'THR', 'TOP', 'TRP', 'TYP', 'TYR', 'TYS', 'U', 'VAL'}) Generator[str, None, None]
Convert
HETATM
toATOM
for HADDOCK3 supportedATOM
.
- haddock.gear.preprocessing.convert_record(fhandler: Iterable[str], record: str, other_record: str, residues: Container[str]) Generator[str, None, None] [source]
Convert on record to another for specified residues.
For example, replace
ATOM
byHETATM
for specific residues.- Parameters:
fhandler (list-like) – Contains lines of file.
record (str) – The PDB RECORD to match; for example,
ATOM
orHETATM
.other_record (str) – The PDB RECORD to replace with; for example,
ATOM
orHETATM
.residues (list, tuple, or set) – List of residues to replace the record.
- haddock.gear.preprocessing.correct_equal_chain_segids(structures: list[list[str]]) list[list[str]] [source]
Correct for repeated chainID in the input PDB files.
Repeated chain IDs are replaced by an upper case character (
[A-Z]
) in order.- Parameters:
structures (list of lists of str) – The input data.
- Returns:
list of lists of str – The new structures.
- haddock.gear.preprocessing.homogenize_chains(lines: list[str]) list[str] [source]
Homogenize chainIDs within the same PDB.
If there are multiple chain identifiers in the PDB file, make all them equal to the first one.
ChainIDs are copied to segIDs afterwards.
- Returns:
list – The modified lines.
- haddock.gear.preprocessing.models_should_have_the_same_labels(lines: Iterable[str]) Iterable[str] [source]
Confirm models have the same labels.
In an ensemble of structures, where the PDB file has multiple MODELS, all models should have the same labels; hence the same number and typ of atoms.
- Parameters:
lines (list of strings.) – List containing the lines of the PDB file. Must NOT be a generator.
- Returns:
list – The original
lines
in case no errors are found.- Raises:
ModelsDifferError – In case MODELS differ. Reports on which models differ.
- haddock.gear.preprocessing.process_pdbs(*inputdata: Iterable[str] | str | Path, dry: bool = False, user_supported_residues: Iterable[str] | None = None) list[list[str]] [source]
Process PDB file contents for compatibility with HADDOCK3.
- Parameters:
inputdata (list of (str, path, list of str [lines], file handler)) – A flat list where in each index it can contain:
file objects
paths to files
strings representing paths
lists or tuples of lines
The above types can be mixed in the input list.
Files are read to lines in a list. Line separators are stripped.
Do not provide nested lists with lists containing paths inside lists.
dry (bool) – Perform a dry run. That is, does not change anything, and just report.
user_supported_residues (list, tuple, or set) – The new residues that are allowed.
- Returns:
list of (list of str) – The corrected (processed) PDB content in the same order as
inputdata
.
- haddock.gear.preprocessing.remove_unsupported_atom(lines: Iterable[str], *, haddock3_defined: set[str] | None = {'A', 'ACE', 'ALA', 'ALY', 'ARG', 'ASH', 'ASN', 'ASP', 'C', 'CFE', 'CHX', 'CSP', 'CTN', 'CYC', 'CYF', 'CYM', 'CYS', 'DA', 'DC', 'DDZ', 'DG', 'DJ', 'DT', 'DUM', 'G', 'GLH', 'GLN', 'GLU', 'GLY', 'HIS', 'HLY', 'HY3', 'HYP', 'ILE', 'LEU', 'LYS', 'M3L', 'MET', 'MLY', 'MLZ', 'MSE', 'NEP', 'NME', 'PHE', 'PNS', 'PRO', 'PTR', 'QSR', 'SEC', 'SEP', 'SER', 'SHA', 'THR', 'TOP', 'TRP', 'TYP', 'TYR', 'TYS', 'U', 'VAL'}, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = 'ATOM') Generator[str, None, None]
Remove unsupported molecules in
ATOM
lines.Uses
remove_unsupported_molecules()
by populating itshaddock3_define
andline_startswith
parameters.See also
- haddock.gear.preprocessing.remove_unsupported_hetatm(lines: Iterable[str], *, haddock3_defined: set[str] | None = {'A2G', 'ABE', 'ACD', 'ACN', 'ACT', 'ADY', 'AG', 'AG1', 'AL', 'AL3', 'AMN', 'AR', 'AS', 'AU', 'AU1', 'AU3', 'BDP', 'BDY', 'BEN', 'BGC', 'BMA', 'BR', 'BR1', 'BUT', 'CA', 'CA2', 'CD', 'CD2', 'CHE', 'CL', 'CL1', 'CO', 'CO2', 'CO3', 'COH', 'COM', 'CR', 'CR2', 'CR3', 'CS', 'CS1', 'CU', 'CU1', 'CU2', 'CYA', 'DFO', 'DME', 'DMS', 'DOD', 'EOL', 'ETA', 'ETH', 'F', 'F1', 'FCA', 'FCB', 'FE', 'FE2', 'FE3', 'FUC', 'FUL', 'GAL', 'GLA', 'GLC', 'GXL', 'HEB', 'HEC', 'HG', 'HG1', 'HG2', 'HO', 'HO3', 'HOH', 'I', 'I1', 'IMI', 'IR', 'IR3', 'K', 'K1', 'KR', 'LI1', 'MAG', 'MAN', 'MER', 'MG', 'MG2', 'MIY', 'MMA', 'MN', 'MN2', 'MN3', 'MO', 'MO3', 'NA', 'NA1', 'NAG', 'NDG', 'NGA', 'NI', 'NI2', 'O2', 'OS', 'OS4', 'PB', 'PB2', 'PHN', 'PO4', 'PT', 'PT2', 'RAM', 'SIA', 'SIB', 'SO4', 'SR', 'SR2', 'THS', 'TIP', 'TIP3', 'U3', 'U4', 'URE', 'V', 'V2', 'V3', 'WAT', 'WO4', 'XE', 'XYP', 'XYS', 'YB', 'YB2', 'YB3', 'ZN', 'ZN2'}, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = 'HETATM') Generator[str, None, None]
Remove unsupported molecules in
HETATM
lines.Uses
remove_unsupported_molecules()
by populating itshaddock3_define
andline_startswith
parameters.See also
- haddock.gear.preprocessing.remove_unsupported_molecules(lines: Iterable[str], haddock3_defined: set[str] | None = None, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = ('ATOM', 'HETATM')) Generator[str, None, None] [source]
Remove HADDOCK3 unsupported molecules.
This function is abstract and you need to provide the set of residues supported by HADDOCK3. See parameters.
Residues not provided in
haddock3_defined
anduser_defined
are removed from the PDB lines.Other lines are yieled unmodified.
- Parameters:
lines (list or list-like) – Lines of the PDB file. This function will consumes lines over a
for
loop; mind it if you use a generator.haddock3_defined (set) – Set of residues supported by HADDOCK3. Defaults to
None
.user_defined (set) – An additional set of allowed residues given by the user. Defaults to
None
.line_startswith (tuple) – The lines to consider. Defaults to
("ATOM", "HETATM")
.
- Yields:
line (str) – Line-by-line. Lines for residues not supported are not yielded.
See also
Other functions use this function to create context.
- haddock.gear.preprocessing.replace_HETATM_to_ATOM(fhandler: Iterable[str], res: str) Generator[str, None, None] [source]
Replace record HETATM to ATOM for res.
Do not alter other lines.
- Parameters:
fhanlder (file handler or list of lines) – List-like of file lines. Consumes over a
for
loop.res (str) – Residue name to match for the substitution.
- Yields:
str – Yield line-by-line.
- haddock.gear.preprocessing.replace_HID_to_HIS(fhandler: Iterable[str], *, resin: str = 'HID', resout: str = 'HIS') Generator[str, None, None]
Replace
HID
toHIS
.See also
- haddock.gear.preprocessing.replace_HIE_to_HIS(fhandler: Iterable[str], *, resin: str = 'HIE', resout: str = 'HIS') Generator[str, None, None]
Replace
HIE
toHIS
.See also
- haddock.gear.preprocessing.replace_HSD_to_HIS(fhandler: Iterable[str], *, resin: str = 'HSD', resout: str = 'HIS') Generator[str, None, None]
Replace
HSD
toHIS
.See also
- haddock.gear.preprocessing.replace_HSE_to_HIS(fhandler: Iterable[str], *, resin: str = 'HSE', resout: str = 'HIS') Generator[str, None, None]
Replace
HSE
toHIS
.See also
- haddock.gear.preprocessing.replace_MSE_to_MET(fhandler: Iterable[str], *, resin: str = 'MSE', resout: str = 'MET') Generator[str, None, None]
Replace
MSE
toMET
.See also
- haddock.gear.preprocessing.replace_residue(fhandler: Iterable[str], resin: str, resout: str) Generator[str, None, None] [source]
Replace residue by another and changes
HETATM
toATOM
if needed.Do not alter other lines.
- Parameters:
fhanlder (file handler or list of lines) – List-like of file lines. Consumes over a
for
loop.resin (str) – Residue name to match for the substitution.
resout (str) – Name of the new residue. Renames
resin
toresout
.
- Yields:
str – Yield line-by-line.
See also
pdb_rplresname
frompdb-tools
- haddock.gear.preprocessing.solve_no_chainID_no_segID(lines: Iterable[str]) Iterable[str] [source]
Solve inconsistencies with chainID and segID.
If segID is non-existant, copy chainID over segID, and vice-versa. If none are present, adds an upper case char starting from A. This char is not repeated until the alphabet exhausts. If chainIDs and segIDs differ, copy chainIDs over segIDs.
- Parameters:
lines (list of str) – The lines of a PDB file.
- Returns:
list – With new lines. Or the input ones if no modification was made.