PDB preprocessing
Process input PDB files to ensure compatibility with HADDOCK3.
This module checks and modifies PDB files for compatibility with HADDOCK3. There are three types of checks/modifications:
Performed to each PDB line-by-line, in a equal fashion of
pdb-tools. In fact, this step mostly uses thepdb-toolspackage.Performed on each PDB as a whole.
Performed on all PDBs together.
Main functions
read_additional_residues()
Corrections performed on 1)
The following actions are perfomed sequentially over all PDBs:
from
pdb-tools:pdb_keepcoordfrom
pdb-tools:pdb_tidywithstrict=Truefrom
pdb-toos:pdb_elementfrom
pdb-tools:pdb_selaltlocfrom
pdb-tools:pdb_pdb_occwithoccupancy=1.00replace
MSEtoMETreplace
HSDtoHISreplace
HSEtoHISreplace
HIDtoHISreplace
HIEtoHISadd_charges_to_ions, see
add_charges_to_ions()convert
ATOMtoHETATMfor those atoms that should beHETATM. Considers the additional residues provided by the user. Seeconvert_ATOM_to_HETATM().convert
HETATMtoATOMfor those atoms that should beATOM,from
pdb-toos:pdb_fixinsert, withoption_list=[].remove unsupported
HETATM. Considers residues provided by the user.remove unsupported
ATOM. Considers residues provided by the user.from
pdb-tools:pdb_reatom, start from1.from
pdb-tools:pdb_tidywithstrict=True
Corrections performed on 2)
The following actions are performed sequentially for each PDB:
Read the documentation of the above functions for details what they do.
Corrections performed on 3)
The following actions are performed to all PDBs together:
Read the documentation of the above functions for details what they do.
When it happens
The PDB processing step is performed by default when reading the input molecules and copying them to the data/ folder inside the run directory. When PDBs are processed, a copy of the original input PDBs is also stored in the data/ folder.
To deactivate this initial PDB processing, set skip_preprocess = False
in the general parameters of the configuration file.
Additional information
If you are a developer and want to read more about the history of this preprocessing module, visit:
- exception haddock.gear.preprocessing.ModelsDifferError[source]
 Bases:
HaddockErrorMODELS of the PDB differ in atom labels.
- haddock.gear.preprocessing.add_charges_to_ions(fhandler: Iterable[str]) Generator[str, None, None][source]
 Add charges to ions according to HADDOCK3 specifications.
Check if charge is correctly defined in residue name. If so, yield the line with correct residue name and charge at the end.
Check if charge is correctly defined in atom name.
Create charge from element. This might need manual edit in case the atom as an unconventional charge.
- Parameters:
 fhandler (file-hanlder, list, or list-like) – Lines of the PDB file. This function will consumes lines over a
forloop; mind it if you use a generator.- Yields:
 line (str) – Line-by-line: modified ion lines and any other line.
- haddock.gear.preprocessing.convert_ATOM_to_HETATM(fhandler: Iterable[str], *, record: str = 'ATOM', other_record: str = 'HETATM', residues: Container[str] = {'A2G', 'ABE', 'ACD', 'ACN', 'ACT', 'ADN', 'ADP', 'ADY', 'AG', 'AG1', 'AL', 'AL3', 'AMN', 'AMP', 'AR', 'AS', 'ATP', 'AU', 'AU1', 'AU3', 'BDP', 'BDY', 'BEN', 'BGC', 'BMA', 'BR', 'BR1', 'BUT', 'CA', 'CA2', 'CD', 'CD2', 'CHE', 'CIT', 'CL', 'CL1', 'CO', 'CO2', 'CO3', 'COH', 'COM', 'CR', 'CR2', 'CR3', 'CS', 'CS1', 'CU', 'CU1', 'CU2', 'CYA', 'DFO', 'DME', 'DMS', 'DOD', 'EOL', 'ETA', 'ETH', 'F', 'F1', 'FAD', 'FCA', 'FCB', 'FE', 'FE2', 'FE3', 'FLC', 'FUC', 'FUL', 'GAL', 'GDP', 'GLA', 'GLC', 'GMP', 'GTP', 'GXL', 'HEB', 'HEC', 'HG', 'HG1', 'HG2', 'HO', 'HO3', 'HOH', 'I', 'I1', 'IMI', 'IR', 'IR3', 'K', 'K1', 'KR', 'LI1', 'MAG', 'MAN', 'MER', 'MG', 'MG2', 'MIY', 'MMA', 'MN', 'MN2', 'MN3', 'MO', 'MO3', 'NA', 'NA1', 'NAD', 'NAG', 'NAP', 'NDG', 'NDP', 'NGA', 'NI', 'NI2', 'O2', 'OS', 'OS4', 'PB', 'PB2', 'PHN', 'PO4', 'PT', 'PT2', 'RAM', 'SIA', 'SIB', 'SO4', 'SR', 'SR2', 'THS', 'TIP', 'TIP3', 'U3', 'U4', 'URE', 'V', 'V2', 'V3', 'WAT', 'WO4', 'XE', 'XYP', 'XYS', 'YB', 'YB2', 'YB3', 'ZN', 'ZN2'}) Generator[str, None, None]
 Convert
ATOMtoHETATMfor HADDOCK3 supportedHETATM.
- haddock.gear.preprocessing.convert_HETATM_to_ATOM(fhandler: Iterable[str], *, record: str = 'HETATM', other_record: str = 'ATOM ', residues: Container[str] = {'A', 'ACE', 'ALA', 'ALY', 'ARG', 'ASH', 'ASN', 'ASP', 'C', 'CFE', 'CHX', 'CIR', 'CSP', 'CTN', 'CYC', 'CYF', 'CYM', 'CYS', 'DA', 'DC', 'DDZ', 'DG', 'DJ', 'DT', 'DUM', 'G', 'GLH', 'GLN', 'GLU', 'GLY', 'HIS', 'HLY', 'HY3', 'HYP', 'ILE', 'LEU', 'LYS', 'M3L', 'MET', 'MLY', 'MLZ', 'MSE', 'NEP', 'NME', 'PCA', 'PHE', 'PNS', 'PRO', 'PTR', 'QSR', 'SEC', 'SEP', 'SER', 'SHA', 'THR', 'TOP', 'TRP', 'TYP', 'TYR', 'TYS', 'U', 'VAL'}) Generator[str, None, None]
 Convert
HETATMtoATOMfor HADDOCK3 supportedATOM.
- haddock.gear.preprocessing.convert_record(fhandler: Iterable[str], record: str, other_record: str, residues: Container[str]) Generator[str, None, None][source]
 Convert on record to another for specified residues.
For example, replace
ATOMbyHETATMfor specific residues.- Parameters:
 fhandler (list-like) – Contains lines of file.
record (str) – The PDB RECORD to match; for example,
ATOMorHETATM.other_record (str) – The PDB RECORD to replace with; for example,
ATOMorHETATM.residues (list, tuple, or set) – List of residues to replace the record.
- haddock.gear.preprocessing.correct_equal_chain_segids(structures: list[list[str]]) list[list[str]][source]
 Correct for repeated chainID in the input PDB files.
Repeated chain IDs are replaced by an upper case character (
[A-Z]) in order.- Parameters:
 structures (list of lists of str) – The input data.
- Returns:
 list of lists of str – The new structures.
- haddock.gear.preprocessing.homogenize_chains(lines: list[str]) list[str][source]
 Homogenize chainIDs within the same PDB.
If there are multiple chain identifiers in the PDB file, make all them equal to the first one.
ChainIDs are copied to segIDs afterwards.
- Returns:
 list – The modified lines.
- haddock.gear.preprocessing.models_should_have_the_same_labels(lines: Iterable[str]) Iterable[str][source]
 Confirm models have the same labels.
In an ensemble of structures, where the PDB file has multiple MODELS, all models should have the same labels; hence the same number and typ of atoms.
- Parameters:
 lines (list of strings.) – List containing the lines of the PDB file. Must NOT be a generator.
- Returns:
 list – The original
linesin case no errors are found.- Raises:
 ModelsDifferError – In case MODELS differ. Reports on which models differ.
- haddock.gear.preprocessing.process_pdbs(*inputdata: Iterable[str] | str | Path, dry: bool = False, user_supported_residues: Iterable[str] | None = None) list[list[str]][source]
 Process PDB file contents for compatibility with HADDOCK3.
- Parameters:
 inputdata (list of (str, path, list of str [lines], file handler)) – A flat list where in each index it can contain:
file objects
paths to files
strings representing paths
lists or tuples of lines
The above types can be mixed in the input list.
Files are read to lines in a list. Line separators are stripped.
Do not provide nested lists with lists containing paths inside lists.
dry (bool) – Perform a dry run. That is, does not change anything, and just report.
user_supported_residues (list, tuple, or set) – The new residues that are allowed.
- Returns:
 list of (list of str) – The corrected (processed) PDB content in the same order as
inputdata.
- haddock.gear.preprocessing.remove_unsupported_atom(lines: Iterable[str], *, haddock3_defined: set[str] | None = {'A', 'ACE', 'ALA', 'ALY', 'ARG', 'ASH', 'ASN', 'ASP', 'C', 'CFE', 'CHX', 'CIR', 'CSP', 'CTN', 'CYC', 'CYF', 'CYM', 'CYS', 'DA', 'DC', 'DDZ', 'DG', 'DJ', 'DT', 'DUM', 'G', 'GLH', 'GLN', 'GLU', 'GLY', 'HIS', 'HLY', 'HY3', 'HYP', 'ILE', 'LEU', 'LYS', 'M3L', 'MET', 'MLY', 'MLZ', 'MSE', 'NEP', 'NME', 'PCA', 'PHE', 'PNS', 'PRO', 'PTR', 'QSR', 'SEC', 'SEP', 'SER', 'SHA', 'THR', 'TOP', 'TRP', 'TYP', 'TYR', 'TYS', 'U', 'VAL'}, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = 'ATOM') Generator[str, None, None]
 Remove unsupported molecules in
ATOMlines.Uses
remove_unsupported_molecules()by populating itshaddock3_defineandline_startswithparameters.See also
- haddock.gear.preprocessing.remove_unsupported_hetatm(lines: Iterable[str], *, haddock3_defined: set[str] | None = {'A2G', 'ABE', 'ACD', 'ACN', 'ACT', 'ADN', 'ADP', 'ADY', 'AG', 'AG1', 'AL', 'AL3', 'AMN', 'AMP', 'AR', 'AS', 'ATP', 'AU', 'AU1', 'AU3', 'BDP', 'BDY', 'BEN', 'BGC', 'BMA', 'BR', 'BR1', 'BUT', 'CA', 'CA2', 'CD', 'CD2', 'CHE', 'CIT', 'CL', 'CL1', 'CO', 'CO2', 'CO3', 'COH', 'COM', 'CR', 'CR2', 'CR3', 'CS', 'CS1', 'CU', 'CU1', 'CU2', 'CYA', 'DFO', 'DME', 'DMS', 'DOD', 'EOL', 'ETA', 'ETH', 'F', 'F1', 'FAD', 'FCA', 'FCB', 'FE', 'FE2', 'FE3', 'FLC', 'FUC', 'FUL', 'GAL', 'GDP', 'GLA', 'GLC', 'GMP', 'GTP', 'GXL', 'HEB', 'HEC', 'HG', 'HG1', 'HG2', 'HO', 'HO3', 'HOH', 'I', 'I1', 'IMI', 'IR', 'IR3', 'K', 'K1', 'KR', 'LI1', 'MAG', 'MAN', 'MER', 'MG', 'MG2', 'MIY', 'MMA', 'MN', 'MN2', 'MN3', 'MO', 'MO3', 'NA', 'NA1', 'NAD', 'NAG', 'NAP', 'NDG', 'NDP', 'NGA', 'NI', 'NI2', 'O2', 'OS', 'OS4', 'PB', 'PB2', 'PHN', 'PO4', 'PT', 'PT2', 'RAM', 'SIA', 'SIB', 'SO4', 'SR', 'SR2', 'THS', 'TIP', 'TIP3', 'U3', 'U4', 'URE', 'V', 'V2', 'V3', 'WAT', 'WO4', 'XE', 'XYP', 'XYS', 'YB', 'YB2', 'YB3', 'ZN', 'ZN2'}, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = 'HETATM') Generator[str, None, None]
 Remove unsupported molecules in
HETATMlines.Uses
remove_unsupported_molecules()by populating itshaddock3_defineandline_startswithparameters.See also
- haddock.gear.preprocessing.remove_unsupported_molecules(lines: Iterable[str], haddock3_defined: set[str] | None = None, user_defined: set[str] | None = None, line_startswith: str | tuple[str, ...] = ('ATOM', 'HETATM')) Generator[str, None, None][source]
 Remove HADDOCK3 unsupported molecules.
This function is abstract and you need to provide the set of residues supported by HADDOCK3. See parameters.
Residues not provided in
haddock3_definedanduser_definedare removed from the PDB lines.Other lines are yieled unmodified.
- Parameters:
 lines (list or list-like) – Lines of the PDB file. This function will consumes lines over a
forloop; mind it if you use a generator.haddock3_defined (set) – Set of residues supported by HADDOCK3. Defaults to
None.user_defined (set) – An additional set of allowed residues given by the user. Defaults to
None.line_startswith (tuple) – The lines to consider. Defaults to
("ATOM", "HETATM").
- Yields:
 line (str) – Line-by-line. Lines for residues not supported are not yielded.
See also
Other functions use this function to create context.
- haddock.gear.preprocessing.replace_HETATM_to_ATOM(fhandler: Iterable[str], res: str) Generator[str, None, None][source]
 Replace record HETATM to ATOM for res.
Do not alter other lines.
- Parameters:
 fhanlder (file handler or list of lines) – List-like of file lines. Consumes over a
forloop.res (str) – Residue name to match for the substitution.
- Yields:
 str – Yield line-by-line.
- haddock.gear.preprocessing.replace_HID_to_HIS(fhandler: Iterable[str], *, resin: str = 'HID', resout: str = 'HIS') Generator[str, None, None]
 Replace
HIDtoHIS.See also
- haddock.gear.preprocessing.replace_HIE_to_HIS(fhandler: Iterable[str], *, resin: str = 'HIE', resout: str = 'HIS') Generator[str, None, None]
 Replace
HIEtoHIS.See also
- haddock.gear.preprocessing.replace_HSD_to_HIS(fhandler: Iterable[str], *, resin: str = 'HSD', resout: str = 'HIS') Generator[str, None, None]
 Replace
HSDtoHIS.See also
- haddock.gear.preprocessing.replace_HSE_to_HIS(fhandler: Iterable[str], *, resin: str = 'HSE', resout: str = 'HIS') Generator[str, None, None]
 Replace
HSEtoHIS.See also
- haddock.gear.preprocessing.replace_MSE_to_MET(fhandler: Iterable[str], *, resin: str = 'MSE', resout: str = 'MET') Generator[str, None, None]
 Replace
MSEtoMET.See also
- haddock.gear.preprocessing.replace_residue(fhandler: Iterable[str], resin: str, resout: str) Generator[str, None, None][source]
 Replace residue by another and changes
HETATMtoATOMif needed.Do not alter other lines.
- Parameters:
 fhanlder (file handler or list of lines) – List-like of file lines. Consumes over a
forloop.resin (str) – Residue name to match for the substitution.
resout (str) – Name of the new residue. Renames
resintoresout.
- Yields:
 str – Yield line-by-line.
See also
pdb_rplresnamefrompdb-tools
- haddock.gear.preprocessing.solve_no_chainID_no_segID(lines: Iterable[str]) Iterable[str][source]
 Solve inconsistencies with chainID and segID.
If segID is non-existant, copy chainID over segID, and vice-versa. If none are present, adds an upper case char starting from A. This char is not repeated until the alphabet exhausts. If chainIDs and segIDs differ, copy chainIDs over segIDs.
- Parameters:
 lines (list of str) – The lines of a PDB file.
- Returns:
 list – With new lines. Or the input ones if no modification was made.