libalign: sequence and structural alignments
Library of functions to perform sequence and structural alignments.
Main functions
- exception haddock.libs.libalign.ALIGNError(msg: object = '')[source]
Bases:
Exception
Raised when something goes wrong with the ALIGNMENT library.
- haddock.libs.libalign.ResCode
The single letter code of a residue.
Unrecognized residues’ code is X.
alias of
Literal
[‘C’, ‘D’, ‘S’, ‘Q’, ‘K’, ‘I’, ‘P’, ‘T’, ‘F’, ‘N’, ‘G’, ‘H’, ‘L’, ‘R’, ‘W’, ‘A’, ‘V’, ‘E’, ‘Y’, ‘M’, ‘X’]
- haddock.libs.libalign.align_seq(reference, model, output_path)[source]
Sequence align and get the numbering relationship.
- Parameters:
reference (PosixPath or
haddock.libs.libontology.PDBFile
)model (PosixPath or
haddock.libs.libontology.PDBFile
)output_path (Path)
- Returns:
align_dic (dict) – dictionary of sequence alignments (one per chain)
- haddock.libs.libalign.align_strct(reference: PDBFile, model: PDBFile, output_path: str | Path, lovoalign_exec: str | Path | None = None) dict[str, dict[int, int]] [source]
Structuraly align and get numbering relationship.
- Parameters:
reference (
haddock.libs.libontology.PDBFile
)model (
haddock.libs.libontology.PDBFile
)output_path (Path)
lovoalign_exec (Path) – lovoalign executable
- Returns:
numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)
- haddock.libs.libalign.calc_rmsd(V: ndarray[Any, dtype[float64]], W: ndarray[Any, dtype[float64]]) float [source]
Calculate the RMSD from two vectors.
- Parameters:
V (np.array dtype=float, shape=(n_atoms,3))
W (np.array dtype=float, shape=(n_atoms,3))
- Returns:
rmsd (float)
- haddock.libs.libalign.centroid(X: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Get the centroid.
- Parameters:
X (np.array dtype=float, shape=(n_atoms,3))
- Returns:
C (np.array dtype=float, shape=(3,))
- haddock.libs.libalign.check_chains(obs_chains, inp_r_chain, inp_l_chains)[source]
Check observed chains against the expected ones.
Logic: if at least one of inp_l_chains is among the observed chains and is not selected as the receptor chain, then ligand_chains is equal to this interesection. Otherwise, ligand_chains becomes equal to all the other chains (once receptor chain is removed).
- Parameters:
obs_chains (list) – List of observed chains.
inp_r_chain (str) – Receptor chain.
inp_l_chains (list) – List of ligand chains.
- haddock.libs.libalign.check_common_atoms(models, filter_resdic, allatoms, atom_similarity)[source]
Check if the models share the same atoms.
- Parameters:
models (list) – list of models
filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)
allatoms (bool) – use all the heavy atoms
atom_similarity (float) – minimum atom similarity required between models
- Returns:
n_atoms (int) – number of common atoms
common_keys (list) – list of common atom keys
- haddock.libs.libalign.dump_as_izone(fname, numbering_dic, model2ref_chain_dict=None)[source]
Dump the numbering dictionary as .izone.
- Parameters:
fname (str) – output filename
numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)
- haddock.libs.libalign.get_align(method: str, lovoalign_exec: str | Path) partial[dict[str, dict[int, int]]] [source]
Get the alignment function.
- Parameters:
method (str) – Available options:
sequence
andstructure
.lovoalign_exec (str) – Path to the lovoalign executable.
- Returns:
align_func (functools.partial) – desired alignment function
- haddock.libs.libalign.get_atoms(pdb: PDBFile | Path, full: bool = False) dict[str, list[str]] [source]
Identify what is the molecule type of each PDB.
- Parameters:
pdb (PosixPath or
haddock.libs.libontology.PDBFile
) – PDB file to have its atoms identifiedfull (bool) – Weather or not to take full atoms into consideration. If False, only main-chain atoms retrieved. If True, all heavy atoms retrieved.
- Returns:
atom_dic (dict) – dictionary of atoms
- haddock.libs.libalign.kabsch(P: ndarray[Any, dtype[float64]], Q: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Find the rotation matrix using Kabsch algorithm.
- Parameters:
P (np.array dtype=float, shape=(n_atoms,3))
Q (np.array dtype=float, shape=(n_atoms,3))
- Returns:
U (np.array dtype=float, shape=(3,3))
- haddock.libs.libalign.load_coords(pdb_f, atoms, filter_resdic=None, numbering_dic=None, model2ref_chain_dict=None, add_resname=None)[source]
Load coordinates from PDB.
- Parameters:
pdb_f (PDBFile)
atoms (dict) – dictionary of atoms
filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)
numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)
add_resname (bool) – use the residue name in the identifier
- Returns:
coord_dic (dict) – dictionary of coordinates (one per chain)
chain_ranges (dict) – dictionary of chain ranges
- haddock.libs.libalign.make_range(chain_range_dic: dict[str, list[int]]) dict[str, tuple[int, int]] [source]
Expand a chain dictionary into ranges.
- Parameters:
chain_range_dic (dict) – dictionary of chain indexes (one list per chain)
- Returns:
chain_ranges (dict) – dictionary of chain ranges (one tuple per chain)
- haddock.libs.libalign.pdb2fastadic(pdb_f: PDBFile | Path) dict[str, dict[int, str]] [source]
Write the sequence as a fasta.
- Parameters:
pdb_f (PosixPath or
haddock.libs.libontology.PDBFile
)- Returns:
seq_dic (dict) – dict of fasta sequences (one per chain)
- haddock.libs.libalign.rearrange_xyz_files(output_name: str | Path, path: str | Path, ncores: int) None [source]
Combine different xyz outputs in a single file.
- Parameters:
output_name (FilePath) – output name
path (FilePath) – path to the output files
ncores (int) – number of cores
- haddock.libs.libalign.sequence_alignment(seq_ref, seq_model)[source]
Perform a sequence alignment.
- Parameters:
seq_ref (str) – reference sequence
seq_model (str) – model sequence
- Returns:
identity (float) – sequence identity
top_aln (Bio.Align.PairwiseAlignments) – alignment object
aln_ref_seg (tuple) – aligned reference segment
aln_mod_seg (tuple) – aligned model segment