haddock.modules.analysis.clustrmsd.clustrmsd module
RMSD clustering.
- haddock.modules.analysis.clustrmsd.clustrmsd.apply_min_population(cluster_arr: ndarray, min_population: int) ndarray [source]
Apply min_population to cluster list.
- Parameters:
cluster_arr (np.ndarray) – Array of clusters.
min_population (int) – min_population value on cluster population.
- Returns:
cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)
- haddock.modules.analysis.clustrmsd.clustrmsd.cond_index(i: int, j: int, n: int) float [source]
Get the condensed index from two matrix indexes.
- Parameters:
i (int) – Index of the first element.
j (int) – Index of the second element.
n (int) – Number of observations.
- haddock.modules.analysis.clustrmsd.clustrmsd.get_cluster_center(npw: ndarray, n_obs: int, rmsd_matrix: ndarray) int [source]
Get the cluster centers.
- Parameters:
npw (np.ndarray) – Indexes of the cluster over cluster_list array
n_obs (int) – Number of overall observations (models).
rmsd_matrix (np.ndarray) – RMSD matrix.
- Returns:
cluster_center (int) – Index of cluster center
- haddock.modules.analysis.clustrmsd.clustrmsd.get_clusters(dendrogram, tolerance, criterion)[source]
Obtain the clusters.
- haddock.modules.analysis.clustrmsd.clustrmsd.get_dendrogram(rmsd_matrix, linkage_type)[source]
Get and save the dendrogram.
- Parameters:
rmsd_matrix (
numpy.ndarray
) – Numpy array with the RMSD matrix.linkage_type (str) – Linkage type for the clustering.
- Returns:
Z (
numpy.ndarray
) – Numpy array with the dendrogram.
- haddock.modules.analysis.clustrmsd.clustrmsd.get_matrix_path(rmsd_matrix: RMSDFile) Path [source]
From an RMSDFile object returns the rmsd matrix path.
- Parameters:
rmsd_matrix (
RMSDFile
) – RMSDFile object with the path to the RMSD matrix.- Returns:
matrix_fpath (Path) – Path to the RMSD matrix
- Raises:
TypeError – If input rmsd_matrix is not of type RMSDFile
- haddock.modules.analysis.clustrmsd.clustrmsd.iterate_min_population(cluster_arr: ndarray, min_population: int) ndarray [source]
Find one valid valuster satisfying the min_population parameter.
Logic: Iterate over the min_population values until we find at least one valid cluster.
- Parameters:
cluster_arr (np.ndarray) – Array of clusters.
min_population (int) – min_population value on cluster population.
- Returns:
new_cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)
- haddock.modules.analysis.clustrmsd.clustrmsd.order_clusters(cluster_arr)[source]
Order the clusters by population.
- The most populated cluster will be assigned the ID 1, the second most
populated the ID 2, and so on.
- Parameters:
cluster_arr (np.ndarray) – Array of clusters.
- Returns:
clusters (list) – List of clusters.
cluster_arr (np.ndarray) – Array of clusters.
- haddock.modules.analysis.clustrmsd.clustrmsd.read_matrix(rmsd_matrix: RMSDFile) ndarray [source]
Read the RMSD matrix.
- Parameters:
rmsd_matrix (
RMSDFile
) – RMSDFile object with the path to the RMSD matrix.- Returns:
matrix (
numpy.ndarray
) – Numpy array with the RMSD matrix.
- haddock.modules.analysis.clustrmsd.clustrmsd.write_clusters(clusters, cluster_arr, models, rmsd_matrix, out_filename='cluster.out', centers=False)[source]
Write the clusters to a file.
- Parameters:
clusters (list) – List of clusters to write.
cluster_arr (np.array) – Array with the cluster assignment for each model.
models (list) – List of models.
rmsd_matrix (np.array) – RMSD matrix.
out_filename (str, optional) – Output filename. The default is “cluster.out”.
centers (bool, optional) – Whether to calculate the cluster centers. The default is False.
- Returns:
- haddock.modules.analysis.clustrmsd.clustrmsd.write_clustrmsd_file(clusters, clt_dic, cluster_centers, score_dic, sorted_score_dic, params, output_fname='clustrmsd.txt')[source]
Write the clustrmsd.txt file.
- Parameters:
clusters (np.ndarray) – Array of clusters.
clt_dic (dict) – Dictionary with the clusters.
cluster_centers (dict) – Dictionary with the cluster centers.
score_dic (dict) – Dictionary with the scores.
sorted_score_dic (dict) – Dictionary with the sorted scores.
params (dict) – Dictionary with the clustering parameters.
output_fname (str) – Output filename.