haddock.modules.analysis.clustrmsd.clustrmsd module

RMSD clustering.

haddock.modules.analysis.clustrmsd.clustrmsd.apply_min_population(cluster_arr: ndarray, min_population: int) → ndarray[source]

Apply min_population to cluster list.

Parameters:

cluster_arr (np.ndarray) – Array of clusters.
min_population (int) – min_population value on cluster population.

Returns:

cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)

haddock.modules.analysis.clustrmsd.clustrmsd.cond_index(i: int, j: int, n: int) → float[source]

Get the condensed index from two matrix indexes.

Parameters:

i (int) – Index of the first element.
j (int) – Index of the second element.
n (int) – Number of observations.

haddock.modules.analysis.clustrmsd.clustrmsd.get_cluster_center(npw: ndarray, n_obs: int, rmsd_matrix: ndarray) → int[source]

Get the cluster centers.

Parameters:

npw (np.ndarray) – Indexes of the cluster over cluster_list array
n_obs (int) – Number of overall observations (models).
rmsd_matrix (np.ndarray) – RMSD matrix.

Returns:

cluster_center (int) – Index of cluster center

haddock.modules.analysis.clustrmsd.clustrmsd.get_clusters(dendrogram, tolerance, criterion)[source]: Obtain the clusters.

haddock.modules.analysis.clustrmsd.clustrmsd.get_dendrogram(rmsd_matrix, linkage_type)[source]

Get and save the dendrogram.

Parameters:

rmsd_matrix (numpy.ndarray) – Numpy array with the RMSD matrix.
linkage_type (str) – Linkage type for the clustering.

Returns:

Z (numpy.ndarray) – Numpy array with the dendrogram.

haddock.modules.analysis.clustrmsd.clustrmsd.get_matrix_path(rmsd_matrix: RMSDFile) → Path[source]

From an RMSDFile object returns the rmsd matrix path.

Parameters:: rmsd_matrix (RMSDFile) – RMSDFile object with the path to the RMSD matrix.
Returns:: matrix_fpath (Path) – Path to the RMSD matrix
Raises:: TypeError – If input rmsd_matrix is not of type RMSDFile

haddock.modules.analysis.clustrmsd.clustrmsd.iterate_min_population(cluster_arr: ndarray, min_population: int) → ndarray[source]

Find one valid valuster satisfying the min_population parameter.

Logic: Iterate over the min_population values until we find at least one valid cluster.

Parameters:

cluster_arr (np.ndarray) – Array of clusters.
min_population (int) – min_population value on cluster population.

Returns:

new_cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)

haddock.modules.analysis.clustrmsd.clustrmsd.order_clusters(cluster_arr)[source]

Order the clusters by population.

The most populated cluster will be assigned the ID 1, the second most: populated the ID 2, and so on.

Parameters:

cluster_arr (np.ndarray) – Array of clusters.

Returns:

clusters (list) – List of clusters.
cluster_arr (np.ndarray) – Array of clusters.

haddock.modules.analysis.clustrmsd.clustrmsd.read_matrix(rmsd_matrix: RMSDFile) → ndarray[source]

Read the RMSD matrix.

Parameters:: rmsd_matrix (RMSDFile) – RMSDFile object with the path to the RMSD matrix.
Returns:: matrix (numpy.ndarray) – Numpy array with the RMSD matrix.

haddock.modules.analysis.clustrmsd.clustrmsd.write_clusters(clusters, cluster_arr, models, rmsd_matrix, out_filename='cluster.out', centers=False)[source]

Write the clusters to a file.

Parameters:

clusters (list) – List of clusters to write.
cluster_arr (np.array) – Array with the cluster assignment for each model.
models (list) – List of models.
rmsd_matrix (np.array) – RMSD matrix.
out_filename (str, optional) – Output filename. The default is “cluster.out”.
centers (bool, optional) – Whether to calculate the cluster centers. The default is False.

Returns:

clt_dic (dict) – Dictionary with the clusters.
cluster_centers (dict) – Dictionary with the cluster ID as key and the cluster center as value.

haddock.modules.analysis.clustrmsd.clustrmsd.write_clustrmsd_file(clusters, clt_dic, cluster_centers, score_dic, sorted_score_dic, params, output_fname='clustrmsd.txt')[source]

Write the clustrmsd.txt file.

Parameters:

clusters (np.ndarray) – Array of clusters.
clt_dic (dict) – Dictionary with the clusters.
cluster_centers (dict) – Dictionary with the cluster centers.
score_dic (dict) – Dictionary with the scores.
sorted_score_dic (dict) – Dictionary with the sorted scores.
params (dict) – Dictionary with the clustering parameters.
output_fname (str) – Output filename.