haddock.modules.analysis.clustrmsd.clustrmsd module

RMSD clustering.

haddock.modules.analysis.clustrmsd.clustrmsd.apply_min_population(cluster_arr: ndarray, min_population: int) ndarray[source]

Apply min_population to cluster list.

Parameters:
  • cluster_arr (np.ndarray) – Array of clusters.

  • min_population (int) – min_population value on cluster population.

Returns:

cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)

haddock.modules.analysis.clustrmsd.clustrmsd.cond_index(i: int, j: int, n: int) float[source]

Get the condensed index from two matrix indexes.

Parameters:
  • i (int) – Index of the first element.

  • j (int) – Index of the second element.

  • n (int) – Number of observations.

haddock.modules.analysis.clustrmsd.clustrmsd.get_cluster_center(npw: ndarray, n_obs: int, rmsd_matrix: ndarray) int[source]

Get the cluster centers.

Parameters:
  • npw (np.ndarray) – Indexes of the cluster over cluster_list array

  • n_obs (int) – Number of overall observations (models).

  • rmsd_matrix (np.ndarray) – RMSD matrix.

Returns:

cluster_center (int) – Index of cluster center

haddock.modules.analysis.clustrmsd.clustrmsd.get_clusters(dendrogram, tolerance, criterion)[source]

Obtain the clusters.

haddock.modules.analysis.clustrmsd.clustrmsd.get_dendrogram(rmsd_matrix, linkage_type)[source]

Get and save the dendrogram.

Parameters:
  • rmsd_matrix (numpy.ndarray) – Numpy array with the RMSD matrix.

  • linkage_type (str) – Linkage type for the clustering.

Returns:

Z (numpy.ndarray) – Numpy array with the dendrogram.

haddock.modules.analysis.clustrmsd.clustrmsd.get_matrix_path(rmsd_matrix: RMSDFile) Path[source]

From an RMSDFile object returns the rmsd matrix path.

Parameters:

rmsd_matrix (RMSDFile) – RMSDFile object with the path to the RMSD matrix.

Returns:

matrix_fpath (Path) – Path to the RMSD matrix

Raises:

TypeError – If input rmsd_matrix is not of type RMSDFile

haddock.modules.analysis.clustrmsd.clustrmsd.iterate_min_population(cluster_arr: ndarray, min_population: int) ndarray[source]

Find one valid valuster satisfying the min_population parameter.

Logic: Iterate over the min_population values until we find at least one valid cluster.

Parameters:
  • cluster_arr (np.ndarray) – Array of clusters.

  • min_population (int) – min_population value on cluster population.

Returns:

new_cluster_arr (np.ndarray) – Array of clusters (unclustered structures are labelled with -1)

haddock.modules.analysis.clustrmsd.clustrmsd.order_clusters(cluster_arr)[source]

Order the clusters by population.

The most populated cluster will be assigned the ID 1, the second most

populated the ID 2, and so on.

Parameters:

cluster_arr (np.ndarray) – Array of clusters.

Returns:

  • clusters (list) – List of clusters.

  • cluster_arr (np.ndarray) – Array of clusters.

haddock.modules.analysis.clustrmsd.clustrmsd.read_matrix(rmsd_matrix: RMSDFile) ndarray[source]

Read the RMSD matrix.

Parameters:

rmsd_matrix (RMSDFile) – RMSDFile object with the path to the RMSD matrix.

Returns:

matrix (numpy.ndarray) – Numpy array with the RMSD matrix.

haddock.modules.analysis.clustrmsd.clustrmsd.write_clusters(clusters, cluster_arr, models, rmsd_matrix, out_filename='cluster.out', centers=False)[source]

Write the clusters to a file.

Parameters:
  • clusters (list) – List of clusters to write.

  • cluster_arr (np.array) – Array with the cluster assignment for each model.

  • models (list) – List of models.

  • rmsd_matrix (np.array) – RMSD matrix.

  • out_filename (str, optional) – Output filename. The default is “cluster.out”.

  • centers (bool, optional) – Whether to calculate the cluster centers. The default is False.

Returns:

  • clt_dic (dict) – Dictionary with the clusters.

  • cluster_centers (dict) – Dictionary with the cluster ID as key and the cluster center as value.

haddock.modules.analysis.clustrmsd.clustrmsd.write_clustrmsd_file(clusters, clt_dic, cluster_centers, score_dic, sorted_score_dic, params, output_fname='clustrmsd.txt')[source]

Write the clustrmsd.txt file.

Parameters:
  • clusters (np.ndarray) – Array of clusters.

  • clt_dic (dict) – Dictionary with the clusters.

  • cluster_centers (dict) – Dictionary with the cluster centers.

  • score_dic (dict) – Dictionary with the scores.

  • sorted_score_dic (dict) – Dictionary with the sorted scores.

  • params (dict) – Dictionary with the clustering parameters.

  • output_fname (str) – Output filename.