Writing a benchmark.yaml file

The benchmark.yaml file is a configuration file in YAML format that will be used by haddock-runner to run the benchmark. The whole idea is that one configuration file can define multiple scenarios, each scenario being a set of parameters that will be used to run HADDOCK.

This file should be the replicable part of the benchmark, i.e. the part that you want to share with others. It should contain all the information needed to run the benchmark, alongside the input list.

This file is divided in 3 main sections; general, slurm and scenarios.

General section

Here you must define the following parameters:

  • executable: Path to the run-haddock.sh script (see here for more details).
  • max_concurrent: Maximum number of jobs that can be executed at a given time
  • haddock_dir: Path to the HADDOCK installation, this is used to validate the parameters of the scenarios section
  • receptor_suffix: This pattern will identify what is the receptor file in the the suffix used to identify the receptor files
  • ligand_suffix: This will be used to identify the ligand files
  • shape_suffix: This will be used to identify the shape files
  • input_list: The path to the input list (see here for more details)
  • work_dir: The path where the results will be stored

See below an example:

general:
  executable: /workspaces/haddock-runner/example/haddock3.sh
  max_concurrent: 4
  haddock_dir: /opt/haddock3
  receptor_suffix: _r_u
  ligand_suffix: _l_u
  input_list: /workspaces/haddock-runner/example/input_list.txt
  work_dir: /workspaces/haddock-runner/bm-goes-here

Slurm section

This section is option but highly recomended! For these to take effect you must be running the benchmark in a HPC environment. These will be used internally by the runner to compose the .job file. Here you can define the following parameters, if left blank, SLURM will pick up the default values:

  • partition: The name of the partition to be used
  • cpus_per_task: Number of CPUs per task
  • ntasks_per_node: Number of tasks per node
  • nodes: Number of nodes
  • time: Maximum time for the job
  • account: Account to be used
  • mail_user: Email to be notified when the job starts and ends

See below an example:

slurm:
  partition: short # use the short partition
  cpus_per_task: 8 # use 8 cores per task

Scenario section

Here you must define the scenarios that you want to run, these are slightly different for HADDOCK2.4 and HADDOCK3.0

HADDOCK2.4

For HADDOCK2.4 you must define the following:

  • name: the name of the scenario
  • parameters: the parameters to be used in the scenario
    • run_cns: parameters that will be used in the run.cns file
    • restraints: patterns used to identify the restraints files
      • ambig: pattern used to identify the ambiguous restraints file
      • unambig: pattern used to identify the unambiguous restraints file
      • hbonds: pattern used to identify the hydrogen bonds restraints file
    • custom_toppar: patterns used to identify the custom topology files
      • topology: pattern used to identify the topology file
      • param: pattern used to identify the parameter file
# HADDOCK2.4
scenarios:
  - name: true-interface
    parameters:
      run_cns:
        noecv: false
        structures_0: 1000
        structures_1: 200
        waterrefine: 200
      restraints:
        ambig: ti

HADDOCK3.0

Note: HADDOCK3.0 is still under development and is not meant to be used for production runs! Please use HADDOCK2.4 instead. For information about the available modules, please refer to the HADDOCK3 tutorial and the documentation.

For HADDOCK3.0 you must define the following:

  • name: the name of the scenario
  • parameters: the parameters to be used in the scenario
    • general: general parameters; those are the ones defined in the "top" section of the run.toml script
    • modules: this subsection is related to the parameters of each module in HADDOCK3.0
      • order: the order of the modules to be used in HADDOCK3.0
      • <module-name>: parameters for the module
# HADDOCK3.0
scenarios:
  - name: true-interface
    parameters:
      general:
        mode: local
        ncores: 4

      modules:
        order: [topoaa, rigidbody, seletop, flexref, emref]
        topoaa:
          autohis: true
        rigidbody:
          ambig_fname: _ti.tbl
        seletop:
          select: 200
        flexref:
        emref: