Writing a benchmark.yaml file
The benchmark.yaml
file is a configuration file in YAML
format that will be used by haddock-runner
to run the benchmark. The whole idea is that one configuration file can define multiple scenarios, each scenario being a set of parameters that will be used to run HADDOCK.
This file should be the replicable part of the benchmark, i.e. the part that you want to share with others. It should contain all the information needed to run the benchmark, alongside the input list.
This file is divided in 3 main sections; general
, slurm
and scenarios
.
General section
Here you must define the following parameters:
executable
: Path to therun-haddock.sh
script (see here for more details).max_concurrent
: Maximum number of jobs that can be executed at a given timehaddock_dir
: Path to the HADDOCK installation, this is used to validate the parameters of thescenarios
sectionreceptor_suffix
: This pattern will identify what is the receptor file in the the suffix used to identify the receptor filesligand_suffix
: This will be used to identify the ligand filesshape_suffix
: This will be used to identify the shape filesinput_list
: The path to the input list (see here for more details)work_dir
: The path where the results will be stored
See below an example:
general:
executable: /workspaces/haddock-runner/example/haddock3.sh
max_concurrent: 4
haddock_dir: /opt/haddock3
receptor_suffix: _r_u
ligand_suffix: _l_u
input_list: /workspaces/haddock-runner/example/input_list.txt
work_dir: /workspaces/haddock-runner/bm-goes-here
Slurm section
This section is option but highly recomended! For these to take effect you must be running the benchmark in a HPC environment. These will be used internally by the runner to compose the .job
file. Here you can define the following parameters, if left blank, SLURM will pick up the default values:
partition
: The name of the partition to be usedcpus_per_task
: Number of CPUs per taskntasks_per_node
: Number of tasks per nodenodes
: Number of nodestime
: Maximum time for the jobaccount
: Account to be usedmail_user
: Email to be notified when the job starts and ends
See below an example:
slurm:
partition: short # use the short partition
cpus_per_task: 8 # use 8 cores per task
Scenario section
Here you must define the scenarios that you want to run, these are slightly different for HADDOCK2.4 and HADDOCK3.0
HADDOCK2.4
For HADDOCK2.4 you must define the following:
name
: the name of the scenarioparameters
: the parameters to be used in the scenariorun_cns
: parameters that will be used in therun.cns
filerestraints
: patterns used to identify the restraints filesambig
: pattern used to identify the ambiguous restraints fileunambig
: pattern used to identify the unambiguous restraints filehbonds
: pattern used to identify the hydrogen bonds restraints file
custom_toppar
: patterns used to identify the custom topology filestopology
: pattern used to identify the topology fileparam
: pattern used to identify the parameter file
# HADDOCK2.4
scenarios:
- name: true-interface
parameters:
run_cns:
noecv: false
structures_0: 1000
structures_1: 200
waterrefine: 200
restraints:
ambig: ti
HADDOCK3.0
Note: HADDOCK3.0 is still under development and is not meant to be used for production runs! Please use HADDOCK2.4 instead. For information about the available modules, please refer to the HADDOCK3 tutorial and the documentation.
For HADDOCK3.0 you must define the following:
name
: the name of the scenarioparameters
: the parameters to be used in the scenariogeneral
: general parameters; those are the ones defined in the "top" section of therun.toml
scriptmodules
: this subsection is related to the parameters of each module in HADDOCK3.0order
: the order of the modules to be used in HADDOCK3.0<module-name>
: parameters for the module
# HADDOCK3.0
scenarios:
- name: true-interface
parameters:
general:
mode: local
ncores: 4
modules:
order: [topoaa, rigidbody, seletop, flexref, emref]
topoaa:
autohis: true
rigidbody:
ambig_fname: _ti.tbl
seletop:
select: 200
flexref:
emref: