HPC Parallel

Running on HPC machines is slightly more complex that the simple dask distributed parallelism gained by using dask-distributed alone. The LocalCluster implementation gained by setting the “–parallel” flag often will not launch on HPC platforms because of how they are configured. So we have implemented a HPC friendly option that uses both dask-distributed and dask-jobqueue. Due to the complexity of how HPC machines operate, it is possible to submit these workloads in several different ways, and there are several options for manual configuration of parameters. The below table shows the HPC specific CLI flags, what they do and their default values.

Flag

Description

Default

–conda-exec

conda/mamba executable override, use if having issues.

autodetect

–conda-env

Override for name of environment to load.

autodetect

–conda-path

Override for conda install.

autodetect

–hpc

Turn on HPC parallelism.

False

–hpc-account

Set HPC project/account to use for submissions.

empty string

–hpc-constraint

Constraints to apply to job allocation, such as hardware generation.

empty string

–hpc-cores

Override for number of cores on requested per node.

autodetect

–hpc-memory

Override for memory request.

autodetect

–hpc-nodes

How many nodes should the job run on.

1

–hpc-processes

Override How many dask processes per node. This should usually match cores.

autodetect

–hpc-queue

Override for queue name to submit to.

standard

–hpc-qos

Override for QoS.

empty string

–hpc-walltime

Override for wall time job should request

24:00:00

–submit

Have WaterEntropy submit itself to a HPC cluster

False

The easiest option for users to launch this version of WaterEntropy, is from the commandline using our advanced auto-detection features to lookup and detect various hardware features and submit WaterEntropy to the scheduler. This is likely to be the most compliant with HPC policies of the pure CLI ways to launch. To do this you simply run:

waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc \\
--start 0 --end 512 --step 1 --hpc --hpc-nodes 4 --hpc-account c01-bio \\
--hpc-qos standard --submit

This will submit a master job to the scheduler system which will run the WaterEntropy master process, a separate dask cluster will then be orchestrated to run the work, so you will see a single master job plus the number of nodes requested as dask-workers in the HPC queue.

It is possible to run the same command, without the “–submit” and this will run the master python process on the HPC head node, with only the dask-worker cluster being sent to the scheduler. You should note, that this will cause resources to be blocked for other users on the head node, which may be seen as bad practice or against policies on some facilities:

waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc \\
--start 0 --end 128 --step 1 --hpc --hpc-account c01-bio --hpc-qos standard

If you want more control over how the WaterEntropy master process is submitted then you can submit your own script like this:

#!/bin/bash --login

#SBATCH --job-name=waterentropy-test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --account=c01-bio
#SBATCH --partition=standard
#SBATCH --qos=standard

eval "$(/mnt/lustre/a2fs-nvme/work/c01/c01/jtg2/miniforge3/bin/conda shell.bash hook)"
eval "$(mamba shell hook --shell bash)"
mamba activate waterentropy

srun waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc --start 0 --end 512 \\
--step 1 --hpc --hpc-nodes 4 --hpc-account c01-bio --hpc-qos standard

Topology and trajectory files are available in the tests/input_files directory.