HPC Parallel¶
Running on HPC machines is slightly more complex that the simple dask distributed parallelism gained by using dask-distributed alone. The LocalCluster implementation gained by setting the “–parallel” flag often will not launch on HPC platforms because of how they are configured. So we have implemented a HPC friendly option that uses both dask-distributed and dask-jobqueue. Due to the complexity of how HPC machines operate, it is possible to submit these workloads in several different ways, and there are several options for manual configuration of parameters. The below table shows the HPC specific CLI flags, what they do and their default values.
Flag |
Description |
Default |
|---|---|---|
–conda-exec |
conda/mamba executable override, use if having issues. |
autodetect |
–conda-env |
Override for name of environment to load. |
autodetect |
–conda-path |
Override for conda install. |
autodetect |
–hpc |
Turn on HPC parallelism. |
False |
–hpc-account |
Set HPC project/account to use for submissions. |
empty string |
–hpc-constraint |
Constraints to apply to job allocation, such as hardware generation. |
empty string |
–hpc-cores |
Override for number of cores on requested per node. |
autodetect |
–hpc-memory |
Override for memory request. |
autodetect |
–hpc-nodes |
How many nodes should the job run on. |
1 |
–hpc-processes |
Override How many dask processes per node. This should usually match cores. |
autodetect |
–hpc-queue |
Override for queue name to submit to. |
standard |
–hpc-qos |
Override for QoS. |
empty string |
–hpc-walltime |
Override for wall time job should request |
24:00:00 |
–submit |
Have WaterEntropy submit itself to a HPC cluster |
False |
The easiest option for users to launch this version of WaterEntropy, is from the commandline using our advanced auto-detection features to lookup and detect various hardware features and submit WaterEntropy to the scheduler. This is likely to be the most compliant with HPC policies of the pure CLI ways to launch. To do this you simply run:
waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc \\
--start 0 --end 512 --step 1 --hpc --hpc-nodes 4 --hpc-account c01-bio \\
--hpc-qos standard --submit
This will submit a master job to the scheduler system which will run the WaterEntropy master process, a separate dask cluster will then be orchestrated to run the work, so you will see a single master job plus the number of nodes requested as dask-workers in the HPC queue.
It is possible to run the same command, without the “–submit” and this will run the master python process on the HPC head node, with only the dask-worker cluster being sent to the scheduler. You should note, that this will cause resources to be blocked for other users on the head node, which may be seen as bad practice or against policies on some facilities:
waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc \\
--start 0 --end 128 --step 1 --hpc --hpc-account c01-bio --hpc-qos standard
If you want more control over how the WaterEntropy master process is submitted then you can submit your own script like this:
#!/bin/bash --login
#SBATCH --job-name=waterentropy-test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --account=c01-bio
#SBATCH --partition=standard
#SBATCH --qos=standard
eval "$(/mnt/lustre/a2fs-nvme/work/c01/c01/jtg2/miniforge3/bin/conda shell.bash hook)"
eval "$(mamba shell hook --shell bash)"
mamba activate waterentropy
srun waterEntropy --file-topology example_inputs/BTN_longer_sims/BTN_solvated_box.prmtop \\
--file-coords example_inputs/BTN_longer_sims/BTN_5000frames.nc --start 0 --end 512 \\
--step 1 --hpc --hpc-nodes 4 --hpc-account c01-bio --hpc-qos standard
Topology and trajectory files are available in the tests/input_files directory.