Skip to content

Remote launcher plugin

PatchedSlurmQueueConf dataclass #

Bases: _AddedArgumentsConf, SlurmQueueConf

Adds more SLURM parameters to the config for the SLURM submitit launcher of Hydra.

signal_delay_s class-attribute instance-attribute #

signal_delay_s: int = 120

USR1 signal delay before timeout.

max_num_timeout class-attribute instance-attribute #

max_num_timeout: int = 0

Maximum number of retries on job timeout.

Change this only after you confirmed your code can handle re-submission by properly resuming from the latest stored checkpoint. check the following for more info on slurm_max_num_timeout https://github.com/facebookincubator/submitit/blob/master/docs/checkpointing.md

additional_parameters class-attribute instance-attribute #

additional_parameters: dict[str, Any] = field(
    default_factory=dict
)

Useful to add parameters which are not currently available in the plugin.

Eg: {"mail-user": "blublu@fb.com", "mail-type": "BEGIN"}

array_parallelism class-attribute instance-attribute #

array_parallelism: int = 256

Maximum number of jobs running in parallel.

setup class-attribute instance-attribute #

setup: list[str] | None = None

A list of commands to run in sbatch before running srun.

get_slurm_accounts cached #

get_slurm_accounts(cluster: str) -> list[str]

Gets the SLURM accounts of the user using sacctmgr on the slurm cluster.