Scaling

Milabench is able to select a batch size depending on the underlying GPU capacity.

The feature is drivent by the config/scaling.yaml file, which holds information about the memory usage of a given bench given the batch size.

convnext_large-fp32:
  arg: --batch-size
  default: 128
  model:
    8: 5824.75 MiB
    16: 8774.75 MiB
    32: 14548.75 MiB
    64: 26274.75 MiB
    128: 49586.75 MiB

Auto Batch size

To enable batch resizing an environment variable can be specified. It will use the capacity inside the system.yaml configurattion file.

system:
  arch: cuda
  gpu:
    capacity: 81920 MiB
  nodes: []

MILABENCH_SIZER_AUTO=1 milabench run --system system.yaml

For better performance, a multiple constraint can be added. This will force batch size to be a multiple of 8.

MILABENCH_SIZER_MULTIPLE=8 milabench run

Batch size override

The batch size can be globally overriden

MILABENCH_SIZER_BATCH_SIZE=64 milabench run

Memory Usage Extractor

To automate batch size <=> memory usage data gathering a validation layer that retrieve the batch size and the memory usage can be enabled.

In the example below, once milabench has finished running it will generate a new scaling configuration with the data extracted from the run.

export MILABENCH_SIZER_SAVE="newscaling.yaml"
MILABENCH_SIZER_BATCH_SIZE=64 milabench run