Docker

Docker Images are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.

CUDA

Requirements

NVIDIA driver
docker-ce
nvidia-docker

Usage

The commands below will download the lastest cuda container and run milabench right away, storing the results inside the results folder on the host machine:

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm --ipc=host --gpus=all      \
      -v $(pwd)/results:/milabench/envs/runs   \
      $MILABENCH_IMAGE                         \
      milabench run

--ipc=host removes shared memory restrictions, but you can also set --shm-size to a high value instead (at least 8G, possibly more).

Each run should store results in a unique directory under results/ on the host machine. To generate a readable report of the results you can run:

# Show Performance Report
docker run -it --rm                             \
      -v $(pwd)/results:/milabench/envs/runs    \
      $MILABENCH_IMAGE                          \
      milabench report --runs /milabench/envs/runs

ROCM

Requirements

rocm
docker

Usage

For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different:

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm  --ipc=host                                                  \
      --device=/dev/kfd --device=/dev/dri                                        \
      --security-opt seccomp=unconfined --group-add video                        \
      -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
      -v /opt/rocm:/opt/rocm                                                     \
      -v $(pwd)/results:/milabench/envs/runs                                     \
      $MILABENCH_IMAGE                                                           \
      milabench run

For the performance report, it is the same command:

# Show Performance Report
docker run -it --rm                             \
      -v $(pwd)/results:/milabench/envs/runs    \
      $MILABENCH_IMAGE                          \
      milabench report --runs /milabench/envs/runs

Multi-node benchmark

There are currently two multi-node benchmarks, opt-1_3b-multinode (data-parallel) and opt-6_7b-multinode (model-parallel, that model is too large to fit on a single GPU). Here is how to run them:

Make sure the machine can ssh between each other without passwords
Pull the milabench docker image you would like to run on all machines

docker pull

Create the output directory

mkdir -p results

Create a list of nodes that will participate in the benchmark inside a results/system.yaml file (see example below)

vi results/system.yaml

Call milabench with by specifying the node list we created.

docker ... -v $(pwd)/results:/milabench/envs/runs -v <privatekey>:/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml

system:
  sshkey: <privatekey>
  arch: cuda
  docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly

  nodes:
    - name: node1
      ip: 192.168.0.25
      main: true
      port: 8123
      user: <username>

    - name: node2
      ip: 192.168.0.26
      main: false
      user: <username>

Then, the command should look like this:

# On manager-node:

# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
docker run -it --rm --gpus all --network host --ipc=host --privileged \
  -v $SSH_KEY_FILE:/milabench/id_milabench \
  -v $(pwd)/results:/milabench/envs/runs \
  $MILABENCH_IMAGE \
  milabench run --system /milabench/envs/runs/system.yaml \
  --select multinode

The last line (--select multinode) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.

If you need to use more than two nodes, edit or copy system.yaml and simply add the other nodes’ addresses in nodes. You will also need to update the benchmark definition and increase the max number of nodes by creating a new overrides.yaml file.

For example, for 4 nodes:

# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:
  num_machines: 4

system:
  arch: cuda
  docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly

  nodes:
    - name: node1
      ip: 192.168.0.25
      main: true
      port: 8123
      user: <username>

    - name: node2
      ip: 192.168.0.26
      main: false
      user: <username>

    - name: node3
      ip: 192.168.0.27
      main: false
      user: <username>

    - name: node4
      ip: 192.168.0.28
      main: false
      user: <username>

The command would look like

docker ... milabench run ... --system /milabench/envs/runs/system.yaml --overrides /milabench/envs/runs/overrides.yaml

Note

The multi-node benchmark is sensitive to network performance. If the mono-node benchmark opt-6_7b is significantly faster than opt-6_7b-multinode (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform a bit worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)

Even if Infiniband is properly configured, the benchmark may fail to use it unless the --privileged flag is set when running the container.

Building images

Images can be built locally for prototyping and testing.

docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml .

Or for ROCm:

docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .