Docker
Docker Images are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.
CUDA
Requirements
NVIDIA driver
Usage
The commands below will download the lastest cuda container and run milabench right away,
storing the results inside the results
folder on the host machine:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host --gpus=all \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
--ipc=host
removes shared memory restrictions, but you can also set --shm-size
to a high value instead (at least 8G
, possibly more).
Each run should store results in a unique directory under results/
on the host machine. To generate a readable report of the results you can run:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
ROCM
Requirements
rocm
docker
Usage
For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host \
--device=/dev/kfd --device=/dev/dri \
--security-opt seccomp=unconfined --group-add video \
-v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
-v /opt/rocm:/opt/rocm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
For the performance report, it is the same command:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
Multi-node benchmark
There are currently two multi-node benchmarks, opt-1_3b-multinode
(data-parallel) and
opt-6_7b-multinode
(model-parallel, that model is too large to fit on a single GPU). Here is how to run them:
Make sure the machine can ssh between each other without passwords
Pull the milabench docker image you would like to run on all machines
docker pull
Create the output directory
mkdir -p results
Create a list of nodes that will participate in the benchmark inside a
results/system.yaml
file (see example below)
vi results/system.yaml
Call milabench with by specifying the node list we created.
docker ... -v $(pwd)/results:/milabench/envs/runs -v <privatekey>:/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml
system:
sshkey: <privatekey>
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly
nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>
- name: node2
ip: 192.168.0.26
main: false
user: <username>
Then, the command should look like this:
# On manager-node:
# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
docker run -it --rm --gpus all --network host --ipc=host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run --system /milabench/envs/runs/system.yaml \
--select multinode
The last line (--select multinode
) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.
If you need to use more than two nodes, edit or copy system.yaml
and simply add the other nodes’ addresses in nodes
.
You will also need to update the benchmark definition and increase the max number of nodes by creating a new overrides.yaml
file.
For example, for 4 nodes:
# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:
num_machines: 4
system:
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly
nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>
- name: node2
ip: 192.168.0.26
main: false
user: <username>
- name: node3
ip: 192.168.0.27
main: false
user: <username>
- name: node4
ip: 192.168.0.28
main: false
user: <username>
The command would look like
docker ... milabench run ... --system /milabench/envs/runs/system.yaml --overrides /milabench/envs/runs/overrides.yaml
Note
The multi-node benchmark is sensitive to network performance. If the mono-node benchmark opt-6_7b
is significantly faster than opt-6_7b-multinode
(e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform a bit worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)
Even if Infiniband is properly configured, the benchmark may fail to use it unless the --privileged
flag is set when running the container.
Building images
Images can be built locally for prototyping and testing.
docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml .
Or for ROCm:
docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .