Docker ====== `Docker Images `_ are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary. CUDA ---- Requirements ^^^^^^^^^^^^ * NVIDIA driver * `docker-ce `_ * `nvidia-docker `_ Usage ^^^^^ The commands below will download the lastest cuda container and run milabench right away, storing the results inside the ``results`` folder on the host machine: .. code-block:: bash # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly # Pull the image we are going to run docker pull $MILABENCH_IMAGE # Run milabench docker run -it --rm --ipc=host --gpus=all \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run ``--ipc=host`` removes shared memory restrictions, but you can also set ``--shm-size`` to a high value instead (at least ``8G``, possibly more). Each run should store results in a unique directory under ``results/`` on the host machine. To generate a readable report of the results you can run: .. code-block:: bash # Show Performance Report docker run -it --rm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench report --runs /milabench/envs/runs ROCM ---- Requirements ^^^^^^^^^^^^ * rocm * docker Usage ^^^^^ For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different: .. code-block:: bash # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly # Pull the image we are going to run docker pull $MILABENCH_IMAGE # Run milabench docker run -it --rm --ipc=host \ --device=/dev/kfd --device=/dev/dri \ --security-opt seccomp=unconfined --group-add video \ -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \ -v /opt/rocm:/opt/rocm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run For the performance report, it is the same command: .. code-block:: bash # Show Performance Report docker run -it --rm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench report --runs /milabench/envs/runs Multi-node benchmark ^^^^^^^^^^^^^^^^^^^^ There are currently two multi-node benchmarks, ``opt-1_3b-multinode`` (data-parallel) and ``opt-6_7b-multinode`` (model-parallel, that model is too large to fit on a single GPU). Here is how to run them: 0. Make sure the machine can ssh between each other without passwords 1. Pull the milabench docker image you would like to run on all machines - ``docker pull`` 1. Create the output directory - ``mkdir -p results`` 2. Create a list of nodes that will participate in the benchmark inside a ``results/system.yaml`` file (see example below) - ``vi results/system.yaml`` 3. Call milabench with by specifying the node list we created. - ``docker ... -v $(pwd)/results:/milabench/envs/runs -v :/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml`` .. notes:: The main node is the node that will be in charge of managing the other worker nodes. .. code-block:: yaml system: sshkey: arch: cuda docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly nodes: - name: node1 ip: 192.168.0.25 main: true port: 8123 user: - name: node2 ip: 192.168.0.26 main: false user: Then, the command should look like this: .. code-block:: bash # On manager-node: # Change if needed export SSH_KEY_FILE=$HOME/.ssh/id_rsa export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly docker run -it --rm --gpus all --network host --ipc=host --privileged \ -v $SSH_KEY_FILE:/milabench/id_milabench \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run --system /milabench/envs/runs/system.yaml \ --select multinode The last line (``--select multinode``) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks. If you need to use more than two nodes, edit or copy ``system.yaml`` and simply add the other nodes' addresses in ``nodes``. You will also need to update the benchmark definition and increase the max number of nodes by creating a new ``overrides.yaml`` file. For example, for 4 nodes: .. code-block:: yaml # Name of the benchmark. You can also override values in other benchmarks. opt-6_7b-multinode: num_machines: 4 .. code-block:: yaml system: arch: cuda docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly nodes: - name: node1 ip: 192.168.0.25 main: true port: 8123 user: - name: node2 ip: 192.168.0.26 main: false user: - name: node3 ip: 192.168.0.27 main: false user: - name: node4 ip: 192.168.0.28 main: false user: The command would look like .. code-block:: bash docker ... milabench run ... --system /milabench/envs/runs/system.yaml --overrides /milabench/envs/runs/overrides.yaml .. note:: The multi-node benchmark is sensitive to network performance. If the mono-node benchmark ``opt-6_7b`` is significantly faster than ``opt-6_7b-multinode`` (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform *a bit* worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.) Even if Infiniband is properly configured, the benchmark may fail to use it unless the ``--privileged`` flag is set when running the container. Building images --------------- Images can be built locally for prototyping and testing. .. code-block:: docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml . Or for ROCm: .. code-block:: docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .