Design

Milabench aims to simulate research workloads for benchmarking purposes.

Performance is measured as throughput (samples / secs). For example, for a model like resnet the throughput would be image per seconds.
Single GPU workloads are spawned per GPU to ensure the entire machine is used. Simulating something similar to a hyper parameter search. The performance of the benchmark is the sum of throughput of each processes.
Multi GPU workloads
Multi Nodes

Run

Milabench Manager Process
- Handles messages from benchmark processes
- Saves messages into a file for future analysis
Benchmark processes
- run using voir
- voir is configured to intercept and send events during the training process
- This allow us to add models from git repositories without modification
- voir sends data through a file descriptor that was created by milabench main process

What milabench is

Training focused
milabench show candid performance numbers
- No optimization beyond batch size scaling is performed
- we want to measure the performance our researcher will see not the performance they could get.
pytorch centric
- Pytorch has become the defacto library for research
- We are looking for accelerator with good maturity that can support this framework with limited code change.

What milabench is not

milabench goal is not a performance show case of an accelerator.