Design
Milabench aims to simulate research workloads for benchmarking purposes.
Performance is measured as throughput (samples / secs). For example, for a model like resnet the throughput would be image per seconds.
Single GPU workloads are spawned per GPU to ensure the entire machine is used. Simulating something similar to a hyper parameter search. The performance of the benchmark is the sum of throughput of each processes.
Multi GPU workloads
Multi Nodes
Run
- Milabench Manager Process
Handles messages from benchmark processes
Saves messages into a file for future analysis
- Benchmark processes
run using
voir
voir is configured to intercept and send events during the training process
This allow us to add models from git repositories without modification
voir sends data through a file descriptor that was created by milabench main process
What milabench is
Training focused
- milabench show candid performance numbers
No optimization beyond batch size scaling is performed
we want to measure the performance our researcher will see not the performance they could get.
- pytorch centric
Pytorch has become the defacto library for research
We are looking for accelerator with good maturity that can support this framework with limited code change.
What milabench is not
milabench goal is not a performance show case of an accelerator.