Design

Milabench aims to simulate research workloads for benchmarking purposes.

  • Performance is measured as throughput (samples / secs). For example, for a model like resnet the throughput would be image per seconds.

  • Single GPU workloads are spawned per GPU to ensure the entire machine is used. Simulating something similar to a hyper parameter search. The performance of the benchmark is the sum of throughput of each processes.

  • Multi GPU workloads

  • Multi Nodes

Run

  • Milabench Manager Process
    • Handles messages from benchmark processes

    • Saves messages into a file for future analysis

  • Benchmark processes
    • run using voir

    • voir is configured to intercept and send events during the training process

    • This allow us to add models from git repositories without modification

    • voir sends data through a file descriptor that was created by milabench main process

What milabench is

  • Training focused

  • milabench show candid performance numbers
    • No optimization beyond batch size scaling is performed

    • we want to measure the performance our researcher will see not the performance they could get.

  • pytorch centric
    • Pytorch has become the defacto library for research

    • We are looking for accelerator with good maturity that can support this framework with limited code change.

What milabench is not

  • milabench goal is not a performance show case of an accelerator.