Design
======

Milabench aims to simulate research workloads for benchmarking purposes.

* Performance is measured as throughput (samples / secs).
  For example, for a model like resnet the throughput would be image per seconds.

* Single GPU workloads are spawned per GPU to ensure the entire machine is used.
  Simulating something similar to a hyper parameter search.
  The performance of the benchmark is the sum of throughput of each processes.

* Multi GPU workloads

* Multi Nodes


Run
---

* Milabench Manager Process
   * Handles messages from benchmark processes
   * Saves messages into a file for future analysis

* Benchmark processes
   * run using ``voir``
   * voir is configured to intercept and send events during the training process
   * This allow us to add models from git repositories without modification
   * voir sends data through a file descriptor that was created by milabench main process


What milabench is
-----------------

* Training focused
* milabench show candid performance numbers
   * No optimization beyond batch size scaling is performed
   * we want to measure the performance our researcher will see
     not the performance they could get.
* pytorch centric
   * Pytorch has become the defacto library for research
   * We are looking for accelerator with good maturity that can support
     this framework with limited code change.


What milabench is not
---------------------

* milabench goal is not a performance show case of an accelerator.