How to run an analysis in parallel

In this tutorial, we will demonstrate how to use the parallel resource executor of reportengine to run a validphys analysis (or any other reportengine app analysis). Typically, when running a validphys script, reportengine creates a directed acyclic graph (DAG) that is executed sequentially, meaning that each node must wait for the previous node to complete before it can be evaluated. This approach is not very efficient, especially if the nodes are independent of each other. The parallel execution of a reportengine task is based on dask.distributed (dask-distributed).

The main steps to follow when running a task in parallel are:

Initialize a dask-scheduler (this can be done, for instance, on a separate screen opened from command line)
```
$ dask-scheduler &
```
Note that the above command should output the scheduler address (e.g. Scheduler at: tcp://171.24.141.13:8786).
Assign some workers to the scheduler and specify the amount of memory each of them can use. For example:
```
$ dask-worker  <scheduler address> --nworkers 10 --nthreads 1 --memory-limit "14 GiB" &
```
Note that in the above example we also fixed the number of threads at disposal by each worker to be 1, this is important in order to avoid possible racing conditions triggered by the matplotlib library.
Run the process using the correct flags:
```
$ validphys <name of runcard> --parallel --scheduler <scheduler address>
```
running the process this way should output a client dashboard link (e.g.: http://172.24.142.17:8787/status) from which the status of the job can be monitored. Note that for this to work you will need the package bokeh with version >=2.4.3. This can be easily obtained, e.g., `pip install bokeh==2.4.3`.

The main thing to take care of when running a certain process is point 2 and, in particular, the amount of memory that is being assigned to each worker. In the following we will give some explicit examples of the use of memory for some standard validphys scripts.

Example 1: PDF plots, `validphys2/examples/plot_pdfs.yaml`

Suppose we have the following runcard

meta:
  title: PDF plot example
  author: Rosalyn Pearson
  keywords: [parallel, example]


pdfs:
  - {id: "NNPDF40_nlo_as_01180", label: "4.0 NLO"}
  - {id: "NNPDF40_nnlo_lowprecision", label: "4.0 NNLO low precision"}
  - {id: "NNPDF40_nnlo_as_01180", label: "4.0 NNLO"}


pdfs_noband: ["NNPDF40_nnlo_as_01180"] # Equivalently [3]

show_mc_errors: True

Q: 10

PDFnormalize:
  - normtitle: Absolute  # normalize_to default is None
  - normalize_to: 1      # Specify index in list of PDFs or name of PDF
    normtitle: Ratio

Basespecs:
  - basis: flavour
    basistitle: Flavour basis
  - basis: evolution
    basistitle: Evolution basis

PDFscalespecs:
  - xscale: log
    xscaletitle: Log
  - xscale: linear
    xscaletitle: Linear

template_text: |
  {@with PDFscalespecs@}
  {@xscaletitle@} scale
  =====================
  {@with Basespecs@}
  {@basistitle@}
  -------------
  {@with PDFnormalize@}
  {@normtitle@}
  {@plot_pdfs@}
  {@plot_pdf_uncertainties@}
  {@plot_pdfreplicas@}
  {@endwith@}
  {@endwith@}
  {@endwith@}

actions_:
  - report(main=True)

As an example we can run the above job by assigning 5 workers to the dask scheduler each of which has access to 5 GiB of memory for a total of 25 GiB:

$ dask-worker tcp://172.24.142.17:8786 --nworkers 5 --nthreads 1 --memory-limit "5 GiB"

We then run the task as

$ validphys plot_pdfs.yaml --parallel --scheduler tcp://172.24.142.17:8786

The time needed for this task (on a machine with 8 cores and 32 GiB of RAM) is

real  0m43.464s
user  0m2.419s
sys   0m0.607s

as compared to the sequential execution which gives

real  2m0.531s
user  8m20.506s
sys   1m45.868s

Example 2: Comparison of Fits

This example shows how to perform a comparison between two fits, that is, how to perform a vp-comparefits analysis using the parallel implementation. Note that this example is computationally more expensive, so it is recommended to run it on a computer with large memory availability.

Once a dask-scheduler has been initialised we assign to it the following workers

$ dask-worker <scheduler address> --nworkers 15 --nthreads 1 --memory-limit '13 GiB'

As a toy example we then compare the NNPDF40_nnlo_as_01180_1000 fit to itself:

$ vp-comparefits NNPDF40_nnlo_as_01180_1000 NNPDF40_nnlo_as_01180_1000 --title example --author mnc --keywords example --parallel --scheduler <scheduler address>

The time needed for this task on a computer with the following attributes

=========================================================================
Ubuntu 20.04.6 LTS (focal) in DAMTP
Host: zprime, Group: HEP, Kernel: Linux 5.4
Memory: 515890M, Swap: 16383M
Arch: x86_64, AMD EPYC 7453 28-Core Processor [28 cores]
Make: Giga Computing, Model: R182-Z91-00 Rack Mount Chassis
=========================================================================

is:

real  5m21.546s
user  0m17.064s
sys   0m4.401s

The time needed on the same machine when running the job sequentially is

real  30m22.245s
user  57m9.356s
sys   15m40.624s

Using dask without a Scheduler

It is possible to run validphys scripts without having to explicitly initialise a dask scheduler by simply adding a --parallel flag to the task:

validphys <name script> --parallel

this method, however, should not be used for analyses that are computationally more expensive than plot_pdfs.yaml since the default memory limit that is assigned to each worker could potentially not be enough to carry out the task.

How to run an analysis in parallel

Example 1: PDF plots, validphys2/examples/plot_pdfs.yaml

Example 2: Comparison of Fits

Using dask without a Scheduler

Example 1: PDF plots, `validphys2/examples/plot_pdfs.yaml`