How to run an analysis in parallel
==================================
In this tutorial, we will demonstrate how to use the parallel resource executor
of reportengine to run a ``validphys`` analysis (or any other ``reportengine`` app analysis).
Typically, when running a ``validphys`` script, ``reportengine`` creates a directed acyclic
graph (DAG) that is executed sequentially, meaning that each node must wait for the previous
node to complete before it can be evaluated. This approach is not very efficient, especially
if the nodes are independent of each other. The parallel execution of a reportengine task is
based on ``dask.distributed`` (`dask-distributed `_).
The main steps to follow when running a task in parallel are:
1. Initialize a dask-scheduler (this can be done, for instance, on a separate screen opened from
command line)
.. code:: bash
$ dask-scheduler &
Note that the above command should output the scheduler address (e.g. Scheduler at: tcp://171.24.141.13:8786).
2. Assign some workers to the scheduler and specify the amount of memory each of them can use. For example:
.. code:: bash
$ dask-worker --nworkers 10 --nthreads 1 --memory-limit "14 GiB" &
Note that in the above example we also fixed the number of threads at disposal by each worker to
be 1, this is important in order to avoid possible racing conditions triggered by the matplotlib
library.
3. Run the process using the correct flags:
.. code:: bash
$ validphys --parallel --scheduler
running the process this way should output a client dashboard link (e.g.:
http://172.24.142.17:8787/status) from which the status of the job can be
monitored. Note that for this to work you will need the package ``bokeh``
with version ``>=2.4.3``. This can be easily obtained, e.g., ```pip install bokeh==2.4.3```.
The main thing to take care of when running a certain process is point 2 and, in particular,
the amount of memory that is being assigned to each worker. In the following we will give some
explicit examples of the use of memory for some standard validphys scripts.
Example 1: PDF plots, ``validphys2/examples/plot_pdfs.yaml``
----------------------------------------------------------
Suppose we have the following runcard
.. code:: yaml
meta:
title: PDF plot example
author: Rosalyn Pearson
keywords: [parallel, example]
pdfs:
- {id: "NNPDF40_nlo_as_01180", label: "4.0 NLO"}
- {id: "NNPDF40_nnlo_lowprecision", label: "4.0 NNLO low precision"}
- {id: "NNPDF40_nnlo_as_01180", label: "4.0 NNLO"}
pdfs_noband: ["NNPDF40_nnlo_as_01180"] # Equivalently [3]
show_mc_errors: True
Q: 10
PDFnormalize:
- normtitle: Absolute # normalize_to default is None
- normalize_to: 1 # Specify index in list of PDFs or name of PDF
normtitle: Ratio
Basespecs:
- basis: flavour
basistitle: Flavour basis
- basis: evolution
basistitle: Evolution basis
PDFscalespecs:
- xscale: log
xscaletitle: Log
- xscale: linear
xscaletitle: Linear
template_text: |
{@with PDFscalespecs@}
{@xscaletitle@} scale
=====================
{@with Basespecs@}
{@basistitle@}
-------------
{@with PDFnormalize@}
{@normtitle@}
{@plot_pdfs@}
{@plot_pdf_uncertainties@}
{@plot_pdfreplicas@}
{@endwith@}
{@endwith@}
{@endwith@}
actions_:
- report(main=True)
As an example we can run the above job by assigning 5 workers to the dask scheduler
each of which has access to 5 GiB of memory for a total of 25 GiB:
.. code:: bash
$ dask-worker tcp://172.24.142.17:8786 --nworkers 5 --nthreads 1 --memory-limit "5 GiB"
We then run the task as
.. code:: bash
$ validphys plot_pdfs.yaml --parallel --scheduler tcp://172.24.142.17:8786
The time needed for this task (on a machine with 8 cores and 32 GiB of RAM) is
.. code:: console
real 0m43.464s
user 0m2.419s
sys 0m0.607s
as compared to the sequential execution which gives
.. code:: console
real 2m0.531s
user 8m20.506s
sys 1m45.868s
Example 2: Comparison of Fits
-----------------------------
This example shows how to perform a comparison between two fits,
that is, how to perform a ``vp-comparefits`` analysis using the
parallel implementation. Note that this example is computationally more
expensive, so it is recommended to run it on a computer with large memory availability.
Once a ``dask-scheduler`` has been initialised we assign to it the following workers
.. code:: bash
$ dask-worker --nworkers 15 --nthreads 1 --memory-limit '13 GiB'
As a toy example we then compare the `NNPDF40_nnlo_as_01180_1000` fit to itself:
.. code:: bash
$ vp-comparefits NNPDF40_nnlo_as_01180_1000 NNPDF40_nnlo_as_01180_1000 --title example --author mnc --keywords example --parallel --scheduler
The time needed for this task on a computer with the following attributes
.. code::
=========================================================================
Ubuntu 20.04.6 LTS (focal) in DAMTP
Host: zprime, Group: HEP, Kernel: Linux 5.4
Memory: 515890M, Swap: 16383M
Arch: x86_64, AMD EPYC 7453 28-Core Processor [28 cores]
Make: Giga Computing, Model: R182-Z91-00 Rack Mount Chassis
=========================================================================
is:
.. code::
real 5m21.546s
user 0m17.064s
sys 0m4.401s
The time needed on the same machine when running the job sequentially is
.. code::
real 30m22.245s
user 57m9.356s
sys 15m40.624s
Using dask without a Scheduler
==============================
It is possible to run validphys scripts without having to explicitly
initialise a dask scheduler by simply adding a ``--parallel`` flag to the task:
.. code:: bash
validphys --parallel
this method, however, should not be used for analyses that are computationally
more expensive than ``plot_pdfs.yaml`` since the default memory limit that
is assigned to each worker could potentially not be enough to carry out the task.