How to run a PDF fit
The user should perform the steps documented below in order to obtain a complete
PDF fit using the latest release of the NNPDF fitting code: n3fit
.
The fitting methodology is detailed in the [Methodology](methodology) page.
Preparing a fit runcard
The runcard is written in YAML. The runcard is the unique identifier of a fit and contains all required information to perform a fit, which includes the experimental data, the theory setup and the fitting setup.
A detailed explanation on the parameters accepted by the n3fit
runcards
can be found in the detailed guide.
For newcomers, it is recommended to start from an already existing runcard,
example runcards (and runcard used in NNPDF releases) are available at
n3fit/runcards.
The runcards are mostly self explanatory, see for instance below an
example of the parameter
dictionary that defines the Machine Learning framework.
# runcard example
...
parameters:
nodes_per_layer: [15, 10, 8]
activation_per_layer: ['sigmoid', 'sigmoid', 'linear']
initializer: 'glorot_normal'
optimizer:
optimizer_name: 'RMSprop'
learning_rate: 0.01
clipnorm: 1.0
epochs: 900
positivity:
multiplier: 1.05
threshold: 1e-5
stopping_patience: 0.30 # Ratio of the number of epochs
layer_type: 'dense'
dropout: 0.0
...
The runcard system is designed such that the user can utilize the program
without having to tinker with the codebase.
One can simply modify the options in parameters
to specify the
desired architecture of the Neural Network as well as the settings for the optimization algorithm.
An important feature of n3fit
is the ability to perform hyperparameter scans,
for this we have also introduced a hyperscan_config
key which specifies
the trial ranges for the hyperparameter scan procedure.
See the following self-explanatory example:
hyperscan_config:
stopping: # setup for stopping scan
min_epochs: 5e2 # minimum number of epochs
max_epochs: 40e2 # maximum number of epochs
min_patience: 0.10 # minimum stop patience
max_patience: 0.40 # maximum stop patience
positivity: # setup for the positivity scan
min_multiplier: 1.04 # minimum lagrange multiplier coeff.
max_multiplier: 1.1 # maximum lagrange multiplier coeff.
min_initial: 1.0 # minimum initial penalty
max_initial: 5.0 # maximum initial penalty
optimizer: # setup for the optimizer scan
- optimizer_name: 'Adadelta'
learning_rate:
min: 0.5
max: 1.5
- optimizer_name: 'Adam'
learning_rate:
min: 0.5
max: 1.5
architecture: # setup for the architecture scan
initializers: 'ALL' # Use all implemented initializers from keras
max_drop: 0.15 # maximum dropout probability
n_layers: [2,3,4] # number of layers
min_units: 5 # minimum number of nodes
max_units: 50 # maximum number of nodes
activations: ['sigmoid', 'tanh'] # list of activation functions
It is also possible to take the configuration of the hyperparameter scan from a previous
run in the NNPDF server by using the key from_hyperscan
:
hyperscan_config:
from_hyperscan: 'some_previous_hyperscan'
or to directly take the trials from said hyperscan:
hyperscan_config:
use_tries_from: 'some_previous_hyperscan'
Running the fitting code
After successfully installing the n3fit
package and preparing a runcard
following the points presented above you can proceed with a fit.
Prepare the fit:
vp-setupfit runcard.yml
. This command will generate a folder with the same name as the runcard (minus the file extension) in the current directory, which will contain a copy of the original YAML runcard. The required resources (such as the theory and t0 PDF set) will be downloaded automatically. Alternatively they can be obtained with thevp-get
tool.The
n3fit
program takes aruncard.yml
as input and a replica number, e.g.n3fit runcard.yml replica
wherereplica
goes from 1-n where n is the maximum number of desired replicas. Note that if you desire, for example, a 100 replica fit you should launch more than 100 replicas (e.g. 130) because not all of the replicas will pass the checks inpostfit
(see here for more info).Wait until you have fit results. Then run the
evolven3fit
program once to evolve all replicas using DGLAP. The arguments areevolven3fit evolve runcard_folder
.Wait until you have results, then use
postfit number_of_replicas runcard_folder
to finalize the PDF set by applying post selection criteria. This will produce a set ofnumber_of_replicas + 1
replicas. This time the number of replicas should be that which you desire in the final fit (100 in the above example). Note that the standard behaviour ofpostfit
can be modified by using various flags. More information can be found at Processing a fit.
It is possible to run more than one replica in one single run of n3fit
by
using the --replica_range
option. Running n3fit
in this way increases the
memory usage as all replicas need to be stored in memory but decreases disk load
as the reading of the datasets and fktables is only done once for all replicas.
If you are planning to perform a hyperparameter scan just perform exactly the
same steps by adding the --hyperopt number_of_trials
argument to n3fit
,
where number_of_trials
is the maximum allowed value of trials required by the
fit. Usually when running hyperparameter scan we switch-off the MC replica
generation so different replicas will correspond to different initial points for
the scan, this approach provides faster results. We provide the vp-hyperoptplot
script to analyse the output of the hyperparameter scan.
Output of the fit
Every time a replica is finalized, the output is saved to the `runcard/nnfit/replica_$replica`
folder, which contains a number of files:
chi2exps.log
: a json log file with the χ² of the training every 100 epochs.runcard.exportgrid
: a file containing the PDF grid.runcard.json
: Includes information about the fit (metadata, parameters, times) in json format.
Note
The reported χ² refers always to the actual χ², i.e., without positivity loss or other penalty terms.
Upload and analyse the fit
After obtaining the fit you can proceed with the fit upload and analisis by:
Uploading the results using
vp-upload runcard_folder
then install the fitted set withvp-get fit fit_name
.Analysing the results with
validphys
, see the vp-guide. Consider using thevp-comparefits
tool.
Performance of the fit
The n3fit
framework is currently based on Keras
and it is tested to run with the Tensorflow
and pytorch backends.
This also means that anything that make any of these packages faster will also
make n3fit
faster.
Note that at the time of writing, TensorFlow
is approximately 4 times faster than pytorch
.
The default backend for keras
is tensorflow
.
In order to change the backend, the environment variable KERAS_BACKENDD
need to be set (e.g., KERAS_BACKEND=torch
).
The best results are obtained with tensorflow[and-cuda]
installed from pip.
When you install the nnpdf conda package, you get the
tensorflow-eigen package,
which is not the default. This is due to a memory explosion found in some of
the conda mkl builds.
If you want to disable MKL without installing tensorflow-eigen
you can always
set the environment variable TF_DISABLE_MKL=1
before running n3fit
.
When running n3fit
all versions of the package show similar performance.
When using the MKL version of tensorflow you gain more control of the way Tensorflow will use the multithreading capabilities of the machine by using the following environment variables:
KMP_BLOCKTIME=0
KMP_AFFINITY=granularity=fine,verbose,compact,1,0
These are the best values found for n3fit
when using the mkl version of Tensorflow from conda
and were found for TF 2.1 as the default values were suboptimal.
For a more detailed explanation on the effects of KMP_AFFINITY
on the performance of
the code please see
here.
By default, n3fit
will try to use as many cores as possible, but this behaviour can be overriden
from the runcard with the maxcores
parameter. In our tests the point of diminishing returns is found
at maxcores=4
.
Note that everything stated above is machine dependent so the best parameters for you might be
very different. When testing, it is useful to set the environmental variable KMP_SETTINGS
to 1
to obtain detailed information about the current variables being used by OpenMP.
Below we present a benchmark that have been run for the Global NNPDF 3.1 case, as found in the example runcards folder.
- Settings of the benchmark:
TF version: tensorflow-eigen from conda, TF 2.2
NNPDF commit: f878fc95a4f32e8c3b4c454fc12d438cbb87ea80
Number of epochs: 5000
maxcores: 4
no early stopping
- Hardware:
Intel(R) Core(TM) i7-6700 CPU @ 4.00GHz
16 GB RAM 3000 MHz DDR4
- Timing for a fit:
Walltime: 397s
CPUtime: 1729s
Iterate the fit
It may be desirable to iterate a fit to achieve a higher degree of convergence/stability in the fit. To read more about this, see How to run an iterated fit.
QED fit
In order to run a QED fit see How to run a QED fit.