How to run a PDF fit

The user should perform the steps documented below in order to obtain a complete PDF fit using the latest release of the NNPDF fitting code: n3fit. The fitting methodology is detailed in the [Methodology](methodology) page.

Preparing a fit runcard

The runcard is written in YAML. The runcard is the unique identifier of a fit and contains all required information to perform a fit, which includes the experimental data, the theory setup and the fitting setup.

A detailed explanation on the parameters accepted by the n3fit runcards can be found in the detailed guide.

For newcomers, it is recommended to start from an already existing runcard, example runcards (and runcard used in NNPDF releases) are available at n3fit/runcards. The runcards are mostly self explanatory, see for instance below an example of the parameter dictionary that defines the Machine Learning framework.

# runcard example
...
parameters:
  nodes_per_layer: [15, 10, 8]
  activation_per_layer: ['sigmoid', 'sigmoid', 'linear']
  initializer: 'glorot_normal'
  optimizer:
    optimizer_name: 'RMSprop'
    learning_rate: 0.01
    clipnorm: 1.0
  epochs: 900
  positivity:
    multiplier: 1.05
    threshold: 1e-5
  stopping_patience: 0.30 # Ratio of the number of epochs
  layer_type: 'dense'
  dropout: 0.0
...

The runcard system is designed such that the user can utilize the program without having to tinker with the codebase. One can simply modify the options in parameters to specify the desired architecture of the Neural Network as well as the settings for the optimization algorithm.

An important feature of n3fit is the ability to perform hyperparameter scans, for this we have also introduced a hyperscan_config key which specifies the trial ranges for the hyperparameter scan procedure. See the following self-explanatory example:

hyperscan_config:
    stopping: # setup for stopping scan
        min_epochs: 5e2  # minimum number of epochs
        max_epochs: 40e2 # maximum number of epochs
        min_patience: 0.10 # minimum stop patience
        max_patience: 0.40 # maximum stop patience
    positivity: # setup for the positivity scan
        min_multiplier: 1.04 # minimum lagrange multiplier coeff.
        max_multiplier: 1.1 # maximum lagrange multiplier coeff.
        min_initial: 1.0 # minimum initial penalty
        max_initial: 5.0 # maximum initial penalty
    optimizer: # setup for the optimizer scan
        - optimizer_name: 'Adadelta'
          learning_rate:
            min: 0.5
            max: 1.5
        - optimizer_name: 'Adam'
          learning_rate:
            min: 0.5
            max: 1.5
    architecture: # setup for the architecture scan
        initializers: 'ALL' # Use all implemented initializers from keras
        max_drop: 0.15 # maximum dropout probability
        n_layers: [2,3,4] # number of layers
        min_units: 5 # minimum number of nodes
        max_units: 50 # maximum number of nodes
        activations: ['sigmoid', 'tanh'] # list of activation functions

It is also possible to take the configuration of the hyperparameter scan from a previous run in the NNPDF server by using the key from_hyperscan:

hyperscan_config:
  from_hyperscan: 'some_previous_hyperscan'

or to directly take the trials from said hyperscan:

hyperscan_config:
  use_tries_from: 'some_previous_hyperscan'

Running the fitting code

After successfully installing the n3fit package and preparing a runcard following the points presented above you can proceed with a fit.

  1. Prepare the fit: vp-setupfit runcard.yml. This command will generate a folder with the same name as the runcard (minus the file extension) in the current directory, which will contain a copy of the original YAML runcard. The required resources (such as the theory and t0 PDF set) will be downloaded automatically. Alternatively they can be obtained with the vp-get tool.

  2. The n3fit program takes a runcard.yml as input and a replica number, e.g. n3fit runcard.yml replica where replica goes from 1-n where n is the maximum number of desired replicas. Note that if you desire, for example, a 100 replica fit you should launch more than 100 replicas (e.g. 130) because not all of the replicas will pass the checks in postfit (see here for more info).

  3. Wait until you have fit results. Then run the evolven3fit program once to evolve all replicas using DGLAP. The arguments are evolven3fit evolve runcard_folder.

  4. Wait until you have results, then use postfit number_of_replicas runcard_folder to finalize the PDF set by applying post selection criteria. This will produce a set of number_of_replicas + 1 replicas. This time the number of replicas should be that which you desire in the final fit (100 in the above example). Note that the standard behaviour of postfit can be modified by using various flags. More information can be found at Processing a fit.

It is possible to run more than one replica in one single run of n3fit by using the --replica_range option. Running n3fit in this way increases the memory usage as all replicas need to be stored in memory but decreases disk load as the reading of the datasets and fktables is only done once for all replicas.

If you are planning to perform a hyperparameter scan just perform exactly the same steps by adding the --hyperopt number_of_trials argument to n3fit, where number_of_trials is the maximum allowed value of trials required by the fit. Usually when running hyperparameter scan we switch-off the MC replica generation so different replicas will correspond to different initial points for the scan, this approach provides faster results. We provide the vp-hyperoptplot script to analyse the output of the hyperparameter scan.

Output of the fit

Every time a replica is finalized, the output is saved to the `runcard/nnfit/replica_$replica` folder, which contains a number of files:

  • chi2exps.log: a json log file with the χ² of the training every 100 epochs.

  • runcard.exportgrid: a file containing the PDF grid.

  • runcard.json: Includes information about the fit (metadata, parameters, times) in json format.

Note

The reported χ² refers always to the actual χ², i.e., without positivity loss or other penalty terms.

Upload and analyse the fit

After obtaining the fit you can proceed with the fit upload and analisis by:

  1. Uploading the results using vp-upload runcard_folder then install the fitted set with vp-get fit fit_name.

  2. Analysing the results with validphys, see the vp-guide. Consider using the vp-comparefits tool.

Performance of the fit

The n3fit framework is currently based on Keras and it is tested to run with the Tensorflow and pytorch backends. This also means that anything that make any of these packages faster will also make n3fit faster. Note that at the time of writing, TensorFlow is approximately 4 times faster than pytorch.

The default backend for keras is tensorflow. In order to change the backend, the environment variable KERAS_BACKENDD need to be set (e.g., KERAS_BACKEND=torch).

The best results are obtained with tensorflow[and-cuda] installed from pip. When you install the nnpdf conda package, you get the tensorflow-eigen package, which is not the default. This is due to a memory explosion found in some of the conda mkl builds.

If you want to disable MKL without installing tensorflow-eigen you can always set the environment variable TF_DISABLE_MKL=1 before running n3fit. When running n3fit all versions of the package show similar performance.

When using the MKL version of tensorflow you gain more control of the way Tensorflow will use the multithreading capabilities of the machine by using the following environment variables:

KMP_BLOCKTIME=0
KMP_AFFINITY=granularity=fine,verbose,compact,1,0

These are the best values found for n3fit when using the mkl version of Tensorflow from conda and were found for TF 2.1 as the default values were suboptimal. For a more detailed explanation on the effects of KMP_AFFINITY on the performance of the code please see here.

By default, n3fit will try to use as many cores as possible, but this behaviour can be overriden from the runcard with the maxcores parameter. In our tests the point of diminishing returns is found at maxcores=4.

Note that everything stated above is machine dependent so the best parameters for you might be very different. When testing, it is useful to set the environmental variable KMP_SETTINGS to 1 to obtain detailed information about the current variables being used by OpenMP.

Below we present a benchmark that have been run for the Global NNPDF 3.1 case, as found in the example runcards folder.

Settings of the benchmark:
Hardware:
  • Intel(R) Core(TM) i7-6700 CPU @ 4.00GHz

  • 16 GB RAM 3000 MHz DDR4

Timing for a fit:
  • Walltime: 397s

  • CPUtime: 1729s

Iterate the fit

It may be desirable to iterate a fit to achieve a higher degree of convergence/stability in the fit. To read more about this, see How to run an iterated fit.

QED fit

In order to run a QED fit see How to run a QED fit.