n3fit
runcard detailed guide
In this section we fine-grain the explanation of the different parameters that enter the runcard.
Dataset selection
The first thing one finds when building a fit runcard for
nnpdf
is the dataset selection, dataset_inputs.
dataset_inputs:
- { dataset: SLAC_NC_NOTFIXED_P_EM-F2, frac: 0.5, variant: legacy_dw}
- { dataset: NMC_NC_NOTFIXED_EM-F2, frac: 0.5, variant: legacy_dw }
- { dataset: ATLAS_Z0J_8TEV_PT-M, frac: 0.75, variant: legacy_10}
The dataset_inputs key contains a list of dictionaries defining the datasets to be used in the fit as well as their options (which are detailed in DataSetSpec - Core dataset object).
Training / Validation split
The fraction of events that are considered for the training and validation sets is defined by the frac
key in the experiment:dataset
parameter of the nnpdf runcard. A fraction of X
means that X
of the event will go into the training set while 1-X
will enter the validation set for that dataset.
dataset_inputs:
- { dataset: SLAC_NC_NOTFIXED_P_EM-F2, frac: 0.75, variant: legacy_dw}
It is possible to run a fit with no validation set by setting the fraction to 1.0
, in this case the training set will be used as validation set.
The random seed for the training/validation split is defined by the variable trvlseed
.
By default the seed is further modified by the replica index, but it is possible
to fix it such that it is the same for all replicas with same_trvl_per_replica
(false
by default).
trvlseed: 7
same_trvl_per_replica: true
Preprocessing
The behaviour of the preprocessing in the n3fit
code is controlled, as in the old nnfit
code, through the fitting:basis
parameter of the nnpdf runcard.
The preprocessing factor applied to every flavour of the basis is:
This parameter accepts a list of the size of the chosen basis with a number of parameter for each flavour. The parameters used in n3fit
are:
fl
: name of the flavour, this name will be use to define the name of the weights asalpha_{fl}
andbeta_{fl}
.smallx
: range of thealpha
largex
: range of thebeta
trainable
: sets the flavour basis to be trainable or not, defaults toTrue
Setting the trainable
flag to False
is equivalent to recovering the old behaviour of nnfit
.
fitting:
basis:
# smallx, largex: preprocessing ranges
- { fl: sng, smallx: [1.05,1.19], largex: [1.47,2.70], trainable: False }
- { fl: g, smallx: [0.94,1.25], largex: [0.11,5.87], trainable: False }
- { fl: v, smallx: [0.54,0.75], largex: [1.15,2.76], trainable: False }
- { fl: v3, smallx: [0.21,0.57], largex: [1.35,3.08] }
- { fl: v8, smallx: [0.52,0.76], largex: [0.77,3.56], trainable: True }
- { fl: t3, smallx: [-0.37,1.52], largex: [1.74,3.39] }
- { fl: t8, smallx: [0.56,1.29], largex: [1.45,3.03] }
- { fl: cp, smallx: [0.12,1.19], largex: [1.83,6.70] }
It is important to determine the correct values for the largex
and smallx
preprocessing
ranges. For example setting a poor range for those parameters can result in a conflict with the
positivity or integrability constraints, making it such
that no replicas can satisfy those constraints. In most cases when changes are made to a runcard,
they will have a relatively small effect on the required preprocessing ranges. This includes common
variations to runcards such as changing the datasets, or settings related to the training of the
neural network. In these cases running an iterated fit is likely the
easiest way to obtain a satisfactory range of the preprocessing. However, in some cases, such as for
example a change of PDF basis where the preprocessing ranges obtain a different meaning entirely,
we don’t know what a good starting point for the ranges would be. One way to identify good ranges
is by opening up the smallx
and large
parameters for large ranges and setting
trainable: True
. This way the preprocessing exponents will be considered part of the free
parameters of the model, and as such they will be fitted by the optimization algorithm.
NNPDF4.0 fits are run with trainable: False
, because trainable preprocessing exponents can lead
to an underestimation of the PDF uncertainties in the extrapolation domain. So after determining a
reasonable range for the preprocessing exponents, a new runcard should be generated using
vp-nextfitruncard
as explained in :ref:_run-iterated-fit. In this runcard one should then
manually set trainable: False
for all preprocessing exponents before running the iterated fit.
It can take more than one iteration before the iterated fits have converged to stable values for the
preprocessing ranges.
Note that the script vp-nextfitruncard
automatically enforces some constraints
on preprocessing ranges, which are required for integrability of certain
flavours. Specifically clipping the maximum value of the small-x exponent
as \(\alpha \leq 1\) for the valence PDFs and triplets T3 and T8.
More details on those limits, and how to disable them can be found
by running
$ vp-nextfitruncard --help
More information on vp-nextfitruncard
can be found in
How to run an iterated fit.
Network Architecture
There are different network architectures implemented in n3fit
.
Which can be selected by changing the parameters::layer_type
parameter in the runcard.
All layer types implement the nodes_per_layer
, activation_per_layer
and initializer
parameters.
parameters:
nodes_per_layer: [5, 3, 8]
activation_per_layer: ['tanh', 'tanh', 'linear']
layer_type: 'dense_per_flavour'
initializer: 'glorot_normal'
One single network (
layer_type: dense
):- Extra accepted parameters:
- dropout: float
see keras dropout
- regularizer: str
- regularizer_args: dict
choice arguments for the regularizer
In this mode all nodes are connected with all nodes of the next layer. In this case there is one single network which take as input the value of x
(and log(x)
) and outputs all different flavours.
In this case the nodes_per_layer
parameter represents the nodes each one of these layers has. For instance [40, 20, 8]
corresponds to a network where the first layer is a matrix (2x40)
(the input is x, log(x)
), the second layer is a matrix (40x20)
and the third and final one (20x8)
.
One network per flavour (
layer_type: dense_per_flavour
):
This mode is designed to behave as the methodology for NNPDF before 3.1 where each flavour has a separated identical network.
In this case the nodes_per_layer
parameter represents the nodes each layer of each flavour has. For instance [5, 3, 8]
means that the first step is a list of 8 layers of shape (2x5)
, while the second layer is again a list that matches the previous one (i.e., 8 layers) with layers of shape (5x3)
while the last layer has two task. The output of each layer should be one single element (i.e., 8 (3x1)
layers) and then concatenate them all so that the final output of the neural network will be a 8-elements tensor. A report comparing the dense
and dense_per_flavour
architectures can be found here
Optimizer
One of the most important parameters defining the training of the Neural Network is the choice of optimizer (and its corresponding options).
parameters:
optimizer:
optimizer_name: 'Adadelta'
learning_rate: 1.0
clipnorm: 1.0
The full list of optimizers accepted by the n3fit
and their arguments
can be checked in the MetaModel file.
Positivity
In n3fit
the behavior of the positivity observables has changed with respect to nnfit
.
In nnfit
the loss due to the positivity observable was multiplied by a maxlambda
for each observable, defined in the runcard as:
positivity:
posdatasets:
- {dataset: POSF2U, maxlambda: 1e6}
This behavior was found to be very inefficient for gradient descent based strategies and was exchanged for a dynamical Lagrange multiplier.
The dynamical multiplier is defined in terms of a initial value and a multiplier to be applied each 100 epochs.
Both the initial value and the 100 epochs multiplier are defined as an optional positivity
dictionary alongside the hyperparameters of
the Neural Network as:
parameters:
positivity:
threshold: 1e-6
multiplier: 1.05
initial: 14.5
Note that by defining the positivity in this way all datasets will share the same Lagrange multiplier.
It is also possible to not define the positivity hyperparameters (or define them only partially).
In this case n3fit
will set the initial Lagrange multiplier as initial
(default: 1.0)
while the multiplier
will be such that after the last epoch the final Lagrange multiplier
equals the maxlambda
defined for the dataset.
Finally we have the positivity threshold, which is set to 1e-6
by default.
During the fit, the positivity loss will be compared to this value. If it is above it,
the positivity won’t be considered good (and thus the fit won’t stop).
If the replica reaches the maximum number of epochs with the positivity loss above
this value, it will be tagged as POS_VETO
and the replica removed from postfit.
Integrability
Integrability in n3fit
is enforced through a Lagrange multiplier, this is
the same basic concept as how positivity is enforced, and therefore the
input in the runcard is analogous to the case of positivity where one can
apply the integrability contraints through an optional integrability
dictionary as (not that as opposed to positivity, for integrability no
threshold value can be set):
parameters:
integrability:
multiplier: 1.05
initial: 14.5
Again similar to positivity, it is also possible to leave either the initial
or multiplier
keys empty and instead define a maxlambda
per dataset:
integrability:
integdatasets:
- {dataset: INTEGXT8, maxlambda: 1e2}
Regularized covariance matrices
The covariance matrix regularization is controlled by a norm_threshold parameter. By default if the parameter is not set, no regularization is applied. It is found that the effect of aggressive correlation models can be tested by setting values of 4 (which roughly corresponds to the assumptions that correlations are controlled to an accuracy of better than 35%) or higher.
norm_threshold: 4
Inspecting and profiling the code
It is possible to inspect the n3fit
code using TensorBoard.
In order to enable the TensorBoard callback in n3fit
it is enough with adding the following options in the runcard:
tensorboard:
weight_freq: 100
profiling: True
The weight_freq
flag controls each how many epochs the weights of the NN are stored.
Note that smaller values will lead to slower performance and increased memory usage.
After the n3fit
run has finished, details of the run can be found in the replica directory, under the tboard
subfolder.
Logging details can be visualized in the browser with the following command:
tensorboard --logdir runcard_name/nnfit/replica_1/tboard
Logging details will include the value of the loss for each experiment over time, the values of the weights of the NN, as well as a detailed analysis of the amount of time that TensorFlow spent on each operation.
Running fits in parallel
It is possible to run fits in parallel with n3fit
by setting the parallel_models
flag in the runcard to true
when running a range of replicas.
Running in parallel can be quite hard on memory and it is only advantageous when
fitting on a GPU, where one can find a speed up equal to the number of models run
in parallel (each model being a different replica).
When running in parallel it might be advantageous (e.g., for debugging) to set the training validation split to be equal for all replicas, this can be done with the same_trvl_per_replica: true runcard flag.
In other words, in order to run several replicas in parallel in a machine
(be it a big CPU or, most likely, a GPU)
it is necessary to modify the n3fit
runcard by adding the following
top-level option:
parallel_models: true
Note that currently, in order to run with parallel models, one has to set savepseudodata: false
in the fitting
section of the runcard. Once this is done, the user can run n3fit
with a
replica range to be parallelized (in this case from replica 1 to replica 4).
n3fit runcard.yml 1 -r 4
In machines with more than one GPU you can select the GPU in which the code
should run by setting the environment variable CUDA_VISIBLE_DEVICES
to the right index (usually 0, 1, 2
) or leaving it explicitly empty
to avoid running on GPU: export CUDA_VISIBLE_DEVICES=""
Note that in order to run the replicas in parallel using the GPUs of an Apple Silicon computer (like M1 Mac), it is necessary to also install the following packages:
conda install -c apple tensorflow-deps
pip install tensorflow-macos==2.13.0 tensorflow-metal wandb==0.15.9
See also the following issue for more information: protobuf issue.
Other options
Threshold \(\chi2\)
parameters:
threshold_chi2: 4.0
threshold_chi2
: sets a maximum validation \(\chi2\) for the stopping to activate. Avoids (too) early stopping.
Save and load weights of the model
save: "weights.h5"
load: "weights.h5"
save
: saves the weights of the PDF model in the selected file in the replica folder.load
: loads the weights of the PDF model from the selected file.
Since the weights depend only on the architecture of the Neural Network, it is possible to save the weights of a Neural Network trained with one set of hyperparameters and experiments and load it in a different runcard and continue the training from there.
While the load file is read as an absolute path, the file to save to will be found inside the replica folder.
Saving and loading fit pseudodata
If the user wishes to save the Monte Carlo pseudodata used for each replica within a fit,
they can do so using the savepseudodata
flag under the fitting
top-level namespace:
fitting:
savepseudodata: true
This will cause a csv
file to be saved for each replica under
<fit_directory>/replica_<number>/datacuts_theory_fitting_training_pseudodata.csv
and
<fit_directory>/replica_<number>/datacuts_theory_fitting_validation_pseudodata.csv
for the training and validation splits respectively. The data points are indexed
according to their experiment. Additionally, the union of these two is saved in
<fit_directory>/replica_<number>/datacuts_theory_fitting_pseudodata_table.csv
if one is not interested in the exact nature of the splitting.
Imposing sum rules
By default in n3fit
sum rules are imposed following the definitions in Eq. (10) of
the NNPDF3.0 paper.
It is however possible to disable them by setting to false the sum_rules
flag.
fitting:
sum_rules: False
It is also possible to impose just the valence or the momentum sum rules by using the
VSR
or MSR
flags, respectively (True
is equal to All
).