How to include a theory covariance matrix in a fit

This section details how to include scale variation covariance matrices (covmats) in a PDF fit.

First, decide which theory covmat you want

Choose the desired point-prescription listed here.
Each prescription comes with a point_prescription flag to include in the runcard, one of [“3 point”, “5 point”, “5bar point”, “7 point”, “9 point”]

Next, add necessary flags to the runcard

Remember to list the required datasets using dataset_inputs (see Data specification).
Add theorycovmatconfig to the runcard. An example is in the following code snippet:

############################################################
theory:
  theoryid: 163        # database id

theorycovmatconfig:
  point_prescription: "3 point"
  theoryids:
    from_: scale_variation_theories
  pdf: NNPDF31_nlo_as_0118
  use_thcovmat_in_fitting: true
  use_thcovmat_in_sampling: true

############################################################

pdf is the PDF used to generate the scale varied predictions which construct the theory covmat. Choose something close to the PDF you are trying to fit, such as a previous iteration if available.
theoryids are necessary for the construction of the theory covmat. To avoid user error in entering them in the correct configuration and order, this is handled by the produce_scale_variation_theories action in config, using the information in the scalevariations module.
The flags use_thcovmat_in_fitting and use_thcovmat_in_sampling specify where to use the theory covmat in the code. There are two possible places: the fitting (i.e. \(\chi^2\) minimiser) and the sampling (i.e. pseudodata generation). The default is True for both.

Warning

Changing either of these to False will affect the fit outcome and should be avoided unless you know what you are doing.

If you want to compare data to another fit

Sometimes we want to compare data to another fit for validation, for example we might want to compare predictions for the NLO fit with MHOUs to the known NNLO fit (see Tests).
To make sure the cuts match between these two fits, edit the datacuts section of the runcard to include the following

use_cuts: fromintersection
cuts_intersection_spec:
- theoryid: 163
- theoryid: 53

This ensures that the cuts on the data are the intersection of the cuts in theory 53 (default NNLO) and theory 163 (central scale variation NLO). See here for theory definitions.

Example runcard

The following is an example runcard for an NLO NNPDF3.1-style fit with a 3 point theory covmat. It can be found here.

#
# Configuration file for NNPDF++
#
##########################################################################################
description: Example runcard for NLO NNPDF3.1 style fit with 3pt theory covariance matrix

##########################################################################################
# frac: training fraction
# ewk: apply ewk k-factors
# sys: systematics treatment (see systypes)
dataset_inputs:
  - {dataset: NMCPD, frac: 0.5}
  - {dataset: NMC, frac: 0.5}
  - {dataset: SLACP, frac: 0.5}
  - {dataset: SLACD, frac: 0.5}
  - {dataset: BCDMSP, frac: 0.5}
  - {dataset: BCDMSD, frac: 0.5}
  - {dataset: CHORUSNU, frac: 0.5}
  - {dataset: CHORUSNB, frac: 0.5}
  - {dataset: NTVNUDMN, frac: 0.5}
  - {dataset: NTVNBDMN, frac: 0.5}
  - {dataset: HERACOMBNCEM, frac: 0.5}
  - {dataset: HERACOMBNCEP460, frac: 0.5}
  - {dataset: HERACOMBNCEP575, frac: 0.5}
  - {dataset: HERACOMBNCEP820, frac: 0.5}
  - {dataset: HERACOMBNCEP920, frac: 0.5}
  - {dataset: HERACOMBCCEM, frac: 0.5}
  - {dataset: HERACOMBCCEP, frac: 0.5}
  - {dataset: HERAF2CHARM, frac: 0.5}
  - {dataset: CDFZRAP, frac: 1.0}
  - {dataset: D0ZRAP, frac: 1.0}
  - {dataset: D0WEASY, frac: 1.0}
  - {dataset: D0WMASY, frac: 1.0}
  - {dataset: ATLASWZRAP36PB, frac: 1.0}
  - {dataset: ATLASZHIGHMASS49FB, frac: 1.0}
  - {dataset: ATLASLOMASSDY11EXT, frac: 1.0}
  - {dataset: ATLASWZRAP11, frac: 0.5}
  - {dataset: ATLAS1JET11, frac: 0.5}
  - {dataset: ATLASZPT8TEVMDIST, frac: 0.5}
  - {dataset: ATLASZPT8TEVYDIST, frac: 0.5}
  - {dataset: ATLASTTBARTOT, frac: 1.0}
  - {dataset: ATLASTOPDIFF8TEVTRAPNORM, frac: 1.0}
  - {dataset: CMSWEASY840PB, frac: 1.0}
  - {dataset: CMSWMASY47FB, frac: 1.0}
  - {dataset: CMSDY2D11, frac: 0.5}
  - {dataset: CMSWMU8TEV, frac: 1.0}
  - {dataset: CMSZDIFF12, frac: 1.0, cfac: [NRM]}
  - {dataset: CMSJETS11, frac: 0.5}
  - {dataset: CMSTTBARTOT, frac: 1.0}
  - {dataset: CMSTOPDIFF8TEVTTRAPNORM, frac: 1.0}
  - {dataset: LHCBZ940PB, frac: 1.0}
  - {dataset: LHCBZEE2FB, frac: 1.0}
  - {dataset: LHCBWZMU7TEV, frac: 1.0, cfac: [NRM]}
  - {dataset: LHCBWZMU8TEV, frac: 1.0, cfac: [NRM]}

############################################################
datacuts:
  t0pdfset: 190310-tg-nlo-global                    # PDF set to generate t0 covmat
  q2min: 13.96                        # Q2 minimum
  w2min: 12.5                        # W2 minimum
  combocuts: NNPDF31                 # NNPDF3.0 final kin. cuts
  jetptcut_tev: 0                    # jet pt cut for tevatron
  jetptcut_lhc: 0                    # jet pt cut for lhc
  wptcut_lhc: 30.0                   # Minimum pT for W pT diff distributions
  jetycut_tev: 1e30                  # jet rap. cut for tevatron
  jetycut_lhc: 1e30                  # jet rap. cut for lhc
  dymasscut_min: 0                   # dy inv.mass. min cut
  dymasscut_max: 1e30                # dy inv.mass. max cut
  jetcfactcut: 1e30                  # jet cfact. cut
  use_cuts: fromintersection
  cuts_intersection_spec:
  - theoryid: 163
  - theoryid: 53

############################################################
theory:
  theoryid: 163        # database id

theorycovmatconfig:
  point_prescription: "3 point"
  theoryids:
   from_: scale_variation_theories
  fivetheories: None
  pdf: NNPDF31_nlo_as_0118
  use_thcovmat_in_fitting: true
  use_thcovmat_in_sampling: true


############################################################
fitting:
  seed: 65532133530           # set the seed for the random generator
  genrep: on        # on = generate MC replicas, off = use real data
  rngalgo: 0        # 0 = ranlux, 1 = cmrg, see randomgenerator.cc
  fitmethod: NGA    # Minimization algorithm
  ngen: 30000       # Maximum number of generations
  nmutants: 80      # Number of mutants for replica
  paramtype: NN
  nnodes: [2, 5, 3, 1]

  # NN23(QED) = sng=0,g=1,v=2,t3=3,ds=4,sp=5,sm=6,(pht=7)
  # EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
  # EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
  # FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
  fitbasis: NN31IC # EVOL (7), EVOLQED (8), etc.
  basis:
      # remeber to change the name of PDF accordingly with fitbasis
      # pos: on for NN squared
      # mutsize: mutation size
      # mutprob: mutation probability
      # smallx, largex: preprocessing ranges
  - {fl: sng, pos: off, mutsize: [15], mutprob: [0.05], smallx: [1.046, 1.188], largex: [
      1.437, 2.716]}
  - {fl: g, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.9604, 1.23], largex: [
      0.08459, 6.137]}
  - {fl: v, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5656, 0.7242], largex: [
      1.153, 2.838]}
  - {fl: v3, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.1521, 0.5611], largex: [
      1.236, 2.976]}
  - {fl: v8, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5264, 0.7246], largex: [
      0.6919, 3.198]}
  - {fl: t3, pos: off, mutsize: [15], mutprob: [0.05], smallx: [-0.3687, 1.459], largex: [
      1.664, 3.373]}
  - {fl: t8, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5357, 1.267], largex: [
      1.433, 2.866]}
  - {fl: cp, pos: off, mutsize: [15], mutprob: [0.05], smallx: [-0.09635, 1.204],
    largex: [1.654, 7.456]}

############################################################
stopping:
  stopmethod: LOOKBACK  # Stopping method
  lbdelta: 0            # Delta for look-back stopping
  mingen: 0             # Minimum number of generations
  window: 500           # Window for moving average
  minchi2: 3.5          # Minimum chi2
  minchi2exp: 6.0       # Minimum chi2 for experiments
  nsmear: 200           # Smear for stopping
  deltasm: 200          # Delta smear for stopping
  rv: 2                 # Ratio for validation stopping
  rt: 0.5               # Ratio for training stopping
  epsilon: 1e-6         # Gradient epsilon

############################################################
positivity:
  posdatasets:
  - {dataset: POSF2U, poslambda: 1e6}        # Positivity Lagrange Multiplier
  - {dataset: POSF2DW, poslambda: 1e6}
  - {dataset: POSF2S, poslambda: 1e6}
  - {dataset: POSFLL, poslambda: 1e6}
  - {dataset: POSDYU, poslambda: 1e10}
  - {dataset: POSDYD, poslambda: 1e10}
  - {dataset: POSDYS, poslambda: 1e10}

############################################################
closuretest:
  filterseed: 0     # Random seed to be used in filtering data partitions
  fakedata: off     # on = to use FAKEPDF to generate pseudo-data
  fakepdf: MSTW2008nlo68cl      # Theory input for pseudo-data
  errorsize: 1.0    # uncertainties rescaling
  fakenoise: off    # on = to add random fluctuations to pseudo-data
  rancutprob: 1.0   # Fraction of data to be included in the fit
  rancutmethod: 0   # Method to select rancutprob data fraction
  rancuttrnval: off # 0(1) to output training(valiation) chi2 in report
  printpdf4gen: off # To print info on PDFs during minimization

############################################################
lhagrid:
  nx: 150
  xmin: 1e-9
  xmed: 0.1
  xmax: 1.0
  nq: 50
  qmax: 1e5

############################################################
debug: off