Theory data files
In the nnpdf++
project, FK
tables (or grids) are used to provide the
information required to compute perturbative QCD cross sections in a compact fashion. With
the FK
method a typical hadronic observable data point \(\mathcal{O}\), is
computed as,
\(\mathcal{O}_d= \sum_{\alpha,\beta}^{N_x}\sum_{i,j}^{N_{\mathrm{pdf}}} \sigma^{(d)}_{\alpha\beta i j}N_i^0(x_\alpha)N_j^0(x_\beta)\).
where \(\sigma_{\alpha\beta i j}^{(d)}\), the FK
table, is a five index
object with two indices in flavour (\(i\), \(j\)), two indices in \(x\) (\(\alpha\),
\(\beta\)) and a data point index \(d\). \(N^0_i({x_\alpha})\) is the \(i^{\mathrm{th}}\)
initial scale PDF in the evolution basis at \(x\)-grid point \(x=x_\alpha\). Each
FK
table has an internally specified \(x\)-grid upon which the PDFs are
interpolated. The full 14-PDF evolution basis used in the FK
tables is
given by:
\(\left\{ \gamma, \Sigma,g,V,V3,V8,V15,V24,V35,T3,T8,T15,T24,T35\right\}\).
Additional information may be introduced via correction factors known internally
as \(C\)-factors. These consist of data point by data point multiplicative
corrections to the final result of the FK
convolution \(\mathcal{O}\). These
are provided by CFACTOR
files, typical applications being the application
of NNLO and electroweak corrections. For processes which depend non-linearly
upon PDFs, such as cross-section ratios or asymmetries, multiple FK tables may
be required for one observable. In this case information is provided in the form
of a COMPOUND
file which specifies how the results from several FK
tables may be combined to produce the target observable. In this section we
shall specify the layout of the FK
, COMPOUND
and CFACTOR
files.
FK table compression
It is important to note that the FK table format as described here pertains to the uncompressed tables. Typically FK tables as found and read by the NNPDF code are compressed individually with gzip.
FK
file format
FK
preamble layout
The FK preamble is constructed by a set of data segments, of which there are two configurations. The first configuration consists of a list of key-value pairs, and the second is a simple data ‘blob’ with no requirements as to its formatting. Each segment begins with a delineating line which for key-value pairs is
_SegmentName_____________________________________________
and for data blobs is
{SegmentName_____________________________________________
The key difference being in the first character, underscore (_
) for
key-value pair segments, and open curly brace ({
) for data blobs. The name of
the segment is specified from the second character, to a terminating
underscore (_
). The line is then typically padded out with underscores up
to 60 characters. Following this delineating line, for a key-value segment, the
following lines must all be of the format
*KEY: VALUE
with the first character required to be an asterisk (*
), then specifying the
key, and value for that segment. For blob-type segments, no constraints are
placed upon the format, aside from that each line must not begin with
one of the delineating characters {
or _
, as these will trigger the
construction of a new segment.
While the user may specify additional segments, both key-value pair and blob-type for their own use, there are seven segments required by the code. These are, specified by their segment name:
GridDesc [BLOB]
This segment provides a ‘banner’ with a short description for the FK table. The contents of this banner are displayed when the table is read from file.
VersionInfo [K-V]
A list specifying the versions of the various pieces of code used in the generation of this FK table (minimally libnnpdf and apfel).
GridInfo [K-V]
This list specified various architectural points of the FK table. The required keys are specified in fk_config_variables.
TheoryInfo [K-V]
A list of all the theory parameters used in the generation of the table. The required keys are specified in Theory parameter definitions.
FlavourMap [BLOB]
The segment describes the flavour structure of the grid by means of a flavour map. This map details which flavour channels are active in the grid, using the basis specified here. For DIS processes, an example section would be
{FlavourMap_____________________________________________0 1 1 0 0 0 0 0 0 0 1 0 0 0which specifies that only the Singlet, gluon and \(T_8\) channels are populated in the grid. In the case of hadronic FK tables, the full \(14\times 14\) flavour combination matrix is specified in the same manner. Consider the flavourmap for the CDFR2KT Dataset:
{FlavourMap_____________________________________________0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 0 0 0 0 0 0 0 0 0 0 00 1 1 0 0 0 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 0 0 0 00 0 0 0 0 0 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0This flavourmap contains 9 nonzero entries, demonstrating the importance of only computing those flavour combinations that are relevant to the process. Additionally this map instructs the
nnpdf++
convolution code as to which elements of the FastKernel grid should be read, to minimise holding zero entries in memory.xGrid [BLOB]
This segment defines the \(x\)-grid upon which the
FK
grid is defined, given as an \(N_x\) long list of the \(x\)-grid points. This grid should be optimised to minimiseFK
grid zeros in \(x\)-space. The blob is a simple list of the grid points, here is an example of an \(x\)-grid with \(N_x=5\) entries:{xGrid_____________________________________________0.100000000000000010.137500000000000010.174999999999999990.212500000000000021.00000000000000000
For examples of complete DIS and hadronic FK
table headers, see
example_fk_preamble.
FK
grid layout
To start the section of the file with the FK
grid itself, we begin with a
blob-type segment delineator:
{FastKernel_____________________________________________
The grid itself is now written out. For hadronic data, the format is line by line as follows:
\(d \:\: \alpha \:\: \beta \:\: \sigma^d_{\alpha\beta 1 1} \:\: \sigma^d_{\alpha\beta 1 2}\:\: ....\:\: \sigma^d_{\alpha\beta n n}\)
where \(d\) is the index of the data point for that line, \(\alpha\) is the \(x\)-index
of the first PDF, \(\beta\) is the \(x\)-index of the second PDF, the
\(\sigma^d_{\alpha\beta i j}\) are the values of the FastKernel grid for data
point \(d\) as in the equation here, and \(n=14\) is the total number of parton
flavours in the grid. Therefore the full \(14\times 14\) flavour space for one
combination of the indices \(\{d,\alpha,\beta\}\) is written out on each line.
These lines should be written out first in \(\beta\), then \(\alpha\) and finally
\(d\) so that the FK
grids are written in blocks of data points. All FK
grid values should be written out in double precision. For DIS data the FK
grids must be written out as
\(d \:\: \alpha \:\: \sigma^d_{\alpha 1} \:\: \sigma^d_{\alpha 2}\:\: ....\:\: \sigma^d_{\alpha n}\)
Therefore here all \(n=14\) values are written out for each combination of \(\{d,\alpha\}\).
When writing out the grids, note that only \(x\)-grid points for which there are
nonzero FK
entries are written out. For example, there should be no lines
such as:
\(d \:\: \alpha \:\: \beta \:\: 0 \:\: 0 \:\: 0 \:\: .... \:\: 0\)
However, for those \(x\)-grid points which do have nonzero \(\sigma\) contributions, the full set of flavour contributions must be written out regardless of the number of zero entries. This choice was made in order that the nonzero flavour entries may be examined/optimised by hand after the FK table is generated.
The FK
file should end on the last entry in the grid, and without empty
lines at the end of file.
CFACTOR
file format
Additional multiplicative factors to be applied to the output of the FK
convolution may be introduced by the use of CFACTOR
files. These files
have a very simple format. They begin with a header providing a description of
the \(C\)-factor information stored in the file. This segment is initialised and
terminated by a line beginning with a star (*
) character and consists of
six mandatory fields:
SetName - The Dataset name.
Author - The author of the
CFACTOR
file.Date - The date of authorship.
CodesUsed - The code or codes used in generating the \(C\)-factors.
TheoryInput - Theory input parameters used in the \(C\)-factors (e.g \(\alpha_S\), scales).
PDFset - The PDF set used in the \(C\)-factors.
These fields are formatted as
FieldName: FieldEntry
and may be accompanied by any additional information, within the star delineated header region. Consider the following as a complete example of the header,
***************************************SetName: D0ZRAPAuthor: John Doe john.doe@cern.chDate: 2014CodesUsed: MCFM 15.01TheoryInput: as 0.118, central scale 91.2 GeVPDFset: NNPDF30_as_0118_nnloWarnings: NoneAdditional Information here***************************************
The remainder of the file consists of the \(C\)-factors themselves, and the error
upon the \(C\)-factors. Each line is now the \(C\)-factor for each data point, with
the whitespace separated uncertainty. For example, for Dataset with five
points, the data section of a CFACTOR
file may be:
1.1 0.11.2 0.121.3 0.131.4 0.141.5 0.15
where the \(i^{\text{th}}\) line corresponds to the \(C\)-factor to be applied to
the FK
prediction for the \((i-1)^{\text{th}}\) data point. The first column
denotes the value of the \(C\)-factor and the second column denotes the
uncertainty upon it (in absolute terms, not as a percentage or otherwise
relative to the \(C\)-factor). For a complete example of a CFACTOR
file,
please see Example: CFACTOR file format.
COMPOUND
file format
Some Datasets cover observables that depend non-linearly upon the input
PDFs. For example, the NMCPD Dataset is a measurement of the ratio of
deuteron to proton structure functions. In the nnpdf++
code such sets are
denoted Compound Datasets. In these cases, a prescription must be given for how the
results from FK convolutions, as in this equation, should be combined.
The COMPOUND
files are a simple method of providing this information. For
each Compound Dataset a COMPOUND
file is provided that contains the
information on how to build the observable from constituent FK
tables. The
following operations are currently implemented:
Operation \((N_{\text{FK}})\) |
Code |
Output Observable |
---|---|---|
Null Operation(1) |
NULL |
\(\mathcal{O}_d = \mathcal{O}_d^{(1)}\) |
Sum (2) |
ADD |
\(\mathcal{O}_d = \mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d\) |
Sum (10) |
SMT |
\(\mathcal{O}_d = \sum_{i=1}^{10}\mathcal{O}^{(i)}_d\) |
Normalised Sum (4) |
SMN |
\(\mathcal{O}_d = (\mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d)/(\mathcal{O}^{(3)}_d + \mathcal{O}^{(4)}_d)\) |
Asymmetry (2) |
ASY |
\(\mathcal{O}_d = (\mathcal{O}^{(1)}_d - \mathcal{O}^{(2)}_d)/(\mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d)\) |
Combination (20) |
COM |
\(\mathcal{O}_d = \sum_{i=1}^{10}\mathcal{O}^{(i)}_d/\sum_{i=11}^{20}\mathcal{O}^{(i)}_d\) |
Ratio (2) |
RATIO |
\(\mathcal{O}_d = \mathcal{O}^{(1)}_d / \mathcal{O}^{(2)}_d\) |
Here \(N_{\text{FK}}\) refers to the number of tables required for each
compound operation. \(\mathcal{O}_d\) is final observable prediction for the
\(d^{\text{th}}\) point in the Dataset. \(\mathcal{O}_d^{(i)}\) refers to the
observable prediction for the \(d^{\text{th}}\) point arising from the
\(i^{\text{th}}\) FK
table calculation. Note that here the ordering in \(i\)
is important.
The COMPOUND
file layout is as so. The first line is once again a general
comment line and is not used by the code, and therefore has no particular
requirements other than its presence. Following this line should come a list of
the FK
tables required for the calculation. This must be given as the
table’s filename without its path, preceded by the string ‘FK:’. For example,
FK: FK_SETNAME_1.datFK: FK_SETNAME_2.dat
The ordering of the list is once again important, and must match the above table. For example, the observables \(\mathcal{O}^{(i)}\) arise from the computation with the \(i^{\text{th}}\) element of this list. The final line specified the operation to be performed upon the list of tables, and must take the form
OP: [CODE]
where the [CODE] is given in the above table. Here is an example of a
complete COMPOUND
file
# COMPOUND FKFK: FK_NUMERATOR.datFK: FK_DENOMINATOR.datOP: RATIO