.. _th_data_files: ================= Theory data files ================= In the ``nnpdf++`` project, ``FK`` tables (or grids) are used to provide the information required to compute perturbative QCD cross sections in a compact fashion. With the ``FK`` method a typical hadronic observable data point :math:`\mathcal{O}`, is computed as, .. _observable: :math:`\mathcal{O}_d= \sum_{\alpha,\beta}^{N_x}\sum_{i,j}^{N_{\mathrm{pdf}}} \sigma^{(d)}_{\alpha\beta i j}N_i^0(x_\alpha)N_j^0(x_\beta)`. where :math:`\sigma_{\alpha\beta i j}^{(d)}`, the ``FK`` table, is a five index object with two indices in flavour (:math:`i`, :math:`j`), two indices in :math:`x` (:math:`\alpha`, :math:`\beta`) and a data point index :math:`d`. :math:`N^0_i({x_\alpha})` is the :math:`i^{\mathrm{th}}` initial scale PDF in the evolution basis at :math:`x`-grid point :math:`x=x_\alpha`. Each ``FK`` table has an internally specified :math:`x`-grid upon which the PDFs are interpolated. The full 14-PDF evolution basis used in the ``FK`` tables is given by: .. _flavours: :math:`\left\{ \gamma, \Sigma,g,V,V3,V8,V15,V24,V35,T3,T8,T15,T24,T35\right\}`. Additional information may be introduced via correction factors known internally as :math:`C`-factors. These consist of data point by data point multiplicative corrections to the final result of the ``FK`` convolution :math:`\mathcal{O}`. These are provided by ``CFACTOR`` files, typical applications being the application of NNLO and electroweak corrections. For processes which depend non-linearly upon PDFs, such as cross-section ratios or asymmetries, multiple FK tables may be required for one observable. In this case information is provided in the form of a ``COMPOUND`` file which specifies how the results from several ``FK`` tables may be combined to produce the target observable. In this section we shall specify the layout of the ``FK``, ``COMPOUND`` and ``CFACTOR`` files. FK table compression -------------------- It is important to note that the FK table format as described here pertains to the *uncompressed* tables. Typically FK tables as found and read by the NNPDF code are compressed individually with gzip. ``FK`` file format ================== ``FK`` preamble layout ---------------------- The FK preamble is constructed by a set of data segments, of which there are two configurations. The first configuration consists of a list of key-value pairs, and the second is a simple data 'blob' with no requirements as to its formatting. Each segment begins with a delineating line which for key-value pairs is _SegmentName_____________________________________________ and for data blobs is {SegmentName_____________________________________________ The key difference being in the first character, underscore (``_``) for key-value pair segments, and open curly brace (``{``) for data blobs. The name of the segment is specified from the second character, to a terminating underscore (``_``). The line is then typically padded out with underscores up to 60 characters. Following this delineating line, for a key-value segment, the following lines must all be of the format *KEY: VALUE with the first character required to be an asterisk (``*``), then specifying the key, and value for that segment. For blob-type segments, no constraints are placed upon the format, aside from that each line **must not** begin with one of the delineating characters ``{`` or ``_``, as these will trigger the construction of a new segment. While the user may specify additional segments, both key-value pair and blob-type for their own use, there are seven segments required by the code. These are, specified by their segment name: * **GridDesc** [BLOB] This segment provides a 'banner' with a short description for the FK table. The contents of this banner are displayed when the table is read from file. * **VersionInfo** [K-V] A list specifying the versions of the various pieces of code used in the generation of this FK table (minimally libnnpdf and apfel). * **GridInfo** [K-V] This list specified various architectural points of the FK table. The required keys are specified in :ref:`fk_config_variables`. * **TheoryInfo** [K-V] A list of all the theory parameters used in the generation of the table. The required keys are specified in :ref:`th_parameter_definitions`. * **FlavourMap** [BLOB] The segment describes the flavour structure of the grid by means of a flavour map. This map details which flavour channels are active in the grid, using the basis specified :ref:`here`. For DIS processes, an example section would be | {FlavourMap_____________________________________________ | 0 1 1 0 0 0 0 0 0 0 1 0 0 0 which specifies that only the Singlet, gluon and :math:`T_8` channels are populated in the grid. In the case of hadronic FK tables, the full :math:`14\times 14` flavour combination matrix is specified in the same manner. Consider the flavourmap for the CDFR2KT *Dataset*: | {FlavourMap_____________________________________________ | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 1 1 0 0 0 0 0 0 0 0 0 0 0 | 0 1 1 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 This flavourmap contains 9 nonzero entries, demonstrating the importance of only computing those flavour combinations that are relevant to the process. Additionally this map instructs the ``nnpdf++`` convolution code as to which elements of the FastKernel grid should be read, to minimise holding zero entries in memory. * **xGrid** [BLOB] This segment defines the :math:`x`-grid upon which the ``FK`` grid is defined, given as an :math:`N_x` long list of the :math:`x`-grid points. This grid should be optimised to minimise ``FK`` grid zeros in :math:`x`-space. The blob is a simple list of the grid points, here is an example of an :math:`x`-grid with :math:`N_x=5` entries: | {xGrid_____________________________________________ | 0.10000000000000001 | 0.13750000000000001 | 0.17499999999999999 | 0.21250000000000002 | 1.00000000000000000 For examples of complete DIS and hadronic ``FK`` table headers, see :ref:`example_fk_preamble`. ``FK`` grid layout ------------------ To start the section of the file with the ``FK`` grid itself, we begin with a blob-type segment delineator: {FastKernel_____________________________________________ The grid itself is now written out. For hadronic data, the format is line by line as follows: :math:`d \:\: \alpha \:\: \beta \:\: \sigma^d_{\alpha\beta 1 1} \:\: \sigma^d_{\alpha\beta 1 2}\:\: ....\:\: \sigma^d_{\alpha\beta n n}` where :math:`d` is the index of the data point for that line, :math:`\alpha` is the :math:`x`-index of the first PDF, :math:`\beta` is the :math:`x`-index of the second PDF, the :math:`\sigma^d_{\alpha\beta i j}` are the values of the FastKernel grid for data point :math:`d` as in the equation :ref:`here`, and :math:`n=14` is the total number of parton flavours in the grid. Therefore the full :math:`14\times 14` flavour space for one combination of the indices :math:`\{d,\alpha,\beta\}` is written out on each line. These lines should be written out first in :math:`\beta`, then :math:`\alpha` and finally :math:`d` so that the ``FK`` grids are written in blocks of data points. All ``FK`` grid values should be written out in double precision. For DIS data the ``FK`` grids must be written out as :math:`d \:\: \alpha \:\: \sigma^d_{\alpha 1} \:\: \sigma^d_{\alpha 2}\:\: ....\:\: \sigma^d_{\alpha n}` Therefore here all :math:`n=14` values are written out for each combination of :math:`\{d,\alpha\}`. When writing out the grids, note that only :math:`x`-grid points for which there are nonzero ``FK`` entries are written out. For example, there should be no lines such as: :math:`d \:\: \alpha \:\: \beta \:\: 0 \:\: 0 \:\: 0 \:\: .... \:\: 0` However, for those :math:`x`-grid points which do have nonzero :math:`\sigma` contributions, the full set of flavour contributions must be written out regardless of the number of zero entries. This choice was made in order that the nonzero flavour entries may be examined/optimised by hand after the FK table is generated. The ``FK`` file should end on the last entry in the grid, and without empty lines at the end of file. ``CFACTOR`` file format ======================= Additional multiplicative factors to be applied to the output of the ``FK`` convolution may be introduced by the use of ``CFACTOR`` files. These files have a very simple format. They begin with a header providing a description of the :math:`C`-factor information stored in the file. This segment is initialised and terminated by a line beginning with a star (``*``) character and consists of six mandatory fields: * **SetName** - The *Dataset* name. * **Author** - The author of the ``CFACTOR`` file. * **Date** - The date of authorship. * **CodesUsed** - The code or codes used in generating the :math:`C`-factors. * **TheoryInput** - Theory input parameters used in the :math:`C`-factors (e.g :math:`\alpha_S`, scales). * **PDFset** - The PDF set used in the :math:`C`-factors. These fields are formatted as FieldName: FieldEntry and may be accompanied by any additional information, within the star delineated header region. Consider the following as a complete example of the header, | ******************************************* | SetName: D0ZRAP | Author: John Doe john.doe@cern.ch | Date: 2014 | CodesUsed: MCFM 15.01 | TheoryInput: as 0.118, central scale 91.2 GeV | PDFset: NNPDF30\_as\_0118\_nnlo | Warnings: None | Additional Information here | ******************************************* The remainder of the file consists of the :math:`C`-factors themselves, and the error upon the :math:`C`-factors. Each line is now the :math:`C`-factor for each data point, with the whitespace separated uncertainty. For example, for *Dataset* with five points, the data section of a ``CFACTOR`` file may be: | 1.1 0.1 | 1.2 0.12 | 1.3 0.13 | 1.4 0.14 | 1.5 0.15 where the :math:`i^{\text{th}}` line corresponds to the :math:`C`-factor to be applied to the ``FK`` prediction for the :math:`(i-1)^{\text{th}}` data point. The first column denotes the value of the :math:`C`-factor and the second column denotes the uncertainty upon it (in absolute terms, not as a percentage or otherwise relative to the :math:`C`-factor). For a complete example of a ``CFACTOR`` file, please see :ref:`example_cfactor_file`. ``COMPOUND`` file format ======================== Some *Datasets* cover observables that depend non-linearly upon the input PDFs. For example, the NMCPD *Dataset* is a measurement of the ratio of deuteron to proton structure functions. In the ``nnpdf++`` code such sets are denoted *Compound Datasets*. In these cases, a prescription must be given for how the results from FK convolutions, as in this :ref:`equation`, should be combined. The ``COMPOUND`` files are a simple method of providing this information. For each *Compound Dataset* a ``COMPOUND`` file is provided that contains the information on how to build the observable from constituent ``FK`` tables. The following operations are currently implemented: ================================= ========= ================= Operation :math:`(N_{\text{FK}})` Code Output Observable ================================= ========= ================= Null Operation(1) **NULL** :math:`\mathcal{O}_d = \mathcal{O}_d^{(1)}` Sum (2) **ADD** :math:`\mathcal{O}_d = \mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d` Sum (10) **SMT** :math:`\mathcal{O}_d = \sum_{i=1}^{10}\mathcal{O}^{(i)}_d` Normalised Sum (4) **SMN** :math:`\mathcal{O}_d = (\mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d)/(\mathcal{O}^{(3)}_d + \mathcal{O}^{(4)}_d)` Asymmetry (2) **ASY** :math:`\mathcal{O}_d = (\mathcal{O}^{(1)}_d - \mathcal{O}^{(2)}_d)/(\mathcal{O}^{(1)}_d + \mathcal{O}^{(2)}_d)` Combination (20) **COM** :math:`\mathcal{O}_d = \sum_{i=1}^{10}\mathcal{O}^{(i)}_d/\sum_{i=11}^{20}\mathcal{O}^{(i)}_d` Ratio (2) **RATIO** :math:`\mathcal{O}_d = \mathcal{O}^{(1)}_d / \mathcal{O}^{(2)}_d` ================================= ========= ================= Here :math:`N_{\text{FK}}` refers to the number of tables required for each compound operation. :math:`\mathcal{O}_d` is final observable prediction for the :math:`d^{\text{th}}` point in the *Dataset*. :math:`\mathcal{O}_d^{(i)}` refers to the observable prediction for the :math:`d^{\text{th}}` point arising from the :math:`i^{\text{th}}` ``FK`` table calculation. Note that here the ordering in :math:`i` is important. The ``COMPOUND`` file layout is as so. The first line is once again a general comment line and is not used by the code, and therefore has no particular requirements other than its presence. Following this line should come a list of the ``FK`` tables required for the calculation. This must be given as the table's filename *without* its path, preceded by the string '**FK:**'. For example, | FK: FK_SETNAME_1.dat | FK: FK_SETNAME_2.dat The ordering of the list is once again important, and must match the above table. For example, the observables :math:`\mathcal{O}^{(i)}` arise from the computation with the :math:`i^{\text{th}}` element of this list. The final line specified the operation to be performed upon the list of tables, and must take the form OP: **[CODE]** where the **[CODE]** is given in the above table. Here is an example of a complete ``COMPOUND`` file | # COMPOUND FK | FK: FK\_NUMERATOR.dat | FK: FK\_DENOMINATOR.dat | OP: RATIO