validphys package

Subpackages

Submodules

validphys.api module

api.py

This module contains the reportengine programmatic API, initialized with the validphys providers, Config and Environment.

Example:

Simple Usage:

>> from validphys.api import API >> fig = API.plot_pdfs(pdf=”NNPDF_nlo_as_0118”, Q=100) >> fig.show()

validphys.app module

app.py

Mainloop of the validphys application. Here we define tailoted extensions to the reporthengine application (such as extra command line flags). Additionally the provider modules that serve as source to the validphys actions are declared here.

The entry point of the validphys application is the main funcion of this module.

class validphys.app.App(name='validphys', providers=['validphys.results', 'validphys.commondata', 'validphys.pdfgrids', 'validphys.pdfplots', 'validphys.dataplots', 'validphys.fitdata', 'validphys.arclength', 'validphys.sumrules', 'validphys.reweighting', 'validphys.kinematics', 'validphys.correlations', 'validphys.chi2grids', 'validphys.eff_exponents', 'validphys.asy_exponents', 'validphys.paramfits.dataops', 'validphys.paramfits.plots', 'validphys.theorycovariance.construction', 'validphys.theorycovariance.output', 'validphys.theorycovariance.tests', 'validphys.replica_selector', 'validphys.closuretest', 'validphys.mc_gen', 'validphys.theoryinfo', 'validphys.pseudodata', 'validphys.renametools', 'validphys.covmats', 'validphys.hyperoptplot', 'validphys.deltachi2', 'validphys.n3fit_data', 'validphys.mc2hessian', 'reportengine.report', 'validphys.overfit_metric'])[source]

Bases: App

property argparser
config_class

alias of Config

critical_message = 'A critical error ocurred. This is likely due to one of the following reasons:\n\n - A bug in validphys.\n - Corruption of the provided resources (e.g. incorrect plotting files).\n - Cosmic rays hitting your CPU and altering the registers.\n\nThe traceback above should help determine the cause of the problem. If you\nbelieve this is a bug in validphys (please discard the cosmic rays first),\nplease open an issue on GitHub<https://github.com/NNPDF/nnpdf/issues>,\nincluding the contents of the following file:\n\n%s\n'
property default_style
environment_class

alias of Environment

init()[source]
run()[source]
static upload_context(do_upload, output)[source]

If do_upload is False, do notihing. Otherwise, on enter, check the requiements for uploading and on exit, upload the output path if do_upload is True. Otherwise do nothing. Raise SystemExit on error.

validphys.app.main()[source]

validphys.arclength module

arclength.py

Module for the computation and presentation of arclengths.

class validphys.arclength.ArcLengthGrid(pdf, basis, flavours, stats)

Bases: tuple

basis

Alias for field number 1

flavours

Alias for field number 2

pdf

Alias for field number 0

stats

Alias for field number 3

validphys.arclength.arc_length_table(arc_lengths)[source]

Return a table with the descriptive statistics of the arc lengths over members of the PDF.

validphys.arclength.arc_lengths(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Compute arc lengths at scale Q

set up a grid with three segments and compute the arclength for each segment. Note: the variation of the PDF over the grid is computed by computing the forward differences between adjacent grid points.

Parameters:
  • pdf (validphys.core.PDF object) –

  • Q (float) – scale at which to evaluate PDF

  • basis (default = "flavour") –

  • flavours (default = None) –

Returns:

  • validphys.arclength.ArcLengthGrid object

  • object that contains the PDF, basis, flavours, and computed

  • arc length statistics.

validphys.arclength.integrability_number(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'evolution', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return sum_i |x_i*f(x_i)|, x_i = {1e-9, 1e-8, 1e-7} for selected flavours

validphys.arclength.plot_arc_lengths(pdfs_arc_lengths: ~collections.abc.Sequence, Q: ~numbers.Real, normalize_to: (<class 'NoneType'>, <class 'int'>) = None)[source]

Plot the arc lengths of provided pdfs

validphys.asy_exponents module

Tools for computing and plotting asymptotic exponents.

class validphys.asy_exponents.AsyExponentBandPlotter(exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter

Class inheriting from BandPDFPlotter, changing title and ylabel to reflect the asymptotic exponent being plotted.

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
validphys.asy_exponents.alpha_asy(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent alpha, as defined by the first relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.asymptotic_exponents_table(pdf: ~validphys.core.PDF, *, x_alpha: ~numbers.Real = 1e-06, x_beta: ~numbers.Real = 0.9, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, npoints=100)[source]

Returns a table with the values of the asymptotic exponents alpha and beta, as defined in Eq. (4) of [arXiv:1604.00024], at the specified value of x and Q.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.beta_asy(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent beta, as defined by the second relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.plot_alpha_asy(pdfs, alpha_asy_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plots the alpha asymptotic exponent

validphys.asy_exponents.plot_beta_asy(pdfs, beta_asy_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plots the beta asymptotic exponent

validphys.calcutils module

calcutils.py

Low level utilities to calculate χ² and such. These are used to implement the higher level functions in results.py

validphys.calcutils.all_chi2(results)[source]

Return the chi² for all elements in the result, regardless of the Stats class Note that the interpretation of the result will depend on the PDF error type

validphys.calcutils.all_chi2_theory(results, totcov)[source]

Like all_chi2 but here the chi² are calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.bootstrap_values(data, nresamples, *, boot_seed: int | None = None, apply_func: Callable | None = None, args=None)[source]

General bootstrap sample

data is the data which is to be sampled, replicas is assumed to be on the final axis e.g N_bins*N_replicas

boot_seed can be specified if the user wishes to be able to take exact same bootstrap samples multiple times, as default it is set as None, in which case a random seed is used.

If just data and nresamples is provided, then bootstrap_values creates N resamples of the data, where each resample is a Monte Carlo selection of the data across replicas. The mean of each resample is returned

Alternatively, the user can specify a function to be sampled apply_func plus any additional arguments required by that function. bootstrap_values then returns apply_func(bootstrap_data, *args) where bootstrap_data.shape = (data.shape, nresamples). It is critical that apply_func can handle data input in this format.

validphys.calcutils.calc_chi2(sqrtcov, diffs)[source]

Elementary function to compute the chi², given a Cholesky decomposed lower triangular part and a vector of differences.

Parameters:
  • sqrtcov (matrix) – A lower tringular matrix corresponding to the lower part of the Cholesky decomposition of the covariance matrix.

  • diffs (array) – A vector of differences (e.g. between data and theory). The first dimenssion must match the shape of sqrtcov. The computation will be broadcast over the other dimensions.

Returns:

chi2 – The result of the χ² for each vector of differences. Will have the same shape as diffs.shape[1:].

Return type:

array

Notes

This function computes the χ² more efficiently and accurately than following the direct definition of inverting the covariance matrix, \(\chi^2 = d\Sigma^{-1}d\), by solving the triangular linear system instead.

Examples

>>> from validphys.calcutils import calc_chi2
>>> import numpy as np
>>> import scipy.linalg as la
>>> np.random.seed(0)
>>> diffs = np.random.rand(10)
>>> s = np.random.rand(10,10)
>>> cov = s@s.T
>>> calc_chi2(la.cholesky(cov, lower=True), diffs)
44.64401691354948
>>> diffs@la.inv(cov)@diffs
44.64401691354948
validphys.calcutils.calc_phi(sqrtcov, diffs)[source]

Low level function which calculates phi given a Cholesky decomposed lower triangular part and a vector of differences. Primarily used when phi is to be calculated independently from chi2.

The vector of differences diffs is expected to have N_bins on the first axis

validphys.calcutils.central_chi2(results)[source]

Calculate the chi² from the central value of the theory prediction to the data

validphys.calcutils.central_chi2_theory(results, totcov)[source]

Like central_chi2 but here the chi² is calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.get_df_block(matrix: DataFrame, key: str, level)[source]

Given a pandas dataframe whose index and column keys match, and data represents a symmetric matrix returns a diagonal block of this matrix corresponding to matrix`[key, key`] as a numpy array

addtitionally, the user can specify the level of the key for which the cross section is being taken, by default it is set to 1 which corresponds to the dataset level of a theory covariance matrix

validphys.calcutils.regularize_covmat(covmat: array, norm_threshold=4)[source]

Given a covariance matrix, performs a regularization which is equivalent to performing regularize_l2 on the sqrt of covmat: the l2 norm of the inverse of the correlation matrix calculated from covmat is set to be less than or equal to norm_threshold. If the input covmat already fulfills this criterion it is returned.

Parameters:
  • covmat (array) – a covariance matrix which is to be regularized.

  • norm_threshold (float) – The acceptable l2 norm of the sqrt correlation matrix, by default set to 4.

Returns:

new_covmat – A new covariance matrix which has been regularized according to prescription above.

Return type:

array

validphys.calcutils.regularize_l2(sqrtcov, norm_threshold=4)[source]

Return a regularized version of sqrtcov.

Given sqrtcov an (N, nsys) matrix, such that it’s gram matrix is the covariance matrix (covmat = sqrtcov@sqrtcov.T), first decompose it like sqrtcov = D@A, where D is a positive diagonal matrix of standard deviations and A is the “square root” of the correlation matrix, corrmat = A@A.T. Then produce a new version of A which removes the unstable behaviour and assemble a new square root covariance matrix, which is returned.

The stability condition is controlled by norm_threshold. It is

\[\left\Vert A^+ \right\Vert_{L2} \leq \frac{1}{\text{norm_threshold}}\]

A+ is the pseudoinverse of A, norm_threshold roughly corresponds to the sqrt of the maximimum relative uncertainty in any systematic.

Parameters:
  • sqrtcov (2d array) – An (N, nsys) matrix specifying the uncertainties.

  • norm_threshold (float) – The tolerance for the regularization.

Returns:

newsqrtcov – A regularized version of sqrtcov.

Return type:

2d array

validphys.checks module

Created on Thu Jun 2 19:35:40 2016

@author: Zahari Kassabov

validphys.checks.check_at_least_two_replicas(pdf)[source]
validphys.checks.check_can_save_grid(ns, **kwags)[source]
validphys.checks.check_cuts_considered(use_cuts)[source]
validphys.checks.check_cuts_fromfit(use_cuts)[source]
validphys.checks.check_darwin_single_process(NPROC)[source]

Check that if we are on macOS (platform is Darwin), NPROC is equal to 1. This is related to the infamous issues with multiprocessing on macOS.

The “solution” is to run the code sequentially if NPROC is 1 and enforce that macOS users don’t set NPROC as anything else.

TODO: Once pseudodata is generated in python, try using spawn instead of fork with multiprocessing.

Notes

for the specific NNPDF issue: https://github.com/NNPDF/nnpdf/issues/931

General discussion: https://wefearchange.org/2018/11/forkmacos.rst.html

validphys.checks.check_data_cuts_match_theorycovmat(data, fitthcovmat)[source]
validphys.checks.check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat)[source]
validphys.checks.check_dataspecs_fits_different(dataspecs_fit)[source]

Need this check because oterwise the pandas object gets confused

validphys.checks.check_fits_different(fits)[source]

Need this check because oterwise the pandas object gets confused

validphys.checks.check_has_fitted_replicas(ns, **kwargs)[source]
validphys.checks.check_have_two_pdfs(pdfs)[source]
validphys.checks.check_know_errors(ns, **kwargs)[source]
validphys.checks.check_mixband_as_replicas(pdfs, mixband_as_replicas)[source]

Same as check_pdfs_noband, but for the mixband_as_replicas key. Allows mixband_as_replicas to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_norm_threshold(norm_threshold)[source]

Check norm_threshold is not None

validphys.checks.check_not_using_pdferr(use_pdferr=False, **kwargs)[source]
validphys.checks.check_pdf_is_montecarlo(ns, **kwargs)[source]
validphys.checks.check_pdf_is_montecarlo_or_hessian(pdf, **kwargs)[source]
validphys.checks.check_pdf_normalize_to(pdfs, normalize_to)[source]

Transforn normalize_to into an index.

validphys.checks.check_pdfs_noband(pdfs, pdfs_noband)[source]

Allows pdfs_noband to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_scale(scalename, allow_none=False)[source]

Check that we have a valid matplotlib scale. With allow_none=True, also None is valid.

validphys.checks.check_speclabels_different(dataspecs_speclabel)[source]

This is needed for grouping dataframes (and because generally indecated a bug)

validphys.checks.check_two_dataspecs(dataspecs)[source]
validphys.checks.check_use_t0(ns, **kwargs)[source]

Checks use_t0 is set to true

validphys.checks.check_using_theory_covmat(use_theorycovmat)[source]

Check that the use_theorycovmat is set to True

validphys.checks.check_xlimits(xmax, xmin)[source]

validphys.chi2grids module

chi2grids.py

Compute and store χ² data from replicas, possibly keeping the correlations between pseudorreplica fluctuations between different fits. This is applied here to parameter determinations such as those of αs.

validphys.chi2grids.PseudoReplicaExpChi2Data

alias of PseudoReplicaChi2Data

validphys.chi2grids.computed_pseudoreplicas_chi2(fitted_make_replicas, group_result_table_no_table, groups_sqrtcovmat)[source]

Return a dataframe with the chi² of each replica with its corresponding pseudodata (i.e. the one it was fitted with). The chi² is computed by group. The index of the output dataframe is

['group',  'ndata' , 'nnfit_index']

where nnftix_index is the name of the corresponding replica

validphys.chi2grids.export_fits_computed_pseudoreplicas_chi2(fits_computed_pseudoreplicas_chi2)[source]

Hack to force writting the CSV output

validphys.commondata module

commondata.py

Module containing actions which return loaded commondata, leverages utils found in validphys.commondataparser, and returns objects from validphys.coredata

validphys.commondata.loaded_commondata_with_cuts(commondata, cuts)[source]

Load the commondata and apply cuts.

Parameters:
  • commondata (validphys.core.CommonDataSpec) – commondata to load and cut.

  • cuts (validphys.core.cuts, None) – valid cuts, used to cut loaded commondata.

Returns:

loaded_cut_commondata

Return type:

validphys.coredata.CommonData

validphys.commondataparser module

This module implements parsers for commondata and systype files into useful datastructures, contained in the validphys.coredata module.

The validphys commondata structure is an instance of validphys.coredata.CommonData

class validphys.commondataparser.CommonDataMetadata(name: str, nsys: int, ndata: int, process_type: str)[source]

Bases: object

Contains metadata information about the data being read

name: str
ndata: int
nsys: int
process_type: str
validphys.commondataparser.get_kinlabel_key(process_label)[source]

Since there is no 1:1 correspondence between latex keys and the old libNNPDF names we match the longest key such that the proc label starts with it.

validphys.commondataparser.get_plot_kinlabels(commondata)[source]

Return the LaTex kinematic labels for a given Commondata

validphys.commondataparser.load_commondata(spec)[source]

Load the data corresponding to a CommonDataSpec object. Returns an instance of CommonData

validphys.commondataparser.parse_commondata(commondatafile, systypefile, setname)[source]

Parse a commondata file and a systype file into a CommonData.

Parameters:
  • commondatafile (file or path to file) –

  • systypefile (file or path to file) –

Returns:

commondata – An object containing the data and information from the commondata and systype files.

Return type:

CommonData

validphys.commondataparser.parse_systypes(systypefile)[source]

Parses a systype file and returns a pandas dataframe.

validphys.commondataparser.peek_commondata_metadata(commondatafilename)[source]

Read some of the properties of the commondata object as a CommonData Metadata

validphys.commondatawriter module

This module contains functions to write commondata and systypes tables to files

validphys.commondatawriter.write_commondata_data(commondata, buffer)[source]

write commondata table to buffer, this can be a memory map, compressed archive or strings (using for instance StringIO)

Parameters:

Example

>>> from validphys.loader import Loader
>>> from io import StringIO
>>> l = Loader()
>>> cd = l.check_commondata("NMC").load_commondata_instance()
>>> sio = StringIO()
>>> write_commondata_data(cd,sio)
>>> print(sio.getvalue())
validphys.commondatawriter.write_commondata_to_file(commondata, path)[source]

write commondata table to file

validphys.commondatawriter.write_systype_data(commondata, buffer)[source]

write systype table to buffer, this can be a memory map, compressed archive or strings (using for instance StringIO)

Parameters:

Example

>>> from validphys.loader import Loader
>>> from io import StringIO
>>> l = Loader()
>>> cd = l.check_commondata("NMC").load_commondata_instance()
>>> sio = StringIO()
>>> write_systype_data(cd,sio)
>>> print(sio.getvalue())
validphys.commondatawriter.write_systype_to_file(commondata, path)[source]

write systype table to file

validphys.config module

class validphys.config.Config(input_params, environment=None)[source]

Bases: Config, CoreConfig, ParamfitsConfig

The effective configuration parser class.

class validphys.config.CoreConfig(input_params, environment=None)[source]

Bases: Config

load_default_data_grouping(spec)[source]

Load the default grouping of data

load_default_default_filter_rules(spec)[source]
load_default_default_filter_settings(spec)[source]
property loader
parse_added_filter_rules(rules: (<class 'list'>, <class 'NoneType'>) = None)[source]
parse_additional_errors(bool)[source]

PDF set used to generate the photon additional errors: they are constructed using the replicas 101-107 of the PDF set LUXqed17_plus_PDF4LHC15_nnlo_100 (that are obtained varying some parameters of the LuxQED approach) in the way described in sec. 2.5 of https://arxiv.org/pdf/1712.07053.pdf

parse_cut_similarity_threshold(th: Real)[source]

Maximum relative ratio when using fromsimilarpredictons cuts.

parse_data_grouping(key)[source]

a key which indicates which default grouping to use. Mainly for internal use. It allows the default grouping of experiment to be applied to runcards which don’t specify metadata_group without there being a namespace conflict in the lockfile

parse_dataset_input(dataset: Mapping)[source]

The mapping that corresponds to the dataset specifications in the fit files

parse_dataset_inputs(param: list)

A list of dataset_input objects.

parse_default_filter_rules(spec: (<class 'str'>, <class 'NoneType'>))[source]
parse_default_filter_rules_recorded_spec_(spec)[source]

This function is a hacky fix for parsing the recorded spec of filter rules. The reason we need this function is that without it reportengine detects a conflict in the dataset key.

parse_default_filter_settings(spec: (<class 'str'>, <class 'NoneType'>))[source]
parse_experiment(experiment: dict)[source]

A set of datasets where correlated systematics are taken into account. It is a mapping where the keys are the experiment name ‘experiment’ and a list of datasets.

parse_experiment_input(ei: dict)[source]

The mapping that corresponds to the experiment specification in the fit config files. Currently, this needs to be combined with experiment_from_input to yield an useful result.

parse_experiment_inputs(param: list)

A list of experiment_input objects.

parse_experiments(param: list)

A list of experiment objects.

parse_fakepdf(name)[source]

PDF set used to generate the fake data in a closure test.

parse_filter_defaults(filter_defaults: (<class 'dict'>, <class 'NoneType'>))[source]

A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min and w2min.

parse_filter_rules(filter_rules: (<class 'list'>, <class 'NoneType'>))[source]

A list of filter rules. See https://docs.nnpdf.science/vp/filters.html for details on the syntax

parse_fit(item)[source]

A fit in the results folder, containing at least a valid filter result. Either just an id (str), or a mapping with ‘id’ and ‘label’.

parse_fitdeclaration(label: str)[source]

Used to guess some informtion from the fit name, without having to download it. This is meant to be used with other providers like e.g.:

{@with fits_as_from_fitdeclarations::fits_name_from_fitdeclarations@} {@ …do stuff… @} {@endwith@}

parse_fitdeclarations(param: list)

A list of fitdeclaration objects.

parse_fits(param: list)

A list of fit objects.

parse_groupby(grouping: str)[source]

parses the groupby key and checks it is an allowed grouping

parse_hyperscan(hyperscan)[source]

A hyperscan in the hyperscan_results folder, containing at least one tries.json file

parse_hyperscan_config(hyperscan_config, hyperopt=None)[source]

Configuration of the hyperscan

parse_hyperscans(param: list)

A list of hyperscan objects.

parse_integdataset(integset: dict, *, theoryid)[source]

An observable corresponding to a PDF in the evolution basis, used as integrability constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_integdatasets(param: list, *, theoryid)

A list of integdataset objects.

parse_lumi_channel(ch: str)[source]
parse_lumi_channels(param: list)

A list of lumi_channel objects.

parse_luxset(name)[source]

PDF set used to generate the photon with fiatlux.

parse_metadata_group(group: str)[source]

User specified key to group data by. The key must exist in the PLOTTING file for example experiment

parse_norm_threshold(val: (<class 'numbers.Number'>, <class 'NoneType'>))[source]

The threshold to use for covariance matrix normalisation, sets the maximum l2 norm of the inverse covariance matrix, by clipping smallest eigenvalues

If norm_threshold is set to None, then no covmat regularization is performed

parse_pdf(item)[source]

A PDF set installed in LHAPDF. Either just an id (str), or a mapping with ‘id’ and ‘label’.

parse_pdfs(param: list)

A list of pdf objects.

parse_posdataset(posset: dict, *, theoryid)[source]

An observable used as positivity constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_posdatasets(param: list, *, theoryid)

A list of posdataset objects.

parse_reweighting_experiments(experiments, *, theoryid, use_cuts, fit=None)[source]

A list of experiments to be used for reweighting.

parse_speclabel(label: (<class 'str'>, <class 'NoneType'>))[source]

A label for a dataspec. To be used in some plots

parse_t0pdfset(name)[source]

PDF set used to generate the t0 covmat.

parse_theoryid(item)[source]

A number corresponding to the database theory ID where the corresponding theory folder is installed in te data directory. Either just an id (str or int), or a mapping with ‘id’ and ‘label’.

parse_theoryids(param: list)

A list of theoryid objects.

parse_use_cuts(use_cuts: (<class 'bool'>, <class 'str'>))[source]

Whether to filter the points based on the cuts applied in the fit, or the whole data in the dataset. The possible options are:

  • internal: Calculate the cuts based on the existing rules. This is the default.

  • fromfit: Read the cuts stored in the fit.

  • nocuts: Use the whole dataset.

parse_use_fitcommondata(do_use: bool)[source]

Use the commondata files in the fit instead of those in the data directory.

parse_use_t0(do_use_t0: bool)[source]

Whether to use the t0 PDF set to generate covariance matrices.

produce_all_commondata()[source]

produces all commondata using the loader function

produce_all_lumi_channels()[source]
produce_basisfromfit(fit)[source]

Set the basis from fit config. In the fit config file the basis is set using the key fitbasis, but it is exposed to validphys as basis.

The name of this production rule is intentionally set to not conflict with the existing fitbasis runcard key.

produce_combined_shift_and_theory_dataspecs(dataspecs)[source]
produce_commondata(*, dataset_input, use_fitcommondata=False, fit=None)[source]

Produce a CommondataSpec from a dataset input

produce_covariance_matrix(use_pdferr: bool = False)[source]

Modifies which action is used as covariance_matrix depending on the flag use_pdferr

produce_covmat_t0_considered(use_t0: bool = False)[source]

Modifies which action is used as covariance_matrix depending on the flag use_t0

produce_cuts(*, commondata, use_cuts)[source]

Obtain cuts for a given dataset input, based on the appropriate policy.

produce_data(data_input, *, group_name='data')[source]

A set of datasets where correlated systematics are taken into account

produce_data_input()[source]

Produce the data_input which is a flat list of dataset_input s. This production rule handles the backwards compatibility with old datasets which specify experiments in the runcard.

produce_dataset(*, dataset_input, theoryid, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]

Dataset specification from the theory and CommonData. Use the cuts from the fit, if provided. If check_plotting is set to True, attempt to lod and check the PLOTTING files (note this may cause a noticeable slowdown in general).

produce_dataset_inputs_covariance_matrix(use_pdferr: bool = False)[source]

Modifies which action is used as experiment_covariance_matrix depending on the flag use_pdferr

produce_dataset_inputs_covmat_t0_considered(use_t0: bool = False)[source]

Modifies which action is used as experiment_covariance_matrix depending on the flag use_t0

produce_dataset_inputs_fitting_covmat(theory_covmat_flag=False, use_thcovmat_in_fitting=False)[source]

Produces the correct covmat to be used in fitting_data_dict according to some options: whether to include the theory covmat, whether to separate the multiplcative errors and whether to compute the experimental covmat using the t0 prescription.

produce_dataset_inputs_sampling_covmat(sep_mult, theory_covmat_flag=False, use_thcovmat_in_sampling=False)[source]

Produces the correct covmat to be used in make_replica according to some options: whether to include the theory covmat, whether to separate the multiplcative errors and whether to compute the experimental covmat using the t0 prescription.

produce_dataspecs_with_matched_cuts(dataspecs)[source]

Take a list of namespaces (dataspecs), resolve dataset within each of them, and return another list of dataspecs where the datasets all have the same cuts, corresponding to the intersection of the selected points. All the datasets must have the same name (i.e. correspond with the same experimental measurement), but can otherwise differ, for example in the theory used for the experimental predictions.

This rule can be combined with matched_datasets_from_dataspecs.

produce_defaults(q2min=None, w2min=None, maxTau=None, default_filter_settings=None, filter_defaults={}, default_filter_settings_recorded_spec_=None)[source]

Produce default values for filters taking into account the values of q2min, w2min and maxTau defined at namespace level and those inside a filter_defaults mapping.

produce_experiment_from_input(experiment_input, theoryid, use_cuts, fit=None)[source]

Return a mapping containing a single experiment from an experiment input. NOTE: This might be deprecated in the future.

produce_filter_data(fakedata: bool = False, theorycovmatconfig=None)[source]

Set the action used to filter the data to filter either real or closure data. If the closure data filter is being used and if the theory covariance matrix is not being closure tested then filter data by experiment for efficiency

produce_fit_id(fit) str[source]

Return a string containing the ID of the fit

produce_fitcontext(fitinputcontext, fitpdf)[source]

Set PDF, theory ID and data input from the fit config

produce_fitcontextwithcuts(fit, fitinputcontext)[source]

Like fitinputcontext but setting the cuts policy.

produce_fitenvironment(fit, fitinputcontext)[source]

Like fitcontext, but additionally forcing various other parameters, such as the cuts policy and Monte Carlo seeding to be the same as the fit.

Notes

produce_fitinputcontext(fit)[source]

Like fitcontext but without setting the PDF

produce_fitpdf(fit)[source]

Like fitcontext only setting the PDF

produce_fitpdfandbasis(fitpdf, basisfromfit)[source]

Set the PDF and basis from the fit config.

produce_fitq0fromfit(fitinputcontext)[source]

Given a fit, return the fitting scale according to the theory

produce_fitreplicas(fit)[source]

Production rule mapping the replica key to each Monte Carlo fit replica.

produce_fitthcovmat(use_thcovmat_if_present: bool = False, fit: (<class 'str'>, <class 'NoneType'>) = None)[source]

If a fit is specified and use_thcovmat_if_present is True then returns the corresponding covariance matrix for the given fit if it exists. If the fit doesn’t have a theory covariance matrix then returns False.

produce_fitunderlyinglaw(fit)[source]

Reads closuretest: fakepdf from fit config file and passes as pdf

produce_fivetheories(point_prescription)[source]
produce_group_dataset_inputs_by_experiment(data_input)[source]
produce_group_dataset_inputs_by_metadata(data_input, processed_metadata_group)[source]

Take the data and the processed_metadata_group key and attempt to group the data, returns a list where each element specifies the data_input for a single group and the group_name

produce_group_dataset_inputs_by_process(data_input)[source]
produce_inclusive_use_scalevar_uncertainties(use_scalevar_uncertainties: bool = False, point_prescription: (<class 'str'>, None) = None)[source]

Whether to use a scale variation uncertainty theory covmat. Checks whether a point prescription is included in the runcard and if so assumes scale uncertainties are to be used.

produce_integdatasets(integrability)[source]
produce_loaded_theory_covmat(output_path, data_input, theory_covmat_flag=False, use_user_uncertainties=False, use_scalevar_uncertainties=True)[source]

Loads the theory covmat from the correct file according to how it was generated by vp-setupfit.

produce_loaded_user_covmat_path(user_covmat_path: str = '', use_user_uncertainties: bool = False)[source]

Path to the user covmat provided by user_covmat_path in the runcard. If no path is provided, returns None. For use in theorycovariance.construction.user_covmat.

produce_matched_datasets_from_dataspecs(dataspecs)[source]

Take an arbitrary list of mappings called dataspecs and return a new list of mappings called dataspecs constructed as follows.

From each of the original dataspecs, resolve the key process, and all the experiments and datasets therein.

Compute the intersection of the dataset names, and for each element in the intersection construct a mapping with the follwing keys:

  • process : A string with the common process name.

  • experiment_name : A string with the common experiment name.

  • dataset_name : A string with the common dataset name.

  • dataspecs : A list of mappinngs matching the original “dataspecs”. Each mapping contains:

    • dataset: A dataset with the name data_set name and the

    properties (cuts, theory, etc) corresponding to the original dataspec. * dataset_input: The input line used to build dataset. * All the other keys in the original dataspec.

produce_matched_positivity_from_dataspecs(dataspecs)[source]

Like produce_matched_datasets_from_dataspecs but for positivity datasets.

produce_multiclosure_underlyinglaw(fits)[source]

Produce the underlying law for a set of fits. This allows a single t0 like covariance matrix to be loaded for all fits, for use with statistical estimators on multiple closure fits. If the fits don’t all have the same underlying law then an error is raised, offending fit is identified.

produce_nnfit_theory_covmat(use_thcovmat_in_sampling: bool, use_thcovmat_in_fitting: bool, inclusive_use_scalevar_uncertainties, use_user_uncertainties: bool = False)[source]

Return the theory covariance matrix used in the fit.

produce_no_covmat_reg()[source]

explicitly set norm_threshold to None so that no covariance matrix regularization is performed

produce_pdf_id(pdf) str[source]

Return a string containing the PDF’s LHAPDF ID

produce_pdfreplicas(fitpdf)[source]

Production rule mapping the replica key to each postfit replica.

produce_posdatasets(positivity)[source]
produce_processed_data_grouping(use_thcovmat_in_fitting=False, use_thcovmat_in_sampling=False, data_grouping=None, data_grouping_recorded_spec_=None)[source]

Process the data_grouping key from the runcard, or lockfile. If data_grouping_recorded_spec_ is present then its value is taken, and the runcard is assumed to be a lockfile.

If data_grouping is None, then, if either use_thcovmat_in_fitting or use_thcovmat_in_sampling (or both) are true (which means that the fit is a thcovmat fit), group all the datasets together, otherwise fall back to the default behaviour of grouping by experiment (called standard_report).

Else, the user can specfiy their own grouping, for example metadata_process.

produce_processed_metadata_group(processed_data_grouping, metadata_group=None)[source]

Expose the final data grouping result. Either metadata_group is specified by user, in which case uses processed_data_grouping which is experiment by default.

produce_replicas(nreplica: int)[source]

Produce a replicas array

produce_reweight_all_datasets(experiments)[source]
produce_rules(theoryid, use_cuts, defaults, default_filter_rules=None, filter_rules=None, default_filter_rules_recorded_spec_=None, added_filter_rules: (<class 'list'>, <class 'NoneType'>) = None)[source]

Produce filter rules based on the user defined input and defaults.

produce_scale_variation_theories(theoryid, point_prescription)[source]

Produces a list of theoryids given a theoryid at central scales and a point prescription. The options for the latter are ‘3 point’, ‘5 point’, ‘5bar point’, ‘7 point’ and ‘9 point’. Note that these are defined in arXiv:1906.10698. This hard codes the theories needed for each prescription to avoid user error.

produce_sep_mult(separate_multiplicative=None)[source]

Specifies whether to separate the multiplicative errors in the experimental covmat construction. The default is True.

produce_seventheories(point_prescription)[source]
produce_t0set(t0pdfset=None, use_t0=False)[source]

Return the t0set if use_t0 is True and None otherwise. Raises an error if t0 is requested but no t0set is given.

produce_theory_database()[source]

Produces path to the theory.db file

produce_total_chi2_data(fitthcovmat)[source]

If there is no theory covmat for the fit, then calculate the total chi2 by summing the chi2 from each experiment.

produce_total_phi_data(fitthcovmat)[source]

If there is no theory covmat for the fit, then calculate the total phi using contributions from each experiment.

class validphys.config.Environment(*, this_folder=None, net=True, upload=False, dry=False, **kwargs)[source]

Bases: Environment

Container for information to be filled at run time

validphys.convolution module

This module implements tools for computing convolutions between PDFs and theory grids, which yield observables.

The high level predictions() function can be used to extact theory predictions for experimentally measured quantities:

import numpy as np
from validphys.api import API
from validphys.convolution import predictions


inp = {
    'fit': '181023-001-sc',
    'use_cuts': 'internal',
    'theoryid': 162,
    'pdf': 'NNPDF40_nnlo_lowprecision',
    'dataset_inputs': {'from_': 'fit'}
}


all_datasets = API.data(**inp).datasets

pdf = API.pdf(**inp)


all_preds = [predictions(ds, pdf) for ds in all_datasets]

Some variants such as central_predictions() and linear_predictions() are useful for more specialized tasks.

These functions work with validphys.core.DatasetSpec objects, allowing to account for information on COMPOUND predictions and cuts. A lower level interface which operates with validphys.coredata.FKTableData objects is also available.

exception validphys.convolution.PredictionsRequireCutsError[source]

Bases: Exception

validphys.convolution.central_dis_predictions(loaded_fk, pdf)[source]

Implementation of central_fk_predictions() for DIS observables.

validphys.convolution.central_fk_predictions(loaded_fk, pdf)[source]

Same as fk_predictions(), but computing predictions for the central PDF member only.

validphys.convolution.central_hadron_predictions(loaded_fk, pdf)[source]

Implementation of central_fk_predictions() for hadronic observables.

validphys.convolution.central_predictions(dataset, pdf)[source]

Same as predictions() but computing the predictions for the central member of the PDF set only. For Monte Carlo PDFs, this is a faster alternative to computing the central predictions as the average of the replica predictions (although a small approximation is involved in the case of hadronic predictions).

validphys.convolution.dis_predictions(loaded_fk, pdf)[source]

Implementation of fk_predictions() for DIS observables.

validphys.convolution.fk_predictions(loaded_fk, pdf)[source]

Low level function to compute predictions from a FKTable.

Parameters:
Returns:

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points (use validphys.coredata.FKTableData.with_cuts() to filter out points). The columns correspond to the selected PDF members in the LHAPDF set.

Return type:

pandas.DataFrame

Notes

This function operates on a single FKTable, while the prediction for an experimental quantity generally involves several. Use predictions() to compute those.

Examples

>>> from validphys.loader import Loader
>>> from validphys.convolution import hadron_predictions
>>> from validphys.fkparser import load_fktable
>>> l = Loader()
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
>>> table = load_fktable(ds.fkspecs[0])
>>> hadron_predictions(table, pdf)
             1           2           3           4    ...         97          98          99          100
data                                                  ...
0     176.688118  170.172930  172.460771  173.792321  ...  179.504636  172.343792  168.372508  169.927820
1     252.682923  244.507916  247.840249  249.541798  ...  256.410844  247.805180  242.246438  244.415529
2     828.076008  813.452551  824.581569  828.213508  ...  838.707211  826.056388  810.310109  816.824167
validphys.convolution.hadron_predictions(loaded_fk, pdf)[source]

Implementation of fk_predictions() for hadronic observables.

validphys.convolution.linear_fk_predictions(loaded_fk, pdf)[source]

Same as predictions() for DIS, but compute linearized predictions for hadronic data, using linear_hadron_predictions().

validphys.convolution.linear_hadron_predictions(loaded_fk, pdf)[source]

Implementation of linear_fk_predictions() for hadronic observables. Specifically this computes:

central_value ⊗ FK ⊗ (2 * replica_values - central_value)

which is the linear expansion of the hadronic observable in the difference between each replica and the central value, replica_values - central_value

validphys.convolution.linear_predictions(dataset, pdf)[source]

Same as predictions() but computing linearized predictions. These are the same as predictions for DIS, but truncates to the terms that are linear in the difference between each member and the central value for hadronic predictions.

This approximation is generally a very good approximation in that yields differences that are much smaller that the PDF uncertainty.

validphys.convolution.predictions(dataset, pdf)[source]

“Compute theory predictions for a given PDF and dataset. Information regading the dataset, on cuts, CFactors and combinations of FKTables is taken into account to construct the predictions.

The result should be comparable to experimental predictions implemented in CommonData.

Parameters:
  • dataset (validphys.core.DatasetSpec) – The dataset containing information on the partonic cross section.

  • pdf (validphys.core.PDF) – The PDF set to use for the convolutions.

Returns:

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points, based on the dataset cuts. The columns correspond to the selected PDF members in the LHAPDF set.

Return type:

pandas.DataFrame

Examples

Obtain descriptive statistics over PDF replicas for each of the three points in the ATLAS ttbar dataset:

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53)
>>> from validphys.convolution import predictions
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> preds = predictions(ds, pdf)
>>> preds.T.describe()
data            0           1           2
count  100.000000  100.000000  100.000000
mean   161.271292  231.500367  767.816844
std      2.227304    2.883497    7.327617
min    156.638526  225.283254  750.850250
25%    159.652216  229.486793  762.773527
50%    161.066965  231.281248  767.619249
75%    162.620554  233.306836  772.390286
max    168.390840  240.287549  786.549380

validphys.core module

Core datastructures used in the validphys data model.

class validphys.core.CommonDataSpec(datafile, sysfile, plotfiles, name=None, metadata=None)[source]

Bases: TupleComp

load()[source]
load_commondata_instance()[source]

load a validphys.core.CommonDataSpec to validphys.core.CommonData

property metadata
property name
property ndata
property nsys
property plot_kinlabels
property process_type
class validphys.core.Cuts(commondata, path)[source]

Bases: TupleComp

load()[source]
class validphys.core.CutsPolicy(value)[source]

Bases: Enum

An enumeration.

FROMFIT = 'fromfit'
FROM_CUT_INTERSECTION_NAMESPACE = 'fromintersection'
FROM_SIMILAR_PREDICTIONS_NAMESPACE = 'fromsimilarpredictions'
INTERNAL = 'internal'
NOCUTS = 'nocuts'
class validphys.core.DataGroupSpec(name, datasets, dsinputs=None)[source]

Bases: TupleComp, NSList

property as_markdown
load_commondata()[source]
load_commondata_instance()[source]

Given Experiment load list of validphys.coredata.CommonData objects with cuts already applied

property thspec
to_unweighted()[source]

Return a copy of the group with the weights for all experiments set to one. Note that the results cannot be used as a namespace.

class validphys.core.DataSetInput(*, name, sys, cfac, frac, weight, custom_group)[source]

Bases: TupleComp

Represents whatever the user enters in the YAML to specify a dataset.

class validphys.core.DataSetSpec(*, name, commondata, fkspecs, thspec, cuts, frac=1, op=None, weight=1)[source]

Bases: TupleComp

load_commondata()[source]

Strips the commondata loading from load

to_unweighted()[source]

Return a copy of the dataset with the weight set to one.

class validphys.core.ExperimentInput(*, name, datasets)[source]

Bases: TupleComp

as_dict()[source]
class validphys.core.FKTableSpec(fkpath, cfactors, metadata=None)[source]

Bases: TupleComp

Each FKTable is formed by a number of sub-fktables to be concatenated each of which having its own path. Therefore the fkpath variable is a list of paths.

Before the pineappl implementation, FKTable were already pre-concatenated. The Legacy interface therefore relies on fkpath being just a string or path instead

The metadata of the FKTable for the given dataset is stored as an attribute to this function. This is transitional, eventually it will be held by the associated CommonData in the new format.

load_cfactors()[source]

Each of the sub-fktables that form the complete FKTable can have several cfactors applied to it. This function uses parse_cfactor to make them into CFactorData

load_with_cuts(cuts)[source]

Load the fktable and apply cuts immediately. Returns a FKTableData

class validphys.core.Filter(indexes, label, **kwargs)[source]

Bases: object

as_pair()[source]
class validphys.core.FitSpec(name, path)[source]

Bases: TupleComp

as_input()[source]
label
name
path
class validphys.core.HessianStats(data, rescale_factor=1)[source]

Bases: SymmHessianStats

Compute stats in the ‘assymetric’ hessian format: The first index (0) is the central value. The odd indexes are the results for lower eigenvectors and the even are the upper eigenvectors.A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

moment(order)[source]
std_error()[source]
class validphys.core.HyperscanSpec(name, path)[source]

Bases: FitSpec

The hyperscan spec is just a special case of FitSpec

get_all_trials(base_params=None)[source]

Read all trials from all tries files. If there are original runcard-based parameters, a reference to them can be passed to the trials so that a full hyperparameter dictionary can be defined

Each hyperopt trial object will also have a reference to all trials in its own file

label
name
path
sample_trials(n=None, base_params=None, sigma=4.0)[source]

Parse all trials in the hyperscan object and then return an array of n trials read from the tries.json files and sampled according to their reward. If n is None, no sapling is performed and all trials are returned

Returns:

Dictionary on the form {parameters

Return type:

list of trials}

property tries_files

Return a dictionary with all tries.json files mapped to their replica number

class validphys.core.IntegrabilitySetSpec(name, commondataspec, fkspec, maxlambda, thspec)[source]

Bases: LagrangeSetSpec

class validphys.core.InternalCutsWrapper(commondata, rules)[source]

Bases: TupleComp

load()[source]
class validphys.core.LagrangeSetSpec(name, commondataspec, fkspec, maxlambda, thspec)[source]

Bases: DataSetSpec

Extends DataSetSpec to work around the particularities of the positivity, integrability and other Lagrange Multiplier datasets.

load_commondata()[source]

Strips the commondata loading from load

to_unweighted()[source]

Return a copy of the dataset with the weight set to one.

class validphys.core.MCStats(data)[source]

Bases: Stats

Result obtained from a Monte Carlo sample

errorbar68()[source]
moment(order)[source]
sample_values(size)[source]
std_error()[source]
class validphys.core.MatchedCuts(othercuts, ndata)[source]

Bases: TupleComp

load()[source]
class validphys.core.PDF(name)[source]

Bases: TupleComp

Base validphys PDF providing high level access to metadata.

Statistical estimators which depends on the PDF type (MC, Hessian…) are exposed as a Stats object through the stats_class attribute The LHAPDF metadata can directly be accessed through the info attribute

Examples

>>> from validphys.api import API
>>> from validphys.convolution import predictions
>>> args = {"dataset_input":{"dataset": "ATLASTTBARTOT"}, "theoryid":162, "use_cuts":"internal"}
>>> ds = API.dataset(**args)
>>> pdf = API.pdf(pdf="NNPDF40_nnlo_as_01180")
>>> preds = predictions(ds, pdf)
>>> preds.shape
(3, 100)
property alphas_mz

Alpha_s(M_Z) as defined in the LHAPDF .info file

property alphas_vals

List of alpha_s(Q) at various Q for interpolation based alphas. Values as defined in the LHAPDF .info file

property error_conf_level

Error confidence level as defined in the LHAPDF .info file if no number is given in the LHAPDF .info file defaults to 68%

property error_type

Error type as defined in the LHAPDF .info file

get_members()[source]

Return the number of members selected in pdf.load().grid_values

property info

Information contained in the LHAPDF .info file

property infopath
property isinstalled
property label
load()[source]
load_t0()[source]

Load the PDF as a t0 set

property q_min

Minimum Q as given by the LHAPDF .info file

property stats_class

Return the stats calculator for this error type

exception validphys.core.PDFDoesNotExist[source]

Bases: Exception

class validphys.core.PositivitySetSpec(name, commondataspec, fkspec, maxlambda, thspec)[source]

Bases: LagrangeSetSpec

class validphys.core.SimilarCuts(inputs, threshold)[source]

Bases: TupleComp

load()[source]
class validphys.core.Stats(data)[source]

Bases: object

Class holding statistical information about the objects used in validphys. This object can be a PDF or any function of a PDF (such as hadronic observable).

By convention, member 0 corresponds to the central value of the PDF. Accordingly, the method central_value will return the result held for member 0. Note that this is equal to the mean of the error_members only for the PDF itself and linear functions of the PDF (such as DIS-type observable). If you want to obtain the average of the error members you can do: np.mean(stats_instance.error_members, axis=0)

central_value()[source]
error_members()[source]
errorbar68()[source]
errorbarstd()[source]
moment(order)[source]
sample_values(size)[source]
std_error()[source]
std_interval(nsigma)[source]
class validphys.core.SymmHessianStats(data, rescale_factor=1)[source]

Bases: Stats

Compute stats in the ‘symetric’ hessian format: The first index (0) is the central value. The rest of the indexes are results for each eigenvector. A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

errorbar68()[source]
moment(order)[source]
std_error()[source]
class validphys.core.ThCovMatSpec(path)[source]

Bases: object

load()[source]
class validphys.core.TheoryIDSpec(id: int, path: pathlib.Path, dbpath: pathlib.Path)[source]

Bases: object

dbpath: Path
get_description()[source]
id: int
is_pineappl()[source]

Check whether this theory is a pineappl-based theory

path: Path
property yamldb_path
class validphys.core.TupleComp(*args, **kwargs)[source]

Bases: object

classmethod argnames()[source]
validphys.core.cut_mask(cuts)[source]

Return an objects that will act as the cuts when applied as a slice

validphys.coredata module

Data containers backed by Python managed memory (Numpy arrays and Pandas dataframes).

class validphys.coredata.CFactorData(description: str, central_value: array, uncertainty: array)[source]

Bases: object

Data contained in a CFactor

Parameters:
  • description (str) – Information on how the data was obtained.

  • central_value (array, shape(ndata)) – The value of the cfactor for each data point.

  • uncertainty (array, shape(ndata)) – The absolute uncertainty on the cfactor if available.

central_value: array
description: str
uncertainty: array
class validphys.coredata.CommonData(setname: str, ndata: int, commondataproc: str, nkin: int, nsys: int, commondata_table: DataFrame, systype_table: DataFrame)[source]

Bases: object

Data contained in Commondata files, relevant cuts applied.

Parameters:
  • setname (str) – Name of the dataset

  • ndata (int) – Number of data points

  • commondataproc (str) – Process type, one of 21 options

  • nkin (int) – Number of kinematics specified

  • nsys (int) – Number of systematics

  • commondata_table (pd.DataFrame) – Pandas dataframe containing the commondata

  • systype_table (pd.DataFrame) – Pandas dataframe containing the systype index for each systematic alongside the uncertainty type (ADD/MULT/RAND) and name (CORR/UNCORR/THEORYCORR/SKIP)

  • systematics_table (pd.DataFrame) – Panda dataframe containing the table of systematics

property additive_errors

Returns the systematics which are additive (systype is ADD) as absolute uncertainties (same units as data), with SKIP uncertainties removed.

property central_values
commondata_table: DataFrame
commondataproc: str
export(path)[source]
Export the data, and error types

Use the same format as libNNPDF:

  • A DATA_<dataset>.dat file with the dataframe of accepted points

  • A systypes/STYPES_<dataset>.dat file with the error types

get_cv()[source]
get_kintable()[source]
property kinematics
property multiplicative_errors

Returns the systematics which are multiplicative (systype is MULT) in a percentage format, with SKIP uncertainties removed.

ndata: int
nkin: int
nsys: int
setname: str
property stat_errors
systematic_errors(central_values=None)[source]

Returns all systematic errors as absolute uncertainties, with a single column for each uncertainty. Converts multiplicative_errors to units of data and then appends onto additive_errors. By default uses the experimental central values to perform conversion, but the user can supply a 1-D array of central values, with length self.ndata, to use instead of the experimental central values to calculate the absolute contribution of the multiplicative systematics.

Parameters:

central_values (None, np.array) – 1-D array containing alternative central values to combine with multiplicative uncertainties. This array must have length equal to self.ndata. By default central_values is None, and the central values of the commondata are used.

Returns:

systematic_errors – Dataframe containing systematic errors.

Return type:

pd.DataFrame

systematics_table: DataFrame
systype_table: DataFrame
with_central_value(cv)[source]
with_cuts(cuts)[source]

A method to return a CommonData object where an integer mask has been applied, keeping only data points which pass cuts.

Note if the first data point passes cuts, the first entry of cuts should be 0.

Paramters

cuts: list or validphys.core.Cuts or None

class validphys.coredata.FKTableData(hadronic: bool, Q0: float, ndata: int, xgrid: ~numpy.ndarray, sigma: ~pandas.core.frame.DataFrame, metadata: dict = <factory>, protected: bool = False)[source]

Bases: object

Data contained in an FKTable

Parameters:
  • hadronic (bool) – Whether a hadronic (two PDFs) or a DIS (one PDF) convolution is needed.

  • Q0 (float) – The scale at which the PDFs should be evaluated (in GeV).

  • ndata (int) – The number of data points in the grid.

  • xgrid (array, shape (nx)) – The points in x at which the PDFs should be evaluated.

  • sigma (pd.DataFrame) –

    For hadronic data, the columns are the indexes in the NfxNf list of possible flavour combinations of two PDFs. The MultiIndex contains three keys, the data index, an index into xgrid for the first PDF and an idex into xgrid for the second PDF, indicating if the points in x where the PDF should be evaluated.

    For DIS data, the columns are indexes in the Nf list of flavours. The MultiIndex contains two keys, the data index and an index into xgrid indicating the points in x where the PDF should be evaluated.

  • metadata (dict) – Other information contained in the FKTable.

  • protected (bool) – When a fktable is protected cuts will not be applied. The most common use-case is when a total cross section is used as a normalization table for a differential cross section, in legacy code (<= NNPDF4.0) both fktables would be cut using the differential index.

Q0: float
get_np_fktable()[source]

Returns the fktable as a dense numpy array that can be directly manipulated with numpy

The return shape is:

(ndata, nx, nbasis) for DIS (ndata, nx, nx, nbasis) for hadronic

where nx is the length of the xgrid and nbasis the number of flavour contributions that contribute

hadronic: bool
property luminosity_mapping

Return the flavour combinations that contribute to the fktable in the form of a single array

The return shape is:

(nbasis,) for DIS (nbasis*2,) for hadronic

metadata: dict
ndata: int
protected: bool = False
sigma: DataFrame
with_cfactor(cfactor)[source]

Returns a copy of the FKTableData object with cfactors applied to the fktable

with_cuts(cuts)[source]

Return a copy of the FKTabe with the cuts applied. The data index of the sigma operator (the outermost level), contains the data point that have been kept. The ndata property is updated to reflect the new number of datapoints. If cuts is None, return the object unmodified.

Parameters:

cuts (array_like or validphys.core.Cuts or None.) – The cuts to be applied.

Returns:

res – A copy of the FKtable with the cuts applies.

Return type:

FKTableData

Notes

The original number of points can be accessed with table.metadata['GridInfo'].ndata.

Examples

>>> from validphys.fkparser import load_fktable
... from validphys.loader import Loader
... l = Loader()
... ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
... table = load_fktable(ds.fkspecs[0])
... newtable = table.with_cuts([0,1])
>>> assert set(newtable.sigma.index.get_level_values(0)) == {0,1}
>>> assert newtable.ndata == 2
>>> assert newtable.metadata['GridInfo'].ndata == 3
xgrid: ndarray

validphys.correlations module

Utilities for computing correlations in batch.

@author: Zahari Kassabov

validphys.correlations.obs_obs_correlations(pdf, corrpair_results)[source]

Return the theoretical correlation matrix between a pair of observables.

validphys.correlations.obs_pdf_correlations(pdf, results, xplotting_grid)[source]

Return the correlations between each point in a dataset and the PDF values on a grid of (x,f) points in a format similar to xplotting_grid.

validphys.covmats module

Module for handling logic and manipulation of covariance and correlation matrices on different levels of abstraction

validphys.covmats.covmat_from_systematics(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, norm_threshold=None, _central_values=None)[source]

Take the statistical uncertainty and systematics table from a validphys.coredata.CommonData object and construct the covariance matrix accounting for correlations between systematics.

If the systematic has the name SKIP then it is ignored in the construction of the covariance matrix.

ADDitive or MULTiplicative systypes are handled by either multiplying the additive or multiplicative uncertainties respectively. We convert uncertainties so that they are all in the same units as the data:

  • Additive (ADD) systematics are left unchanged

  • multiplicative (MULT) systematics need to be converted from a

percentage by multiplying by the central value and dividing by 100.

Finally, the systematics are split into the five possible archetypes of systematic uncertainties: uncorrelated (UNCORR), correlated (CORR), theory uncorrelated (THEORYUNCORR), theory correlated (THEORYCORR) and special correlated (SPECIALCORR) systematics.

Uncorrelated contributions from statistical error, uncorrelated and theory uncorrelated are added in quadrature to the diagonal of the covmat.

The contribution to the covariance matrix arising due to correlated systematics is schematically A_correlated @ A_correlated.T, where A_correlated is a matrix N_dat by N_sys. The total contribution from correlated systematics is found by adding together the result of mutiplying each correlated systematic matrix by its transpose (correlated, theory_correlated and special_correlated).

For more information on the generation of the covariance matrix see the paper outlining the procedure, specifically equation 2 and surrounding text.

Parameters:
  • loaded_commondata_with_cuts (validphys.coredata.CommonData) – CommonData which stores information about systematic errors, their treatment and description.

  • dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _central_values (None, np.array) – 1-D array containing alternative central values to combine with the multiplicative errors to calculate their absolute contributions. By default this is None, and the experimental central values are used. However, this can be used to calculate, for example, the t0 covariance matrix by using the predictions from the central member of the t0 pdf.

Returns:

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type:

np.array

Example

In order to use this function, simply call it from the API

>>> from validphys.api import API
>>> inp = dict(
...     dataset_input={'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10},
...     theoryid=162,
...     use_cuts="internal"
... )
>>> cov = API.covmat_from_systematics(**inp)
>>> cov.shape
(28, 28)
validphys.covmats.covmat_stability_characteristic(systematics_matrix_from_commondata)[source]

Return a number characterizing the stability of an experimental covariance matrix against uncertainties in the correlation. It is defined as the L2 norm (largest singular value) of the square root of the inverse correlation matrix. This is equivalent to the square root of the inverse of the smallest singular value of the correlation matrix:

Z = (1/λ⁰)^½

Where λ⁰ is the smallest eigenvalue of the correlation matrix.

This is the number used as threshold in calcutils.regularize_covmat(). The interpretation is roughly what precision does the worst correlation need to have in order to not affect meaningfully the χ² computed using the covariance matrix, so for example a stability characteristic of 4 means that correlations need to be known with uncetainties less than 0.25.

Examples

>>> from validphys.api import API
>>> API.covmat_stability_characteristic(dataset_input={"dataset": "NMC"},
... theoryid=162, use_cuts="internal")
2.742658604186114
validphys.covmats.dataset_inputs_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, data_input, use_weights_in_covmat=True, norm_threshold=None, _list_of_central_values=None, _only_additive=False)[source]

Given a list containing validphys.coredata.CommonData s, construct the full covariance matrix.

This is similar to covmat_from_systematics() except that special corr systematics are concatenated across all datasets before being multiplied by their transpose to give off block-diagonal contributions. The other systematics contribute to the block diagonal in the same way as covmat_from_systematics().

Parameters:
  • dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – list of CommonData objects.

  • data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _list_of_central_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.

Returns:

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type:

np.array

Example

This function can be called directly from the API:

>>> dsinps = [
...     {'dataset': 'NMC'},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD']},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10}
... ]
>>> inp = dict(dataset_inputs=dsinps, theoryid=162, use_cuts="internal")
>>> cov = API.dataset_inputs_covmat_from_systematics(**inp)
>>> cov.shape
(235, 235)

Which properly accounts for all dataset settings and cuts.

validphys.covmats.dataset_inputs_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_sqrt_covmat(dataset_inputs_covariance_matrix)[source]

Like sqrt_covmat but for an group of datasets

validphys.covmats.dataset_inputs_stability_table(dataset_inputs_stability, dataset_inputs)[source]

Return a table with py:func:covmat_stability_characteristic for all dataset inputs

validphys.covmats.dataset_inputs_t0_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Like t0_covmat_from_systematics() except for all data

Parameters:
  • dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – The CommonData for all datasets defined in dataset_inputs.

  • data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • dataset_inputs_t0_predictions (list[np.array]) – The t0 predictions for all datasets.

Returns:

t0_covmat – t0 covariance matrix matrix for list of datasets.

Return type:

np.array

validphys.covmats.dataset_inputs_t0_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_t0_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_t0_total_covmat(dataset_inputs_t0_exp_covmat, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_t0_total_covmat_separate(dataset_inputs_t0_exp_covmat_separate, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat(dataset_inputs_exp_covmat, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat_separate(dataset_inputs_exp_covmat_separate, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_t0_predictions(dataset, t0set)[source]

Returns the t0 predictions for a dataset which are the predictions calculated using the central member of pdf. Note that if pdf has errortype replicas, and the dataset is a hadronic observable then the predictions of the central member are subtly different to the central value of the replica predictions.

Parameters:
Returns:

t0_predictions – 1-D numpy array with predictions for each of the cut datapoints.

Return type:

np.array

validphys.covmats.datasets_covmat_differences_table(each_dataset, datasets_covmat_no_reg, datasets_covmat_reg, norm_threshold)[source]

For each dataset calculate and tabulate two max differences upon regularization given a value for norm_threshold:

  • max relative difference to the diagonal of the covariance matrix (%)

  • max absolute difference to the correlation matrix of each covmat

validphys.covmats.dataspecs_datasets_covmat_differences_table(dataspecs_speclabel, dataspecs_covmat_diff_tables)[source]

For each dataspec calculate and tabulate the two covmat differences described in datasets_covmat_differences_table (max relative difference in variance and max absolute correlation difference)

validphys.covmats.fit_name_with_covmat_label(fit, fitthcovmat)[source]

If theory covariance matrix is being used to calculate statistical estimators for the fit then appends (exp + th) onto the fit name for use in legends and column headers to help the user see what covariance matrix was used to produce the plot or table they are looking at.

validphys.covmats.generate_exp_covmat(datasets_input, data, use_weights, norm_threshold, _list_of_c_values, only_add)[source]

Function to generate the experimental covmat eventually using the t0 prescription. It is also possible to compute it only with the additive errors.

Parameters:
  • dataset_inputs (list[validphys.coredata.CommonData]) – list of CommonData objects.

  • data (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _list_of_c_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.

  • only_add (bool) – specifies whether to use only the additive errors to compute the covmat

Returns:

  • np.array

  • experimental covariance matrix

validphys.covmats.groups_corrmat(groups_covmat)[source]

Generates the grouped experimental correlation matrix with groups_covmat as input

validphys.covmats.groups_covmat(groups_covmat_no_table)[source]

Duplicate of groups_covmat_no_table but with a table decorator.

validphys.covmats.groups_covmat_no_table(groups_data, groups_index, groups_covmat_collection)[source]

Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:

  • group name

  • dataset name

  • index of the point within the dataset.

validphys.covmats.groups_invcovmat(groups_data, groups_index, groups_covmat_collection)[source]

Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.

validphys.covmats.groups_normcovmat(groups_covmat, groups_data_values)[source]

Calculates the grouped experimental covariance matrix normalised to data.

validphys.covmats.groups_sqrtcovmat(groups_data, groups_index, groups_sqrt_covmat)[source]

Like groups_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.

validphys.covmats.pdferr_plus_covmat(dataset, pdf, covmat_t0_considered)[source]

For a given dataset, returns the sum of the covariance matrix given by covmat_t0_considered and the PDF error: - If the PDF error_type is ‘replicas’, a covariance matrix is estimated from

the replica theory predictions

Parameters:
  • dataset (DataSetSpec) – object parsed from the dataset_input runcard key

  • pdf (PDF) – monte carlo pdf used to estimate PDF error

  • covmat_t0_considered (np.array) – experimental covariance matrix with the t0 considered

Returns:

covariance_matrix – sum of the experimental and pdf error as a numpy array

Return type:

np.array

Examples

use_pdferr makes this action be used for covariance_matrix

>>> from validphys.api import API
>>> from import numpy as np
>>> inp = {
        'dataset_input': {'dataset' : 'ATLASTTBARTOT'},
        'theoryid': 53,
        'pdf': 'NNPDF31_nlo_as_0118',
        'use_cuts': 'nocuts'
    }
>>> a = API.covariance_matrix(**inp, use_pdferr=True)
>>> b = API.pdferr_plus_covmat(**inp)
>>> np.allclose(a == b)
True
validphys.covmats.pdferr_plus_dataset_inputs_covmat(data, pdf, dataset_inputs_covmat_t0_considered, fitthcovmat)[source]

Like pdferr_plus_covmat except for an experiment

validphys.covmats.reorder_thcovmat_as_expcovmat(fitthcovmat, data)[source]

Reorder the thcovmat in such a way to match the order of the experimental covmat, which means the order of the runcard

validphys.covmats.sqrt_covmat(covariance_matrix)[source]

Function that computes the square root of the covariance matrix.

Parameters:

covariance_matrix (np.array) – A positive definite covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Returns:

sqrt_mat – The square root of the input covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts), and which is the the lower triangular decomposition. The following should be True: np.allclose(sqrt_covmat @ sqrt_covmat.T, covariance_matrix).

Return type:

np.array

Notes

The square root is found by using the Cholesky decomposition. However, rather than finding the decomposition of the covariance matrix directly, the (upper triangular) decomposition is found of the corresponding correlation matrix and then the output of this is rescaled and then transposed as sqrt_matrix = (decomp * sqrt_diags).T, where decomp is the Cholesky decomposition of the correlation matrix and sqrt_diags is the square root of the diagonal entries of the covariance matrix. This method is useful in situations in which the covariance matrix is near-singular. See here for more discussion on this.

The lower triangular is useful for efficient calculation of the \(\chi^2\)

Example

>>> import numpy as np
>>> from validphys.api import API
>>> API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
array([[0.0326543 , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
    [0.00314523, 0.01467259, 0.        , ..., 0.        , 0.        ,
        0.        ],
    [0.0037817 , 0.00544256, 0.02874822, ..., 0.        , 0.        ,
        0.        ],
    ...,
    [0.00043404, 0.00031169, 0.00020489, ..., 0.00441073, 0.        ,
        0.        ],
    [0.00048717, 0.00033792, 0.00022971, ..., 0.00126704, 0.00435696,
        0.        ],
    [0.00067353, 0.00050372, 0.0003203 , ..., 0.00107255, 0.00065041,
        0.01002952]])
>>> sqrt_cov = API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
>>> cov = API.covariance_matrix(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
>>> np.allclose(np.linalg.cholesky(cov), sqrt_cov)
True
validphys.covmats.systematics_matrix_from_commondata(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, _central_values=None)[source]

Returns a systematics matrix, \(A\), for the corresponding dataset. The systematics matrix is a square root of the covmat:

\[C = A A^T\]

and is obtained by concatenating a block diagonal of the uncorrelated uncertainties with the correlated systematics.

validphys.covmats.t0_covmat_from_systematics(loaded_commondata_with_cuts, *, dataset_input, use_weights_in_covmat=True, norm_threshold=None, dataset_t0_predictions)[source]

Like covmat_from_systematics() except uses the t0 predictions to calculate the absolute constributions to the covmat from multiplicative uncertainties. For more info on the t0 predictions see validphys.commondata.dataset_t0_predictions().

Parameters:
  • loaded_commondata_with_cuts (validphys.coredata.CommonData) – commondata object for which to generate the covmat.

  • dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • dataset_t0_predictions (np.array) – 1-D array with t0 predictions.

Returns:

t0_covmat – t0 covariance matrix

Return type:

np.array

validphys.covmats_utils module

covmat_utils.py

Utils functions for constructing covariance matrices from systematics. Leveraged by validphys.covmats which contains relevant actions/providers.

validphys.covmats_utils.construct_covmat(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to construct a covariance matrix (covmat), given the statistical error and a dataframe of systematics.

Errors with name UNCORR or THEORYUNCORR are added in quadrature with the statistical error to the diagonal of the covmat.

Other systematics are treated as correlated; their covmat contribution is found by multiplying them by their transpose.

Parameters:
  • stat_errors (np.array) – a 1-D array of statistical uncertainties

  • sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.covmats_utils.systematics_matrix(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to create a systematics matrix , \(A\), such that:

\[C = A A^T\]

Where \(C\) is the covariance matrix. This is achieved by creating a block diagonal matrix by adding the uncorrelated systematics in quadrature then taking the square-root and concatenating the correlated systematics, schematically:

Parameters:
  • stat_errors (np.array) – a 1-D array of statistical uncertainties

  • sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.dataplots module

Plots of relations between data PDFs and fits.

validphys.dataplots.check_normalize_to(ns, **kwargs)[source]

Transforn normalize_to into an index.

validphys.dataplots.kde_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]

KDE plot for experiments chi2.

validphys.dataplots.plot_chi2_eigs(pdf, dataset, chi2_per_eig)[source]
validphys.dataplots.plot_chi2dist(dataset, abs_chi2_data, chi2_stats, pdf)[source]

Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]

Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_sv(dataset, abs_chi2_data_thcovmat, pdf)[source]

Same as plot_chi2dist considering also the theory covmat in the calculation

validphys.dataplots.plot_dataset_inputs_phi_dist(data, dataset_inputs_bootstrap_phi_data)[source]

Generates a bootstrap distribution of phi and then plots a histogram of the individual bootstrap samples for dataset_inputs. By default the number of bootstrap samples is set to a sensible number (500) however this number can be changed by specifying bootstrap_samples in the runcard

validphys.dataplots.plot_datasets_chi2(groups_data, groups_chi2)[source]

Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_chi2_spider(groups_data, groups_chi2)[source]

Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_pdfs_chi2(data, each_dataset_chi2_pdfs, pdfs)[source]

Plot the chi² of all datasets with bars, and for different pdfs.

validphys.dataplots.plot_datasets_pdfs_chi2_sv(data, each_dataset_chi2_pdfs_sv, pdfs)[source]

Same as plot_datasets_pdfs_chi2_sv with the chi²s computed including scale variations

validphys.dataplots.plot_dataspecs_datasets_chi2(dataspecs_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_datasets_chi2_spider(dataspecs_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2_spider but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2(dataspecs_groups_chi2_table, processed_metadata_group)[source]

Same as plot_fits_groups_data_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2_spider(dataspecs_groups_chi2_table)[source]
validphys.dataplots.plot_dataspecs_positivity(dataspecs_speclabel, dataspecs_positivity_predictions, dataspecs_posdataset, pos_use_kin=False)[source]

Like plot_positivity() except plots positivity for each element of dataspecs, allowing positivity predictions to be generated with different theory_id s as well as pdf s

validphys.dataplots.plot_fancy(one_or_more_results, commondata, cuts, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

Read the PLOTTING configuration for the dataset and generate the corrspondig data theory plot.

The input results are assumed to be such that the first one is the data, and the subsequent ones are the predictions for the PDFfs. See one_or_more_results. The labelling of the predictions can be influenced by setting label attribute of theories and pdfs.

normalize_to: should be either ‘data’, a pdf id or an index of the result (0 for the data, and i for the ith pdf). None means plotting absolute values.

See docs/plotting_format.md for details on the format of the PLOTTING files.

validphys.dataplots.plot_fancy_dataspecs(dataspecs_results, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

General interface for data-theory comparison plots.

The user should define an arbitrary list of mappings called “dataspecs”. In each of these, dataset must resolve to a dataset with the same name (but could be e.g. different theories). The production rule matched_datasets_from_datasepcs may be used for this purpose.

The result will be a plot combining all the predictions from the dataspecs mapping (whch could vary in theory, pdf, cuts, etc).

The user can define a “speclabel” key in each datasspec (or only on some). By default, the PDF label will be used in the legend (like in plot_fancy).

normalize_to must be either:

  • The string ‘data’ or the integer 0 to plot the ratio to data,

  • or the 1-based index of the dataspec to normalize to the corresponding prediction,

  • or None (default) to plot absolute values.

A limitation at the moment is that the data cuts and errors will be taken from the first specifiaction.

validphys.dataplots.plot_fancy_sv_dataspecs(dataspecs_results_with_scale_variations, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None)[source]

Exactly the same as plot_fancy_dataspecs but the theoretical results passed down are modified so that the 1-sigma error bands correspond to a combination of the PDF error and the scale variations collected over theoryids

See: validphys.results.results_with_scale_variations()

validphys.dataplots.plot_fits_chi2_spider(fits, fits_groups_chi2, fits_groups_data, processed_metadata_group)[source]

Plots the chi²s of all groups of datasets on a spider/radar diagram.

validphys.dataplots.plot_fits_datasets_chi2(fits_datasets_chi2_table)[source]

Generate a plot equivalent to plot_datasets_chi2 using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider(fits_datasets_chi2_table)[source]

Generate a plot equivalent to plot_datasets_chi2_spider using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider_bygroup(fits_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2_spider but one plot for each group.

validphys.dataplots.plot_fits_groups_data_chi2(fits_groups_chi2_table, processed_metadata_group)[source]

Generate a plot equivalent to plot_groups_data_chi2 using all the fitted group of data as input.

validphys.dataplots.plot_fits_groups_data_phi(fits_groups_phi_table, processed_metadata_group)[source]

Plots a set of bars for each fit, each bar represents the value of phi for the corresponding group of datasets, which is defined according to the keys in the PLOTTING info file

validphys.dataplots.plot_fits_phi_spider(fits, fits_groups_data, fits_groups_data_phi, processed_metadata_group)[source]

Like plot_fits_chi2_spider but for phi.

validphys.dataplots.plot_groups_data_chi2(groups_data, groups_chi2, processed_metadata_group)[source]

Plot the chi² of all groups of datasets with bars.

validphys.dataplots.plot_groups_data_chi2_spider(groups_data, groups_chi2, processed_metadata_group, pdf)[source]

Plot the chi² of all groups of datasets as a spider plot.

validphys.dataplots.plot_groups_data_phi_spider(groups_data, groups_data_phi, processed_metadata_group, pdf)[source]

Plot the phi of all groups of datasets as a spider plot.

validphys.dataplots.plot_obscorrs(corrpair_datasets, obs_obs_correlations, pdf)[source]

NOTE: EXPERIMENTAL. Plot the correlation matrix between a pair of datasets.

validphys.dataplots.plot_phi(groups_data, groups_data_phi, processed_metadata_group)[source]

plots phi for each group of data as a bar for a single PDF input

See phi_data for information on how phi is calculated

validphys.dataplots.plot_phi_scatter_dataspecs(dataspecs_groups, dataspecs_speclabel, dataspecs_groups_bootstrap_phi)[source]

For each of the dataspecs, a bootstrap distribution of phi is generated for all specified groups of datasets. The distribution is then represented as a scatter point which is the median of the bootstrap distribution and an errorbar which spans the 68% confidence interval. By default the number of bootstrap samples is set to a sensible value, however it can be controlled by specifying bootstrap_samples in the runcard.

validphys.dataplots.plot_positivity(pdfs, positivity_predictions_for_pdfs, posdataset, pos_use_kin=False)[source]

Plot an errorbar spanning the central 68% CI of a positivity observable as well as a point indicating the central value (according to the pdf.stats_class.central_value()).

Errorbars and points are plotted on a symlog scale as a function of the data point index (if pos_use_kin==False) or the first kinematic variable (if pos_use_kin==True).

validphys.dataplots.plot_replica_sum_rules(pdf, sum_rules, Q)[source]

Plot the value of each sum rule as a function of the replica index

validphys.dataplots.plot_smpdf(pdf, dataset, obs_pdf_correlations, mark_threshold: float = 0.9)[source]

Plot the correlations between the change in the observable and the change in the PDF in (x,fl) space.

mark_threshold is the proportion of the maximum absolute correlation that will be used to mark the corresponding area in x in the background of the plot. The maximum absolute values are used for the comparison.

Examples

>>> from validphys.api import API
>>> data_input = {
>>>    "dataset_input" : {"dataset": "HERACOMBNCEP920"},
>>>    "theoryid": 200,
>>>     "use_cuts": "internal",
>>>     "pdf": "NNPDF40_nnlo_as_01180",
>>>     "Q": 1.6,
>>>     "mark_threshold": 0.2
>>> }
>>> smpdf_gen = API.plot_smpdf(**data_input)
>>> fig = next(smpdf_gen)
>>> fig.show()
validphys.dataplots.plot_training_length(replica_data, fit)[source]

Generate an histogram for the distribution of training lengths in a given fit. Each bin is normalised by the total number of replicas.

validphys.dataplots.plot_training_validation(fit, replica_data, replica_filters=None)[source]

Scatter plot with the training and validation chi² for each replica in the fit. The mean is also displayed as well as a line y=x to easily identify whether training or validation chi² is larger.

validphys.dataplots.plot_trainvaliddist(fit, replica_data)[source]

KDEs for the trainning and validation distributions for each replica in the fit.

validphys.dataplots.plot_xq2(dataset_inputs_by_groups_xq2map, use_cuts, data_input, display_cuts: bool = True, marker_by: str = 'process type', highlight_label: str = 'highlight', highlight_datasets: (<class 'collections.abc.Sequence'>, <class 'NoneType'>) = None, aspect: str = 'landscape')[source]

Plot the (x,Q²) coverage based of the data based on some LO approximations. These are governed by the relevant kintransform.

The representation of the filtered data depends on the display_cuts and use_cuts options:

  • If cuts are disabled (use_cuts is CutsPolicy.NOCUTS), all the data

will be plotted (and setting display_cuts to True is an error).

  • If cuts are enabled (use_cuts is either CutsPolicy.FROMFIT or

CutsPolicy.INTERNAL) and display_cuts is False, the masked points will be ignored.

  • If cuts are enabled and display_cuts is True, the filtered points

will be displaed and marked.

The points are grouped according to the marker_by option. The possible values are: “process type”, “experiment”, “group” or “dataset”.

Some datasets can be made to appear highlighted in the figure: Define a key called highlight_datasets containing the names of the datasets to be highlighted and a key highlight_label with a string containing the label of the highlight, which will appear in the legend.

Example

Obtain a plot with some reasonable defaults:

from validphys.api import API
inp = {'dataset_inputs': [{'dataset': 'NMCPD_dw'},
   {'dataset': 'NMC'},
   {'dataset': 'SLACP_dwsh'},
   {'dataset': 'SLACD_dw'},
   {'dataset': 'BCDMSP_dwsh'},
   {'dataset': 'BCDMSD_dw'},
   {'dataset': 'CHORUSNUPb_dw'},
   {'dataset': 'CHORUSNBPb_dw'},
   {'dataset': 'NTVNUDMNFe_dw', 'cfac': ['MAS']},
   {'dataset': 'NTVNBDMNFe_dw', 'cfac': ['MAS']},
   {'dataset': 'HERACOMBNCEM'},
   {'dataset': 'HERACOMBNCEP460'},
   {'dataset': 'HERACOMBNCEP575'},
   {'dataset': 'HERACOMBNCEP820'},
   {'dataset': 'HERACOMBNCEP920'},
   {'dataset': 'HERACOMBCCEM'},
   {'dataset': 'HERACOMBCCEP'},
   {'dataset': 'HERACOMB_SIGMARED_C'},
   {'dataset': 'HERACOMB_SIGMARED_B'},
   {'dataset': 'DYE886R_dw'},
   {'dataset': 'DYE886P', 'cfac': ['QCD']},
   {'dataset': 'DYE605_dw', 'cfac': ['QCD']},
   {'dataset': 'CDFZRAP_NEW', 'cfac': ['QCD']},
   {'dataset': 'D0ZRAP', 'cfac': ['QCD']},
   {'dataset': 'D0WMASY', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP36PB', 'cfac': ['QCD']},
   {'dataset': 'ATLASZHIGHMASS49FB', 'cfac': ['QCD']},
   {'dataset': 'ATLASLOMASSDY11EXT', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP11CC', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP11CF', 'cfac': ['QCD']},
   {'dataset': 'ATLASDY2D8TEV', 'cfac': ['QCDEWK']},
   {'dataset': 'ATLAS_WZ_TOT_13TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'ATLAS_WP_JET_8TEV_PT', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_WM_JET_8TEV_PT', 'cfac': ['QCD']},
   {'dataset': 'ATLASZPT8TEVMDIST', 'cfac': ['QCD'], 'sys': 10},
   {'dataset': 'ATLASZPT8TEVYDIST', 'cfac': ['QCD'], 'sys': 10},
   {'dataset': 'ATLASTTBARTOT', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TOPDIFF_DILEPT_8TEV_TTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_1JET_8TEV_R06_DEC', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_2JET_7TEV_R06', 'cfac': ['QCD']},
   {'dataset': 'ATLASPHT15', 'cfac': ['QCD', 'EWK']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_R_7TEV', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_T_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_TBAR_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_T_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_TBAR_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'CMSWEASY840PB', 'cfac': ['QCD']},
   {'dataset': 'CMSWMASY47FB', 'cfac': ['QCD']},
   {'dataset': 'CMSDY2D11', 'cfac': ['QCD']},
   {'dataset': 'CMSWMU8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMSZDIFF12', 'cfac': ['QCD', 'NRM'], 'sys': 10},
   {'dataset': 'CMS_2JET_7TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_2JET_3D_8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMSTTBARTOT', 'cfac': ['QCD']},
   {'dataset': 'CMSTOPDIFF8TEVTTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'CMSTTBARTOT5TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTBAR_2D_DIFF_MTT_TRAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTB_DIFF_13TEV_2016_2L_TRAP', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTB_DIFF_13TEV_2016_LJ_TRAP', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_TOT_7TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_R_8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']},
   {'dataset': 'LHCBZ940PB', 'cfac': ['QCD']},
   {'dataset': 'LHCBZEE2FB', 'cfac': ['QCD']},
   {'dataset': 'LHCBWZMU7TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'LHCBWZMU8TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'LHCB_Z_13TEV_DIMUON', 'cfac': ['QCD']},
   {'dataset': 'LHCB_Z_13TEV_DIELECTRON', 'cfac': ['QCD']}],
  'use_cuts': 'internal',
  'display_cuts': False,
  'theoryid': 162,
  'highlight_label': 'Old',
  'highlight_datasets': ['NMC', 'CHORUSNUPb_dw', 'CHORUSNBPb_dw']}
API.plot_xq2(**inp)

validphys.deltachi2 module

deltachi2.py

Plots and data processing that can be used in a delta chi2 analysis

class validphys.deltachi2.PDFEpsilonPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Subclassing PDFPlotter in order to plot epsilon (measure of gaussanity) for multiple PDFs, yielding a separate figure for each flavour

draw(pdf, grid, flstate)[source]

Obtains the gridvalues of epsilon (measure of Gaussianity)

get_ylabel(parton_name)[source]
legend(flstate)[source]
setup_flavour(flstate)[source]
validphys.deltachi2.check_pdf_is_symmhessian(pdf, **kwargs)[source]

Check pdf has error type of symmhessian

validphys.deltachi2.check_pdfs_are_montecarlo(pdfs, **kwargs)[source]

Checks that the action is applied only to a pdf consisiting of MC replicas.

validphys.deltachi2.delta_chi2_hessian(pdf, total_chi2_data)[source]

Return delta_chi2 (computed as in plot_delta_chi2_hessian) relative to each eigenvector of the Hessian set.

validphys.deltachi2.plot_delta_chi2_hessian_distribution(delta_chi2_hessian, pdf, total_chi2_data)[source]

Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_delta_chi2_hessian_eigenv(delta_chi2_hessian, pdf)[source]

Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_epsilon(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, eps=None)[source]

Plot the discrepancy (epsilon) of the 1-sigma and 68% bands at each grid value for all pdfs for a given Q. See https://arxiv.org/abs/1505.06736 eq. (11)

xscale is read from pdf plotting_grid scale, which is ‘log’ by default.

eps defines the value at which plot a simple hline

validphys.deltachi2.plot_kullback_leibler(delta_chi2_hessian)[source]

Determines the Kullback–Leibler divergence by comparing the expectation value of Delta chi2 to the cumulative distribution function of chi-square distribution with one degree of freedom (see: https://en.wikipedia.org/wiki/Chi-square_distribution).

The Kullback-Leibler divergence provides a measure of the difference between two distribution functions, here we compare the chi-squared distribution and the cumulative distribution of the expectation value of Delta chi2.

validphys.deltachi2.plot_pos_neg_pdfs(pdf, pos_neg_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None)[source]

Plot the the uncertainty of the original hessian pdfs, as well as that of the positive and negative subset.

validphys.deltachi2.pos_neg_xplotting_grids(delta_chi2_hessian, xplotting_grid)[source]

Generates xplotting_grids correspodning to positive and negative delta chi2s.

validphys.eff_exponents module

Tools for computing and plotting effective exponents.

class validphys.eff_exponents.ExponentBandPlotter(hlines, exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter, PreprocessingPlotter

draw(pdf, grid, flstate)[source]

Overload BandPDFPlotter.draw() to plot bands of the effective exponent calculated from the replicas and horizontal lines for the effective exponents of the previous/next fits, if possible.

flstate is an element of the flavours for the first pdf specified in pdfs. If this flavour doesn’t exist in the current pdf’s fitbasis or the set of flavours for which the preprocessing exponents exist for the current pdf no horizontal lines are plotted.

class validphys.eff_exponents.PreprocessingPlotter(exponent, *args, **kwargs)[source]

Bases: PDFPlotter

Class inherenting from BandPDFPlotter, changing title and ylabel to reflect the effective exponent being plotted.

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
validphys.eff_exponents.alpha_eff(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent alpha at the specified values of x and flavour. alpha is relevant at small x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.beta_eff(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent beta at the specified values of x and flavour. beta is relevant at large x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.effective_exponents_table_internal(next_effective_exponents_table, *, fit=None, basis)[source]

Returns a table which concatenates previous_effective_exponents_table and next_effective_exponents_table if both tables contain effective exponents in the same basis.

If the previous exponents are in a different basis, or no fit was given to read the previous exponents from, then only the next exponents table is returned, for plotting purposes.

validphys.eff_exponents.fmt(a)
validphys.eff_exponents.get_alpha_lines(effective_exponents_table_internal)[source]

Given an effective_exponents_table_internal returns the rows with bounds of the alpha effective exponent for all flavours, used to plot horizontal lines on the alpha effective exponent plots.

validphys.eff_exponents.get_beta_lines(effective_exponents_table_internal)[source]

Same as get_alpha_lines but for beta

validphys.eff_exponents.iterate_preprocessing_yaml(fit, next_fit_eff_exps_table, _flmap_np_clip_arg=None)[source]

Using py:func:next_effective_exponents_table update the preprocessing exponents of the input fit. This is part of the usual pipeline referred to as “iterating a fit”, for more information see: How to run an iterated fit. A fully iterated runcard can be obtained from the action iterated_runcard_yaml().

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterate_preprocessing_yaml@} `

Alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterate_preprocessing_yaml(fit=<fit name>)
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)
Parameters:
  • fit (validphys.core.FitSpec) – Whose preprocessing range will be iterated, the output runcard will be the same as the one used to run this fit, except with new preprocessing range.

  • next_fit_eff_exps_table (pd.DataFrame) – Table outputted by next_fit_eff_exps_table() containing the next preprocessing ranges.

  • _flmap_np_clip_arg (dict) – Internal argument used by vp-nextfitruncard. Dictionary containing a mapping like {<flavour>: {<largex/smallx>: {a_min: <min value>, a_max: <max value>}}}. If a flavour is present in _flmap_np_clip_arg then the preprocessing ranges will be passed through np.clip with the arguments supplied in the mapping.

validphys.eff_exponents.iterated_runcard_yaml(fit, update_runcard_description_yaml)[source]

Takes the runcard with preprocessing iterated and description updated then

  • Updates the t0 pdf set to be fit

  • Modifies the random seeds (to random unsigned long ints)

This should facilitate running a new fit with identical input settings as the specified fit with the t0, seeds and preprocessing iterated. For more information see: How to run an iterated fit

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterated_runcard_yaml@} `

alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterated_runcard_yaml(
...     fit=<fit name>,
...     _updated_description="My iterated fit"
... )
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)
validphys.eff_exponents.next_effective_exponents_table(pdf: ~validphys.core.PDF, *, fitq0fromfit: (<class 'numbers.Real'>, <class 'NoneType'>) = None, x1_alpha: ~numbers.Real = 1e-06, x2_alpha: ~numbers.Real = 0.001, x1_beta: ~numbers.Real = 0.65, x2_beta: ~numbers.Real = 0.95, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Given a PDF, calculate the next effective exponents

By default x1_alpha = 1e-6, x2_alpha = 1e-3, x1_beta = 0.65, and x2_beta = 0.95, but different values can be specified in the runcard. The values control where the bounds of alpha and beta are evaluated:

alpha_min:

singlet/gluon: the 2x68% c.l. lower value evaluated at x=`x1_alpha` others : min(2x68% c.l. lower value evaluated at x=`x1_alpha` and x=`x2_alpha`)

alpha_max:

singlet/gluon: min(2 and the 2x68% c.l. upper value evaluated at x=`x1_alpha`) others : min(2 and max(2x68% c.l. upper value evaluated at x=`x1_alpha`

and x=`x2_alpha`))

beta_min:

max(0 and min(2x68% c.l. lower value evaluated at x=`x1_beta` and x=`x2_beta`))

beta_max:

max(2x68% c.l. upper value evaluated at x=`x1_beta` and x=`x2_beta`)

validphys.eff_exponents.plot_alpha_eff(fits_pdf, alpha_eff_fits, fits_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

validphys.eff_exponents.plot_alpha_eff_internal(pdfs, alpha_eff_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

validphys.eff_exponents.plot_beta_eff(fits_pdf, beta_eff_fits, fits_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Same as plot_alpha_eff but for beta effective exponents

validphys.eff_exponents.plot_beta_eff_internal(pdfs, beta_eff_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Same as plot_alpha_eff_internal but for beta effective exponent

validphys.eff_exponents.previous_effective_exponents(basis: str, fit: (<class 'validphys.core.FitSpec'>, <class 'NoneType'>) = None)[source]

If provided with a fit, check that the basis is the basis which was fitted if so then return the previous effective exponents read from the fit runcard.

validphys.eff_exponents.previous_effective_exponents_table(fit: FitSpec)[source]

Given a fit, reads the previous exponents from the fit runcard

validphys.eff_exponents.update_runcard_description_yaml(iterate_preprocessing_yaml, _updated_description=None)[source]

Take the runcard with iterated preprocessing and update the description if _updated_description is provided. As with iterate_preprocessing_yaml() the result can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@update_runcard_description_yaml@} `

validphys.filters module

Filters for NNPDF fits

exception validphys.filters.BadPerturbativeOrder[source]

Bases: ValueError

Exception raised when the perturbative order string is not recognized.

exception validphys.filters.FatalRuleError[source]

Bases: Exception

Exception raised when a rule application failed at runtime.

exception validphys.filters.MissingRuleAttribute[source]

Bases: RuleProcessingError, AttributeError

Exception raised when a rule is missing required attributes.

class validphys.filters.PerturbativeOrder(string)[source]

Bases: object

Class that conveniently handles perturbative order declarations for use within the Rule class filter.

Parameters:

string (str) –

A string in the format of NNLO or equivalently N2LO. This can be followed by one of ! + - or none.

The syntax allows for rules to be executed only if the perturbative order is within a given range. The following enumerates all 4 cases as an example:

NNLO+ only execute the following rule if the pto is 2 or greater NNLO- only execute the following rule if the pto is strictly less than 2 NNLO! only execute the following rule if the pto is strictly not 2 NNLO only execute the following rule if the pto is exactly 2

Any unrecognized string will raise a BadPerturbativeOrder exception.

Example

>>> from validphys.filters import PerturbativeOrder
>>> pto = PerturbativeOrder("NNLO+")
>>> pto.numeric_pto
2
>>> 1 in pto
False
>>> 2 in pto
True
>>> 3 in pto
True
parse()[source]
class validphys.filters.Rule(initial_data: dict, *, defaults: dict, theory_parameters: dict, loader=None)[source]

Bases: object

Rule object to be used to generate cuts mask.

A rule object is created for each rule in ./cuts/filters.yaml

Parameters:
  • initial_data (dict) –

    A dictionary containing all the information regarding the rule. This contains the name of the dataset the rule to applies to and/or the process type the rule applies to. Additionally, the rule itself is defined, alongside the reason the rule is used. Finally, the user can optionally define their own custom local variables.

    By default these are defined in cuts/filters.yaml

  • defaults (dict) –

    A dictionary containing default values to be used globally in all rules.

    By default these are defined in cuts/defaults.yaml

  • theory_parameters – Dict containing pairs of (theory_parameter, value)

  • loader (validphys.loader.Loader, optional) – A loader instance used to retrieve the datasets.

numpy_functions = {'fabs': <ufunc 'fabs'>, 'log': <ufunc 'log'>, 'sqrt': <ufunc 'sqrt'>}
exception validphys.filters.RuleProcessingError[source]

Bases: Exception

Exception raised when we couldn’t process a rule.

validphys.filters.check_additional_errors(additional_errors)[source]

Lux additional errors pdf check

validphys.filters.check_integrability(integdatasets)[source]

Verify positive datasets are ready for the fit.

validphys.filters.check_luxset(luxset)[source]

Lux pdf check

validphys.filters.check_nonnegative(var: str)[source]

Ensure that var is positive

validphys.filters.check_positivity(posdatasets)[source]

Verify positive datasets are ready for the fit.

validphys.filters.check_t0pdfset(t0pdfset)[source]

T0 pdf check

validphys.filters.default_filter_rules_input()[source]

Return a dictionary with the input settings. These are defined in filters.yaml in the validphys.cuts module.

validphys.filters.default_filter_settings_input()[source]

Return a dictionary with the default hardcoded filter settings. These are defined in defaults.yaml in the validphys.cuts module.

validphys.filters.export_mask(path, mask)[source]

Dump mask to file

validphys.filters.filter(filter_data)[source]

Summarise filters applied to all datasets

validphys.filters.filter_closure_data(filter_path, data, fakepdf, fakenoise, filterseed, sep_mult)[source]

Filter closure data. In addition to cutting data points, the data is generated from an underlying fakepdf, applying a shift to the data if fakenoise is True, which emulates the experimental central values being shifted away from the underlying law.

validphys.filters.filter_closure_data_by_experiment(filter_path, experiments_data, fakepdf, fakenoise, filterseed, data_index, sep_mult)[source]

Like filter_closure_data() except filters data by experiment.

This function just peforms a for loop over experiments, the reason we don’t use reportengine.collect is that it can permute the order in which closure data is generate, which means that the pseudodata is not reproducible.

validphys.filters.filter_real_data(filter_path, data)[source]

Filter real data, cutting any points which do not pass the filter rules.

validphys.filters.get_cuts_for_dataset(commondata, rules) list[source]

Function to generate a list containing the index of all experimental points that passed kinematic cut rules stored in ./cuts/filters.yaml

Parameters:
Returns:

mask – List object containing index of all passed experimental values

Return type:

list

Example

>>> from validphys.filters import (get_cuts_for_dataset, Rule,
...     default_filter_settings, default_filter_rules_input)
>>> from validphys.loader import Loader
>>> l = Loader()
>>> cd = l.check_commondata("NMC")
>>> theory = l.check_theoryID(53)
>>> filter_defaults = default_filter_settings()
>>> params = theory.get_description()
>>> rule_list = [Rule(initial_data=i, defaults=filter_defaults, theory_parameters=params)
...     for i in default_filter_rules_input()]
>>> get_cuts_for_dataset(cd, rules=rule_list)
validphys.filters.make_dataset_dir(path)[source]

Creates directory at path location.

validphys.fitdata module

Utilities for loading data from fit folders

class validphys.fitdata.DatasetComp(common, first_only, second_only)

Bases: tuple

common

Alias for field number 0

first_only

Alias for field number 1

second_only

Alias for field number 2

class validphys.fitdata.FitInfo(nite, training, validation, chi2, is_positive, arclengths, integnumbers)

Bases: tuple

arclengths

Alias for field number 5

chi2

Alias for field number 3

integnumbers

Alias for field number 6

is_positive

Alias for field number 4

nite

Alias for field number 0

training

Alias for field number 1

validation

Alias for field number 2

validphys.fitdata.check_lhapdf_info(results_dir, fitname)[source]

Check that an LHAPDF info metadata file is present in the fit results

validphys.fitdata.check_nnfit_results_path(path)[source]

Returns True if the requested path is a valid results directory, i.e if it is a directory and has a ‘nnfit’ subdirectory

validphys.fitdata.check_replica_files(replica_path, prefix)[source]

Verification of a replica results directory at replica_path for a fit named prefix. Returns True if the results directory is complete

validphys.fitdata.datasets_properties_table(data_input)[source]

Return dataset properties for each dataset in data_input

validphys.fitdata.fit_code_version(fit)[source]

Returns table with the code version from replica_1/{fitname}.json files. Note that the version for thensorflow distinguishes between the mkl=on and off version

validphys.fitdata.fit_datasets_properties_table(fitinputcontext)[source]

Returns table of dataset properties for each dataset used in a fit.

validphys.fitdata.fit_summary(fit_name_with_covmat_label, replica_data, total_chi2_data, total_phi_data)[source]

Summary table of fit properties - Central chi-squared - Average chi-squared - Training and Validation error functions - Training lengths - Phi

Note: Chi-squared values from the replica_data are not used here (presumably they are fixed to being t0)

This uses a corrected form for the error on phi in comparison to the vp1 value. The error is propagated from the uncertainty on the average chi-squared only.

validphys.fitdata.fit_theory_covmat_summary(fit, fitthcovmat)[source]

returns a table with a single column for the fit, with three rows indicating if the theory covariance matrix was used in the ‘sampling’ of the pseudodata, the ‘fitting’, and the ‘validphys statistical estimators’ in the current namespace for that fit.

validphys.fitdata.fits_replica_data_correlated(fits_replica_data, fits_replica_indexes, fits)[source]

Return a table with the same columns as replica_data indexed by the replica fit ID. For identical fits, the values across rows should be the same.

If some replica ID is not present for a given fit (e.g. discarded by postfit), the corresponding entries in the table will be null.

validphys.fitdata.fits_version_table(fits_fit_code_version)[source]

Produces a table of version information for multiple fits.

validphys.fitdata.fitted_replica_indexes(pdf)[source]

Return nnfit index of replicas 1 to N.

validphys.fitdata.load_fitinfo(replica_path, prefix)[source]

Process the data in the .json. file for a single replica into a FitInfo object. If the .json file does not exist an old-format fit is assumed and old_load_fitinfo will be called instead.

validphys.fitdata.match_datasets_by_name(fits, fits_datasets)[source]

Return a tuple with common, first_only and second_only. The elements of the tuple are mappings where the keys are dataset names and the values are the two datasets contained in each fit for common, and the corresponfing dataset inclucded only in the first fit and only in the second fit.

validphys.fitdata.num_fitted_replicas(fit)[source]

Function to obtain the number of nnfit replicas. That is the number of replicas before postfit was run.

validphys.fitdata.print_dataset_differences(fits, match_datasets_by_name, print_common: bool = True)[source]

Given exactly two fits, print the datasets that are included in one ” “but not in the other. If print_common is True, also print the datasets that are common.

For the purposes of visual aid, everything is ordered by the dataset name, in terms of the the convention for the commondata means that everything is order by:

  1. Experiment name

  2. Process

  3. Energy

validphys.fitdata.print_different_cuts(fits, test_for_same_cuts)[source]

Print a summary of the datasets that are included in both fits but have different cuts.

validphys.fitdata.print_systype_overlap(groups_commondata, group_dataset_inputs_by_metadata)[source]

Returns a set of systypes that overlap between groups. Discards the set of systypes which overlap but do not imply correlations

validphys.fitdata.replica_data(fit, replica_paths)[source]

Load the necessary data from the .json file of each of the replicas. The corresponding PDF set must be installed in the LHAPDF path.

The included information is:

(‘nite’, ‘training’, ‘validation’, ‘chi2’, ‘pos_status’, ‘arclenghts’)

validphys.fitdata.replica_paths(fit)[source]

Return the paths of all the replicas

validphys.fitdata.summarise_fits(collected_fit_summaries)[source]

Produces a table of basic comparisons between fits, includes all the fields used in fit_summary

validphys.fitdata.summarise_theory_covmat_fits(fits_theory_covmat_summary)[source]

Collects the theory covmat summary for all fits and concatenates them into a single table

validphys.fitdata.t0_chi2_info_table(pdf, dataset_inputs_abs_chi2_data, t0pdfset, use_t0)[source]

Provides table with - t0pdfset name - Central t0-chi-squared - Average t0-chi-squared

validphys.fitdata.test_for_same_cuts(fits, match_datasets_by_name)[source]

Given two fits, return a list of tuples (first, second) where first and second are DatasetSpecs that correspond to the same dataset but have different cuts, such that first is included in the first fit and second in the second.

validphys.fitveto module

fitveto.py

Module for the determination of passing fit replicas.

Current active vetoes:

Positivity - Replicas with FitInfo.is_positive == False ChiSquared - Replicas with ChiSquared > nsigma_discard_chi2*StandardDev + Average ArclengthX - Replicas with ArcLengthX > nsigma_discard_arclength*StandardDev + Average Integrability - Replicas with IntegrabilityNumbers < integ_threshold

validphys.fitveto.determine_vetoes(fitinfos: list, nsigma_discard_chi2: float, nsigma_discard_arclength: float, integ_threshold: float)[source]

Assesses whether replica fitinfo passes standard NNPDF vetoes Returns a dictionary of vetoes and their passing boolean masks. Included in the dictionary is a ‘Total’ veto.

validphys.fitveto.distribution_veto(dist, prior_mask, nsigma_threshold)[source]

For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy:

value <= mean + nsigma_threshold*standard_deviation

Only points passing the prior_mask are considered in the average or standard deviation.

validphys.fitveto.integrability_veto(dist, integ_threshold)[source]

For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy: value <= integ_threshold

validphys.fitveto.save_vetoes_info(veto_dict: dict, chi2_threshold, arclength_threshold, integ_threshold, filepath)[source]

Saves to file the chi2 and arclength thresholds used by postfit as well as veto dictionaries which contain information on which replicas pass each veto.

validphys.fkparser module

This module implements parsers for FKtable and CFactor files into useful datastructures, contained in the validphys.coredata module, which can be easily pickled and interfaced with common Python libraries.

Most users will be interested in using the high level interface load_fktable(). Given a validphys.core.FKTableSpec object, it returns an instance of validphys.coredata.FKTableData, an object with the required information to compute a convolution, with the CFactors applied.

from validphys.fkparser import load_fktable
from validphys.loader import Loader
l = Loader()
fk = l.check_fktable(setname="ATLASTTBARTOT", theoryID=53, cfac=('QCD',))
res = load_fktable(fk)
exception validphys.fkparser.BadCFactorError[source]

Bases: Exception

Exception raised when an CFactor cannot be parsed correctly

exception validphys.fkparser.BadFKTableError[source]

Bases: Exception

Exception raised when an FKTable cannot be parsed correctly

class validphys.fkparser.GridInfo(setname: str, hadronic: bool, ndata: int, nx: int)[source]

Bases: object

Class containing the basic properties of an FKTable grid.

hadronic: bool
ndata: int
nx: int
setname: str
validphys.fkparser.load_fktable(spec)[source]

Load the data corresponding to a FKSpec object. The cfactors will be applied to the grid. If we have a new-type fktable, call directly load(), otherwise fallback to the old parser

validphys.fkparser.open_fkpath(path)[source]

Return a file-like object from the fktable path, regardless of whether it is compressed

Parameters

path: Path or str

Path like file containing a valid FKTable. It can be either inside a tarball or in plain text.

returns:

f – A file like object for further processing.

rtype:

file

validphys.fkparser.parse_cfactor(f)[source]

Parse an open byte stream into a :py:class`CFactorData`. Raise a BadCFactorError if problems are encountered.

Parameters:

f (file) – Binary file-like object

Returns:

cfac – An object containing the data on the cfactor for each point.

Return type:

CFactorData

validphys.fkparser.parse_fktable(f)[source]

Parse an open byte stream into an FKTableData. Raise a BadFKTableError if problems are encountered.

Parameters:

f (file) – Open file-like object. See :func:`open_fkpath`to obtain it.

Returns:

fktable – An object containing the FKTable data and information.

Return type:

FKTableData

Notes

This function operates at the level of a single file, and therefore it does not apply CFactors (see load_fktable() for that) or handle operations within COMPOUND ensembles.

validphys.gridvalues module

gridvalues.py

Core functionality needed to obtain a set of values from LHAPDF. The tools for representing these grids are in pdfgrids.py (the validphys provider module), and the basis transformations are in pdfbases.py

validphys.gridvalues.central_grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Same as grid_values() but it returns only the central values. The return value is indexed as:

grid_values[replica][flavour][x][Q]

where the first dimension (coresponding to the central member of the PDF set) is always one.

validphys.gridvalues.evaluate_luminosity(pdf_set: LHAPDFSet, n: int, s: float, mx: float, x1: float, x2: float, channel)[source]

Returns PDF luminosity at specified values of mx, x1, x2, sqrts**2 for a given channel.

pdf_set: The interested PDF set s: The square of the center of mass energy GeV^2. mx: The invariant mass bin GeV. x1 and x2: The partonic x1 and x2. channel: The channel tag name from LUMI_CHANNELS.

validphys.gridvalues.grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Evaluate x*f(x) on a grid of points in flavour, x and Q.

Parameters:
  • pdf (PDF) – Any PDF set

  • flmat (iterable) – A list of PDG IDs corresponding the the LHAPDF flavours in the grid.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

Returns:

  • A 4-dimension array with the PDF values at the input parameters

  • for each replica. The return value is indexed as follows:: – grid_values[replica][flavour][x][Q]

See also

validphys.pdfbases.Basis.grid_values(), interface, allowing, and, aliases

Examples

Compute the maximum difference across replicas between the u and ubar PDFs (times x) for x=0.05 and both Q=10 and Q=100:

>>> from validphys.loader import Loader
>>> from validphys.gridvalues import grid_values
>>> import numpy as np
>>> gv = grid_values(Loader().check_pdf('NNPDF31_nnlo_as_0118'), [-1, 1], [0.5], [10, 100])
>>> #Take the difference across the flavour dimension, the max
>>> #across the replica dimension, and leave the Q dimension untouched.
>>> np.diff(gv, axis=1).max(axis=0).ravel()
array([0.07904731, 0.04989902], dtype=float32)

validphys.hyper_algorithm module

This module contains functions dedicated to process the json dictionaries

validphys.hyper_algorithm.autofilter_dataframe(dataframe, keys, n_to_combine=1, n_to_kill=1, threshold=-1)[source]

Receives a dataframe and a list of keys. Creates combinations of n_to_combine keys and computes the reward Finally removes from the dataframe the n_to_kill worse combinations

Anything under threshold will be removed and will not count towards the n_to_kill (by default threshold = -50 so only things which are really bad will be removed)

# Arguments:
  • dataframe: a pandas dataframe

  • keys: keys to combine

  • n_to_combine: how many keys do we want to combine

  • n_to_kill: how many combinations to kill

  • threshold: anything under this reward will be removed

# Returns:
  • dataframe_sliced: a slice of the dataframe with the weakest combinations

    removed

validphys.hyper_algorithm.bin_generator(df_values, max_n=10)[source]

Receives a dataframe with a list of unique values . If there are more than max_n of them and they are numeric, create max_n bins. If they are already discrete values or there are less than max_n options, output the same input

# Arguments:
  • df_values: dataframe with unique values

  • maximum: maximum number of allowed different values

# Returns:
  • new_vals: list of tuples with (initial, end) value of the bin

validphys.hyper_algorithm.compute_reward(mdict, biggest_ntotal)[source]

Given a combination dictionary computes the reward function:

If the fail rate for this combination is above the fail threshold, rewards is -100

The formula below for the reward takes into account:
  • The rate of ok fits that have a loss below the loss_threshold

  • The rate of fits that failed

  • The std deviation

  • How far away is the median from the best loss

  • How far away are median and average

validphys.hyper_algorithm.dataframe_removal(dataframe, hit_list)[source]

Removes all combinations defined in hit_list from the dataframe. The hit list is list of dictionaries containing the ‘slice’ key where ‘slice’ must be a slice of ‘dataframe’

# Arguments:
  • dataframe: a pandas dataframe

  • hit_list: the list of element to remove

# Returns:
  • new_dataframe: the same dataframe with all elements from hit_list removed

validphys.hyper_algorithm.get_combinations(key_info, ncomb)[source]

Given a dictionary mapping keys to iterables of possible values (key_info), return a list of the product of all possible mappings of a subset of ncomb keys to single values out of the corresponding possible values, for all such subsets.

For instance, key_info = {

‘key1’ : [val1-1, val1-2, …], ‘key2’ : [val2-1, val2-2, …], }

ncomb = 2

will return a list of dictionaries: [ {‘key1’ : val1-1, ‘key2’, val2-1 … }, {‘key1’ : val1-1, ‘key2’, val2-2 … }, {‘key1’ : val1-2, ‘key2’, val2-1 … }, {‘key1’ : val1-2, ‘key2’, val2-2 … }, ]

Get all combinations of ncomb elements for the keys and values given in the dictionary key_info:

# Arguments:
  • key_info: dictionary with the possible values for each key

  • ncomb: elements to combine

# Returns:
  • all_combinations: A list of dictionaries of parameters

validphys.hyper_algorithm.get_slice(dataframe, query_dict)[source]

Returns a slice of the dataframe where some keys match some values keys_info must be a dictionary {key1 : value1, key2, value2 …} # Arguments:

  • dataframe: a pandas dataframe

  • query_dict: a dictionary of combination as given by get_combinations

validphys.hyper_algorithm.parse_keys(dataframe, keys)[source]

Receives a dataframe and a set of keys Looks into the dataframe to read the possible values of the keys

Returns a dictionary { ‘key’ : [possible values] },

If the values are not discrete then we need to bin it let’s do this for anything with two many numerical values

# Arguments:
  • dataframe: a pandas dataframe

  • keys: keys to combine

# Returns:
  • key_info: a dictionary with the possible values for each key

validphys.hyper_algorithm.process_slice(df_slice)[source]

Function to process a slice into a dictionary with useful stats If the slice is None it means the combination does not apply

# Arguments:
  • df_slice: a slice of a pandas dataframe

# Returns:
  • proc_dict: a dictionary of stats

validphys.hyper_algorithm.study_combination(dataframe, query_dict)[source]

Given a dataframe and a dictionary of {key1 : value1, key2: value2} returns a dictionary with a number of stats for that combination

# Arguments:
  • dataframe: a pandas dataframe

  • query_dict: a dictionary for a combination as given by get_combinations

# Returns:
  • proc_dict: a dictionary of the “statistics” for this combination

validphys.hyperoptplot module

Module for the parsing and plotting of the results and output of previous hyperparameter scans

class validphys.hyperoptplot.HyperoptTrial(trial_dict, base_params=None, minimum_losses=1, linked_trials=None)[source]

Bases: object

Hyperopt trial class. Makes the dictionary-like output of hyperopt into an object that can be easily managed

Parameters:
  • trial_dict (dict) – one single result (a dictionary) from a tries.json file

  • base_params (dict) – Base parameters of the runcard which can be used to complete the hyperparameter dictionary when not all parameters were scanned

  • minimum_losses (int) – Minimum number of losses to be found in the trial for it to be considered succesful

  • linked_trials (list) – List of trials coming from the same file as this trial

get(item, default=None)[source]

Link a list of trials to this trial

property loss

Return the loss of the hyperopt dict

property params

Parameters for the fit

property reward

Return and cache the reward value

property weighted_reward

Return the reward weighted to the mean value of the linked trials

validphys.hyperoptplot.best_setup(hyperopt_dataframe, hyperscan_config, commandline_args)[source]

Generates a clean table with information on the hyperparameter settings of the best setup.

validphys.hyperoptplot.evaluate_trial(trial_dict, validation_multiplier, fail_threshold, loss_target)[source]

Read a trial dictionary and compute the true loss and decide whether the run passes or not

validphys.hyperoptplot.filter_by_string(filter_string)[source]

Receives a data_dict (a parsed trial) and a filter string, returns True if the trial passes the filter

filter string must have the format: key<operator>string where <operator> can be any of !=, =, >, <

# Arguments:
  • filter_string: the expresion to evaluate

# Returns:
  • filter_function: a function that takes a data_dict and

    returns true if the condition in filter_string passes

validphys.hyperoptplot.generate_dictionary(replica_path, loss_target, json_name='tries.json', starting_index=0, val_multiplier=0.5, fail_threshold=10.0)[source]

Reads a json file and returns a list of dictionaries

# Arguments:
  • replica_path: folder in which the tries.json file can be found

  • starting_index: if the trials are to be added to an already existing

    set, make sure the id has the correct index!

  • val_multiplier: validation multipler

  • fail_threhsold: threshold for the loss to consider a configuration as a failure

validphys.hyperoptplot.hyperopt_dataframe(commandline_args)[source]

Loads the data generated by running hyperopt and stored in json files into a dataframe, and then filters the data according to the selection criteria provided by the command line arguments. It then returns both the entire dataframe as well as a dataframe object with the hyperopt parametesr of the best setup.

validphys.hyperoptplot.hyperopt_table(hyperopt_dataframe)[source]

Generates a table containing complete information on all the tested setups that passed the filters set in the commandline arguments.

validphys.hyperoptplot.order_axis(df, bestdf, key)[source]

Helper function for ordering the axis and make sure the best is always first

validphys.hyperoptplot.parse_architecture(trial)[source]

This function parses the family of parameters which regards the architecture of the NN

number_of_layers activation_per_layer nodes_per_layer l1, l2, l3, l4… max_layers layer_type dropout initializer

validphys.hyperoptplot.parse_optimizer(trial)[source]

This function parses the parameters that affect the optimization

optimizer learning_rate (if it exists)

validphys.hyperoptplot.parse_statistics(trial)[source]

Parse the statistical information of the trial

validation loss testing loss status of the run

validphys.hyperoptplot.parse_stopping(trial)[source]

This function parses the parameters that affect the stopping

epochs stopping_patience pos_initial pos_multiplier

validphys.hyperoptplot.parse_trial(trial)[source]

Trials are very convoluted object, very branched inside The goal of this function is to separate said branching so we can create hierarchies

validphys.hyperoptplot.plot_activation_per_layer(hyperopt_dataframe)[source]

Generates a violin plot of the loss per activation function.

validphys.hyperoptplot.plot_clipnorm(hyperopt_dataframe, optimizer_name)[source]

Generates a scatter plot of the loss as a function of the clipnorm for a given optimizer.

validphys.hyperoptplot.plot_epochs(hyperopt_dataframe)[source]

Generates a scatter plot of the loss as a function the number of epochs.

validphys.hyperoptplot.plot_initializer(hyperopt_dataframe)[source]

Generates a violin plot of the loss per initializer.

validphys.hyperoptplot.plot_iterations(hyperopt_dataframe)[source]

Generates a scatter plot of the loss as a function of the iteration index.

validphys.hyperoptplot.plot_learning_rate(hyperopt_dataframe, optimizer_name)[source]

Generates a scatter plot of the loss as a function of the learning rate for a given optimizer.

validphys.hyperoptplot.plot_number_of_layers(hyperopt_dataframe)[source]

Generates a violin plot of the loss as a function of the number of layers of the model.

validphys.hyperoptplot.plot_optimizers(hyperopt_dataframe)[source]

Generates a violin plot of the loss per optimizer.

validphys.hyperoptplot.plot_scans(df, best_df, plotting_parameter, include_best=True)[source]

This function performs the plotting and is called by the plot_ functions in this file.

validphys.kinematics module

Provides information on the kinematics involved in the data.

Uses the PLOTTING file specification.

class validphys.kinematics.XQ2Map(experiment, commondata, fitted, masked, group)

Bases: tuple

commondata

Alias for field number 1

experiment

Alias for field number 0

fitted

Alias for field number 2

group

Alias for field number 4

masked

Alias for field number 3

validphys.kinematics.all_commondata_grouping(all_commondata, metadata_group)[source]

Return a table with the grouping specified by metadata_group key for each dataset for all available commondata.

validphys.kinematics.all_kinlimits_table(all_kinlimits, use_kinoverride: bool = True)[source]

Return a table with the kinematic limits for the datasets given as input in dataset_inputs. If the PLOTTING overrides are not used, the information on sqrt(k2) will be displayed.

validphys.kinematics.describe_kinematics(commondata, titlelevel: int = 1)[source]

Output a markdown text describing the stored metadata for a given commondata.

titlelevel can be used to control the header level of the title.

validphys.kinematics.kinematics_table(kinematics_table_notable)[source]

Same as kinematics_table_notable but writing the table to file

validphys.kinematics.kinematics_table_notable(commondata, cuts, show_extra_labels: bool = False)[source]

Table containing the kinematics of a commondata object, indexed by their datapoint id. The kinematics will be tranfsormed as per the PLOTTING file of the dataset or process type, and the column headers will be the labels of the variables defined in the metadata.

If show_extra_labels is True then extra label defined in the PLOTTING files will be displayed. Otherwise only the original three kinematics will be shown.

validphys.kinematics.kinlimits(commondata, cuts, use_cuts, use_kinoverride: bool = True)[source]

Return a mapping containing the number of fitted and used datapoints, as well as the label, minimum and maximum value for each of the three kinematics. If use_kinoverride is set to False, the PLOTTING files will be ignored and the kinematics will be interpred based on the process type only. If use_cuts is ‘CutsPolicy.NOCUTS’, the information on the total number of points will be displayed, instead of the fitted ones.

validphys.kinematics.total_fitted_points(all_kinlimits_table) int[source]

Print the total number of fitted points in a given set of data

validphys.kinematics.xq2map_with_cuts(commondata, cuts, group_name=None)[source]

Return two (x,Q²) tuples: one for the fitted data and one for the cut data. If display_cuts is false or all data passes the cuts, the second tuple will be empty.

validphys.lhaindex module

Created on Fri Jan 23 12:11:23 2015

@author: zah

validphys.lhaindex.as_from_name(name)[source]

Annoying function needed because this is not in the info files. as(M_z) there is actually as(M_ref).

validphys.lhaindex.expand_index_names(globstr)[source]
validphys.lhaindex.expand_local_names(globstr)[source]
validphys.lhaindex.expand_names(globstr)[source]

Return names of installed PDFs. If none is found, return names from index

validphys.lhaindex.finddir(name)[source]
validphys.lhaindex.get_collaboration(name)[source]
validphys.lhaindex.get_index_path(folder=None)[source]
validphys.lhaindex.get_indexes_to_names()[source]
validphys.lhaindex.get_lha_datapath()[source]
validphys.lhaindex.get_names_to_indexes()[source]
validphys.lhaindex.get_pdf_indexes(name)[source]

Get index in the amc@nlo format

validphys.lhaindex.get_pdf_name(index)[source]
validphys.lhaindex.infofilename(name)[source]
validphys.lhaindex.isinstalled(name)[source]

Check that name exists in LHAPDF dir

validphys.lhaindex.parse_index(index_file)[source]
validphys.lhaindex.parse_info(name)[source]

validphys.lhapdf_compatibility module

Module for LHAPDF compatibility backends

If LHAPDF is installed, the module will transparently hand over everything to LHAPDF if LHAPDF is not available, it will try to use a combination of the packages

lhapdf-management and pdfflow

which cover all the features of LHAPDF used during the fit (and likely most of validphys)

validphys.lhapdf_compatibility.make_pdf(pdf_name, member=None)[source]

Load a PDF if member is given, load the single member otherwise, load the entire set as a list

if LHAPDF is provided, it returns LHAPDF PDF instances otherwise it returns and object which is _compatible_ with LHAPDF for lhapdf functions for the selected backend

Parameters:

pdf_name: str

name of the PDF to load

member: int

index of the member of the PDF to load

Returns:

list(pdf_sets)

validphys.lhapdfset module

Module containing an LHAPDF class compatible with validphys using the official lhapdf python interface.

The .members and .central_member of the LHAPDFSet are LHAPDF objects (the typical output from mkPDFs) and can be used normally.

Examples

>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> len(pdf.members)
101
>>> pdf.central_member.alphasQ(91.19)
0.11800
>>> pdf.members[0].xfxQ2(0.5, 15625)
{-5: 6.983360500601136e-05,
-4: 0.0021818063617227604,
-3: 0.00172453472243952,
-2: 0.0010906577230485718,
-1: 0.0022049272225017286,
1: 0.020051104853608722,
2: 0.0954139944889494,
3: 0.004116641378803191,
4: 0.002180124185625795,
5: 6.922722705177504e-05,
21: 0.007604124516892057}
class validphys.lhapdfset.LHAPDFSet(name, error_type)[source]

Bases: object

Wrapper for the lhapdf python interface.

Once instantiated this class will load the PDF set from LHAPDF. If it is a T0 set only the CV will be loaded.

property central_member

Returns a reference to member 0 of the PDF list

property flavors

Returns the list of accepted flavors by the LHAPDF set

grid_values(flavors: ndarray, xgrid: ndarray, qgrid: ndarray)[source]

Returns the PDF values for every member for the required flavours, points in x and pointx in q The return shape is

(members, flavors, xgrid, qgrid)

Return type:

ndarray of shape (members, flavors, xgrid, qgrid)

Examples

>>> import numpy as np
>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> xgrid = np.random.rand(10)
>>> qgrid = np.random.rand(3)
>>> flavs = np.arange(-4,4)
>>> flavs[4] = 21
>>> results = pdf.grid_values(flavs, xgrid, qgrid)
property is_t0

Check whether we are in t0 mode

property members

Return the members of the set the special error type t0 returns only member 0

property n_members

Return the number of active members in the PDF set

xfxQ(x, Q, n, fl)[source]

Return the PDF value for one single point for one single member If the flavour is not included in the PDF (for instance top/antitop) return 0.0

validphys.lhio module

A module that reads and writes LHAPDF grids.

validphys.lhio.big_matrix(gridlist)[source]

Return a properly indexes matrix of the differences between each member and the central value

validphys.lhio.generate_replica0(pdf, kin_grids=None, extra_fields=None)[source]
Generates a replica 0 as an average over an existing set of LHAPDF

replicas and outputs it to the PDF’s parent folder

Parameters:
  • pdf (validphys.core.PDF) – An existing validphys PDF object from which the average replica will be (re-)computed

  • kin_grids (Grids in (x,Q) used to print replica0 upon. If None, the grids) – of the source replicas are used.

validphys.lhio.hessian_from_lincomb(pdf, V, set_name=None, folder=None, extra_fields=None)[source]

Construct a new LHAPDF grid from a linear combination of members

validphys.lhio.load_all_replicas(pdf, db=None)[source]
validphys.lhio.load_replica(pdf, rep, kin_grids=None)[source]
validphys.lhio.new_pdf_from_indexes(pdf, indexes, set_name=None, folder=None, extra_fields=None, installgrid=False, use_rep0grid=False)[source]

Create a new PDF set from by selecting replicas from another one.

Parameters:
  • pdf (validphys.core.PDF) – An existng validphys PDF object from which the indexes will be selected.

  • indexes (Iterable[int]) – An iterable with integers corresponding to files in the LHAPDF set. Note that replica 0 will be calculated for you as the mean of the selected replicas.

  • set_name (str) – The name of the new PDF set.

  • folder (str, bytes, os.PathLike) – The path where the LHAPDF set will be written. Must exsist.

  • installgrid (bool, optional, default=``False``.) – Whether to copy the grid to the LHAPDF path.

  • use_rep0grid (bool, optional, default=``False``) – Whether to fill the original replica 0 grid when computing replica 0, instead of relying that all grids are the same and averaging the files directly. It is slower and will call LHAPDF to fill the grids, but works for sets where the replicas have different grids.

validphys.lhio.read_all_xqf(f)[source]
validphys.lhio.read_xqf_from_file(f)[source]
validphys.lhio.read_xqf_from_lhapdf(pdf, replica, kin_grids)[source]
validphys.lhio.rep_matrix(gridlist)[source]

Return a properly indexes matrix of all the members

validphys.lhio.split_sep(f)[source]
validphys.lhio.write_replica(rep, set_root, header, subgrids)[source]

validphys.loader module

Resolve paths to useful objects, and query the existence of different resources within the specified paths.

exception validphys.loader.CfactorNotFound[source]

Bases: LoadFailedError

exception validphys.loader.CompoundNotFound[source]

Bases: LoadFailedError

exception validphys.loader.CutsNotFound[source]

Bases: LoadFailedError

exception validphys.loader.DataNotFoundError[source]

Bases: LoadFailedError

exception validphys.loader.FKTableNotFound[source]

Bases: LoadFailedError

class validphys.loader.FallbackLoader(profile=None)[source]

Bases: Loader, RemoteLoader

A loader that first tries to find resources locally (calling Loader.check_*) and if it fails, it tries to download them (calling RemoteLoader.download_*).

make_checker(resource)[source]
exception validphys.loader.FitNotFound[source]

Bases: LoadFailedError

exception validphys.loader.HyperscanNotFound[source]

Bases: LoadFailedError

exception validphys.loader.InconsistentMetaDataError[source]

Bases: LoaderError

exception validphys.loader.LoadFailedError[source]

Bases: FileNotFoundError, LoaderError

class validphys.loader.Loader(profile=None)[source]

Bases: LoaderBase

Load various resources from the NNPDF data path.

property available_datasets
property available_fits
property available_hyperscans
property available_pdfs
property available_theories

Return a string token for each of the available theories

check_cfactor(theoryID, setname, cfactors)[source]
check_commondata(setname, sysnum=None, use_fitcommondata=False, fit=None)[source]
check_compound(theoryID, setname, cfac)[source]
check_dataset(name, *, rules=None, sysnum=None, theoryid, cfac=(), frac=1, cuts=CutsPolicy.INTERNAL, use_fitcommondata=False, fit=None, weight=1)[source]

Loads a given dataset If the dataset contains new-type fktables, use the pineappl loading function, otherwise fallback to legacy

check_default_filter_rules(theoryid, defaults=None)[source]
check_experiment(name: str, datasets: List[DataSetSpec]) DataGroupSpec[source]

Loader method for instantiating DataGroupSpec objects. The NNPDF::Experiment object can then be instantiated using the load method.

Parameters:
  • name (str) – A string denoting the name of the resulting DataGroupSpec object.

  • dataset (List[DataSetSpec]) – A list of DataSetSpec objects pre-created by the user. Note, these too will be loaded by Loader.

Return type:

DataGroupSpec

Example

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset("NMC", theoryid=53, cuts="internal")
>>> exp = l.check_experiment("My DataGroupSpec Name", [ds])
check_fit(fitname)[source]
check_fit_cuts(commondata, fit)[source]
check_fktable(theoryID, setname, cfac)[source]
check_fkyaml(name, theoryID, cfac)[source]

Load a pineappl fktable Receives a yaml file describing the fktables necessary for a given observable the theory ID and the corresponding cfactors. The cfactors should correspond directly to the fktables, the “compound folder” is not supported for pineappl theories. As such, the name of the cfactor is expected to be

CF_{cfactor_name}_{fktable_name}

check_hyperscan(hyperscan_name)[source]

Obtain a hyperscan run

check_integset(theoryID, setname, postlambda)[source]

Load an integrability dataset

check_internal_cuts(commondata, rules)[source]
check_pdf(name)[source]
check_posset(theoryID, setname, postlambda)[source]

Load a positivity dataset

check_theoryID(theoryID)[source]
check_vp_output_file(filename, extra_paths=('.',))[source]

Find a file in the vp-cache folder, or (with higher priority) in the extra_paths.

property commondata_folder
get_commondata(setname, sysnum)[source]

Get a Commondata from the set name and number.

get_fktable(theoryID, setname, cfac)[source]
get_pdf(name)[source]
get_posset(theoryID, setname, postlambda)[source]
property theorydb_file

Checks theory db file exists and returns path to it

class validphys.loader.LoaderBase(profile=None)[source]

Bases: object

Base class for the NNPDF loader. It can take as input a profile dictionary from which all data can be read. It is possible to override the datapath and resultpath when the class is instantiated.

property hyperscan_resultpath
exception validphys.loader.LoaderError[source]

Bases: Exception

exception validphys.loader.PDFNotFound[source]

Bases: LoadFailedError

exception validphys.loader.ProfileNotFound[source]

Bases: LoadFailedError

class validphys.loader.RemoteLoader(profile=None)[source]

Bases: LoaderBase

download_fit(fitname)[source]
download_hyperscan(hyperscan_name)[source]

Download a hyperscan run from the remote server Downloads the run to the results folder

download_pdf(name)[source]
download_theoryID(thid)[source]
download_vp_output_file(filename, **kwargs)[source]
property downloadable_fits
property downloadable_hyperscans
property downloadable_pdfs
property downloadable_theories
property fit_index
property fit_urls
property hyperscan_index
property hyperscan_url
property lhapdf_pdfs
property lhapdf_urls
property nnpdf_pdfs
property nnpdf_pdfs_index
property nnpdf_pdfs_urls
remote_files(urls, index, thing='files')[source]
property remote_fits
property remote_hyperscans
property remote_keywords
property remote_nnpdf_pdfs
property remote_theories
property theory_index
property theory_urls
exception validphys.loader.RemoteLoaderError[source]

Bases: LoaderError

exception validphys.loader.SysNotFoundError[source]

Bases: LoadFailedError

exception validphys.loader.TheoryDataBaseNotFound[source]

Bases: LoadFailedError

exception validphys.loader.TheoryNotFound[source]

Bases: LoadFailedError

validphys.loader.download_and_extract(url, local_path)[source]

Download a compressed archive and then extract it to the given path

validphys.loader.download_file(url, stream_or_path, make_parents=False)[source]

Download a file and show a progress bar if the INFO log level is enabled. If make_parents is True stream_or_path is path-like, all the parent folders will be created.

validphys.loader.rebuild_commondata_without_cuts(filename_with_cuts, cuts, datapath_filename, newpath)[source]

Take a CommonData file that is stored with the cuts applied and write another file with no cuts. The points that were not present in the original file have the same kinematics as the file in datapath_filename, which must correspond to the original CommonData file which does not have the cuts applied. However, to avoid confusion, the values and uncertainties are all set to zero. The new file is written to newpath.

validphys.mc2hessian module

mc2hessian.py

This module containts the functionality to compute reduced set using the mc2hessian algorithm (See section 2.1 of of 1602.00005).

validphys.mc2hessian.gridname(pdf, Neig, mc2hname: (<class 'str'>, <class 'NoneType'>) = None)[source]

If no custom `mc2hname’ is specified, the name of the Hessian PDF is automatically generated.

validphys.mc2hessian.mc2hessian(pdf, Q, Neig: int, mc2hessian_xgrid, output_path, gridname, installgrid: bool = False)[source]

Produces a Hessian PDF by transfroming a Monte Carlo PDF set.

Parameters:
  • pdf (validphys.core.PDF) – An existng validphys PDF object which will be converted into a Hessian PDF set

  • Q (float) – Energy scale at which the Monte Carlo PDF is sampled

  • Neig (int) – Number of basis eigenvectors in the Hessian PDF set

  • mc2hessian_xgrid (numpy.ndarray) – The points in x at which to sample the Monte Carlo PDF set

  • path (output) – The validphys output path where the PDF will be written

  • gridname (str) – Name of the Hessian PDF set

  • installgrid (bool, optional, default=``False``) – Whether to copyt the Hessian grid to the LHAPDF path

validphys.mc2hessian.mc2hessian_xgrid(xmin: float = 1e-05, xminlin: float = 0.1, xmax: Real = 1, nplog: int = 50, nplin: int = 50)[source]

Provides the points in x to sample the PDF. logspace and linspace will be called with the respsctive parameters.

Generates a grid with nplog logarithmically spaced points between xmin and xminlin followed by nplin linearly spaced points between xminlin and xmax

validphys.mc_gen module

mc_gen.py

Tools to check the pseudo-data MC generation.

validphys.mc_gen.art_data_comparison(art_rep_generation, nreplica: int)[source]

Plots per datapoint of the distribution of replica values.

validphys.mc_gen.art_data_distribution(art_rep_generation, title='Artificial Data Distribution', color='green')[source]

Plot of the distribution of pseudodata.

validphys.mc_gen.art_data_mean_table(art_rep_generation, groups_data)[source]

Generate table for artdata mean values

validphys.mc_gen.art_data_moments(art_rep_generation, color='green')[source]

Returns the moments of the distributions per data point, as a histogram.

validphys.mc_gen.art_data_residuals(art_rep_generation, color='green')[source]

Plot the residuals distribution of pseudodata compared to experiment.

validphys.mc_gen.art_rep_generation(groups_data, make_replicas)[source]

Generates the nreplica pseudodata replicas

validphys.mc_gen.one_art_data_residuals(groups_data, indexed_make_replicas)[source]

Residuals plot for the first datapoint.

validphys.n3fit_data module

n3fit_data.py

Providers which prepare the data ready for n3fit.performfit.performfit().

validphys.n3fit_data.fittable_datasets_masked(data, tr_masks)[source]

Generate a list of validphys.n3fit_data_utils.FittableDataSet from a group of dataset and the corresponding training/validation masks

validphys.n3fit_data.fitting_data_dict(data, make_replica, dataset_inputs_loaded_cd_with_cuts, dataset_inputs_fitting_covmat, tr_masks, kfold_masks, fittable_datasets_masked, diagonal_basis=None)[source]

Provider which takes the information from validphys data.

Returns:

all_dict_out – Containing all the information of the experiment/dataset for training, validation and experimental With the following keys:

’datasets’

list of dictionaries for each of the datasets contained in data

’name’

name of the data - typically experiment/group name

’expdata_true’

non-replica data

’covmat’

full covmat

’invcovmat_true’

inverse of the covmat (non-replica)

’trmask’

mask for the training data

’invcovmat’

inverse of the covmat for the training data

’ndata’

number of datapoints for the training data

’expdata’

experimental data (replica’d) for training

’vlmask’

(same as above for validation)

’invcovmat_vl’

(same as above for validation)

’ndata_vl’

(same as above for validation)

’expdata_vl’

(same as above for validation)

’positivity’

bool - is this a positivity set?

’count_chi2’

should this be counted towards the chi2

Return type:

dict

validphys.n3fit_data.integdatasets_fitting_integ_dict(integdatasets=None)[source]

Loads the integrability datasets. Calls same function as fitting_pos_dict(), except on each element of integdatasets if integdatasets is not None.

Parameters:

integdatasets (list[validphys.core.IntegrabilitySetSpec]) – list containing the settings for the integrability sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

Examples

>>> from validphys.api import API
>>> integdatasets = [{"dataset": "INTEGXT3", "maxlambda": 1e2}]
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=integdatasets, theoryid=53)
>>> len(res), len(res[0])
(1, 9)
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=None)
>>> print(res)
None
validphys.n3fit_data.kfold_masks(kpartitions, data)[source]

Collect the masks (if any) due to kfolding for this data. These will be applied to the experimental data before starting the training of each fold.

Parameters:
  • kpartitions (list[dict]) – list of partitions, each partition dictionary with key-value pair datasets and a list containing the names of all datasets in that partition. See n3fit/runcards/Basic_hyperopt.yml for an example runcard or the hyperopt documentation for an expanded discussion on k-fold partitions.

  • data (validphys.core.DataGroupSpec) – full list of data which is to be partitioned.

Returns:

kfold_masks – A list containing a boolean array for each partition. Each array is a 1-D boolean array with length equal to the number of cut datapoints in data. If a dataset is included in a particular fold then the mask will be True for the elements corresponding to those datasets such that data.load().get_cv()[kfold_masks[i]] will return the datapoints in the ith partition. See example below.

Return type:

list[np.array]

Examples

>>> from validphys.api import API
>>> partitions=[
...     {"datasets": ["HERACOMBCCEM", "HERACOMBNCEP460", "NMC", "NTVNBDMNFe"]},
...     {"datasets": ["HERACOMBCCEP", "HERACOMBNCEP575", "NMCPD", "NTVNUDMNFe"]}
... ]
>>> ds_inputs = [{"dataset": ds} for part in partitions for ds in part["datasets"]]
>>> kfold_masks = API.kfold_masks(dataset_inputs=ds_inputs, kpartitions=partitions, theoryid=53, use_cuts="nocuts")
>>> len(kfold_masks) # one element for each partition
2
>>> kfold_masks[0] # mask which splits data into first partition
array([False, False, False, ...,  True,  True,  True])
>>> data = API.data(dataset_inputs=ds_inputs, theoryid=53, use_cuts="nocuts")
>>> fold_data = data.load().get_cv()[kfold_masks[0]]
>>> len(fold_data)
604
>>> kfold_masks[0].sum()
604
validphys.n3fit_data.posdatasets_fitting_pos_dict(posdatasets=None)[source]

Loads all positivity datasets. It is not allowed to be empty.

Parameters:

integdatasets (list[validphys.core.PositivitySetSpec]) – list containing the settings for the positivity sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

validphys.n3fit_data.pseudodata_table(groups_replicas_indexed_make_replica, replicas)[source]

Creates a pandas DataFrame containing the generated pseudodata. The index is validphys.results.experiments_index() and the columns are the replica numbers.

Notes

Whilst running n3fit, this action will only be called if fitting::savepseudodata is true (as per the default setting) and replicas are fitted one at a time. The table can be found in the replica folder i.e. <fit dir>/nnfit/replica_*/

validphys.n3fit_data.replica_luxseed(replica, luxseed)[source]

Generate the luxseed for a replica. Identical to replica_nnseed but used for a different purpose.

validphys.n3fit_data.replica_mcseed(replica, mcseed, genrep)[source]

Generates the mcseed for a replica.

validphys.n3fit_data.replica_nnseed(replica, nnseed)[source]

Generates the nnseed for a replica.

validphys.n3fit_data.replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nnseed)[source]

For a single replica return a tuple of the inputs to this function. Used with collect over replicas to avoid having to perform multiple collects.

See also

replicas_nnseed_fitting_data_dict, over

validphys.n3fit_data.replica_training_mask(exps_tr_masks, replica, experiments_index)[source]

Save the boolean mask used to split data into training and validation for a given replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters:
  • exps_tr_masks (list[list[np.array]]) – Result of tr_masks() collected over experiments, which creates the nested structure. The outer list is len(group_dataset_inputs_by_experiment) and the inner-most list has an array for each dataset in that particular experiment - as defined by the metadata. The arrays should be 1-D boolean arrays which can be used as masks.

  • replica (int) – The index of the replica.

  • experiments_index (pd.MultiIndex) – Index returned by validphys.results.experiments_index().

Example

>>> from validphys.api import API
>>> ds_inp = [
...     {'dataset': 'NMC', 'frac': 0.75},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75}
... ]
>>> API.replica_training_mask(dataset_inputs=ds_inp, replica=1, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False)
                     replica 1
group dataset    id
NMC   NMC        0        True
                1        True
                2       False
                3        True
                4        True
...                        ...
CMS   CMSZDIFF12 45       True
                46       True
                47       True
                48      False
                49       True

[345 rows x 1 columns]

validphys.n3fit_data.replica_training_mask_table(replica_training_mask)[source]

Same as replica_training_mask but with a table decorator.

validphys.n3fit_data.replica_trvlseed(replica, trvlseed, same_trvl_per_replica=False)[source]

Generates the trvlseed for a replica.

validphys.n3fit_data.tr_masks(data, replica_trvlseed, parallel_models=False, replica=1, replicas=(1,))[source]

Generate the boolean masks used to split data into training and validation points. Returns a list of 1-D boolean arrays, one for each dataset. Each array has length equal to N_data, the datapoints which will be included in the training are True such that

tr_data = data[tr_mask]

validphys.n3fit_data.training_mask(replicas_training_mask)[source]

Save the boolean mask used to split data into training and validation for each replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters:

replicas_exps_tr_masks (list[list[list[np.array]]]) – Result of replica_tr_masks() collected over replicas

Example

>>> from validphys.api import API
>>> from reportengine.namespaces import NSList
>>> # create namespace list for collects over replicas.
>>> reps = NSList(list(range(1, 4)), nskey="replica")
>>> ds_inp = [
...     {'dataset': 'NMC', 'frac': 0.75},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75}
... ]
>>> API.training_mask(dataset_inputs=ds_inp, replicas=reps, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False)
                    replica 1  replica 2  replica 3
group dataset    id
NMC   NMC        0        True      False      False
                1        True       True       True
                2       False       True       True
                3        True       True      False
                4        True       True       True
...                        ...        ...        ...
CMS   CMSZDIFF12 45       True       True       True
                46       True      False       True
                47       True       True       True
                48      False       True       True
                49       True       True       True

[345 rows x 3 columns]