validphys package

Subpackages

Submodules

validphys.api module

api.py

This module contains the reportengine programmatic API, initialized with the validphys providers, Config and Environment.

Example:

Simple Usage:

>> from validphys.api import API >> fig = API.plot_pdfs(pdf=”NNPDF_nlo_as_0118”, Q=100) >> fig.show()

validphys.app module

app.py

Mainloop of the validphys application. Here we define tailoted extensions to the reporthengine application (such as extra command line flags). Additionally the provider modules that serve as source to the validphys actions are declared here.

The entry point of the validphys application is the main funcion of this module.

class validphys.app.App(name='validphys', providers=['validphys.results', 'validphys.commondata', 'validphys.pdfgrids', 'validphys.pdfplots', 'validphys.dataplots', 'validphys.fitdata', 'validphys.arclength', 'validphys.sumrules', 'validphys.reweighting', 'validphys.kinematics', 'validphys.correlations', 'validphys.eff_exponents', 'validphys.asy_exponents', 'validphys.theorycovariance.construction', 'validphys.theorycovariance.output', 'validphys.theorycovariance.tests', 'validphys.replica_selector', 'validphys.closuretest', 'validphys.mc_gen', 'validphys.theoryinfo', 'validphys.pseudodata', 'validphys.renametools', 'validphys.covmats', 'validphys.hyperoptplot', 'validphys.deltachi2', 'validphys.n3fit_data', 'validphys.mc2hessian', 'reportengine.report', 'validphys.overfit_metric', 'validphys.hessian2mc'])[source]

Bases: App

property argparser

config_class: alias of Config

critical_message = 'A critical error occurred. This is likely due to one of the following reasons:\n\n - A bug in validphys.\n - Corruption of the provided resources (e.g. incorrect plotting files).\n - Cosmic rays hitting your CPU and altering the registers.\n\nThe traceback above should help determine the cause of the problem. If you\nbelieve this is a bug in validphys (please discard the cosmic rays first),\nplease open an issue on GitHub<https://github.com/NNPDF/nnpdf/issues>,\nincluding the contents of the following file:\n\n%s\n'

property default_style

environment_class: alias of Environment

init()[source]

run()[source]: TODO

static upload_context(do_upload, output)[source]: If do_upload is False, do notihing. Otherwise, on enter, check the requiements for uploading and on exit, upload the output path if do_upload is True. Otherwise do nothing. Raise SystemExit on error.

validphys.app.main()[source]

validphys.arclength module

arclength.py

Module for the computation and presentation of arclengths.

class validphys.arclength.ArcLengthGrid(pdf, basis, flavours, stats)

Bases: tuple

basis: Alias for field number 1

flavours: Alias for field number 2

pdf: Alias for field number 0

stats: Alias for field number 3

validphys.arclength.arc_length_table(arc_lengths)[source]: Return a table with the descriptive statistics of the arc lengths over members of the PDF.

validphys.arclength.arc_lengths(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Compute arc lengths at scale Q

set up a grid with three segments and compute the arclength for each segment. Note: the variation of the PDF over the grid is computed by computing the forward differences between adjacent grid points.

Parameters

pdf (validphys.core.PDF object) –
Q (float) – scale at which to evaluate PDF
basis (default = "flavour") –
flavours (default = None) –

Returns

validphys.arclength.ArcLengthGrid object
object that contains the PDF, basis, flavours, and computed
arc length statistics.

validphys.arclength.integrability_number(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'evolution', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]: Return sum_i |x_i*f(x_i)|, x_i = {1e-9, 1e-8, 1e-7} for selected flavours

validphys.arclength.plot_arc_lengths(pdfs_arc_lengths: ~collections.abc.Sequence, Q: ~numbers.Real, normalize_to: (<class 'NoneType'>, <class 'int'>) = None)[source]: Plot the arc lengths of provided pdfs

validphys.asy_exponents module

Tools for computing and plotting asymptotic exponents.

class validphys.asy_exponents.AsyExponentBandPlotter(exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter

Class inheriting from BandPDFPlotter, changing title and ylabel to reflect the asymptotic exponent being plotted.

get_title(parton_name)[source]

get_ylabel(parton_name)[source]

validphys.asy_exponents.alpha_asy(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent alpha, as defined by the first relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.asymptotic_exponents_table(pdf: ~validphys.core.PDF, *, x_alpha: ~numbers.Real = 1e-06, x_beta: ~numbers.Real = 0.9, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, npoints=100)[source]

Returns a table with the values of the asymptotic exponents alpha and beta, as defined in Eq. (4) of [arXiv:1604.00024], at the specified value of x and Q.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.beta_asy(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent beta, as defined by the second relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.plot_alpha_asy(pdfs, alpha_asy_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]: Plots the alpha asymptotic exponent

validphys.asy_exponents.plot_beta_asy(pdfs, beta_asy_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]: Plots the beta asymptotic exponent

validphys.calcutils module

calcutils.py

Low level utilities to calculate χ² and such. These are used to implement the higher level functions in results.py

validphys.calcutils.all_chi2(results)[source]: Return the chi² for all elements in the result, regardless of the Stats class Note that the interpretation of the result will depend on the PDF error type

validphys.calcutils.all_chi2_theory(results, totcov)[source]: Like all_chi2 but here the chi² are calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.bootstrap_values(data, nresamples, *, boot_seed: int = None, apply_func: Callable = None, args=None)[source]

General bootstrap sample

data is the data which is to be sampled, replicas is assumed to be on the final axis e.g N_bins*N_replicas

boot_seed can be specified if the user wishes to be able to take exact same bootstrap samples multiple times, as default it is set as None, in which case a random seed is used.

If just data and nresamples is provided, then bootstrap_values creates N resamples of the data, where each resample is a Monte Carlo selection of the data across replicas. The mean of each resample is returned

Alternatively, the user can specify a function to be sampled apply_func plus any additional arguments required by that function. bootstrap_values then returns apply_func(bootstrap_data, *args) where bootstrap_data.shape = (data.shape, nresamples). It is critical that apply_func can handle data input in this format.

validphys.calcutils.calc_chi2(sqrtcov, diffs)[source]

Elementary function to compute the chi², given a Cholesky decomposed lower triangular part and a vector of differences.

Parameters

sqrtcov (matrix) – A lower tringular matrix corresponding to the lower part of the Cholesky decomposition of the covariance matrix.
diffs (array) – A vector of differences (e.g. between data and theory). The first dimenssion must match the shape of sqrtcov. The computation will be broadcast over the other dimensions.

Returns

chi2 – The result of the χ² for each vector of differences. Will have the same shape as diffs.shape[1:].

Return type

array

Notes

This function computes the χ² more efficiently and accurately than following the direct definition of inverting the covariance matrix, \(\chi^2 = d\Sigma^{-1}d\), by solving the triangular linear system instead.

Examples

>>> from validphys.calcutils import calc_chi2
>>> import numpy as np
>>> import scipy.linalg as la
>>> np.random.seed(0)
>>> diffs = np.random.rand(10)
>>> s = np.random.rand(10,10)
>>> cov = s@s.T
>>> calc_chi2(la.cholesky(cov, lower=True), diffs)
44.64401691354948
>>> diffs@la.inv(cov)@diffs
44.64401691354948

validphys.calcutils.calc_phi(sqrtcov, diffs)[source]

Low level function which calculates phi given a Cholesky decomposed lower triangular part and a vector of differences. Primarily used when phi is to be calculated independently from chi2.

The vector of differences diffs is expected to have N_bins on the first axis

validphys.calcutils.central_chi2(results)[source]: Calculate the chi² from the central value of the theory prediction to the data

validphys.calcutils.central_chi2_theory(results, totcov)[source]: Like central_chi2 but here the chi² is calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.get_df_block(matrix: DataFrame, key: str, level)[source]

Given a pandas dataframe whose index and column keys match, and data represents a symmetric matrix returns a diagonal block of this matrix corresponding to matrix`[key, key`] as a numpy array

addtitionally, the user can specify the level of the key for which the cross section is being taken, by default it is set to 1 which corresponds to the dataset level of a theory covariance matrix

validphys.calcutils.regularize_covmat(covmat: array, norm_threshold=4)[source]

Given a covariance matrix, performs a regularization which is equivalent to performing regularize_l2 on the sqrt of covmat: the l2 norm of the inverse of the correlation matrix calculated from covmat is set to be less than or equal to norm_threshold. If the input covmat already fulfills this criterion it is returned.

Parameters

covmat (array) – a covariance matrix which is to be regularized.
norm_threshold (float) – The acceptable l2 norm of the sqrt correlation matrix, by default set to 4.

Returns

new_covmat – A new covariance matrix which has been regularized according to prescription above.

Return type

array

validphys.calcutils.regularize_l2(sqrtcov, norm_threshold=4)[source]

Return a regularized version of sqrtcov.

Given sqrtcov an (N, nsys) matrix, such that it’s gram matrix is the covariance matrix (covmat = sqrtcov@sqrtcov.T), first decompose it like sqrtcov = D@A, where D is a positive diagonal matrix of standard deviations and A is the “square root” of the correlation matrix, corrmat = A@A.T. Then produce a new version of A which removes the unstable behaviour and assemble a new square root covariance matrix, which is returned.

The stability condition is controlled by norm_threshold. It is

\[\left\Vert A^+ \right\Vert_{L2} \leq \frac{1}{\text{norm_threshold}}\]

A+ is the pseudoinverse of A, norm_threshold roughly corresponds to the sqrt of the maximimum relative uncertainty in any systematic.

Parameters

sqrtcov (2d array) – An (N, nsys) matrix specifying the uncertainties.
norm_threshold (float) – The tolerance for the regularization.

Returns

newsqrtcov – A regularized version of sqrtcov.

Return type

2d array

validphys.checks module

validphys.checks.check_at_least_two_replicas(pdf)[source]

validphys.checks.check_can_save_grid(ns, **kwags)[source]

validphys.checks.check_cuts_considered(use_cuts)[source]

validphys.checks.check_cuts_fromfit(use_cuts)[source]

validphys.checks.check_darwin_single_process(NPROC)[source]

Check that if we are on macOS (platform is Darwin), NPROC is equal to 1. This is related to the infamous issues with multiprocessing on macOS.

The “solution” is to run the code sequentially if NPROC is 1 and enforce that macOS users don’t set NPROC as anything else.

TODO: Once pseudodata is generated in python, try using spawn instead of fork with multiprocessing.

Notes

for the specific NNPDF issue: https://github.com/NNPDF/nnpdf/issues/931

General discussion: https://wefearchange.org/2018/11/forkmacos.rst.html

validphys.checks.check_data_cuts_match_theorycovmat(data, fitthcovmat)[source]

validphys.checks.check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat)[source]

validphys.checks.check_dataspecs_fits_different(dataspecs_fit)[source]: Need this check because oterwise the pandas object gets confused

validphys.checks.check_fits_different(fits)[source]: Need this check because oterwise the pandas object gets confused

validphys.checks.check_has_fitted_replicas(ns, **kwargs)[source]

validphys.checks.check_have_two_pdfs(pdfs)[source]

validphys.checks.check_know_errors(ns, **kwargs)[source]

validphys.checks.check_mixband_as_replicas(pdfs, mixband_as_replicas)[source]: Same as check_pdfs_noband, but for the mixband_as_replicas key. Allows mixband_as_replicas to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_norm_threshold(norm_threshold)[source]: Check norm_threshold is not None

validphys.checks.check_not_using_pdferr(use_pdferr=False, **kwargs)[source]

validphys.checks.check_pdf_is_hessian(pdf, **kwargs)[source]

validphys.checks.check_pdf_is_montecarlo(ns, **kwargs)[source]

validphys.checks.check_pdf_is_montecarlo_or_hessian(pdf, **kwargs)[source]

validphys.checks.check_pdf_normalize_to(pdfs, normalize_to)[source]: Transforn normalize_to into an index.

validphys.checks.check_pdfs_noband(pdfs, pdfs_noband)[source]: Allows pdfs_noband to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_scale(scalename, allow_none=False)[source]: Check that we have a valid matplotlib scale. With allow_none=True, also None is valid.

validphys.checks.check_speclabels_different(dataspecs_speclabel)[source]: This is needed for grouping dataframes (and because generally indecated a bug)

validphys.checks.check_two_dataspecs(dataspecs)[source]

validphys.checks.check_use_t0(ns, **kwargs)[source]: Checks use_t0 is set to true

validphys.checks.check_xlimits(xmax, xmin)[source]

validphys.commondata module

commondata.py

Module containing actions which return loaded commondata, leverages utils found in nnpdf_data.commondataparser, and returns objects from nnpdf_data.coredata

validphys.commondata.loaded_commondata_with_cuts(commondata, cuts)[source]

Load the commondata and apply cuts.

Parameters

commondata (validphys.core.CommonDataSpec) – commondata to load and cut.
cuts (validphys.core.cuts, None) – valid cuts, used to cut loaded commondata.

Returns

loaded_cut_commondata

Return type

nnpdf_data.coredata.CommonData

validphys.config module

class validphys.config.Config(input_params, environment=None)[source]

Bases: Config, CoreConfig

The effective configuration parser class.

class validphys.config.CoreConfig(input_params, environment=None)[source]

Bases: Config

load_default_data_grouping(spec)[source]: Load the default grouping of data

load_default_default_filter_rules(spec)[source]

load_default_default_filter_settings(spec)[source]

property loader

parse_added_filter_rules(rules: (<class 'list'>, <class 'NoneType'>) = None)[source]: Returns a tuple of AddedFilterRule objects. Rules are immutable after parsing. AddedFilterRule objects inherit from FilterRule objects. It checks if the rules are unique, i.e. if there are no multiple filters for the same dataset or process with the same fields (reason is not used in the comparison).

parse_additional_errors(bool)[source]: PDF set used to generate the photon additional errors: they are constructed using the replicas 101-107 of the PDF set LUXqed17_plus_PDF4LHC15_nnlo_100 (that are obtained varying some parameters of the LuxQED approach) in the way described in sec. 2.5 of https://arxiv.org/pdf/1712.07053.pdf

parse_cut_similarity_threshold(th: Real)[source]: Maximum relative ratio when using fromsimilarpredictons cuts.

parse_data_grouping(key)[source]: a key which indicates which default grouping to use. Mainly for internal use. It allows the default grouping of experiment to be applied to runcards which don’t specify metadata_group without there being a namespace conflict in the lockfile

parse_dataset_input(dataset: Mapping, allow_legacy_names: bool = True)[source]

The mapping that corresponds to the dataset specifications in the fit files

This mapping is such that

dataset: str: name of the dataset to load
variant: str: variant of the dataset to load
cfac: list: list of cfactors to apply
frac: float: fraction of the data to consider for training purposes
weight: float: extra weight to give to the dataset
custom_group: str: custom group to apply to the dataset

Old-format names-sys will be translated to the new version in this function.

parse_dataset_inputs(param: list, allow_legacy_names: bool = True): A list of dataset_input objects.

parse_default_filter_rules(spec: (<class 'str'>, <class 'NoneType'>))[source]

parse_default_filter_rules_recorded_spec_(spec)[source]: This function is a hacky fix for parsing the recorded spec of filter rules. The reason we need this function is that without it reportengine detects a conflict in the dataset key.

parse_default_filter_settings(spec: (<class 'str'>, <class 'NoneType'>))[source]

parse_drop_internal_rules(drop_internal_rules: (<class 'list'>, <class 'NoneType'>) = None)[source]: Turns drop_internal_rules into a tuple for internal caching.

parse_experiment(experiment: dict)[source]: A set of datasets where correlated systematics are taken into account. It is a mapping where the keys are the experiment name ‘experiment’ and a list of datasets.

parse_experiment_input(ei: dict)[source]: The mapping that corresponds to the experiment specification in the fit config files. Currently, this needs to be combined with experiment_from_input to yield an useful result.

parse_experiment_inputs(param: list): A list of experiment_input objects.

parse_experiments(param: list): A list of experiment objects.

parse_fakepdf(name)[source]: PDF set used to generate the fake data in a closure test.

parse_filter_defaults(filter_defaults: (<class 'dict'>, <class 'NoneType'>))[source]

A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.

Parameters: filter_defaults (dict, None) – A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.
Returns: A hashable object containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.
Return type: FilterDefaults

parse_filter_rules(filter_rules: (<class 'list'>, <class 'NoneType'>))[source]: A tuple of FilterRule objects. Rules are immutable after parsing. See https://docs.nnpdf.science/vp/filters.html for details on the syntax

parse_fit(item)[source]: A fit in the results folder, containing at least a valid filter result. Either just an id (str), or a mapping with ‘id’ and ‘label’.

parse_fitdeclaration(label: str)[source]

Used to guess some informtion from the fit name, without having to download it. This is meant to be used with other providers like e.g.:

{@with fits_as_from_fitdeclarations::fits_name_from_fitdeclarations@} {@ …do stuff… @} {@endwith@}

parse_fitdeclarations(param: list): A list of fitdeclaration objects.

parse_fits(param: list): A list of fit objects.

parse_groupby(grouping: str)[source]: parses the groupby key and checks it is an allowed grouping

parse_hyperscan(hyperscan)[source]: A hyperscan in the hyperscan_results folder, containing at least one tries.json file

parse_hyperscan_config(hyperscan_config, hyperopt=None)[source]: Configuration of the hyperscan

parse_hyperscans(param: list): A list of hyperscan objects.

parse_inconsistent_data_settings(settings)[source]

Parse the inconsistent data settings from the yaml file.

Known keys:

treatment_names: list
list of the names of the treatments that should be rescaled possible values are: MULT, ADD.
names_uncertainties: list
list of the names of the uncertainties that should be rescaled possible values are: CORR, UNCORR, THEORYCORR, THEORYUNCORR, SPECIAL SPECIAL is used for intra-dataset systematics.
inconsistent_datasets: list
list of the datasets for which an inconsistency should be introduced.
sys_rescaling_factor: float, int
the factor by which the systematics should be rescaled.

parse_integdataset(integset: dict, *, theoryid, rules)[source]: An observable corresponding to a PDF in the evolution basis, used as integrability constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_integdatasets(param: list, *, theoryid, rules): A list of integdataset objects.

parse_lumi_channel(ch: str)[source]

parse_lumi_channels(param: list): A list of lumi_channel objects.

parse_luxset(name)[source]: PDF set used to generate the photon with fiatlux.

parse_metadata_group(group: str)[source]: User specified key to group data by. The key must exist in the PLOTTING file for example experiment

parse_norm_threshold(val: (<class 'numbers.Number'>, <class 'NoneType'>))[source]

The threshold to use for covariance matrix normalisation, sets the maximum l2 norm of the inverse covariance matrix, by clipping smallest eigenvalues

If norm_threshold is set to None, then no covmat regularization is performed

parse_pdf(item, unpolarized_bc=None)[source]

A PDF set installed in LHAPDF. If an unpolarized boundary condition it defined, it will be registered as part of the PDF.

If name is already an instance of a vp PDF object, return it unchanged.: Either just an id , or a mapping with ‘id’ and ‘label’.

parse_pdfs(param: list, unpolarized_bc=None): A list of pdf objects.

parse_point_prescriptions(point_prescriptions)[source]

parse_posdataset(posset: dict, *, theoryid, rules)[source]: An observable used as positivity constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_posdatasets(param: list, *, theoryid, rules): A list of posdataset objects.

parse_reweighting_experiments(experiments, *, theoryid, use_cuts, fit=None)[source]: A list of experiments to be used for reweighting.

parse_speclabel(label: (<class 'str'>, <class 'NoneType'>))[source]: A label for a dataspec. To be used in some plots

parse_t0pdfset(name, unpolarized_bc=None)[source]: PDF set used to generate the t0 covmat.

parse_t0theoryid(theoryID: (<class 'str'>, <class 'int'>))[source]

A number corresponding to the database theory ID where the corresponding theory folder is installed in te data directory.

The t0theoryid is specifically used for SM parameter determinatins (e.g. alphas) using the correlated replicas method of arXiv: 1802.03398. To do an alphas determination we perform multiple fits, each with a different value of alphas in the DGLAP kernel and hard scattering cross section. Then we compute the chi2 for each fit to determine which alphas best describes the data, however, to make a fair comparison we need to ensure that the chi2 (and thus the t0 covariance matrix) has to be exactly the same for each fit. This requires not only to fix the t0pdfset between the different fits, but also to fix the t0theoryid.

parse_theoryid(item)[source]: A number corresponding to the database theory ID where the corresponding theory folder is installed in the data directory. Either just an id (str or int), or a mapping with ‘id’ and ‘label’.

parse_theoryids(param: list): A list of theoryid objects.

parse_unpolarized_bc(item)[source]: Unpolarised PDF used as a Boundary Condition to impose positivity of pPDFs. Either just an id , or a mapping with ‘id’ and ‘label’.

parse_unpolarized_bcs(param: list): A list of unpolarized_bc objects.

parse_use_cuts(use_cuts: (<class 'bool'>, <class 'str'>))[source]

Whether to filter the points based on the cuts applied in the fit, or the whole data in the dataset. The possible options are:

internal: Calculate the cuts based on the existing rules. This is the default.
fromfit: Read the cuts stored in the fit.
nocuts: Use the whole dataset.

parse_use_fitcommondata(do_use: bool)[source]: Use the commondata files in the fit instead of those in the data directory.

parse_use_t0(do_use_t0: bool)[source]: Whether to use the t0 PDF set to generate covariance matrices.

produce_all_commondata()[source]: produces all commondata using the loader function

produce_all_lumi_channels()[source]

produce_basisfromfit(fit)[source]

Set the basis from fit config. In the fit config file the basis is set using the key fitbasis, but it is exposed to validphys as basis.

The name of this production rule is intentionally set to not conflict with the existing fitbasis runcard key.

produce_combined_shift_and_theory_dataspecs(dataspecs)[source]

produce_commondata(*, dataset_input, use_fitcommondata=False, fit=None)[source]: Produce a CommondataSpec from a dataset input

produce_covariance_matrix(use_pdferr: bool = False)[source]: Modifies which action is used as covariance_matrix depending on the flag use_pdferr

produce_covmat_t0_considered(use_t0: bool = False)[source]: Modifies which action is used as covariance_matrix depending on the flag use_t0

produce_cuts(*, commondata, use_cuts)[source]: Obtain cuts for a given dataset input, based on the appropriate policy.

produce_data(data_input, *, group_name='data')[source]: A set of datasets where correlated systematics are taken into account

produce_data_input()[source]: Produce the data_input which is a flat list of dataset_input s. This production rule handles the backwards compatibility with old datasets which specify experiments in the runcard.

produce_dataset(*, dataset_input, theoryid, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]: Dataset specification from the theory and CommonData. Use the cuts from the fit, if provided. If check_plotting is set to True, attempt to lod and check the PLOTTING files (note this may cause a noticeable slowdown in general).

produce_dataset_inputs_covariance_matrix(use_pdferr: bool = False)[source]: Modifies which action is used as experiment_covariance_matrix depending on the flag use_pdferr

produce_dataset_inputs_covmat_t0_considered(use_t0: bool = False)[source]: Modifies which action is used as experiment_covariance_matrix depending on the flag use_t0

produce_dataset_inputs_fitting_covmat(use_thcovmat_in_fitting=False)[source]: Produces the correct covmat to be used in fitting_data_dict according to some options: whether to include the theory covmat, whether to separate the multiplcative errors and whether to compute the experimental covmat using the t0 prescription.

produce_dataset_inputs_sampling_covmat(sep_mult=False, use_thcovmat_in_sampling=False, use_t0_sampling=True)[source]

Produces the correct MC replica method sampling covmat to be used in make_replica according to some options: whether to sample using a t0 covariance matrix, include the theory covmat and whether to separate the multiplcative errors.

Parameters

sep_mult (bool, default=False) – Whether to separate the multiplicative errors.
use_thcovmat_in_sampling (bool, default=False) – Whether to include the theory covariance matrix.
use_t0_sampling (bool, default=True) – Whether to sample using a t0 covariance matrix.

Return type

Callable

produce_dataspecs_with_matched_cuts(dataspecs)[source]

Take a list of namespaces (dataspecs), resolve dataset within each of them, and return another list of dataspecs where the datasets all have the same cuts, corresponding to the intersection of the selected points. All the datasets must have the same name (i.e. correspond with the same experimental measurement), but can otherwise differ, for example in the theory used for the experimental predictions.

This rule can be combined with matched_datasets_from_dataspecs.

produce_defaults(q2min=None, w2min=None, maxTau=None, default_filter_settings=None, filter_defaults=None, default_filter_settings_recorded_spec_=None)[source]

Produce default values for filters taking into account the values of q2min, w2min and maxTau defined at namespace level and those inside a filter_defaults mapping.

Within this function the hashable type FilterDefaults is turned into a dictionary so as to allow for overwriting of the values of q2min, w2min and maxTau. The dictionary is then turned back into a FilterDefaults object.

produce_experiment_from_input(experiment_input, theoryid, use_cuts, fit=None)[source]: Return a mapping containing a single experiment from an experiment input. NOTE: This might be deprecated in the future.

produce_filter_data(fakedata: bool = False, theorycovmatconfig=None, inconsistent_fakedata: bool = False)[source]

Set the action used to filter the data to filter either real or closure data. If the closure data filter is being used and if the theory covariance matrix is not being closure tested then filter data by experiment for efficiency.

Parameters

fakedata (bool, default False) – whether to use closure test data in a fit.
theorycovmatconfig (dict) –
inconsistent_fakedata (bool, default False) – If true it allows for the introduction of inconsistencies in a closure test fit and returns filter_inconsistent_closure_data_by_experiment.

produce_fit_id(fit) → str[source]: Return a string containing the ID of the fit

produce_fitcontext(fitinputcontext, fitpdf)[source]: Set PDF, theory ID and data input from the fit config

produce_fitcontextwithcuts(fit, fitinputcontext)[source]: Like fitinputcontext but setting the cuts policy.

produce_fitenvironment(fit, fitinputcontext)[source]

Like fitcontext, but additionally forcing various other parameters, such as the cuts policy and Monte Carlo seeding to be the same as the fit.

Notes

This production rule is designed to be used as a namespace to collect over, for use with validphys.pseudodata.recreate_fit_pseudodata() and can be added to freely, e.g by setting trvlseed to be from the fit runcard.

produce_fitinputcontext(fit)[source]: Like fitcontext but without setting the PDF

produce_fitpdf(fit)[source]: Like fitcontext only setting the PDF

produce_fitpdfandbasis(fitpdf, basisfromfit)[source]: Set the PDF and basis from the fit config.

produce_fitq0fromfit(fitinputcontext)[source]: Given a fit, return the fitting scale according to the theory

produce_fitreplicas(fit)[source]: Production rule mapping the replica key to each Monte Carlo fit replica.

produce_fitthcovmat(use_thcovmat_if_present: bool = False, fit: (<class 'str'>, <class 'NoneType'>) = None)[source]: If a fit is specified and use_thcovmat_if_present is True then returns the corresponding covariance matrix for the given fit if it exists. If the fit doesn’t have a theory covariance matrix then returns False.

produce_fitunderlyinglaw(fit)[source]: Reads closuretest: fakepdf from fit config file and passes as pdf

produce_group_dataset_inputs_by_experiment(data_input)[source]

produce_group_dataset_inputs_by_metadata(data_input, processed_metadata_group)[source]: Take the data and the processed_metadata_group key and attempt to group the data, returns a list where each element specifies the data_input for a single group and the group_name

produce_group_dataset_inputs_by_process(data_input)[source]

produce_integdatasets(integrability)[source]

produce_loaded_theory_covmat(output_path, data_input, user_covmat_path=None, point_prescriptions=None, use_thcovmat_in_sampling=False, use_thcovmat_in_fitting=False)[source]: Loads the theory covmat from the correct file according to how it was generated by vp-setupfit.

produce_loaded_user_covmat_path(user_covmat_path: str = '')[source]: Path to the user covmat provided by user_covmat_path in the runcard. If no path is provided, returns None. For use in theorycovariance.construction.user_covmat.

produce_masks(diagonal_basis: bool = False)[source]: Modifies which action is used as masks depending on the flag diagonal_basis

produce_matched_datasets_from_dataspecs(dataspecs)[source]

Take an arbitrary list of mappings called dataspecs and return a new list of mappings called dataspecs constructed as follows.

From each of the original dataspecs, resolve the key process, and all the experiments and datasets therein.

Compute the intersection of the dataset names, and for each element in the intersection construct a mapping with the follwing keys:

process : A string with the common process name.

experiment_name : A string with the common experiment name.

dataset_name : A string with the common dataset name.

dataspecs : A list of mappinngs matching the original “dataspecs”. Each mapping contains:

dataset: A dataset with the name data_set name and the

properties (cuts, theory, etc) corresponding to the original dataspec. * dataset_input: The input line used to build dataset. * All the other keys in the original dataspec.

produce_matched_positivity_from_dataspecs(dataspecs)[source]: Like produce_matched_datasets_from_dataspecs but for positivity datasets.

produce_multiclosure_underlyinglaw(fits)[source]: Produce the underlying law for a set of fits. This allows a single t0 like covariance matrix to be loaded for all fits, for use with statistical estimators on multiple closure fits. If the fits don’t all have the same underlying law then an error is raised, offending fit is identified.

produce_nnfit_theory_covmat(point_prescriptions: list = None, user_covmat_path: str = None)[source]

Return the theory covariance matrix used in the fit.

This function is only used in vp-setupfit to store the necessary covmats as .csv files in the tables directory.

produce_no_covmat_reg()[source]: explicitly set norm_threshold to None so that no covariance matrix regularization is performed

produce_pdf_id(pdf) → str[source]: Return a string containing the PDF’s LHAPDF ID

produce_pdfreplicas(fitpdf)[source]: Production rule mapping the replica key to each postfit replica.

produce_posdatasets(positivity)[source]

produce_processed_data_grouping(use_thcovmat_in_fitting=False, use_thcovmat_in_sampling=False, diagonal_basis=False, data_grouping=None, data_grouping_recorded_spec_=None)[source]

Process the data_grouping key from the runcard, or lockfile. If data_grouping_recorded_spec_ is present then its value is taken, and the runcard is assumed to be a lockfile.

If data_grouping is None, then, if either use_thcovmat_in_fitting or use_thcovmat_in_sampling (or both) are true (which means that the fit is a thcovmat fit), group all the datasets together, otherwise fall back to the default behaviour of grouping by experiment (called standard_report).

Else, the user can specfiy their own grouping, for example metadata_process.

produce_processed_metadata_group(processed_data_grouping, metadata_group=None)[source]: Expose the final data grouping result. Either metadata_group is specified by user, in which case uses processed_data_grouping which is experiment by default.

produce_replicas(nreplica: int)[source]: Produce a replicas array

produce_reweight_all_datasets(experiments)[source]

produce_rules(theoryid, use_cuts, defaults, default_filter_rules=None, filter_rules=None, default_filter_rules_recorded_spec_=None, added_filter_rules: (<class 'tuple'>, <class 'NoneType'>) = None, drop_internal_rules: tuple = ())[source]

Produce filter rules based on the user defined input and defaults.

It is possible to overwrite or extend the internal rules from the runcard using the following variables:

filter_rules: tuple(rules): Drop all internal rules and take these instead
added_filter_rules: tuple(rules): Extended internal rules with these
drop_internal_rules: tuple(dataset names): Drop internal dataset-specific rules, it is applied before added_filter_rules

produce_sep_mult(separate_multiplicative=False)[source]

produce_t0dataset(*, dataset_input, t0id, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]: Same as produce_dataset, but if a t0theoryid has been defined in the runcard then those corresponding fktables will be linked.

produce_t0id(theoryid, t0theoryid=None)[source]: Return the t0id if t0theoryid is set and return theoryid otherwise.

produce_t0set(t0pdfset=None, use_t0=False)[source]: Return the t0set if use_t0 is True and None otherwise. Raises an error if t0 is requested but no t0set is given.

produce_theory_database()[source]: Produces path to the folder of the theory runcards

produce_theoryids(t0id, point_prescription)[source]: Produces a list of theoryids given a theoryid at central scales and a point prescription. The options for the latter are defined in pointprescriptions.yaml. This hard codes the theories needed for each prescription to avoid user error.

produce_total_chi2_data(fitthcovmat)[source]: If there is no theory covmat for the fit, then calculate the total chi2 by summing the chi2 from each experiment.

produce_total_phi_data(fitthcovmat)[source]: If there is no theory covmat for the fit, then calculate the total phi using contributions from each experiment.

class validphys.config.Environment(*, this_folder=None, net=True, upload=False, dry=False, **kwargs)[source]

Bases: Environment

Container for information to be filled at run time

validphys.convolution module

This module implements tools for computing convolutions between PDFs and theory grids, which yield observables.

The high level predictions() function can be used to extact theory predictions for experimentally measured quantities:

import numpy as np
from validphys.api import API
from validphys.convolution import predictions


inp = {
    'fit': '240921-02-rs-nnpdf40-baseline',
    'use_cuts': 'internal',
    'theoryid': 40_000_000,
    'pdf': 'NNPDF40_nnlo_lowprecision',
    'dataset_inputs': {'from_': 'fit'}
}


all_datasets = API.data(**inp).datasets

pdf = API.pdf(**inp)


all_preds = [predictions(ds, pdf) for ds in all_datasets]

These functions work with validphys.core.DatasetSpec objects, allowing to account for information on predictions which requires no-trivial operations and cuts. A lower level interface which operates with validphys.coredata.FKTableData objects is also available.

exception validphys.convolution.PredictionsRequireCutsError[source]: Bases: Exception

validphys.convolution.central_dis_predictions(loaded_fk, pdf)[source]: Implementation of central_fk_predictions() for DIS observables.

validphys.convolution.central_fk_predictions(loaded_fk, pdf)[source]: Same as fk_predictions(), but computing predictions for the central PDF member only.

validphys.convolution.central_hadron_predictions(loaded_fk, pdf)[source]: Implementation of central_fk_predictions() for hadronic observables.

validphys.convolution.central_predictions(dataset: DataSetSpec, pdf: PDF) → DataFrame[source]: Same as predictions() but computing the predictions for the central member of the PDF set only. For Monte Carlo PDFs, this is a faster alternative to computing the central predictions as the average of the replica predictions (although a small approximation is involved in the case of hadronic predictions).

validphys.convolution.dis_predictions(loaded_fk, pdf)[source]: Implementation of fk_predictions() for DIS observables.

validphys.convolution.fk_predictions(loaded_fk, pdf)[source]

Low level function to compute predictions from a FKTable.

Parameters

loaded_fk (validphys.coredata.FKTableData) – The FKTable corresponding to the partonic cross section.
pdf (validphys.core.PDF) – The PDF set to use for the convolutions.

Returns

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points (use validphys.coredata.FKTableData.with_cuts() to filter out points). The columns correspond to the selected PDF members in the LHAPDF set.

Return type

pandas.DataFrame

Notes

This function operates on a single FKTable, while the prediction for an experimental quantity generally involves several. Use predictions() to compute those.

Examples

>>> from validphys.loader import Loader
>>> from validphys.convolution import hadron_predictions
>>> from validphys.fkparser import load_fktable
>>> l = Loader()
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
>>> table = load_fktable(ds.fkspecs[0])
>>> hadron_predictions(table, pdf)
             1           2           3           4    ...         97          98          99          100
data                                                  ...
0     176.688118  170.172930  172.460771  173.792321  ...  179.504636  172.343792  168.372508  169.927820
1     252.682923  244.507916  247.840249  249.541798  ...  256.410844  247.805180  242.246438  244.415529
2     828.076008  813.452551  824.581569  828.213508  ...  838.707211  826.056388  810.310109  816.824167

validphys.convolution.hadron_predictions(loaded_fk, pdf)[source]: Implementation of fk_predictions() for hadronic observables.

validphys.convolution.linear_fk_predictions(loaded_fk, pdf)[source]: Same as predictions() for DIS, but compute linearized predictions for hadronic data, using linear_hadron_predictions().

validphys.convolution.linear_hadron_predictions(loaded_fk, pdf)[source]

Implementation of linear_fk_predictions() for hadronic observables. Specifically this computes:

central_value ⊗ FK ⊗ (2 * replica_values - central_value)

which is the linear expansion of the hadronic observable in the difference between each replica and the central value, replica_values - central_value

validphys.convolution.linear_predictions(dataset, pdf)[source]

Same as predictions() but computing linearized predictions. These are the same as predictions for DIS, but truncates to the terms that are linear in the difference between each member and the central value for hadronic predictions.

This approximation is generally a very good approximation in that yields differences that are much smaller that the PDF uncertainty.

validphys.convolution.predictions(dataset, pdf)[source]

“Compute theory predictions for a given PDF and dataset. Information regading the dataset, on cuts, CFactors and combinations of FKTables is taken into account to construct the predictions.

The result should be comparable to experimental predictions implemented in CommonData.

Parameters

dataset (validphys.core.DatasetSpec) – The dataset containing information on the partonic cross section.
pdf (validphys.core.PDF) – The PDF set to use for the convolutions.

Returns

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points, based on the dataset cuts. The columns correspond to the selected PDF members in the LHAPDF set.

Return type

pandas.DataFrame

Examples

Obtain descriptive statistics over PDF replicas for each of the three points in the ATLAS ttbar dataset:

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53)
>>> from validphys.convolution import predictions
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> preds = predictions(ds, pdf)
>>> preds.T.describe()
data            0           1           2
count  100.000000  100.000000  100.000000
mean   161.271292  231.500367  767.816844
std      2.227304    2.883497    7.327617
min    156.638526  225.283254  750.850250
25%    159.652216  229.486793  762.773527
50%    161.066965  231.281248  767.619249
75%    162.620554  233.306836  772.390286
max    168.390840  240.287549  786.549380

validphys.core module

Core datastructures used in the validphys data model.

class validphys.core.CommonDataSpec(name, metadata, datafile=None)[source]

Bases: TupleComp

Holds all the information necessary to load a commondata file and provides methods to easily access them

Parameters

name (str) – name of the commondata
metadata (ObservableMetaData) – instance of ObservableMetaData holding all information about the dataset
legacy (bool) – whether this is an old or new format metadata file

The datafile, sysfile and plotfiles` arguments are deprecated and only to be used with legacy=True

property legacy_names

load()[source]: load a validphys.core.CommonDataSpec to validphys.core.CommonData

property metadata

property name

property ndata

property nsys

property plot_kinlabels

property process_type

property theory_metadata

with_modified_data(central_data_file, uncertainties_file=None)[source]: Returns a copy of this instance with a new data file in the metadata

class validphys.core.Cuts(commondata, path)[source]

Bases: TupleComp

load()[source]

class validphys.core.CutsPolicy(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

FROMFIT = 'fromfit'

FROM_CUT_INTERSECTION_NAMESPACE = 'fromintersection'

FROM_SIMILAR_PREDICTIONS_NAMESPACE = 'fromsimilarpredictions'

INTERNAL = 'internal'

NOCUTS = 'nocuts'

class validphys.core.DataGroupSpec(name, datasets, dsinputs=None)[source]

Bases: TupleComp, NSList

property as_markdown

load_commondata()[source]

load_commondata_instance()[source]: Given Experiment load list of nnpdf_data.coredata.CommonData objects with cuts already applied

property thspec

to_unweighted()[source]: Return a copy of the group with the weights for all experiments set to one. Note that the results cannot be used as a namespace.

class validphys.core.DataSetInput(*, name, cfac, frac, weight, custom_group, variant)[source]

Bases: TupleComp

Represents whatever the user enters in the YAML to specify a dataset.

name: str: name of the dataset_inputs
cfac: tuple: cfactors to apply to the final predictions (default: ())
frac: float: fraction of the data to be used during training (default: 1.0)
weight: float: extra weight to apply to the dataset (default: 1.0)
variant: str or tuple[str]: variant or variants to apply (default: None)

class validphys.core.DataSetSpec(*, name, commondata, fkspecs, thspec, cuts, frac=1, op=None, weight=1, rules=())[source]

Bases: TupleComp

load_commondata()[source]: Strips the commondata loading from load

to_unweighted()[source]: Return a copy of the dataset with the weight set to one.

class validphys.core.ExperimentInput(*, name, datasets)[source]

Bases: TupleComp

as_dict()[source]

class validphys.core.FKTableSpec(fkpath, cfactors, metadata=None)[source]

Bases: TupleComp

Each FKTable is formed by a number of sub-fktables to be concatenated each of which having its own path. Therefore the fkpath variable is a list of paths.

Before the pineappl implementation, FKTable were already pre-concatenated. The Legacy interface therefore relies on fkpath being just a string or path instead

The metadata of the FKTable for the given dataset is stored as an attribute to this function. This is transitional, eventually it will be held by the associated CommonData in the new format.

load_cfactors()[source]: Each of the sub-fktables that form the complete FKTable can have several cfactors applied to it. This function uses parse_cfactor to make them into CFactorData

load_with_cuts(cuts)[source]: Load the fktable and apply cuts immediately. Returns a FKTableData

class validphys.core.Filter(indexes, label, **kwargs)[source]

Bases: object

as_pair()[source]

class validphys.core.FitSpec(name, path)[source]

Bases: TupleComp

as_input()[source]

label

name

path

class validphys.core.HessianStats(data, rescale_factor=1)[source]

Bases: SymmHessianStats

Compute stats in the ‘assymetric’ hessian format: The first index (0) is the central value. The odd indexes are the results for lower eigenvectors and the even are the upper eigenvectors.A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

moment(order)[source]

std_error()[source]

class validphys.core.HyperscanSpec(name, path)[source]

Bases: FitSpec

The hyperscan spec is just a special case of FitSpec

get_all_trials(base_params=None)[source]

Read all trials from all tries files. If there are original runcard-based parameters, a reference to them can be passed to the trials so that a full hyperparameter dictionary can be defined

Each hyperopt trial object will also have a reference to all trials in its own file

label

name

path

sample_trials(n=None, base_params=None, sigma=4.0)[source]

Parse all trials in the hyperscan object and then return an array of n trials read from the tries.json files and sampled according to their reward. If n is None, no sapling is performed and all trials are returned

Returns: Dictionary on the form {parameters
Return type: list of trials}

property tries_files: Return a dictionary with all tries.json files mapped to their replica number

class validphys.core.IntegrabilitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]: Bases: LagrangeSetSpec

class validphys.core.InternalCutsWrapper(commondata, rules)[source]

Bases: TupleComp

load()[source]

class validphys.core.LagrangeSetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]

Bases: DataSetSpec

Extends DataSetSpec to work around the particularities of the positivity, integrability and other Lagrange Multiplier datasets.

to_unweighted()[source]: Return a copy of the dataset with the weight set to one.

class validphys.core.MCStats(data)[source]

Bases: Stats

Result obtained from a Monte Carlo sample

errorbar68()[source]

moment(order)[source]

sample_values(size)[source]

std_error()[source]

class validphys.core.MatchedCuts(othercuts, ndata)[source]

Bases: TupleComp

load()[source]

class validphys.core.PDF(name, boundary=None)[source]

Bases: TupleComp

Base validphys PDF providing high level access to metadata.

Statistical estimators which depends on the PDF type (MC, Hessian…) are exposed as a Stats object through the stats_class attribute The LHAPDF metadata can directly be accessed through the info attribute

Examples

>>> from validphys.api import API
>>> from validphys.convolution import predictions
>>> args = { "dataset_input": {"dataset": "CMS_WPWM_7TEV_MUON_ASY"}, "theoryid":40_000_000, "use_cuts":"internal"}
>>> ds = API.dataset(**args)
>>> pdf = API.pdf(pdf="NNPDF40_nnlo_as_01180")
>>> preds = predictions(ds, pdf)
>>> preds.shape
(11, 100)

property alphas_mz: Alpha_s(M_Z) as defined in the LHAPDF .info file

property alphas_vals: List of alpha_s(Q) at various Q for interpolation based alphas. Values as defined in the LHAPDF .info file

property error_conf_level: Error confidence level as defined in the LHAPDF .info file if no number is given in the LHAPDF .info file defaults to 68%

property error_type: Error type as defined in the LHAPDF .info file

get_members()[source]: Return the number of members selected in pdf.load().grid_values

property info: Information contained in the LHAPDF .info file

property infopath

property is_polarized: Returns True if the PDF has a boundary condition associated to it. At the moment LHAPDF provides no mechanism to know whether a PDF is polarized.

property isinstalled

property label

load()[source]

load_t0()[source]: Load the PDF as a t0 set

make_only_cv()[source]

property q_min: Minimum Q as given by the LHAPDF .info file

register_boundary(unpolarized_bc=None)[source]: Register other PDFs as boundary conditions of this PDF

property stats_class: Return the stats calculator for this error type

exception validphys.core.PDFDoesNotExist[source]: Bases: Exception

class validphys.core.PDFcv(name, boundary=None)[source]

Bases: PDF

An add-on for the PDF class that makes only the central value available

load()[source]

class validphys.core.PositivitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]: Bases: LagrangeSetSpec

class validphys.core.SimilarCuts(inputs, threshold)[source]

Bases: TupleComp

load()[source]

class validphys.core.Stats(data)[source]

Bases: object

Class holding statistical information about the objects used in validphys. This object can be a PDF or any function of a PDF (such as hadronic observable).

By convention, member 0 corresponds to the central value of the PDF. Accordingly, the method central_value will return the result held for member 0. Note that this is equal to the mean of the error_members only for the PDF itself and linear functions of the PDF (such as DIS-type observable). If you want to obtain the average of the error members you can do: np.mean(stats_instance.error_members, axis=0)

central_value()[source]

error_members()[source]

errorbar68()[source]

errorbarstd()[source]

moment(order)[source]

sample_values(size)[source]

std_error()[source]

std_interval(nsigma)[source]

class validphys.core.SymmHessianStats(data, rescale_factor=1)[source]

Bases: Stats

Compute stats in the ‘symetric’ hessian format: The first index (0) is the central value. The rest of the indexes are results for each eigenvector. A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

errorbar68()[source]

moment(order)[source]

std_error()[source]

class validphys.core.ThCovMatSpec(path)[source]

Bases: object

load()[source]

class validphys.core.TheoryIDSpec(id: int, path: pathlib.Path, dbpath: pathlib.Path)[source]

Bases: object

dbpath: Path

get_description()[source]

id: int

is_pineappl()[source]: Check whether this theory is a pineappl-based theory Assume yes unless a compound directory is found

path: Path

class validphys.core.TupleComp(*args, **kwargs)[source]

Bases: object

classmethod argnames()[source]

validphys.core.cut_mask(cuts)[source]: Return an objects that will act as the cuts when applied as a slice

validphys.coredata module

Data containers backed by Python managed memory (Numpy arrays and Pandas dataframes).

class validphys.coredata.CFactorData(description: str, central_value: array, uncertainty: array)[source]

Bases: object

Data contained in a CFactor

Parameters

description (str) – Information on how the data was obtained.
central_value (array, shape(ndata)) – The value of the cfactor for each data point.
uncertainty (array, shape(ndata)) – The absolute uncertainty on the cfactor if available.

central_value: array

description: str

uncertainty: array

class validphys.coredata.FKTableData(hadronic: bool, Q0: float, ndata: int, xgrid: ~numpy.ndarray, sigma: ~pandas.core.frame.DataFrame, convolution_types: ~typing.Optional[tuple[str]] = None, metadata: dict = <factory>, protected: bool = False)[source]

Bases: object

Data contained in an FKTable

Parameters

hadronic (bool) – Whether a hadronic (two PDFs) or a DIS (one PDF) convolution is needed.
Q0 (float) – The scale at which the PDFs should be evaluated (in GeV).
ndata (int) – The number of data points in the grid.
xgrid (array, shape (nx)) – The points in x at which the PDFs should be evaluated.
sigma (pd.DataFrame) –
For hadronic data, the columns are the indexes in the NfxNf list of possible flavour combinations of two PDFs. The MultiIndex contains three keys, the data index, an index into xgrid for the first PDF and an idex into xgrid for the second PDF, indicating if the points in x where the PDF should be evaluated.

For DIS data, the columns are indexes in the Nf list of flavours. The MultiIndex contains two keys, the data index and an index into xgrid indicating the points in x where the PDF should be evaluated.
convolution_types (tuple[str]) – The type of convolution that the FkTable is expecting for each of the functions to be convolved with (usually the two types of PDF from the two incoming hadrons).
metadata (dict) – Other information contained in the FKTable.
protected (bool) – When a fktable is protected cuts will not be applied. The most common use-case is when a total cross section is used as a normalization table for a differential cross section, in legacy code (<= NNPDF4.0) both fktables would be cut using the differential index.

Q0: float

convolution_types: Optional[tuple[str]] = None

determine_pdfs(pdf)[source]: Determine the PDF (or PDFs) that should be used to be convoluted with this fktable. Uses the convolution_types key to decide the PDFs. If convolution_types is not defined, it returns the pdf object.

get_np_fktable()[source]

Returns the fktable as a dense numpy array that can be directly manipulated with numpy

The return shape is:: (ndata, nx, nbasis) for DIS (ndata, nx, nx, nbasis) for hadronic

where nx is the length of the xgrid and nbasis the number of flavour contributions that contribute

hadronic: bool

property luminosity_mapping

Return the flavour combinations that contribute to the fktable in the form of a single array

The return shape is:: (nbasis,) for DIS (nbasis*2,) for hadronic

metadata: dict

ndata: int

protected: bool = False

sigma: DataFrame

with_cfactor(cfactor)[source]: Returns a copy of the FKTableData object with cfactors applied to the fktable

with_cuts(cuts)[source]

Return a copy of the FKTable with the cuts applied. The data index of the sigma operator (the outermost level), contains the data point that have been kept. The ndata property is updated to reflect the new number of datapoints. If cuts is None, return the object unmodified.

Parameters: cuts (array_like or validphys.core.Cuts or None.) – The cuts to be applied.
Returns: res – A copy of the FKtable with the cuts applies.
Return type: FKTableData

Notes

The original number of points can be accessed with table.metadata['GridInfo'].ndata.

Examples

>>> from validphys.fkparser import load_fktable
... from validphys.loader import Loader
... l = Loader()
... ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
... table = load_fktable(ds.fkspecs[0])
... newtable = table.with_cuts([0,1])
>>> assert set(newtable.sigma.index.get_level_values(0)) == {0,1}
>>> assert newtable.ndata == 2
>>> assert newtable.metadata['GridInfo'].ndata == 3

xgrid: ndarray

validphys.correlations module

Utilities for computing correlations in batch.

@author: Zahari Kassabov

validphys.correlations.obs_obs_correlations(pdf, corrpair_results)[source]: Return the theoretical correlation matrix between a pair of observables.

validphys.correlations.obs_pdf_correlations(pdf, results, xplotting_grid)[source]: Return the correlations between each point in a dataset and the PDF values on a grid of (x,f) points in a format similar to xplotting_grid.

validphys.covmats module

Module for handling logic and manipulation of covariance and correlation matrices on different levels of abstraction

validphys.covmats.covmat_from_systematics(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, norm_threshold=None, _central_values=None)[source]

Take the statistical uncertainty and systematics table from a nnpdf_data.coredata.CommonData object and construct the covariance matrix accounting for correlations between systematics.

If the systematic has the name SKIP then it is ignored in the construction of the covariance matrix.

ADDitive or MULTiplicative systypes are handled by either multiplying the additive or multiplicative uncertainties respectively. We convert uncertainties so that they are all in the same units as the data:

Additive (ADD) systematics are left unchanged

multiplicative (MULT) systematics need to be converted from a

percentage by multiplying by the central value and dividing by 100.

Finally, the systematics are split into the five possible archetypes of systematic uncertainties: uncorrelated (UNCORR), correlated (CORR), theory uncorrelated (THEORYUNCORR), theory correlated (THEORYCORR) and special correlated (SPECIALCORR) systematics.

Uncorrelated contributions from statistical error, uncorrelated and theory uncorrelated are added in quadrature to the diagonal of the covmat.

The contribution to the covariance matrix arising due to correlated systematics is schematically A_correlated @ A_correlated.T, where A_correlated is a matrix N_dat by N_sys. The total contribution from correlated systematics is found by adding together the result of mutiplying each correlated systematic matrix by its transpose (correlated, theory_correlated and special_correlated).

For more information on the generation of the covariance matrix see the paper outlining the procedure, specifically equation 2 and surrounding text.

Parameters

loaded_commondata_with_cuts (nnpdf_data.coredata.CommonData) – CommonData which stores information about systematic errors, their treatment and description.
dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.
use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_central_values (None, np.array) – 1-D array containing alternative central values to combine with the multiplicative errors to calculate their absolute contributions. By default this is None, and the experimental central values are used. However, this can be used to calculate, for example, the t0 covariance matrix by using the predictions from the central member of the t0 pdf.

Returns

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type

np.array

Example

In order to use this function, simply call it from the API

>>> from validphys.api import API
>>> inp = dict(
...     dataset_input={'dataset': 'CMS_Z0J_8TEV_PT-Y', 'cfac':('NRM',)},
...     theoryid=40_000_000,
...     use_cuts="internal"
... )
>>> cov = API.covmat_from_systematics(**inp)
>>> cov.shape
(28, 28)

validphys.covmats.covmat_stability_characteristic(systematics_matrix_from_commondata)[source]

Return a number characterizing the stability of an experimental covariance matrix against uncertainties in the correlation. It is defined as the L2 norm (largest singular value) of the square root of the inverse correlation matrix. This is equivalent to the square root of the inverse of the smallest singular value of the correlation matrix:

Z = (1/λ⁰)^½

Where λ⁰ is the smallest eigenvalue of the correlation matrix.

This is the number used as threshold in calcutils.regularize_covmat(). The interpretation is roughly what precision does the worst correlation need to have in order to not affect meaningfully the χ² computed using the covariance matrix, so for example a stability characteristic of 4 means that correlations need to be known with uncetainties less than 0.25.

Examples

>>> from validphys.api import API
>>> ds = {'dataset': 'NMC_NC_NOTFIXED_P_EM-SIGMARED', 'variant': 'legacy'}
>>> API.covmat_stability_characteristic(dataset_input=ds,
... theoryid=40_000_000, use_cuts="internal")
2.742658604186124

validphys.covmats.dataset_inputs_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, data_input, use_weights_in_covmat=True, norm_threshold=None, _list_of_central_values=None, _only_additive=False)[source]

Given a list containing nnpdf_data.coredata.CommonData s, construct the full covariance matrix.

This is similar to covmat_from_systematics() except that special corr systematics are concatenated across all datasets before being multiplied by their transpose to give off block-diagonal contributions. The other systematics contribute to the block diagonal in the same way as covmat_from_systematics().

Parameters

dataset_inputs_loaded_cd_with_cuts (list[nnpdf_data.coredata.CommonData]) – list of CommonData objects.
data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.
use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_list_of_central_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.

Returns

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type

np.array

Example

This function can be called directly from the API:

>>> dsinps = [
...     {'dataset': 'NMC_NC_NOTFIXED_P_EM-SIGMARED', 'variant': 'legacy'},
...     {'dataset': 'ATLAS_TTBAR_7TEV_TOT_X-SEC', 'variant': 'legacy_theory'},
...     {'dataset': 'CMS_Z0J_8TEV_PT-Y', 'cfac':('NRM',)},
... ]
>>> inp = dict(dataset_inputs=dsinps, theoryid=40_000_000, use_cuts="internal")
>>> cov = API.dataset_inputs_covmat_from_systematics(**inp)
>>> cov.shape
(233, 233)

Which properly accounts for all dataset settings and cuts.

validphys.covmats.dataset_inputs_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]: Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]: Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_sqrt_covmat(dataset_inputs_covariance_matrix)[source]: Like sqrt_covmat but for an group of datasets

validphys.covmats.dataset_inputs_stability_table(dataset_inputs_stability, dataset_inputs)[source]: Return a table with py:func:covmat_stability_characteristic for all dataset inputs

validphys.covmats.dataset_inputs_t0_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Like t0_covmat_from_systematics() except for all data

Parameters

dataset_inputs_loaded_cd_with_cuts (list[nnpdf_data.coredata.CommonData]) – The CommonData for all datasets defined in dataset_inputs.
data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.
use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
dataset_inputs_t0_predictions (list[np.array]) – The t0 predictions for all datasets.

Returns

t0_covmat – t0 covariance matrix matrix for list of datasets.

Return type

np.array

validphys.covmats.dataset_inputs_t0_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]: Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_t0_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]: Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_t0_total_covmat(dataset_inputs_t0_exp_covmat, loaded_theory_covmat)[source]: Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_t0_total_covmat_separate(dataset_inputs_t0_exp_covmat_separate, loaded_theory_covmat)[source]: Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat(dataset_inputs_exp_covmat, loaded_theory_covmat)[source]: Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat_separate(dataset_inputs_exp_covmat_separate, loaded_theory_covmat)[source]: Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_t0_predictions(t0dataset, t0set)[source]

Returns the t0 predictions for a dataset which are the predictions calculated using the central member of pdf. Note that if pdf has errortype replicas, and the dataset is a hadronic observable then the predictions of the central member are subtly different to the central value of the replica predictions.

Parameters

dataset (validphys.core.DataSetSpec) – dataset for which to calculate t0 predictions
t0set (validphys.core.PDF) – pdf used to calculate the predictions

Returns

t0_predictions – 1-D numpy array with predictions for each of the cut datapoints.

Return type

np.array

validphys.covmats.datasets_covmat_differences_table(each_dataset, datasets_covmat_no_reg, datasets_covmat_reg, norm_threshold)[source]

For each dataset calculate and tabulate two max differences upon regularization given a value for norm_threshold:

max relative difference to the diagonal of the covariance matrix (%)
max absolute difference to the correlation matrix of each covmat

validphys.covmats.dataspecs_datasets_covmat_differences_table(dataspecs_speclabel, dataspecs_covmat_diff_tables)[source]: For each dataspec calculate and tabulate the two covmat differences described in datasets_covmat_differences_table (max relative difference in variance and max absolute correlation difference)

validphys.covmats.fit_name_with_covmat_label(fit, fitthcovmat)[source]: If theory covariance matrix is being used to calculate statistical estimators for the fit then appends (exp + th) onto the fit name for use in legends and column headers to help the user see what covariance matrix was used to produce the plot or table they are looking at.

validphys.covmats.generate_exp_covmat(datasets_input, data, use_weights, norm_threshold, _list_of_c_values, only_add)[source]

Function to generate the experimental covmat eventually using the t0 prescription. It is also possible to compute it only with the additive errors.

Parameters

dataset_inputs (list[nnpdf_data.coredata.CommonData]) – list of CommonData objects.
data (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.
use_weights (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_list_of_c_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.
only_add (bool) – specifies whether to use only the additive errors to compute the covmat

Returns

np.array
experimental covariance matrix

validphys.covmats.groups_corrmat(groups_covmat)[source]: Generates the grouped experimental correlation matrix with groups_covmat as input

validphys.covmats.groups_covmat(groups_covmat_no_table)[source]: Duplicate of groups_covmat_no_table but with a table decorator.

validphys.covmats.groups_covmat_no_table(groups_data, groups_index, groups_covmat_collection)[source]

Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:

group name

dataset name

index of the point within the dataset.

validphys.covmats.groups_invcovmat(groups_data, groups_index, groups_covmat_collection)[source]: Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.

validphys.covmats.groups_normcovmat(groups_covmat, groups_data_values)[source]: Calculates the grouped experimental covariance matrix normalised to data.

validphys.covmats.groups_sqrtcovmat(groups_data, groups_index, groups_sqrt_covmat)[source]: Like groups_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.

validphys.covmats.pdferr_plus_covmat(results_without_covmat, pdf, covmat_t0_considered)[source]

For a given dataset, returns the sum of the covariance matrix given by covmat_t0_considered and the PDF error: - If the PDF error_type is ‘replicas’, a covariance matrix is estimated from

the replica theory predictions

If the PDF error_type is ‘symmhessian’, a covariance matrix is estimated using formulas from (mc2hessian) https://arxiv.org/pdf/1505.06736.pdf
If the PDF error_type is ‘hessian’ a covariance matrix is estimated using the hessian formula from Eq. 5 of https://arxiv.org/pdf/1401.0013.pdf

Parameters

dataset (DataSetSpec) – object parsed from the dataset_input runcard key
pdf (PDF) – monte carlo pdf used to estimate PDF error
covmat_t0_considered (np.array) – experimental covariance matrix with the t0 considered

Returns

covariance_matrix – sum of the experimental and pdf error as a numpy array

Return type

np.array

Examples

use_pdferr makes this action be used for covariance_matrix

>>> from validphys.api import API
>>> import numpy as np
>>> inp = {
        'dataset_input': {
            'dataset': 'ATLAS_TTBAR_8TEV_LJ_DIF_YTTBAR-NORM',
            'variant': 'legacy',
        },
        'theoryid': 40_000_000,
        'pdf': 'NNPDF40_nlo_as_01180',
        'use_cuts': 'internal',
    }
>>> a = API.covariance_matrix(**inp, use_pdferr=True)
>>> b = API.pdferr_plus_covmat(**inp)
>>> (a == b).all()
True

validphys.covmats.pdferr_plus_dataset_inputs_covmat(dataset_inputs_results_without_covmat, data, pdf, dataset_inputs_covmat_t0_considered, fitthcovmat)[source]: Like pdferr_plus_covmat except for an experiment

validphys.covmats.reorder_thcovmat_as_expcovmat(fitthcovmat, data)[source]: Reorder the thcovmat in such a way to match the order of the experimental covmat, which means the order of the runcard

validphys.covmats.sqrt_covmat(covariance_matrix)[source]

Function that computes the square root of the covariance matrix.

Parameters: covariance_matrix (np.array) – A positive definite covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.
Returns: sqrt_mat – The square root of the input covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts), and which is the the lower triangular decomposition. The following should be True: np.allclose(sqrt_covmat @ sqrt_covmat.T, covariance_matrix).
Return type: np.array

Notes

The square root is found by using the Cholesky decomposition. However, rather than finding the decomposition of the covariance matrix directly, the (upper triangular) decomposition is found of the corresponding correlation matrix and then the output of this is rescaled and then transposed as sqrt_matrix = (decomp * sqrt_diags).T, where decomp is the Cholesky decomposition of the correlation matrix and sqrt_diags is the square root of the diagonal entries of the covariance matrix. This method is useful in situations in which the covariance matrix is near-singular. See here for more discussion on this.

The lower triangular is useful for efficient calculation of the \(\chi^2\)

Example

>>> import numpy as np
>>> from validphys.api import API
>>> ds = {'dataset': 'NMC_NC_NOTFIXED_P_EM-SIGMARED', 'variant': 'legacy'}
>>> sqrt_cov = API.sqrt_covmat(dataset_input=ds, theoryid=40_000_000, use_cuts="internal")
>>> cov = API.covariance_matrix(dataset_input=ds, theoryid=40_000_000, use_cuts="internal")
>>> np.allclose(np.linalg.cholesky(cov), sqrt_cov)
True

validphys.covmats.systematics_matrix_from_commondata(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, _central_values=None)[source]

Returns a systematics matrix, \(A\), for the corresponding dataset. The systematics matrix is a square root of the covmat:

\[C = A A^T\]

and is obtained by concatenating a block diagonal of the uncorrelated uncertainties with the correlated systematics.

validphys.covmats.t0_covmat_from_systematics(loaded_commondata_with_cuts, *, dataset_input, use_weights_in_covmat=True, norm_threshold=None, dataset_t0_predictions)[source]

Like covmat_from_systematics() except uses the t0 predictions to calculate the absolute constributions to the covmat from multiplicative uncertainties. For more info on the t0 predictions see validphys.commondata.dataset_t0_predictions().

Parameters

loaded_commondata_with_cuts (nnpdf_data.coredata.CommonData) – commondata object for which to generate the covmat.
dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.
use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
dataset_t0_predictions (np.array) – 1-D array with t0 predictions.

Returns

t0_covmat – t0 covariance matrix

Return type

np.array

validphys.covmats_utils module

covmat_utils.py

Utils functions for constructing covariance matrices from systematics. Leveraged by validphys.covmats which contains relevant actions/providers.

validphys.covmats_utils.construct_covmat(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to construct a covariance matrix (covmat), given the statistical error and a dataframe of systematics.

Errors with name UNCORR or THEORYUNCORR are added in quadrature with the statistical error to the diagonal of the covmat.

Other systematics are treated as correlated; their covmat contribution is found by multiplying them by their transpose.

Parameters

stat_errors (np.array) – a 1-D array of statistical uncertainties
sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.covmats_utils.systematics_matrix(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to create a systematics matrix , \(A\), such that:

\[C = A A^T\]

Where \(C\) is the covariance matrix. This is achieved by creating a block diagonal matrix by adding the uncorrelated systematics in quadrature then taking the square-root and concatenating the correlated systematics, schematically:

Parameters

stat_errors (np.array) – a 1-D array of statistical uncertainties
sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.dataplots module

Plots of relations between data PDFs and fits.

validphys.dataplots.check_normalize_to(ns, **kwargs)[source]: Transforn normalize_to into an index.

validphys.dataplots.kde_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]: KDE plot for experiments chi2.

validphys.dataplots.plot_chi2_eigs(pdf, dataset, chi2_per_eig)[source]

validphys.dataplots.plot_chi2dist(dataset, abs_chi2_data, chi2_stats, pdf)[source]: Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]: Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_sv(dataset, abs_chi2_data_thcovmat, pdf)[source]: Same as plot_chi2dist considering also the theory covmat in the calculation

validphys.dataplots.plot_dataset_inputs_phi_dist(data, dataset_inputs_bootstrap_phi_data)[source]: Generates a bootstrap distribution of phi and then plots a histogram of the individual bootstrap samples for dataset_inputs. By default the number of bootstrap samples is set to a sensible number (500) however this number can be changed by specifying bootstrap_samples in the runcard

validphys.dataplots.plot_datasets_chi2(groups_data, groups_chi2)[source]: Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_chi2_spider(groups_data, groups_chi2)[source]: Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_pdfs_chi2(data, each_dataset_chi2_pdfs, pdfs)[source]: Plot the chi² of all datasets with bars, and for different pdfs.

validphys.dataplots.plot_datasets_pdfs_chi2_sv(data, each_dataset_chi2_pdfs_sv, pdfs)[source]: Same as plot_datasets_pdfs_chi2_sv with the chi²s computed including scale variations

validphys.dataplots.plot_dataspecs_datasets_chi2(dataspecs_datasets_chi2_table)[source]: Same as plot_fits_datasets_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_datasets_chi2_spider(dataspecs_datasets_chi2_table)[source]: Same as plot_fits_datasets_chi2_spider but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2(dataspecs_groups_chi2_table, processed_metadata_group)[source]: Same as plot_fits_groups_data_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2_spider(dataspecs_groups_chi2_table)[source]

validphys.dataplots.plot_dataspecs_positivity(dataspecs_speclabel, dataspecs_positivity_predictions, dataspecs_posdataset, pos_use_kin=False)[source]: Like plot_positivity() except plots positivity for each element of dataspecs, allowing positivity predictions to be generated with different theory_id s as well as pdf s

validphys.dataplots.plot_fancy(one_or_more_results, commondata, cuts, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

Read the PLOTTING configuration for the dataset and generate the corrspondig data theory plot.

The input results are assumed to be such that the first one is the data, and the subsequent ones are the predictions for the PDFfs. See one_or_more_results. The labelling of the predictions can be influenced by setting label attribute of theories and pdfs.

normalize_to: should be either ‘data’, a pdf id or an index of the result (0 for the data, and i for the ith pdf). None means plotting absolute values.

See docs/plotting_format.md for details on the format of the PLOTTING files.

validphys.dataplots.plot_fancy_dataspecs(dataspecs_results, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

General interface for data-theory comparison plots.

The user should define an arbitrary list of mappings called “dataspecs”. In each of these, dataset must resolve to a dataset with the same name (but could be e.g. different theories). The production rule matched_datasets_from_datasepcs may be used for this purpose.

The result will be a plot combining all the predictions from the dataspecs mapping (whch could vary in theory, pdf, cuts, etc).

The user can define a “speclabel” key in each datasspec (or only on some). By default, the PDF label will be used in the legend (like in plot_fancy).

normalize_to must be either:

The string ‘data’ or the integer 0 to plot the ratio to data,

or the 1-based index of the dataspec to normalize to the corresponding prediction,

or None (default) to plot absolute values.

A limitation at the moment is that the data cuts and errors will be taken from the first specifiaction.

validphys.dataplots.plot_fancy_sv_dataspecs(dataspecs_results_with_scale_variations, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None)[source]

Exactly the same as plot_fancy_dataspecs but the theoretical results passed down are modified so that the 1-sigma error bands correspond to a combination of the PDF error and the scale variations collected over theoryids

See: validphys.results.results_with_scale_variations()

validphys.dataplots.plot_fits_chi2_spider(fits, fits_groups_chi2, fits_groups_data, processed_metadata_group)[source]: Plots the chi²s of all groups of datasets on a spider/radar diagram.

validphys.dataplots.plot_fits_datasets_chi2(fits_datasets_chi2_table)[source]: Generate a plot equivalent to plot_datasets_chi2 using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider(fits_datasets_chi2_table)[source]: Generate a plot equivalent to plot_datasets_chi2_spider using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider_bygroup(fits_datasets_chi2_table)[source]: Same as plot_fits_datasets_chi2_spider but one plot for each group.

validphys.dataplots.plot_fits_groups_data_chi2(fits_groups_chi2_table, processed_metadata_group)[source]: Generate a plot equivalent to plot_groups_data_chi2 using all the fitted group of data as input.

validphys.dataplots.plot_fits_groups_data_phi(fits_groups_phi_table, processed_metadata_group)[source]: Plots a set of bars for each fit, each bar represents the value of phi for the corresponding group of datasets, which is defined according to the keys in the PLOTTING info file

validphys.dataplots.plot_fits_phi_spider(fits, fits_groups_data, fits_groups_data_phi, processed_metadata_group)[source]: Like plot_fits_chi2_spider but for phi.

validphys.dataplots.plot_groups_data_chi2(groups_data, groups_chi2, processed_metadata_group)[source]: Plot the chi² of all groups of datasets with bars.

validphys.dataplots.plot_groups_data_chi2_spider(groups_data, groups_chi2, processed_metadata_group, pdf)[source]: Plot the chi² of all groups of datasets as a spider plot.

validphys.dataplots.plot_groups_data_phi_spider(groups_data, groups_data_phi, processed_metadata_group, pdf)[source]: Plot the phi of all groups of datasets as a spider plot.

validphys.dataplots.plot_obscorrs(corrpair_datasets, obs_obs_correlations, pdf)[source]: NOTE: EXPERIMENTAL. Plot the correlation matrix between a pair of datasets.

validphys.dataplots.plot_orbital_momentum(pdf, Q, partial_polarized_sum_rules)[source]: In addition to plotting the correlated spin moments as in plot_polarized_momentum, it also plots the contributions from the Orbital Angular Momentum.

validphys.dataplots.plot_phi(groups_data, groups_data_phi, processed_metadata_group)[source]

plots phi for each group of data as a bar for a single PDF input

See phi_data for information on how phi is calculated

validphys.dataplots.plot_phi_scatter_dataspecs(dataspecs_groups, dataspecs_speclabel, dataspecs_groups_bootstrap_phi)[source]: For each of the dataspecs, a bootstrap distribution of phi is generated for all specified groups of datasets. The distribution is then represented as a scatter point which is the median of the bootstrap distribution and an errorbar which spans the 68% confidence interval. By default the number of bootstrap samples is set to a sensible value, however it can be controlled by specifying bootstrap_samples in the runcard.

validphys.dataplots.plot_polarized_momentum(pdf, Q, partial_polarized_sum_rules, angular_momentum=False)[source]: Plot the correlated uncertainties for the truncated integrals of the polarized gluon and singlet distributions.

validphys.dataplots.plot_positivity(pdfs, positivity_predictions_for_pdfs, posdataset, pos_use_kin=False)[source]

Plot an errorbar spanning the central 68% CI of a positivity observable as well as a point indicating the central value (according to the pdf.stats_class.central_value()).

Errorbars and points are plotted on a symlog scale as a function of the data point index (if pos_use_kin==False) or the first kinematic variable (if pos_use_kin==True).

validphys.dataplots.plot_replica_sum_rules(pdf, sum_rules, Q)[source]: Plot the value of each sum rule as a function of the replica index

validphys.dataplots.plot_smpdf(pdf, dataset, obs_pdf_correlations, mark_threshold: float = 0.9)[source]

Plot the correlations between the change in the observable and the change in the PDF in (x,fl) space.

mark_threshold is the proportion of the maximum absolute correlation that will be used to mark the corresponding area in x in the background of the plot. The maximum absolute values are used for the comparison.

Examples

>>> from validphys.api import API
>>> data_input = {
>>>    "dataset_input" : {"dataset": "HERACOMBNCEP920"},
>>>    "theoryid": 200,
>>>     "use_cuts": "internal",
>>>     "pdf": "NNPDF40_nnlo_as_01180",
>>>     "Q": 1.6,
>>>     "mark_threshold": 0.2
>>> }
>>> smpdf_gen = API.plot_smpdf(**data_input)
>>> fig = next(smpdf_gen)
>>> fig.show()

validphys.dataplots.plot_training_length(replica_data, fit)[source]: Generate an histogram for the distribution of training lengths in a given fit. Each bin is normalised by the total number of replicas.

validphys.dataplots.plot_training_validation(fit, replica_data, replica_filters=None)[source]: Scatter plot with the training and validation chi² for each replica in the fit. The mean is also displayed as well as a line y=x to easily identify whether training or validation chi² is larger.

validphys.dataplots.plot_trainvaliddist(fit, replica_data)[source]: KDEs for the trainning and validation distributions for each replica in the fit.

validphys.dataplots.plot_xq2(dataset_inputs_by_groups_xq2map, use_cuts, data_input, display_cuts: bool = True, marker_by: str = 'process type', highlight_label: str = 'highlight', highlight_datasets: (<class 'collections.abc.Sequence'>, <class 'NoneType'>) = None, aspect: str = 'landscape')[source]

Plot the (x,Q²) coverage based of the data based on some LO approximations. These are governed by the relevant kintransform.

The representation of the filtered data depends on the display_cuts and use_cuts options:

If cuts are disabled (use_cuts is CutsPolicy.NOCUTS), all the data

will be plotted (and setting display_cuts to True is an error).

If cuts are enabled (use_cuts is either CutsPolicy.FROMFIT or

CutsPolicy.INTERNAL) and display_cuts is False, the masked points will be ignored.

If cuts are enabled and display_cuts is True, the filtered points

will be displaed and marked.

The points are grouped according to the marker_by option. The possible values are: “process type”, “experiment”, “group” or “dataset”.

Some datasets can be made to appear highlighted in the figure: Define a key called highlight_datasets containing the names of the datasets to be highlighted and a key highlight_label with a string containing the label of the highlight, which will appear in the legend.

Example

Obtain a plot with some reasonable defaults:

>>> from validphys.api import API
>>> inp = {

… ‘dataset_inputs’: [ … {‘dataset’: ‘NMC_NC_NOTFIXED_EM-F2’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘NMC_NC_NOTFIXED_P_EM-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘SLAC_NC_NOTFIXED_P_EM-F2’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘SLAC_NC_NOTFIXED_D_EM-F2’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘BCDMS_NC_NOTFIXED_P_EM-F2’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘BCDMS_NC_NOTFIXED_D_EM-F2’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘CHORUS_CC_NOTFIXED_PB_NU-SIGMARED’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘CHORUS_CC_NOTFIXED_PB_NB-SIGMARED’, ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘NUTEV_CC_NOTFIXED_FE_NU-SIGMARED’, ‘cfac’: [‘MAS’], ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘NUTEV_CC_NOTFIXED_FE_NB-SIGMARED’, ‘cfac’: [‘MAS’], ‘variant’: ‘legacy_dw’}, … {‘dataset’: ‘HERA_NC_318GEV_EM-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_225GEV_EP-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_251GEV_EP-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_300GEV_EP-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_318GEV_EP-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_CC_318GEV_EM-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_CC_318GEV_EP-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_318GEV_EAVG_CHARM-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘HERA_NC_318GEV_EAVG_BOTTOM-SIGMARED’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘DYE866_Z0_800GEV_DW_RATIO_PDXSECRATIO’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘DYE866_Z0_800GEV_PXSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘DYE605_Z0_38P8GEV_DW_PXSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘DYE906_Z0_120GEV_DW_PDXSECRATIO’, ‘cfac’: [‘ACC’], ‘variant’: ‘legacy’}, … {‘dataset’: ‘CDF_Z0_1P96TEV_ZRAP’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘D0_Z0_1P96TEV_ZRAP’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘D0_WPWM_1P96TEV_ASY’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_WPWM_7TEV_36PB_ETA’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_7TEV_36PB_ETA’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_7TEV_49FB_HIMASS’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_7TEV_LOMASS_M’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_WPWM_7TEV_46FB_CC-ETA’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_7TEV_46FB_CC-Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_7TEV_46FB_CF-Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_8TEV_HIMASS_M-Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_8TEV_LOWMASS_M-Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0_13TEV_TOT’, ‘cfac’: [‘NRM’], ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_WPWM_13TEV_TOT’, ‘cfac’: [‘NRM’], ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_WJ_8TEV_WP-PT’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_WJ_8TEV_WM-PT’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_Z0J_8TEV_PT-M’, ‘variant’: ‘legacy_10’}, … {‘dataset’: ‘ATLAS_Z0J_8TEV_PT-Y’, ‘variant’: ‘legacy_10’}, … {‘dataset’: ‘ATLAS_TTBAR_7TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_TTBAR_8TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_TTBAR_13TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_TTBAR_8TEV_LJ_DIF_YT-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_TTBAR_8TEV_LJ_DIF_YTTBAR-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_TTBAR_8TEV_2L_DIF_YTTBAR-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_1JET_8TEV_R06_PTY’, ‘variant’: ‘legacy_decorrelated’}, … {‘dataset’: ‘ATLAS_2JET_7TEV_R06_M12Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_PH_13TEV_XSEC’, ‘cfac’: [‘EWK’], ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_7TEV_T-Y-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_8TEV_T-RAP-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_WPWM_7TEV_ELECTRON_ASY’}, … {‘dataset’: ‘CMS_WPWM_7TEV_MUON_ASY’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_Z0_7TEV_DIMUON_2D’}, … {‘dataset’: ‘CMS_WPWM_8TEV_MUON_Y’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_Z0J_8TEV_PT-Y’, ‘cfac’: [‘NRM’], ‘variant’: ‘legacy_10’}, … {‘dataset’: ‘CMS_2JET_7TEV_M12Y’}, … {‘dataset’: ‘CMS_1JET_8TEV_PTY’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_7TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_8TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_13TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_8TEV_LJ_DIF_YTTBAR-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_5TEV_TOT_X-SEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_8TEV_2L_DIF_MTTBAR-YT-NORM’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_13TEV_2L_DIF_YT’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_TTBAR_13TEV_LJ_2016_DIF_YT’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_SINGLETOP_7TEV_TCHANNEL-XSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_SINGLETOP_8TEV_TCHANNEL-XSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘CMS_SINGLETOP_13TEV_TCHANNEL-XSEC’, ‘variant’: ‘legacy’}, … {‘dataset’: ‘LHCB_Z0_7TEV_DIELECTRON_Y’}, … {‘dataset’: ‘LHCB_Z0_8TEV_DIELECTRON_Y’}, … {‘dataset’: ‘LHCB_WPWM_7TEV_MUON_Y’, ‘cfac’: [‘NRM’]}, … {‘dataset’: ‘LHCB_Z0_7TEV_MUON_Y’, ‘cfac’: [‘NRM’]}, … {‘dataset’: ‘LHCB_WPWM_8TEV_MUON_Y’, ‘cfac’: [‘NRM’]}, … {‘dataset’: ‘LHCB_Z0_8TEV_MUON_Y’, ‘cfac’: [‘NRM’]}, … {‘dataset’: ‘LHCB_Z0_13TEV_DIMUON-Y’}, … {‘dataset’: ‘LHCB_Z0_13TEV_DIELECTRON-Y’} … ], … ‘use_cuts’: ‘internal’, … ‘display_cuts’: False, … ‘theoryid’: 40_000_000, … ‘highlight_label’: ‘Tevatron’, … ‘highlight_datasets’: [‘CDF_Z0_1P96TEV_ZRAP’, ‘D0_Z0_1P96TEV_ZRAP’, ‘D0_WPWM_1P96TEV_ASY’] … } >>> API.plot_xq2(**inp) <Figure size 1024x768 with 1 Axes>

validphys.deltachi2 module

deltachi2.py

Plots and data processing that can be used in a delta chi2 analysis

class validphys.deltachi2.PDFEpsilonPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Subclassing PDFPlotter in order to plot epsilon (measure of gaussanity) for multiple PDFs, yielding a separate figure for each flavour

draw(pdf, grid, flstate)[source]: Obtains the gridvalues of epsilon (measure of Gaussianity)

get_ylabel(parton_name)[source]

legend(flstate)[source]

setup_flavour(flstate)[source]

validphys.deltachi2.check_pdf_is_symmhessian(pdf, **kwargs)[source]: Check pdf has error type of symmhessian

validphys.deltachi2.check_pdfs_are_montecarlo(pdfs, **kwargs)[source]: Checks that the action is applied only to a pdf consisiting of MC replicas.

validphys.deltachi2.delta_chi2_hessian(pdf, total_chi2_data)[source]: Return delta_chi2 (computed as in plot_delta_chi2_hessian) relative to each eigenvector of the Hessian set.

validphys.deltachi2.plot_delta_chi2_hessian_distribution(delta_chi2_hessian, pdf, total_chi2_data)[source]: Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_delta_chi2_hessian_eigenv(delta_chi2_hessian, pdf)[source]: Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_epsilon(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, eps=None)[source]

Plot the discrepancy (epsilon) of the 1-sigma and 68% bands at each grid value for all pdfs for a given Q. See https://arxiv.org/abs/1505.06736 eq. (11)

xscale is read from pdf plotting_grid scale, which is ‘log’ by default.

eps defines the value at which plot a simple hline

validphys.deltachi2.plot_kullback_leibler(delta_chi2_hessian)[source]

Determines the Kullback–Leibler divergence by comparing the expectation value of Delta chi2 to the cumulative distribution function of chi-square distribution with one degree of freedom (see: https://en.wikipedia.org/wiki/Chi-square_distribution).

The Kullback-Leibler divergence provides a measure of the difference between two distribution functions, here we compare the chi-squared distribution and the cumulative distribution of the expectation value of Delta chi2.

validphys.deltachi2.plot_pos_neg_pdfs(pdf, pos_neg_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None)[source]: Plot the the uncertainty of the original hessian pdfs, as well as that of the positive and negative subset.

validphys.deltachi2.pos_neg_xplotting_grids(delta_chi2_hessian, xplotting_grid)[source]: Generates xplotting_grids correspodning to positive and negative delta chi2s.

validphys.eff_exponents module

Tools for computing and plotting effective exponents.

class validphys.eff_exponents.ExponentBandPlotter(hlines, exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter, PreprocessingPlotter

draw(pdf, grid, flstate)[source]

Overload BandPDFPlotter.draw() to plot bands of the effective exponent calculated from the replicas and horizontal lines for the effective exponents of the previous/next fits, if possible.

flstate is an element of the flavours for the first pdf specified in pdfs. If this flavour doesn’t exist in the current pdf’s fitbasis or the set of flavours for which the preprocessing exponents exist for the current pdf no horizontal lines are plotted.

class validphys.eff_exponents.PreprocessingPlotter(exponent, *args, **kwargs)[source]

Bases: PDFPlotter

Class inherenting from BandPDFPlotter, changing title and ylabel to reflect the effective exponent being plotted.

get_title(parton_name)[source]

get_ylabel(parton_name)[source]

validphys.eff_exponents.alpha_eff(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent alpha at the specified values of x and flavour. alpha is relevant at small x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.beta_eff(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent beta at the specified values of x and flavour. beta is relevant at large x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.effective_exponents_table_internal(next_effective_exponents_table, *, fit=None, basis)[source]

Returns a table which concatenates previous_effective_exponents_table and next_effective_exponents_table if both tables contain effective exponents in the same basis.

If the previous exponents are in a different basis, or no fit was given to read the previous exponents from, then only the next exponents table is returned, for plotting purposes.

validphys.eff_exponents.fmt(a)

validphys.eff_exponents.get_alpha_lines(effective_exponents_table_internal)[source]: Given an effective_exponents_table_internal returns the rows with bounds of the alpha effective exponent for all flavours, used to plot horizontal lines on the alpha effective exponent plots.

validphys.eff_exponents.get_beta_lines(effective_exponents_table_internal)[source]: Same as get_alpha_lines but for beta

validphys.eff_exponents.iterate_preprocessing_yaml(fit, next_fit_eff_exps_table, _flmap_np_clip_arg=None)[source]

Using py:func:next_effective_exponents_table update the preprocessing exponents of the input fit. This is part of the usual pipeline referred to as “iterating a fit”, for more information see: How to run an iterated fit. A fully iterated runcard can be obtained from the action iterated_runcard_yaml().

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterate_preprocessing_yaml@} `

Alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterate_preprocessing_yaml(fit=<fit name>)
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)

Parameters

fit (validphys.core.FitSpec) – Whose preprocessing range will be iterated, the output runcard will be the same as the one used to run this fit, except with new preprocessing range.
next_fit_eff_exps_table (pd.DataFrame) – Table outputted by next_fit_eff_exps_table() containing the next preprocessing ranges.
_flmap_np_clip_arg (dict) – Internal argument used by vp-nextfitruncard. Dictionary containing a mapping like {<flavour>: {<largex/smallx>: {a_min: <min value>, a_max: <max value>}}}. If a flavour is present in _flmap_np_clip_arg then the preprocessing ranges will be passed through np.clip with the arguments supplied in the mapping.

validphys.eff_exponents.iterated_runcard_yaml(fit, update_runcard_description_yaml)[source]

Takes the runcard with preprocessing iterated and description updated then

Updates the t0 pdf, the fiatlux pdf, and the theory covmat pdf to be fit
Modifies the random seeds (to random unsigned long ints)

This should facilitate running a new fit with identical input settings as the specified fit with the t0, seeds and preprocessing iterated. For more information see: How to run an iterated fit

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterated_runcard_yaml@} `

alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterated_runcard_yaml(
...     fit=<fit name>,
...     _updated_description="My iterated fit"
... )
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)

validphys.eff_exponents.next_effective_exponents_table(pdf: ~validphys.core.PDF, *, fitq0fromfit: (<class 'numbers.Real'>, <class 'NoneType'>) = None, x1_alpha: ~numbers.Real = 1e-06, x2_alpha: ~numbers.Real = 0.001, x1_beta: ~numbers.Real = 0.65, x2_beta: ~numbers.Real = 0.95, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Given a PDF, calculate the next effective exponents

By default x1_alpha = 1e-6, x2_alpha = 1e-3, x1_beta = 0.65, and x2_beta = 0.95, but different values can be specified in the runcard. The values control where the bounds of alpha and beta are evaluated:

alpha_min:: singlet/gluon: the 2x68% c.l. lower value evaluated at x=`x1_alpha` others : min(2x68% c.l. lower value evaluated at x=`x1_alpha` and x=`x2_alpha`)
alpha_max:: singlet/gluon: min(2 and the 2x68% c.l. upper value evaluated at x=`x1_alpha`) others : min(2 and max(2x68% c.l. upper value evaluated at x=`x1_alpha`

and x=`x2_alpha`))
beta_min:: max(0 and min(2x68% c.l. lower value evaluated at x=`x1_beta` and x=`x2_beta`))
beta_max:: max(2x68% c.l. upper value evaluated at x=`x1_beta` and x=`x2_beta`)

validphys.eff_exponents.plot_alpha_eff(fits_pdf, alpha_eff_fits, fits_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

validphys.eff_exponents.plot_alpha_eff_internal(pdfs, alpha_eff_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

validphys.eff_exponents.plot_beta_eff(fits_pdf, beta_eff_fits, fits_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]: Same as plot_alpha_eff but for beta effective exponents

validphys.eff_exponents.plot_beta_eff_internal(pdfs, beta_eff_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]: Same as plot_alpha_eff_internal but for beta effective exponent

validphys.eff_exponents.previous_effective_exponents(basis: str, fit: (<class 'validphys.core.FitSpec'>, <class 'NoneType'>) = None)[source]: If provided with a fit, check that the basis is the basis which was fitted if so then return the previous effective exponents read from the fit runcard.

validphys.eff_exponents.previous_effective_exponents_table(fit: FitSpec)[source]: Given a fit, reads the previous exponents from the fit runcard

validphys.eff_exponents.update_runcard_description_yaml(iterate_preprocessing_yaml, _updated_description=None)[source]

Take the runcard with iterated preprocessing and update the description if _updated_description is provided. As with iterate_preprocessing_yaml() the result can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@update_runcard_description_yaml@} `

validphys.filters module

Filters for NNPDF fits

class validphys.filters.AddedFilterRule(dataset: str = None, process_type: str = None, rule: str = None, reason: str = None, local_variables: Mapping[str, Union[str, float]] = None, PTO: str = None, FNS: str = None, IC: str = None)[source]

Bases: FilterRule

Dataclass which carries extra filter rule that is added to the default rule.

exception validphys.filters.BadPerturbativeOrder[source]

Bases: ValueError

Exception raised when the perturbative order string is not recognized.

exception validphys.filters.FatalRuleError[source]

Bases: Exception

Exception raised when a rule application failed at runtime.

class validphys.filters.FilterDefaults(q2min: float = None, w2min: float = None, maxTau: float = None)[source]

Bases: object

Dataclass carrying default values for filters (cuts) taking into account the values of q2min, w2min and maxTau.

maxTau: float = None

q2min: float = None

to_dict()[source]

w2min: float = None

class validphys.filters.FilterRule(dataset: str = None, process_type: str = None, rule: str = None, reason: str = None, local_variables: Mapping[str, Union[str, float]] = None, PTO: str = None, FNS: str = None, IC: str = None)[source]

Bases: object

Dataclass which carries the filter rule information.

FNS: str = None

IC: str = None

PTO: str = None

dataset: str = None

local_variables: Mapping[str, Union[str, float]] = None

process_type: str = None

reason: str = None

rule: str = None

to_dict()[source]

exception validphys.filters.MissingRuleAttribute[source]

Bases: RuleProcessingError, AttributeError

Exception raised when a rule is missing required attributes.

class validphys.filters.PerturbativeOrder(string)[source]

Bases: object

Class that conveniently handles perturbative order declarations for use within the Rule class filter.

Parameters

string (str) –

A string in the format of NNLO or equivalently N2LO. This can be followed by one of ! + - or none.

The syntax allows for rules to be executed only if the perturbative order is within a given range. The following enumerates all 4 cases as an example:

NNLO+ only execute the following rule if the pto is 2 or greater NNLO- only execute the following rule if the pto is strictly less than 2 NNLO! only execute the following rule if the pto is strictly not 2 NNLO only execute the following rule if the pto is exactly 2

Any unrecognized string will raise a BadPerturbativeOrder exception.

Example

>>> from validphys.filters import PerturbativeOrder
>>> pto = PerturbativeOrder("NNLO+")
>>> pto.numeric_pto
2
>>> 1 in pto
False
>>> 2 in pto
True
>>> 3 in pto
True

parse()[source]

class validphys.filters.Rule(initial_data: FilterRule, *, defaults: dict, theory_parameters: dict, loader=None)[source]

Bases: object

Rule object to be used to generate cuts mask.

A rule object is created for each rule in ./cuts/filters.yaml

Old commondata relied on the order of the kinematical variables to be the same as specified in the KIN_LABEL dictionary set in this module. The new commondata specification instead defines explicitly the name of the variables in the metadata. Therefore, when using a new-format commondata, the KIN_LABEL dictionary will not be used and the variables defined in it will be used instead.

Parameters

initial_data (dict) –
A dictionary containing all the information regarding the rule. This contains the name of the dataset the rule to applies to and/or the process type the rule applies to. Additionally, the rule itself is defined, alongside the reason the rule is used. Finally, the user can optionally define their own custom local variables.

By default these are defined in cuts/filters.yaml
defaults (dict) –
A dictionary containing default values to be used globally in all rules.

By default these are defined in cuts/defaults.yaml
theory_parameters – Dict containing pairs of (theory_parameter, value)
loader (validphys.loader.Loader, optional) – A loader instance used to retrieve the datasets.

numpy_functions = {'fabs': <ufunc 'fabs'>, 'log': <ufunc 'log'>, 'sqrt': <ufunc 'sqrt'>}

exception validphys.filters.RuleProcessingError[source]

Bases: Exception

Exception raised when we couldn’t process a rule.

validphys.filters.check_additional_errors(additional_errors)[source]: Lux additional errors pdf check

validphys.filters.check_integrability(integdatasets)[source]: Verify positive datasets are ready for the fit.

validphys.filters.check_luxset(luxset)[source]: Lux pdf check

validphys.filters.check_nonnegative(var: str)[source]: Ensure that var is positive

validphys.filters.check_positivity(posdatasets)[source]: Verify positive datasets are ready for the fit.

validphys.filters.check_t0pdfset(t0pdfset)[source]: T0 pdf check

validphys.filters.check_unpolarized_bc(unpolarized_bc)[source]: Check that unpolarized PDF bound can be loaded normally.

validphys.filters.default_filter_rules_input()[source]: Return a tuple of FilterRule objects. These are defined in filters.yaml in the validphys.cuts module. Similarly to parse_added_filter_rules, this function checks if the rules are unique, i.d. if there are no multiple rules for the same dataset of process with the same rule (reason and local_variables are not hashed).

validphys.filters.default_filter_settings_input()[source]: Return a FilterDefaults dataclass with the default hardcoded filter settings. These are defined in defaults.yaml in the validphys.cuts module.

validphys.filters.export_mask(path, mask)[source]: Dump mask to file

validphys.filters.filter(filter_data)[source]: Summarise filters applied to all datasets

validphys.filters.filter_closure_data_by_experiment(filter_path, experiments_data, fakepdf, fakenoise, filterseed, data_index, sep_mult)[source]

Applies _filter_closure_data() on each experiment in the closure test.

This function just peforms a for loop over experiments, the reason we don’t use reportengine.collect is that it can permute the order in which closure data is generate, which means that the pseudodata is not reproducible.

validphys.filters.filter_inconsistent_closure_data_by_experiment(filter_path, experiments_data, fakepdf, fakenoise, filterseed, data_index, sep_mult, inconsistent_data_settings)[source]: Like filter_closure_data_by_experiment() except for inconsistent closure tests.

validphys.filters.filter_real_data(filter_path, data)[source]: Filter real data, cutting any points which do not pass the filter rules.

validphys.filters.get_cuts_for_dataset(commondata, rules) → list[source]

Function to generate a list containing the index of all experimental points that passed kinematic cut rules stored in ./cuts/filters.yaml

Parameters

commondata (nnpdf_data.coredata.CommonData) –
rules (List[Rule]) – A list of Rule objects specifying the filters.

Returns

mask – List object containing index of all passed experimental values

Return type

list

Example

>>> from validphys.filters import (get_cuts_for_dataset, Rule,
...     default_filter_settings, default_filter_rules_input)
>>> from validphys.loader import Loader
>>> l = Loader()
>>> cd = l.check_commondata("NMC")
>>> theory = l.check_theoryID(53)
>>> filter_defaults = default_filter_settings()
>>> params = theory.get_description()
>>> rule_list = [Rule(initial_data=i, defaults=filter_defaults, theory_parameters=params)
...     for i in default_filter_rules_input()]
>>> get_cuts_for_dataset(cd, rules=rule_list)

validphys.filters.make_dataset_dir(path)[source]: Creates directory at path location.

validphys.fitdata module

Utilities for loading data from fit folders

class validphys.fitdata.DatasetComp(common, first_only, second_only)

Bases: tuple

common: Alias for field number 0

first_only: Alias for field number 1

second_only: Alias for field number 2

class validphys.fitdata.FitInfo(nite: int, training: float, validation: float, chi2: float, pos_flag: bool, arclengths: ndarray, integnumbers: ndarray)[source]

Bases: object

Hold various metadata about the replicas of a fit

Parameter

nite: best (stop) epoch of the replica training: Value of the training error function validation: Value of the validation error function chi2: Value of the experimental chi2 pos_flag: State of the positivity pass flag arclengths: Array of the arc-length of each parton integrability: Array of values for the integrability checks

arclengths: ndarray

chi2: float

property has_converged

the validation loss being smaller than the threshold and the positivity criterion being satisfied. When a fit does not reach convergence, the positivity flag is never flipped to true.

Type: Uses the positivity flag as a proxy for convergence. Where convergence is defined mainly through the two constraints

integnumbers: ndarray

nite: int

pos_flag: bool

training: float

validation: float

validphys.fitdata.check_lhapdf_info(results_dir, fitname)[source]: Check that an LHAPDF info metadata file is present in the fit results

validphys.fitdata.check_nnfit_results_path(path)[source]: Returns True if the requested path is a valid results directory, i.e if it is a directory and has a ‘nnfit’ subdirectory

validphys.fitdata.check_replica_files(replica_path, prefix)[source]: Verification of a replica results directory at replica_path for a fit named prefix. Returns True if the results directory is complete

validphys.fitdata.datasets_properties_table(data_input)[source]: Return dataset properties for each dataset in data_input

validphys.fitdata.fit_code_version(fit)[source]: Returns table with the code version from replica_1/{fitname}.json files. Note that the version for thensorflow distinguishes between the mkl=on and off version

validphys.fitdata.fit_datasets_properties_table(fitinputcontext)[source]: Returns table of dataset properties for each dataset used in a fit.

validphys.fitdata.fit_summary(fit_name_with_covmat_label, replica_data, total_chi2_data, total_phi_data)[source]

Summary table of fit properties - Central chi-squared - Average chi-squared - Training and Validation error functions - Training lengths - Phi

Note: Chi-squared values from the replica_data are not used here (presumably they are fixed to being t0)

This uses a corrected form for the error on phi in comparison to the vp1 value. The error is propagated from the uncertainty on the average chi-squared only.

validphys.fitdata.fit_theory_covmat_summary(fit, fitthcovmat)[source]: returns a table with a single column for the fit, with three rows indicating if the theory covariance matrix was used in the ‘sampling’ of the pseudodata, the ‘fitting’, and the ‘validphys statistical estimators’ in the current namespace for that fit.

validphys.fitdata.fits_replica_data_correlated(fits_replica_data, fits_replica_indexes, fits)[source]

Return a table with the same columns as replica_data indexed by the replica fit ID. For identical fits, the values across rows should be the same.

If some replica ID is not present for a given fit (e.g. discarded by postfit), the corresponding entries in the table will be null.

validphys.fitdata.fits_version_table(fits_fit_code_version)[source]: Produces a table of version information for multiple fits.

validphys.fitdata.fitted_replica_indexes(pdf)[source]: Return nnfit index of replicas 1 to N.

validphys.fitdata.load_fitinfo(replica_path, prefix)[source]: Process the data in the .json. file for a single replica into a FitInfo object. If the .json file does not exist an old-format fit is assumed and old_load_fitinfo will be called instead.

validphys.fitdata.match_datasets_by_name(fits, fits_datasets)[source]: Return a tuple with common, first_only and second_only. The elements of the tuple are mappings where the keys are dataset names and the values are the two datasets contained in each fit for common, and the corresponfing dataset inclucded only in the first fit and only in the second fit.

validphys.fitdata.num_fitted_replicas(fit)[source]: Function to obtain the number of nnfit replicas. That is the number of replicas before postfit was run.

validphys.fitdata.print_dataset_differences(fits, match_datasets_by_name, print_common: bool = True)[source]

Given exactly two fits, print the datasets that are included in one ” “but not in the other. If print_common is True, also print the datasets that are common.

For the purposes of visual aid, everything is ordered by the dataset name, in terms of the the convention for the commondata means that everything is order by:

Experiment name

Process

Energy

validphys.fitdata.print_different_cuts(fits, test_for_same_cuts)[source]: Print a summary of the datasets that are included in both fits but have different cuts.

validphys.fitdata.print_systype_overlap(groups_commondata, group_dataset_inputs_by_metadata)[source]: Returns a set of systypes that overlap between groups. Discards the set of systypes which overlap but do not imply correlations

validphys.fitdata.replica_data(fit, replica_paths)[source]

Load the necessary data from the .json file of each of the replicas. The corresponding PDF set must be installed in the LHAPDF path.

The included information is:

(‘nite’, ‘training’, ‘validation’, ‘chi2’, ‘pos_status’, ‘arclenghts’)

validphys.fitdata.replica_paths(fit)[source]: Return the paths of all the replicas

validphys.fitdata.summarise_fits(collected_fit_summaries)[source]: Produces a table of basic comparisons between fits, includes all the fields used in fit_summary

validphys.fitdata.summarise_theory_covmat_fits(fits_theory_covmat_summary)[source]: Collects the theory covmat summary for all fits and concatenates them into a single table

validphys.fitdata.t0_chi2_info_table(pdf, dataset_inputs_abs_chi2_data, t0pdfset, use_t0)[source]: Provides table with - t0pdfset name - Central t0-chi-squared - Average t0-chi-squared

validphys.fitdata.test_for_same_cuts(fits, match_datasets_by_name)[source]: Given two fits, return a list of tuples (first, second) where first and second are DatasetSpecs that correspond to the same dataset but have different cuts, such that first is included in the first fit and second in the second.

validphys.fitveto module

fitveto.py

Module for the determination of passing fit replicas.

Current active vetoes:: Convergence - Replicas with FitInfo.has_converged == False ChiSquared - Replicas with ChiSquared > nsigma_discard_chi2*StandardDev + Average ArclengthX - Replicas with ArcLengthX > nsigma_discard_arclength*StandardDev + Average Integrability - Replicas with IntegrabilityNumbers < integ_threshold

validphys.fitveto.determine_vetoes(fitinfos: list, nsigma_discard_chi2: float, nsigma_discard_arclength: float, integ_threshold: float)[source]: Assesses whether replica fitinfo passes standard NNPDF vetoes Returns a dictionary of vetoes and their passing boolean masks. Included in the dictionary is a ‘Total’ veto.

validphys.fitveto.distribution_veto(dist, prior_mask, nsigma_threshold)[source]

For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy:

value <= mean + nsigma_threshold*standard_deviation

Only points passing the prior_mask are considered in the average or standard deviation.

validphys.fitveto.integrability_veto(dist, integ_threshold)[source]: For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy: value <= integ_threshold

validphys.fitveto.save_vetoes_info(veto_dict: dict, chi2_threshold, arclength_threshold, integ_threshold, filepath)[source]: Saves to file the chi2 and arclength thresholds used by postfit as well as veto dictionaries which contain information on which replicas pass each veto.

validphys.fkparser module

This module implements parsers for FKtable and CFactor files into useful datastructures, contained in the validphys.coredata module, which can be easily pickled and interfaced with common Python libraries.

Most users will be interested in using the high level interface load_fktable(). Given a validphys.core.FKTableSpec object, it returns an instance of validphys.coredata.FKTableData, an object with the required information to compute a convolution, with the CFactors applied.

from validphys.fkparser import load_fktable
from validphys.loader import Loader
l = Loader()
fk = l.check_fktable(setname="ATLASTTBARTOT", theoryID=53, cfac=('QCD',))
res = load_fktable(fk)

exception validphys.fkparser.BadCFactorError[source]

Bases: Exception

Exception raised when an CFactor cannot be parsed correctly

exception validphys.fkparser.BadFKTableError[source]

Bases: Exception

Exception raised when an FKTable cannot be parsed correctly

class validphys.fkparser.GridInfo(setname: str, hadronic: bool, ndata: int, nx: int)[source]

Bases: object

Class containing the basic properties of an FKTable grid.

hadronic: bool

ndata: int

nx: int

setname: str

validphys.fkparser.load_fktable(spec)[source]: Load the data corresponding to a FKSpec object. The cfactors will be applied to the grid. If we have a new-type fktable, call directly load(), otherwise fallback to the old parser

validphys.fkparser.open_fkpath(path)[source]

Return a file-like object from the fktable path, regardless of whether it is compressed

Parameters

path: Path or str: Path like file containing a valid FKTable. It can be either inside a tarball or in plain text.

returns: f – A file like object for further processing.
rtype: file

validphys.fkparser.parse_cfactor(f)[source]

Parse an open byte stream into a :py:class`CFactorData`. Raise a BadCFactorError if problems are encountered.

Parameters: f (file) – Binary file-like object
Returns: cfac – An object containing the data on the cfactor for each point.
Return type: CFactorData

validphys.fkparser.parse_fktable(f)[source]

Parse an open byte stream into an FKTableData. Raise a BadFKTableError if problems are encountered.

Parameters: f (file) – Open file-like object. See :func:`open_fkpath`to obtain it.
Returns: fktable – An object containing the FKTable data and information.
Return type: FKTableData

Notes

This function operates at the level of a single file, and therefore it does not apply CFactors (see load_fktable() for that) or handle operations within COMPOUND ensembles.

validphys.gridvalues module

gridvalues.py

Core functionality needed to obtain a set of values from LHAPDF. The tools for representing these grids are in pdfgrids.py (the validphys provider module), and the basis transformations are in pdfbases.py

validphys.gridvalues.central_grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Same as grid_values() but it returns only the central values. The return value is indexed as:

grid_values[replica][flavour][x][Q]

where the first dimension (coresponding to the central member of the PDF set) is always one.

validphys.gridvalues.evaluate_luminosity(pdf_set: LHAPDFSet, n: int, s: float, mx: float, x1: float, x2: float, channel)[source]

Returns PDF luminosity at specified values of mx, x1, x2, sqrts**2 for a given channel.

pdf_set: The interested PDF set s: The square of the center of mass energy GeV^2. mx: The invariant mass bin GeV. x1 and x2: The partonic x1 and x2. channel: The channel tag name from LUMI_CHANNELS.

validphys.gridvalues.grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Evaluate x*f(x) on a grid of points in flavour, x and Q.

Parameters

pdf (PDF) – Any PDF set
flmat (iterable) – A list of PDG IDs corresponding the the LHAPDF flavours in the grid.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.

Returns

A 4-dimension array with the PDF values at the input parameters
for each replica. The return value is indexed as follows:: – grid_values[replica][flavour][x][Q]

validphys.hessian2mc module

validphys.hessian2mc.py

This module contains the functions that can be used to convert Hessian sets like MSHT20 and CT18 to Monte Carlo sets. The functions implemented here follow equations (4.3) of the paper arXiv:2203.05506

validphys.hessian2mc.write_hessian_to_mc_watt_thorne(pdf, mc_pdf_name, num_members, watt_thorne_rnd_seed=1)[source]

Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne (MSHT20) prescription described in Eq. 4.3 of arXiv:2203.05506.

Parameters

pdf (validphys.core.PDF) – The Hessian PDF set that is to be converted to Monte Carlo.
mc_pdf_name (str) – The name of the new Monte Carlo PDF set.

validphys.hessian2mc.write_mc_watt_thorne_replicas(Rjk_std_normal, replicas_df, mc_pdf_path)[source]

Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne prescription described in Eq. 4.3 of arXiv:2203.05506.

Parameters

Rjk_std_normal (np.ndarray) – Array of shape (num_members, n_eig) containing random standard normal numbers.
replicas_df (pd.DataFrame) – DataFrame containing replicas of the hessian set at all scales.
mc_pdf_path (pathlib.Path) – Path to the new Monte Carlo PDF set.

validphys.hessian2mc.write_new_lhapdf_info_file_from_previous_pdf(path_old_pdfset, name_old_pdfset, path_new_pdfset, name_new_pdfset, num_members, description_set='MC representation of hessian PDF set', errortype='replicas')[source]: Writes a new LHAPDF set info file based on an existing set.

validphys.hyper_algorithm module

This module contains functions dedicated to process the json dictionaries

validphys.hyper_algorithm.autofilter_dataframe(dataframe, keys, n_to_combine=1, n_to_kill=1, threshold=-1)[source]

Receives a dataframe and a list of keys. Creates combinations of n_to_combine keys and computes the reward Finally removes from the dataframe the n_to_kill worse combinations

Anything under threshold will be removed and will not count towards the n_to_kill (by default threshold = -50 so only things which are really bad will be removed)

# Arguments:

dataframe: a pandas dataframe
keys: keys to combine
n_to_combine: how many keys do we want to combine
n_to_kill: how many combinations to kill
threshold: anything under this reward will be removed

# Returns:

dataframe_sliced: a slice of the dataframe with the weakest combinations
removed

validphys.hyper_algorithm.bin_generator(df_values, max_n=10)[source]

Receives a dataframe with a list of unique values . If there are more than max_n of them and they are numeric, create max_n bins. If they are already discrete values or there are less than max_n options, output the same input

# Arguments:

df_values: dataframe with unique values
maximum: maximum number of allowed different values

# Returns:

new_vals: list of tuples with (initial, end) value of the bin

validphys.hyper_algorithm.compute_reward(mdict, biggest_ntotal)[source]

Given a combination dictionary computes the reward function:

If the fail rate for this combination is above the fail threshold, rewards is -100

The formula below for the reward takes into account:

The rate of ok fits that have a loss below the loss_threshold
The rate of fits that failed
The std deviation
How far away is the median from the best loss
How far away are median and average

validphys.hyper_algorithm.dataframe_removal(dataframe, hit_list)[source]

Removes all combinations defined in hit_list from the dataframe. The hit list is list of dictionaries containing the ‘slice’ key where ‘slice’ must be a slice of ‘dataframe’

# Arguments:

dataframe: a pandas dataframe
hit_list: the list of element to remove

# Returns:

new_dataframe: the same dataframe with all elements from hit_list removed

validphys.hyper_algorithm.get_combinations(key_info, ncomb)[source]

Given a dictionary mapping keys to iterables of possible values (key_info), return a list of the product of all possible mappings of a subset of ncomb keys to single values out of the corresponding possible values, for all such subsets.

For instance, key_info = {

‘key1’ : [val1-1, val1-2, …], ‘key2’ : [val2-1, val2-2, …], }

ncomb = 2

will return a list of dictionaries: [ {‘key1’ : val1-1, ‘key2’, val2-1 … }, {‘key1’ : val1-1, ‘key2’, val2-2 … }, {‘key1’ : val1-2, ‘key2’, val2-1 … }, {‘key1’ : val1-2, ‘key2’, val2-2 … }, ]

Get all combinations of ncomb elements for the keys and values given in the dictionary key_info:

# Arguments:

key_info: dictionary with the possible values for each key
ncomb: elements to combine

# Returns:

all_combinations: A list of dictionaries of parameters

validphys.hyper_algorithm.get_slice(dataframe, query_dict)[source]

Returns a slice of the dataframe where some keys match some values keys_info must be a dictionary {key1 : value1, key2, value2 …} # Arguments:

dataframe: a pandas dataframe

query_dict: a dictionary of combination as given by get_combinations

validphys.hyper_algorithm.parse_keys(dataframe, keys)[source]

Receives a dataframe and a set of keys Looks into the dataframe to read the possible values of the keys

Returns a dictionary { ‘key’ : [possible values] },

If the values are not discrete then we need to bin it let’s do this for anything with two many numerical values

# Arguments:

dataframe: a pandas dataframe
keys: keys to combine

# Returns:

key_info: a dictionary with the possible values for each key

validphys.hyper_algorithm.process_slice(df_slice)[source]

Function to process a slice into a dictionary with useful stats If the slice is None it means the combination does not apply

# Arguments:

df_slice: a slice of a pandas dataframe

# Returns:

proc_dict: a dictionary of stats

validphys.hyper_algorithm.study_combination(dataframe, query_dict)[source]

Given a dataframe and a dictionary of {key1 : value1, key2: value2} returns a dictionary with a number of stats for that combination

# Arguments:

dataframe: a pandas dataframe
query_dict: a dictionary for a combination as given by get_combinations

# Returns:

proc_dict: a dictionary of the “statistics” for this combination

validphys.hyperoptplot module

Module for the parsing and plotting of the results and output of previous hyperparameter scans

class validphys.hyperoptplot.HyperoptTrial(trial_dict, base_params=None, minimum_losses=1, linked_trials=None)[source]

Bases: object

Hyperopt trial class. Makes the dictionary-like output of hyperopt into an object that can be easily managed

Parameters

trial_dict (dict) – one single result (a dictionary) from a tries.json file
base_params (dict) – Base parameters of the runcard which can be used to complete the hyperparameter dictionary when not all parameters were scanned
minimum_losses (int) – Minimum number of losses to be found in the trial for it to be considered succesful
linked_trials (list) – List of trials coming from the same file as this trial

get(item, default=None)[source]

link_trials(list_of_trials)[source]: Link a list of trials to this trial

property loss: Return the loss of the hyperopt dict

property params: Parameters for the fit

property reward: Return and cache the reward value

property weighted_reward: Return the reward weighted to the mean value of the linked trials

validphys.hyperoptplot.best_setup(hyperopt_dataframe, hyperscan_config, commandline_args)[source]: Generates a clean table with information on the hyperparameter settings of the best setup.

validphys.hyperoptplot.evaluate_trial(trial_dict, validation_multiplier, fail_threshold, loss_target)[source]: Read a trial dictionary and compute the true loss and decide whether the run passes or not

validphys.hyperoptplot.filter_by_string(filter_string)[source]

Receives a data_dict (a parsed trial) and a filter string, returns True if the trial passes the filter

filter string must have the format: key<operator>string where <operator> can be any of !=, =, >, <

# Arguments:

filter_string: the expresion to evaluate

# Returns:

filter_function: a function that takes a data_dict and
returns true if the condition in filter_string passes

validphys.hyperoptplot.generate_dictionary(replica_path, loss_target, json_name='tries.json', starting_index=0, val_multiplier=0.5, fail_threshold=10.0)[source]

Reads a json file and returns a list of dictionaries

# Arguments:

replica_path: folder in which the tries.json file can be found
starting_index: if the trials are to be added to an already existing
set, make sure the id has the correct index!
val_multiplier: validation multipler
fail_threhsold: threshold for the loss to consider a configuration as a failure

validphys.hyperoptplot.hyperopt_dataframe(commandline_args)[source]: Loads the data generated by running hyperopt and stored in json files into a dataframe, and then filters the data according to the selection criteria provided by the command line arguments. It then returns both the entire dataframe as well as a dataframe object with the hyperopt parametesr of the best setup.

validphys.hyperoptplot.hyperopt_table(hyperopt_dataframe)[source]: Generates a table containing complete information on all the tested setups that passed the filters set in the commandline arguments.

validphys.hyperoptplot.order_axis(df, bestdf, key)[source]: Helper function for ordering the axis and make sure the best is always first

validphys.hyperoptplot.parse_architecture(trial)[source]

This function parses the family of parameters which regards the architecture of the NN

number_of_layers activation_per_layer nodes_per_layer l1, l2, l3, l4… max_layers layer_type dropout initializer

validphys.hyperoptplot.parse_optimizer(trial)[source]

This function parses the parameters that affect the optimization

optimizer learning_rate (if it exists)

validphys.hyperoptplot.parse_statistics(trial)[source]

Parse the statistical information of the trial

validation loss testing loss status of the run

validphys.hyperoptplot.parse_stopping(trial)[source]

This function parses the parameters that affect the stopping

epochs stopping_patience pos_initial pos_multiplier

validphys.hyperoptplot.parse_trial(trial)[source]: Trials are very convoluted object, very branched inside The goal of this function is to separate said branching so we can create hierarchies

validphys.hyperoptplot.plot_activation_per_layer(hyperopt_dataframe)[source]: Generates a violin plot of the loss per activation function.

validphys.hyperoptplot.plot_clipnorm(hyperopt_dataframe, optimizer_name)[source]: Generates a scatter plot of the loss as a function of the clipnorm for a given optimizer.

validphys.hyperoptplot.plot_epochs(hyperopt_dataframe)[source]: Generates a scatter plot of the loss as a function the number of epochs.

validphys.hyperoptplot.plot_initializer(hyperopt_dataframe)[source]: Generates a violin plot of the loss per initializer.

validphys.hyperoptplot.plot_iterations(hyperopt_dataframe)[source]: Generates a scatter plot of the loss as a function of the iteration index.

validphys.hyperoptplot.plot_learning_rate(hyperopt_dataframe, optimizer_name)[source]: Generates a scatter plot of the loss as a function of the learning rate for a given optimizer.

validphys.hyperoptplot.plot_number_of_layers(hyperopt_dataframe)[source]: Generates a violin plot of the loss as a function of the number of layers of the model.

validphys.hyperoptplot.plot_optimizers(hyperopt_dataframe)[source]: Generates a violin plot of the loss per optimizer.

validphys.hyperoptplot.plot_scans(df, best_df, plotting_parameter, include_best=True)[source]: This function performs the plotting and is called by the plot_ functions in this file.

validphys.kinematics module

Provides information on the kinematics involved in the data.

Uses the PLOTTING file specification.

class validphys.kinematics.XQ2Map(experiment, commondata, fitted, masked, group)

Bases: tuple

commondata: Alias for field number 1

experiment: Alias for field number 0

fitted: Alias for field number 2

group: Alias for field number 4

masked: Alias for field number 3

validphys.kinematics.all_commondata_grouping(all_commondata, metadata_group)[source]: Return a table with the grouping specified by metadata_group key for each dataset for all available commondata.

validphys.kinematics.all_kinlimits_table(all_kinlimits, use_kinoverride: bool = True)[source]: Return a table with the kinematic limits for the datasets given as input in dataset_inputs. If the PLOTTING overrides are not used, the information on sqrt(k2) will be displayed.

validphys.kinematics.describe_kinematics(commondata, titlelevel: int = 1)[source]

Output a markdown text describing the stored metadata for a given commondata.

titlelevel can be used to control the header level of the title.

validphys.kinematics.kinematics_table(kinematics_table_notable)[source]: Same as kinematics_table_notable but writing the table to file

validphys.kinematics.kinematics_table_notable(commondata, cuts, show_extra_labels: bool = False)[source]

Table containing the kinematics of a commondata object, indexed by their datapoint id. The kinematics will be tranfsormed as per the PLOTTING file of the dataset or process type, and the column headers will be the labels of the variables defined in the metadata.

If show_extra_labels is True then extra label defined in the PLOTTING files will be displayed. Otherwise only the original three kinematics will be shown.

validphys.kinematics.kinlimits(commondata, cuts, use_cuts, use_kinoverride: bool = True)[source]: Return a mapping containing the number of fitted and used datapoints, as well as the label, minimum and maximum value for each of the three kinematics. If use_kinoverride is set to False, the PLOTTING files will be ignored and the kinematics will be interpred based on the process type only. If use_cuts is ‘CutsPolicy.NOCUTS’, the information on the total number of points will be displayed, instead of the fitted ones.

validphys.kinematics.total_fitted_points(all_kinlimits_table) → int[source]: Print the total number of fitted points in a given set of data

validphys.kinematics.xq2map_with_cuts(commondata, cuts, group_name=None)[source]: Return two (x,Q²) tuples: one for the fitted data and one for the cut data. If display_cuts is false or all data passes the cuts, the second tuple will be empty.

validphys.lhaindex module

Created on Fri Jan 23 12:11:23 2015

@author: zah

validphys.lhaindex.as_from_name(name)[source]: Annoying function needed because this is not in the info files. as(M_z) there is actually as(M_ref).

validphys.lhaindex.expand_index_names(globstr)[source]

validphys.lhaindex.expand_local_names(globstr)[source]

validphys.lhaindex.expand_names(globstr)[source]: Return names of installed PDFs. If none is found, return names from index

validphys.lhaindex.finddir(name)[source]

validphys.lhaindex.get_collaboration(name)[source]

validphys.lhaindex.get_index_path(folder=None)[source]

validphys.lhaindex.get_indexes_to_names()[source]

validphys.lhaindex.get_lha_datapath()[source]

Return an existing datapath from LHAPDF, starting from the end. If no path is found to exist, recover the old behaviour and returns the last path.

The check for existence intends to solve problems where a previously filled LHAPATH or LHAPDF_DATA_PATH environment variable is pointing to a non-existent path or shared systems where LHAPDF might be compiled with hard-coded paths not available to all users.

validphys.lhaindex.get_names_to_indexes()[source]

validphys.lhaindex.get_pdf_indexes(name)[source]: Get index in the amc@nlo format

validphys.lhaindex.get_pdf_name(index)[source]

validphys.lhaindex.infofilename(name)[source]

validphys.lhaindex.isinstalled(name)[source]: Check that name exists in LHAPDF dir

validphys.lhaindex.parse_index(index_file)[source]

validphys.lhaindex.parse_info(name)[source]

validphys.lhapdf_compatibility module

Module for LHAPDF compatibility backends

If LHAPDF is installed, the module will transparently hand over everything to LHAPDF if LHAPDF is not available, it will try to use a combination of the packages

lhapdf-management and pdfflow

which cover all the features of LHAPDF used during the fit (and likely most of validphys)

validphys.lhapdf_compatibility.make_pdf(pdf_name, member=None)[source]

Load a PDF if member is given, load the single member otherwise, load the entire set as a list

if LHAPDF is provided, it returns LHAPDF PDF instances otherwise it returns and object which is _compatible_ with LHAPDF for lhapdf functions for the selected backend

Parameters:

pdf_name: str
name of the PDF to load

member: int
index of the member of the PDF to load

Returns:

list(pdf_sets)

validphys.lhapdfset module

Module containing an LHAPDF class compatible with validphys using the official lhapdf python interface.

The .members and .central_member of the LHAPDFSet are LHAPDF objects (the typical output from mkPDFs) and can be used normally.

Examples

>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> len(pdf.members)
101
>>> pdf.central_member.alphasQ(91.19)
0.11800
>>> pdf.members[0].xfxQ2(0.5, 15625)
{-5: 6.983360500601136e-05,
-4: 0.0021818063617227604,
-3: 0.00172453472243952,
-2: 0.0010906577230485718,
-1: 0.0022049272225017286,
1: 0.020051104853608722,
2: 0.0954139944889494,
3: 0.004116641378803191,
4: 0.002180124185625795,
5: 6.922722705177504e-05,
21: 0.007604124516892057}

class validphys.lhapdfset.LHAPDFSet(name, error_type)[source]

Bases: object

Wrapper for the lhapdf python interface.

Once instantiated this class will load the PDF set from LHAPDF. If it is a T0 set only the CV will be loaded.

property central_member: Returns a reference to member 0 of the PDF list

property flavors: Returns the list of accepted flavors by the LHAPDF set

grid_values(flavors: ndarray, xgrid: ndarray, qgrid: ndarray)[source]

Returns the PDF values for every member for the required flavours, points in x and pointx in q The return shape is

(members, flavors, xgrid, qgrid)

Return type: ndarray of shape (members, flavors, xgrid, qgrid)

Examples

>>> import numpy as np
>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> xgrid = np.random.rand(10)
>>> qgrid = np.random.rand(3)
>>> flavs = np.arange(-4,4)
>>> flavs[4] = 21
>>> results = pdf.grid_values(flavs, xgrid, qgrid)

property is_t0: Check whether we are in t0 mode

property members: Return the members of the set the special error type t0 returns only member 0

property n_members: Return the number of active members in the PDF set

xfxQ(x, Q, n, fl)[source]: Return the PDF value for one single point for one single member If the flavour is not included in the PDF (for instance top/antitop) return 0.0

validphys.lhio module

A module that reads and writes LHAPDF grids.

validphys.lhio.big_matrix(gridlist)[source]: Return a properly indexes matrix of the differences between each member and the central value

validphys.lhio.generate_replica0(pdf, kin_grids=None, extra_fields=None)[source]

Generates a replica 0 as an average over an existing set of LHAPDF: replicas and outputs it to the PDF’s parent folder

Parameters

pdf (validphys.core.PDF) – An existing validphys PDF object from which the average replica will be (re-)computed
kin_grids (Grids in (x,Q) used to print replica0 upon. If None, the grids) – of the source replicas are used.

validphys.lhio.hessian_from_lincomb(pdf, V, set_name=None, folder=None, extra_fields=None)[source]: Construct a new LHAPDF grid from a linear combination of members

validphys.lhio.load_all_replicas(pdf, db=None)[source]

validphys.lhio.load_replica(pdf, rep, kin_grids=None)[source]

validphys.lhio.new_pdf_from_indexes(pdf, indexes, set_name=None, folder=None, extra_fields=None, installgrid=False, use_rep0grid=False)[source]

Create a new PDF set from by selecting replicas from another one.

Parameters

pdf (validphys.core.PDF) – An existng validphys PDF object from which the indexes will be selected.
indexes (Iterable[int]) – An iterable with integers corresponding to files in the LHAPDF set. Note that replica 0 will be calculated for you as the mean of the selected replicas.
set_name (str) – The name of the new PDF set.
folder (str, bytes, os.PathLike) – The path where the LHAPDF set will be written. Must exsist.
installgrid (bool, optional, default=``False``.) – Whether to copy the grid to the LHAPDF path.
use_rep0grid (bool, optional, default=``False``) – Whether to fill the original replica 0 grid when computing replica 0, instead of relying that all grids are the same and averaging the files directly. It is slower and will call LHAPDF to fill the grids, but works for sets where the replicas have different grids.

validphys.lhio.read_all_xqf(f)[source]

validphys.lhio.read_xqf_from_file(f)[source]

validphys.lhio.read_xqf_from_lhapdf(pdf, replica, kin_grids)[source]

validphys.lhio.rep_matrix(gridlist)[source]: Return a properly indexes matrix of all the members

validphys.lhio.split_sep(f)[source]

validphys.lhio.write_replica(rep, set_root, header, subgrids)[source]

validphys.loader module

Resolve paths to useful objects, and query the existence of different resources within the specified paths.

exception validphys.loader.CfactorNotFound[source]: Bases: LoadFailedError

exception validphys.loader.CompoundNotFound[source]: Bases: LoadFailedError

exception validphys.loader.CutsNotFound[source]: Bases: LoadFailedError

exception validphys.loader.DataNotFoundError[source]: Bases: LoadFailedError

exception validphys.loader.EkoNotFound[source]: Bases: LoadFailedError

exception validphys.loader.FKTableNotFound[source]: Bases: LoadFailedError

class validphys.loader.FallbackLoader(profile=None)[source]

Bases: Loader, RemoteLoader

A loader that first tries to find resources locally (calling Loader.check_*) and if it fails, it tries to download them (calling RemoteLoader.download_*).

make_checker(resource)[source]

exception validphys.loader.FitNotFound[source]: Bases: LoadFailedError

exception validphys.loader.HyperscanNotFound[source]: Bases: LoadFailedError

exception validphys.loader.InconsistentMetaDataError[source]: Bases: LoaderError

exception validphys.loader.LoadFailedError[source]: Bases: FileNotFoundError, LoaderError

class validphys.loader.Loader(profile=None)[source]

Bases: LoaderBase

Load various resources from the NNPDF data path.

property available_datasets: Provide all available datasets that were available before the new commondata was implemented and that have a translation. Returns old names

property available_ekos: Return a string token for each of the available theories

property available_fits

property available_hyperscans

property available_pdfs

property available_theories: Return a string token for each of the available theories

check_cfactor(theoryID, setname, cfactors)[source]

check_commondata(setname, sysnum=None, use_fitcommondata=False, fit=None, variant=None)[source]

Prepare the commondata files to be loaded. A commondata is defined by its name (setname) and the variant(s) (variant)

The function parse_dataset_input in config.py translates all known old commondata into their new names (and variants), therefore this function should only receive requestes for new format.

Any actions trying to requests an old-format commondata from this function will log an error message. This error message will eventually become an actual error.

check_compound(theoryID, setname, cfac)[source]

check_dataset(name, *, rules=None, sysnum=None, theoryid, cfac=(), frac=1, cuts=CutsPolicy.INTERNAL, use_fitcommondata=False, fit=None, weight=1, variant=None)[source]: Loads a given dataset If the dataset contains new-type fktables, use the pineappl loading function, otherwise fallback to legacy

check_default_filter_rules(theoryid, defaults=None)[source]

check_eko(theoryID)[source]: Check the eko exists and return the path to it

check_experiment(name: str, datasets: list[validphys.core.DataSetSpec]) → DataGroupSpec[source]

Loader method for instantiating DataGroupSpec objects. The NNPDF::Experiment object can then be instantiated using the load method.

Parameters

name (str) – A string denoting the name of the resulting DataGroupSpec object.
dataset (List[DataSetSpec]) – A list of DataSetSpec objects pre-created by the user. Note, these too will be loaded by Loader.

Return type

DataGroupSpec

Example

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset("CDF_Z0_1P96TEV_ZRAP", theoryid=40_000_000, cuts="internal")
>>> exp = l.check_experiment("My DataGroupSpec Name", [ds])

check_fit(fitname)[source]

check_fit_cuts(commondata, fit)[source]

check_fk_from_theory_metadata(theory_metadata, theoryID, cfac=None)[source]: Load a pineappl fktable in the new commondata forma Receives a theory metadata describing the fktables necessary for a given observable the theory ID and the corresponding cfactors. The cfactors should correspond directly to the fktables, the “compound folder” is not supported for pineappl theories. As such, the name of the cfactor is expected to be

CF_{cfactor_name}_{fktable_name}

check_fktable(theoryID, setname, cfac)[source]

check_hyperscan(hyperscan_name)[source]: Obtain a hyperscan run

check_integset(theoryID, setname, postlambda, rules)[source]: Load an integrability dataset

check_internal_cuts(commondata, rules)[source]

check_pdf(name)[source]

check_posset(theoryID, setname, postlambda, rules)[source]: Load a positivity dataset

check_theoryID(theoryID)[source]

check_vp_output_file(filename, extra_paths=('.',))[source]: Find a file in the vp-cache folder, or (with higher priority) in the extra_paths.

property commondata_folder

property implemented_datasets: Provide all implemented datasets that can be found in the datafiles folder regardless of whether they can be used for fits (i.e., whether they include a theory), are “fake” (integrability/positivity) or are missing some information.

property theorydb_folder: Checks theory db file exists and returns path to it

class validphys.loader.LoaderBase(profile=None)[source]

Bases: object

Base class for the NNPDF loader. It can take as input a profile dictionary from which all data can be read. It is possible to override the datapath and resultpath when the class is instantiated.

property hyperscan_resultpath

exception validphys.loader.LoaderError[source]: Bases: Exception

exception validphys.loader.PDFNotFound[source]: Bases: LoadFailedError

exception validphys.loader.ProfileNotFound[source]: Bases: LoadFailedError

class validphys.loader.RemoteLoader(profile=None)[source]

Bases: LoaderBase

download_eko(thid)[source]: Download the EKO for a given theory ID

download_fit(fitname)[source]

download_hyperscan(hyperscan_name)[source]: Download a hyperscan run from the remote server Downloads the run to the results folder

download_pdf(name)[source]

download_theoryID(thid)[source]

download_vp_output_file(filename, **kwargs)[source]

property downloadable_ekos

property downloadable_fits

property downloadable_hyperscans

property downloadable_pdfs

property downloadable_theories

property eko_index

property eko_urls

property fit_index

property fit_urls

property hyperscan_index

property hyperscan_url

property lhapdf_pdfs

property lhapdf_urls

property nnpdf_pdfs

property nnpdf_pdfs_index

property nnpdf_pdfs_urls

property remote_ekos

remote_files(urls, index, thing='files')[source]

property remote_fits

property remote_hyperscans

property remote_keywords

property remote_nnpdf_pdfs

property remote_theories

property theory_index

property theory_urls

exception validphys.loader.RemoteLoaderError[source]: Bases: LoaderError

exception validphys.loader.SysNotFoundError[source]: Bases: LoadFailedError

exception validphys.loader.TheoryDataBaseNotFound[source]: Bases: LoadFailedError

exception validphys.loader.TheoryMetadataNotFound[source]: Bases: LoadFailedError

exception validphys.loader.TheoryNotFound[source]: Bases: LoadFailedError

validphys.loader.download_and_extract(url, local_path, target_name=None)[source]: Download a compressed archive and then extract it to the given path

validphys.loader.download_file(url, stream_or_path, make_parents=False, delete_on_failure=False)[source]: Download a file and show a progress bar if the INFO log level is enabled. If make_parents is True stream_or_path is path-like, all the parent folders will be created.

validphys.mc2hessian module

mc2hessian.py

This module containts the functionality to compute reduced set using the mc2hessian algorithm (See section 2.1 of of 1602.00005).

validphys.mc2hessian.gridname(pdf, Neig, mc2hname: (<class 'str'>, <class 'NoneType'>) = None)[source]: If no custom `mc2hname’ is specified, the name of the Hessian PDF is automatically generated.

validphys.mc2hessian.mc2hessian(pdf, Q, Neig: int, mc2hessian_xgrid, output_path, gridname, installgrid: bool = False)[source]

Produces a Hessian PDF by transfroming a Monte Carlo PDF set.

Parameters

pdf (validphys.core.PDF) – An existng validphys PDF object which will be converted into a Hessian PDF set
Q (float) – Energy scale at which the Monte Carlo PDF is sampled
Neig (int) – Number of basis eigenvectors in the Hessian PDF set
mc2hessian_xgrid (numpy.ndarray) – The points in x at which to sample the Monte Carlo PDF set
path (output) – The validphys output path where the PDF will be written
gridname (str) – Name of the Hessian PDF set
installgrid (bool, optional, default=``False``) – Whether to copyt the Hessian grid to the LHAPDF path

validphys.mc2hessian.mc2hessian_xgrid(xmin: float = 1e-05, xminlin: float = 0.1, xmax: Real = 1, nplog: int = 50, nplin: int = 50)[source]

Provides the points in x to sample the PDF. logspace and linspace will be called with the respsctive parameters.

Generates a grid with nplog logarithmically spaced points between xmin and xminlin followed by nplin linearly spaced points between xminlin and xmax

validphys.mc_gen module

mc_gen.py

Tools to check the pseudo-data MC generation.

validphys.mc_gen.art_data_comparison(art_rep_generation, nreplica: int)[source]: Plots per datapoint of the distribution of replica values.

validphys.mc_gen.art_data_distribution(art_rep_generation, title='Artificial Data Distribution', color='green')[source]: Plot of the distribution of pseudodata.

validphys.mc_gen.art_data_mean_table(art_rep_generation, groups_data)[source]: Generate table for artdata mean values

validphys.mc_gen.art_data_moments(art_rep_generation, color='green')[source]: Returns the moments of the distributions per data point, as a histogram.

validphys.mc_gen.art_data_residuals(art_rep_generation, color='green')[source]: Plot the residuals distribution of pseudodata compared to experiment.

validphys.mc_gen.art_rep_generation(groups_data, make_replicas)[source]: Generates the nreplica pseudodata replicas

validphys.mc_gen.one_art_data_residuals(groups_data, indexed_make_replicas)[source]: Residuals plot for the first datapoint.

validphys.n3fit_data module

n3fit_data.py

Providers which prepare the data ready for n3fit.performfit.performfit().

class validphys.n3fit_data.Hashrray(array)[source]

Bases: TupleComp

Wrapper class to hash a numpy array so it can be cached.

validphys.n3fit_data.diagonal_masks(data, replica_trvlseed, dataset_inputs_fitting_covmat, diagonal_frac=1.0, threshold_eigvals=0)[source]

validphys.n3fit_data.fittable_datasets_masked(data)[source]: Generate a list of validphys.n3fit_data_utils.FittableDataSet from a group of dataset and the corresponding training/validation masks

validphys.n3fit_data.fitting_data_dict(data, make_replica, dataset_inputs_loaded_cd_with_cuts, dataset_inputs_fitting_covmat, _inv_covmat_prepared, kfold_masks, fittable_datasets_masked)[source]

Provider which takes the information from validphys data.

Returns

all_dict_out – Containing all the information of the experiment/dataset for training, validation and experimental With the following keys:

’datasets’: list of dictionaries for each of the datasets contained in data
’name’: name of the data - typically experiment/group name
’expdata_true’: non-replica data
’covmat’: full covmat
’invcovmat_true’: inverse of the covmat (non-replica)
’trmask’: mask for the training data
’invcovmat’: inverse of the covmat for the training data
’ndata’: number of datapoints for the training data
’expdata’: experimental data (replica’d) for training
’vlmask’: (same as above for validation)
’invcovmat_vl’: (same as above for validation)
’ndata_vl’: (same as above for validation)
’expdata_vl’: (same as above for validation)
’positivity’: bool - is this a positivity set?
’count_chi2’: should this be counted towards the chi2

Return type

dict

validphys.n3fit_data.integdatasets_fitting_integ_dict(integdatasets=None)[source]

Loads the integrability datasets. Calls same function as fitting_pos_dict(), except on each element of integdatasets if integdatasets is not None.

Parameters: integdatasets (list[validphys.core.IntegrabilitySetSpec]) – list containing the settings for the integrability sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

Examples

>>> from validphys.api import API
>>> integdatasets = [{"dataset": "INTEGXT3", "maxlambda": 1e2}]
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=integdatasets, theoryid=53)
>>> len(res), len(res[0])
(1, 9)
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=None)
>>> print(res)
None

validphys.n3fit_data.kfold_masks(kpartitions, data)[source]

Collect the masks (if any) due to kfolding for this data. These will be applied to the experimental data before starting the training of each fold.

Parameters

kpartitions (list[dict]) – list of partitions, each partition dictionary with key-value pair datasets and a list containing the names of all datasets in that partition. See n3fit/runcards/Basic_hyperopt.yml for an example runcard or the hyperopt documentation for an expanded discussion on k-fold partitions.
data (validphys.core.DataGroupSpec) – full list of data which is to be partitioned.

Returns

kfold_masks – A list containing a boolean array for each partition. Each array is a 1-D boolean array with length equal to the number of cut datapoints in data. If a dataset is included in a particular fold then the mask will be True for the elements corresponding to those datasets such that data.load().get_cv()[kfold_masks[i]] will return the datapoints in the ith partition. See example below.

Return type

list[np.array]

Examples

>>> from validphys.api import API
>>> partitions=[
...     {"datasets": ["HERACOMBCCEM", "HERACOMBNCEP460", "NMC", "NTVNBDMNFe"]},
...     {"datasets": ["HERACOMBCCEP", "HERACOMBNCEP575", "NMCPD", "NTVNUDMNFe"]}
... ]
>>> ds_inputs = [{"dataset": ds} for part in partitions for ds in part["datasets"]]
>>> kfold_masks = API.kfold_masks(dataset_inputs=ds_inputs, kpartitions=partitions, theoryid=53, use_cuts="nocuts")
>>> len(kfold_masks) # one element for each partition
2
>>> kfold_masks[0] # mask which splits data into first partition
array([False, False, False, ...,  True,  True,  True])
>>> data = API.data(dataset_inputs=ds_inputs, theoryid=53, use_cuts="nocuts")
>>> fold_data = data.load().get_cv()[kfold_masks[0]]
>>> len(fold_data)
604
>>> kfold_masks[0].sum()
604

validphys.n3fit_data.posdatasets_fitting_pos_dict(posdatasets=None)[source]

Loads all positivity datasets. It is not allowed to be empty.

Parameters: integdatasets (list[validphys.core.PositivitySetSpec]) – list containing the settings for the positivity sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

validphys.n3fit_data.pseudodata_table(replica_pseudodata)[source]: Save the pseudodata for the given replica. Deactivate by setting fitting::savepseudodata: False from within the fit runcard.

validphys.n3fit_data.replica_luxseed(replica, luxseed)[source]: Generate the luxseed for a replica. Identical to replica_nnseed but used for a different purpose.

validphys.n3fit_data.replica_mask(exps_masks, replica, experiments_index, diagonal_basis=False)[source]

Save the boolean mask used to split data into training and validation for a given replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters

exps_tr_masks (list[list[np.array]]) – Result of tr_masks() collected over experiments, which creates the nested structure. The outer list is len(group_dataset_inputs_by_experiment) and the inner-most list has an array for each dataset in that particular experiment - as defined by the metadata. The arrays should be 1-D boolean arrays which can be used as masks.
replica (int) – The index of the replica.
experiments_index (pd.MultiIndex) – Index returned by validphys.results.experiments_index().

Example

>>> from validphys.api import API
>>> ds_inp = [
...     {'dataset': 'NMC_NC_NOTFIXED_P_EM-SIGMARED', 'variant': 'legacy', 'frac': 0.75},
...     {'dataset': 'ATLAS_TTBAR_7TEV_TOT_X-SEC', 'variant': 'legacy_theory', 'frac': 0.75},
...     {'dataset': 'CMS_Z0J_8TEV_PT-Y', 'cfac':('NRM',), 'frac': 0.75},
... ]
>>> API.replica_training_mask(dataset_inputs=ds_inp, replica=1, trvlseed=123, theoryid=40_000_000, use_cuts="nocuts", mcseed=None, genrep=False)
                                    replica 1
group dataset                       id
NMC   NMC_NC_NOTFIXED_P_EM-SIGMARED 0        True
                                    1        True
                                    2        True
                                    3        True
                                    4       False
...                                           ...
CMS   CMS_Z0J_8TEV_PT-Y             45       True
                                    46       True
                                    47       True
                                    48       True
                                    49       True

[343 rows x 1 columns]

validphys.n3fit_data.replica_mask_table(replica_mask)[source]: Same as replica_training_mask but with a table decorator.

validphys.n3fit_data.replica_mcseed(replica, mcseed, genrep)[source]: Generates the mcseed for a replica.

validphys.n3fit_data.replica_nnseed(replica, nnseed)[source]: Generates the nnseed for a replica.

validphys.n3fit_data.replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nnseed)[source]: For a single replica return a tuple of the inputs to this function. Used with collect over replicas to avoid having to perform multiple collects.

See also

replicas_nnseed_fitting_data_dict, over

validphys.n3fit_data.replica_pseudodata(experiment_indexed_make_replica, replica)[source]

Creates a pandas DataFrame containing the generated pseudodata. The index is validphys.results.experiments_index() and the columns is the replica numbers.

Notes

Whilst running n3fit, this action will only be called if fitting::savepseudodata is true (as per the default setting) The table can be found in the replica folder i.e. <fit dir>/nnfit/replica_*/

validphys.n3fit_data.replica_trvlseed(replica, trvlseed, same_trvl_per_replica=False)[source]: Generates the trvlseed for a replica.

validphys.n3fit_data.replica_validation_mask(exps_tr_masks, replica, experiments_index, diagonal_basis=False)[source]

Save the boolean mask used to split data into training and validation for a given replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters

exps_tr_masks (list[list[np.array]]) – Result of tr_masks() collected over experiments, which creates the nested structure. The outer list is len(group_dataset_inputs_by_experiment) and the inner-most list has an array for each dataset in that particular experiment - as defined by the metadata. The arrays should be 1-D boolean arrays which can be used as masks.
replica (int) – The index of the replica.
experiments_index (pd.MultiIndex) – Index returned by validphys.results.experiments_index().

Example

>>> from validphys.api import API
>>> ds_inp = [
...     {'dataset': 'NMC', 'frac': 0.75},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75}
... ]
>>> API.replica_training_mask(dataset_inputs=ds_inp, replica=1, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False)
                     replica 1
group dataset    id
NMC   NMC        0        True
                1        True
                2       False
                3        True
                4        True
...                        ...
CMS   CMSZDIFF12 45       True
                46       True
                47       True
                48      False
                49       True

[345 rows x 1 columns]

validphys.n3fit_data.standard_masks(data, replica_trvlseed)[source]: Generate the boolean masks used to split data into training and validation points. Returns a list of 1-D boolean arrays, one for each dataset. Each array has length equal to N_data, the datapoints which will be included in the training are True such that

tr_data = data[tr_mask]

validphys.n3fit_data.training_mask(replicas_mask)[source]

Save the boolean mask used to split data into training and validation for each replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters: replicas_exps_tr_masks (list[list[list[np.array]]]) – Result of replica_tr_masks() collected over replicas

Example

>>> from validphys.api import API
>>> from reportengine.namespaces import NSList
>>> # create namespace list for collects over replicas.
>>> reps = NSList(list(range(1, 4)), nskey="replica")
>>> ds_inp = [
...     {'dataset': 'NMC_NC_NOTFIXED_P_EM-SIGMARED', 'variant': 'legacy', 'frac': 0.75},
...     {'dataset': 'ATLAS_TTBAR_7TEV_TOT_X-SEC', 'variant': 'legacy_theory', 'frac': 0.75},
...     {'dataset': 'CMS_Z0J_8TEV_PT-Y', 'cfac':('NRM',), 'frac': 0.75},
... ]
>>> API.training_mask(dataset_inputs=ds_inp, nreplica=3, trvlseed=123, theoryid=40_000_000, use_cuts="nocuts", mcseed=None, genrep=False)
                                                replica 1  replica 2  replica 3
    group dataset                       id
    NMC   NMC_NC_NOTFIXED_P_EM-SIGMARED 0        True      False      False
                                        1        True       True       True
                                        2        True      False       True
                                        3        True       True      False
                                        4       False       True       True
    ...                                           ...        ...        ...
    CMS   CMS_Z0J_8TEV_PT-Y             45       True      False       True
                                        46       True       True       True
                                        47       True      False       True
                                        48       True       True       True
                                        49       True      False       True
    [343 rows x 3 columns]

validphys.n3fit_data.training_mask_table(training_mask)[source]: Same as training_mask but with a table decorator

validphys.n3fit_data.training_pseudodata(replica_pseudodata, replica_mask)[source]: Save the training data for the given replica. Deactivate by setting fitting::savepseudodata: False from within the fit runcard.

See also

validphys.n3fit_data.validation_pseudodata()

validphys.n3fit_data.validation_pseudodata(replica_pseudodata, replica_mask)[source]: Save the training data for the given replica. Deactivate by setting fitting::savepseudodata: False from within the fit runcard.

See also

validphys.n3fit_data.validation_pseudodata()

validphys.n3fit_data_utils module

n3fit_data_utils.py

This module reads validphys validphys.core.DataSetSpec and extracts the relevant information into validphys.n3fit_data_utils.FittableDataSet

The validphys_group_extractor will loop over every dataset of a given group loading their fktables (and applying any necessary cuts).

class validphys.n3fit_data_utils.FittableDataSet(name: str, fktables_data: list, operation: str = 'NULL')[source]

Bases: object

Representation of the DataSet information necessary to run a fit

Parameters

name (str) – name of the dataset
fktables_data (list(validphys.coredata.FKTableData)) – list of coredata fktable objects
operation (str) – operation to be applied to the fktables in the dataset, default “NULL”
frac (float) – fraction of the data to enter the training set
training_mask (bool) – training mask to apply to the fktable

fktables()[source]: Return the list of fktable tensors for the dataset

fktables_data: list

property hadronic: Returns true if this is a hadronic collision dataset

name: str

property ndata: Number of datapoints in the dataset

operation: str = 'NULL'

validphys.n3fit_data_utils.validphys_group_extractor(datasets)[source]

Receives a grouping spec from validphys (most likely an experiment) and loops over its content extracting and parsing all information required for the fit

Parameters: datasets (list(validphys.core.DataSetSpec)) – List of dataset specs in this group
Returns: loaded_obs
Return type: list (validphys.n3fit_data_utils.FittableDataSet)

validphys.overfit_metric module

overfit_metric.py

This module contains the functions used to calculate the overfit metric and produce the corresponding tables and figures.

validphys.overfit_metric.array_expected_overfitting(calculate_chi2s_per_replica, replica_data, number_of_resamples=1000, resampling_fraction=0.95)[source]

Calculates the expected difference in chi2 between: 1. The chi2 of a PDF replica calculated using the corresponding pseudodata

replica used during the fit

The chi2 of a PDF replica calculated using an alternative i.i.d random
pseudododata replicas

The expected difference along with an error estimate is obtained through a bootstrapping consisting of number_of_resamples resamples per pdf replica where each resampling contains a fraction resampling_fraction of all replicas.

Parameters

calculate_chi2s_per_replica (np.ndarray) – validation chi2 per pdf replica
replica_data (list(vp.fitdata.FitInfo)) –
number_of_resamples (int, optional) – number of resamples per pdf replica, by default 1000
resampling_fraction (float, optional) – fraction of replicas used in the bootstrap resampling, by default 0.95

Returns

(number_of_resamples*Npdfs,) sized array containing the mean delta chi2 values per resampled list.

Return type

np.ndarray

validphys.overfit_metric.calculate_chi2s_per_replica(pdf, fit_code_version, recreate_pdf_pseudodata_no_table, preds, dataset_inputs, groups_covmat_no_table)[source]

Calculates, for each PDF replica, the chi2 of the validation with the pseudodata generated for all other replicas in the fit

Parameters

recreate_pdf_pseudodata_no_table (list[namedtuple]) – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.
preds (list[pd.core.frame.DataFrame]) – List of pandas dataframes, each containing the predictions of the pdf replicas for a dataset_input
dataset_inputs (list[DataSetInput]) –
groups_covmat_no_table (pdf.core.frame.DataFrame) –

Returns

(Npdfs, Npdfs) sized matrix containing the chi2 of a pdf replica calculated to a given psuedodata replica. The diagonal values correspond to the cases where the PDF replica has been fitted to the coresponding pseudodata replica

Return type

np.ndarray

validphys.overfit_metric.fit_overfitting_summary(fit, array_expected_overfitting)[source]: Creates a table containing the overfitting information: - mean chi2 difference - bootstrap error - sigmas away from 0

validphys.overfit_metric.plot_overfitting_histogram(fit, array_expected_overfitting)[source]: Plots the bootrap error and central value of the overfittedness in a historgram

validphys.overfit_metric.summarise_overfitting(fits_overfitting_summary)[source]: Same as fit_overfitting_summary, but collected over all fits in the runcard and put in a single table.

validphys.pdfbases module

pdfbases.py

This holds the concrete labels data relative to the PDF bases, as declaratively as possible.

class validphys.pdfbases.Basis(labels, *, aliases=None, default_elements=None, element_representations=None)[source]

Bases: ABC

A Basis maps a set of PDF flavours (typically as given by LHAPDF) to functions thereof. This abstract class provides functionalities to manage labels (used for plotting) and defaults, while the concrete implementation of the transformations is handled by the subclasses (by implementing the validphys.pdfbases.Basis.apply_grid_values() method). The high level validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() methods then provide convenient functionality to work with transformations.

labels

A list of strings representing the labels of each possible transformation, in order.

Type: list

aliases

A mapping from strings to labels appearing in labels, specifying equivalent ways to enter elements in the user interface.

Type: dict, optional

default_elements

A list of the labels to be computed by default when no subset of elements is specified. If not given it is assumed to be the same as labels.

Type: list, optional

element_representations

A mapping from strings to labels indicating the preferred string representation of the provided elements (to be used in plotting). If this parameter is not given or the element is not in the mapping, the label itself is used. It may be convenient to set this when heavy use of LaTeX is desired.

Type: dict, optional

abstract apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters

func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.

central_grid_values(pdf, vmat, xmat, qmat)[source]: Same as Basis.grid_values() but returning information on the central member of the PDF set.

elementlabel(element)[source]: Return the printable representation of a given element of this basis.

grid_values(pdf, vmat, xmat, qmat)[source]

Like validphys.gridvalues.grid_values(), but taking and returning vmat in terms of the vectors in this base.

Parameters

pdf (PDF) – Any PDF set
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.

Returns

grid – A 4-dimension array with the PDF values at the input parameters for each replica. The return value is indexed as follows:

grid_values[replica][flavour][x][Q]

Return type

np.ndarray

Examples

Compute the median ratio over replicas between singlet and gluon for a fixed point in x and a range of values in Q:

>>> import numpy as np
>>> from validphys.loader import Loader
>>> from validphys.pdfbases import evolution
>>> gv = evolution.grid_values(Loader().check_pdf("NNPDF31_nnlo_as_0118"), ["singlet", "gluon"], [0.01], [2,20,200])
>>> np.median(gv[:,0,...]/gv[:,1,...], axis=0)
array([[0.56694959, 0.53782002, 0.60348812]])

has_element(element)[source]: Return true if basis has knowledge of the given element

to_known_elements(vmat)[source]: Transform the list of aliases into an array of known labels. Raise UnknownElement on failure.

class validphys.pdfbases.LinearBasis(labels, from_flavour_mat, *args, **kwargs)[source]

Bases: Basis

A basis that implements a linear transformation of flavours.

from_flavour_mat

A matrix that rotates the flavour basis into this basis.

Type: np.ndarray

apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters

func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.

classmethod from_mapping(mapping, *, aliases=None, default_elements=None)[source]: Construct a basus from a mapping of the form {label:{pdf_flavour:coefficient}}.

class validphys.pdfbases.ScalarFunctionTransformation(transform_func, *args, **kwargs)[source]

Bases: Basis

A basis that transforms the flavour basis into a single element given by transform_func.

Optional keyword arguments are passed to the constructor of validphys.pdfbases.Basis.

transform_func

A callable with the signature transform_func(func, xmat, qmat) that fills the grid in \(x\) and \(Q\) using func and returns a grid with a single basis element.

Type: callable

apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters

func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.

exception validphys.pdfbases.UnknownElement[source]: Bases: KeyError

validphys.pdfbases.check_basis(basis, flavours)[source]: Check to verify a given basis and set of flavours. Returns a dictionary with the relevant instance of the basis class and flavour specification

validphys.pdfbases.fitbasis_to_NN31IC(flav_info, fitbasis)[source]

Return a rotation matrix R_{ij} which takes from one of the possible fitting basis (evolution, NN31IC, FLAVOUR) to the NN31IC basis, (sigma, g, v, v3, v8, t3, t8, cp), corresponding to the one used in NNPDF31. Denoting the rotation matrix as R_{ij} i is the flavour index and j is the evolution index. The evolution basis (NN31IC) is defined as cp = c + cbar = 2c and sigma = u + ubar + d + dbar + s + sbar + cp v = u - ubar + d - dbar + s - sbar + c - cbar v3 = u - ubar - d + dbar v8 = u - ubar + d - dbar - 2*s + 2*sbar t3 = u + ubar - d - dbar t8 = u + ubar + d + dbar - 2*s - 2*sbar

If the input is already in the evolution basis it returns the identity.

Parameters

flav_info (dict) – dictionary containing the information about each PDF (basis dictionary in the runcard)
fitbasis (str) – name of the fitting basis

Returns

mat.transpose() – matrix performing the change of basis from fitbasis to NN31IC

Return type

numpy matrix

validphys.pdfbases.list_bases()[source]: List available PDF bases

validphys.pdfbases.parse_flarr(flarr)[source]: Parse a free form list into a list of PDG parton indexes (that may contain indexes or values from PDF_ALIASES)

validphys.pdfbases.pdg_id_to_canonical_index(flindex)[source]: Given an LHAPDF id, return its index in the ALL_FLAVOURS list.

validphys.pdfbases.scalar_function_transformation(label, *args, **kwargs)[source]

Convenience decorator factory to produce a validphys.pdfbases.ScalarFunctionTransformation basis from a function.

Parameters: label (str) – The single label of the element produced by the function transformation.

Notes

Optional keyword arguments are passed to the constructor of validphys.pdfbases.ScalarFunctionTransformation.

Returns: decorator – A decorator that can be applied to a suitable transformation function.
Return type: callable

validphys.pdfgrids module

High level providers for PDF and luminosity grids, formatted in such a way to facilitate plotting and analysis.

class validphys.pdfgrids.KineticXPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]

Bases: XPlottingGrid

Kinetic Energy version of the XPlottingGrid

derivative()[source]: Return the derivative of the grid with respect to dlogx A call to this function will return a new XPlottingGrid instance with the derivative as grid values and with an increased derivative_degree

process_label(base_label)[source]: Wraps the base_label inside the kinetic energy formula

class validphys.pdfgrids.Lumi1dGrid(m, grid_values)

Bases: tuple

grid_values: Alias for field number 1

m: Alias for field number 0

class validphys.pdfgrids.Lumi2dGrid(y, m, grid_values)

Bases: tuple

grid_values: Alias for field number 2

m: Alias for field number 1

y: Alias for field number 0

class validphys.pdfgrids.XPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]

Bases: object

DataClass holding the value of the PDF at the specified values of x, Q and flavour. The grid_values attribute corresponds to a Stats instance in order to compute statistical estimators in a sensible manner.

Q: float

basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>)

copy_grid(grid_values)[source]: Create a copy of the grid with potentially a different set of values

derivative()[source]: Return the derivative of the grid with respect to dlogx A call to this function will return a new XPlottingGrid instance with the derivative as grid values and with an increased derivative_degree

derivative_degree: int = 0

flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>)

grid_values: Stats

process_label(base_label)[source]: Process the base_label used for plotting. For instance, for derivatives it will add d/dlogx to the base_label.

scale: str

select_flavour(flindex)[source]: Return a new grid for one single flavour

xgrid: ndarray

validphys.pdfgrids.boundary_xplotting_grid(unpolarized_bc: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]: A wrapper around xplotting_grid to compute instead unpolarized_bcs.

validphys.pdfgrids.distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]

Return an object containing the value of the distance PDF at the specified values of x and flavour.

The parameter normalize_to identifies the reference PDF set with respect to the distance is computed.

This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.

validphys.pdfgrids.kinetic_xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]: Returns an object containing the value of the kinetic energy of the PDF at the specified values of x and flavour for a given Q. Utilizes xplotting_grid The kinetic energy of the PDF is defined as:

\[k = \sqrt{1 + (d/dlogx f)^2}\]

validphys.pdfgrids.lumigrid1d(pdf: ~validphys.core.PDF, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'NoneType'>, <class 'numbers.Real'>) = None, nbins_m: int = 50, mxmin: ~numbers.Real = 10, mxmax: (<class 'NoneType'>, <class 'numbers.Real'>) = None, scale='log')[source]

Return the integrated luminosity in a grid of nbins_m points, for the values of invariant mass given (proton-proton) collider energy sqrts (given in GeV). A rapidity cut on the integration range (if specified) is taken into account.

By default, the grid is sampled logarithmically in mass. The limits are given by mxmin and mxmax, given in GeV. By default mxmin is 10 GeV and mxmax is set based on sqrts.

The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.

validphys.pdfgrids.lumigrid2d(pdf: PDF, lumi_channel, sqrts: Real, y_lim: Real = 5, nbins_m: int = 100, nbins_y: int = 50)[source]

Return the differential luminosity in a grid of (nbins_m x nbins_y) points, for the allowed values of invariant mass and rpidity for given (proton-proton) collider energy sqrts (given in GeV). y_lim specifies the maximum rapidy.

The grid is sampled linearly in rapidity and logarithmically in mass.

The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.

validphys.pdfgrids.pull_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]: Return an object containing the value of the pull between the two PDFs at the specified values of x and flavour. The parameter normalize_to identifies the reference PDF set with respect to the pull is computed. This method returns pull grids where the relative pull between both PDF sets, defined as the distance in terms of the standard deviations of the reference PDF, is computed. At least one grid will be identical to zero.

validphys.pdfgrids.variance_distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]

Return an object containing the value of the variance distance PDF at the specified values of x and flavour.

The parameter normalize_to identifies the reference PDF set with respect to the distance is computed.

This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.

validphys.pdfgrids.xgrid(xmin: Real = 1e-05, xmax: Real = 1, scale: str = 'log', npoints: int = 200)[source]: Return a tuple (scale, array) where scale is the input scale (“linear” or “log”) and array is generated from the input parameters and distributed according to scale.

validphys.pdfgrids.xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, derivative: int = 0)[source]

Return an object containing the value of the PDF at the specified values of x and flavour.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

derivative (int): how many derivtives of the PDF should be taken (default=0)

validphys.pdfoutput module

pdfoutput.py

reportengine helpers to enable outputing PDFs.

This module provides one decorator, pdfset that is used to mark a provider as generating a PDF set. The providers must take a set_name and an output_path argument. set_name will be required to be a unique string that does not correspond to any installed LHAPDF grid, and output_path will be modified to actually correspond to <output>/pdfsets. Within reportengine, the return value of the sets marked with @pdfset will be discarded, and the relative path to the output folder will be used instead. This can be used to formulate links within the report.

validphys.pdfoutput.pdfset(f)[source]: Mark the function as returning a PDF set. Make sure that providers marked with this decorator take set_name and output_path as arguments.

validphys.pdfplots module

pdfplots.py

Plots of quantities that are mostly functions of the PDFs only.

class validphys.pdfplots.AllFlavoursPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Auxiliary class which groups multiple PDF flavours in one plot.

get_ylabel(parton_name)[source]

setup_flavour(flstate)[source]

class validphys.pdfplots.BandPDFPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

legend(flstate)[source]

setup_flavour(flstate)[source]

class validphys.pdfplots.BandPDFPlotterBC(*args, unpolarized_bcs, boundary_xplotting_grids, **kwargs)[source]

Bases: BandPDFPlotter

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.DistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Auxiliary class which draws the distance plots.

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

get_title(parton_name)[source]

get_ylabel(parton_name)[source]

normalize()[source]

class validphys.pdfplots.FlavourState[source]

Bases: SimpleNamespace

This is the namespace for the pats specific for each flavour

class validphys.pdfplots.FlavoursDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]: Bases: DistancePDFPlotter, AllFlavoursPlotter

class validphys.pdfplots.FlavoursPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]

Bases: AllFlavoursPlotter, BandPDFPlotter

get_title(parton_name)[source]

class validphys.pdfplots.FlavoursVarDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]: Bases: VarDistancePDFPlotter, AllFlavoursPlotter

class validphys.pdfplots.MixBandPDFPlotter(*args, mixband_as_replicas, **kwargs)[source]

Bases: BandPDFPlotter

Special wrapper class to plot, in the same figure, PDF bands and PDF replicas depending on the type of PDF. Practical use: plot together the PDF central values with the NNPDF bands

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.PDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: object

Stateful object breaks plotting grids by favour, as a function of x and for fixed Q.

This class has a lot of state, but it should all be defined at initialization time. Things that change e.g. per flavour should be passed explicitly as arguments.

property Q

abstract draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

property firstgrid

get_title(parton_name)[source]

get_ylabel(parton_name)[source]

legend(flstate)[source]

normalize()[source]

property normalize_pdf

setup_flavour(flstate)[source]

property xscale

class validphys.pdfplots.PullPDFPlotter(pdfs_list, pull_grids_list, xscale, normalize_to, ymin, ymax)[source]

Bases: object

Auxiliary class which groups multiple pulls in one plot.

pdfs_list is a list of dictionaries, each containing the two PDFs to be used for the pull. pull_grids_list is the list of the pull computed for the PDF pairs described by pdfs_list.

property Q

draw(pdfs, grid, flstate)[source]

get_title(flstate)[source]

get_ylabel()[source]

legend(flstate)[source]

plot_call()[source]

property xscale

class validphys.pdfplots.ReplicaPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.UncertaintyPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]: Plot the desired function of the grid and return the array to be used for autoscaling

get_ylabel(parton_name)[source]

class validphys.pdfplots.VarDistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: DistancePDFPlotter

Auxiliary class which draws the variance distance plots

get_title(parton_name)[source]

get_ylabel(parton_name)[source]

validphys.pdfplots.plot_flavours(pdf, xplotting_grid, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the absolute central value and the uncertainty of all the flavours of a pdf as a function of x for a given value of Q.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

validphys.pdfplots.plot_lumi1d(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, show_mc_errors: bool = True, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, pdfs_noband=None, scale='log', legend_stat_labels: bool = True)[source]

Plot PDF luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).

This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel. normalize_to works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs. A list pdfs_noband can be passed to supress the error bands for certain PDFs and plot the central values only. legend_stat_labels controls whether to show detailed information on what kind of confidence interval is being plotted in the legend labels.

validphys.pdfplots.plot_lumi1d_replicas(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]

This function is similar to plot_lumi1d, but instead of plotting the standard deviation and 68% c.i. it plots the luminosities for individual replicas.

Plot PDF replica luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).

This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel. normalize_to works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs.

validphys.pdfplots.plot_lumi1d_uncertainties(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]

Plot PDF luminosity uncertainties at a given center of mass energy. sqrts is the center of mass energy (GeV).

If normalize_to is set, the values are normalized to the central value of the corresponding PDFs. y_cut can be used to specify a rapidity cut over the integration range.

validphys.pdfplots.plot_lumi2d(pdf, lumi_channel, lumigrid2d, sqrts, display_negative: bool = True)[source]

Plot the absolute luminosity on a grid of invariant mass and rapidity for a given center of mass energy sqrts. The color scale is logarithmic. If display_negative is True, mark the negative values.

The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.

validphys.pdfplots.plot_lumi2d_uncertainty(pdf, lumi_channel, lumigrid2d, sqrts: Real)[source]

Plot 2D luminosity unciertainty plot at a given center of mass energy. Porting code from https://github.com/scarrazza/lumi2d.

The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.

validphys.pdfplots.plot_pdf_pulls(pdfs_list, pull_grids_list, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]: Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.

validphys.pdfplots.plot_pdf_uncertainties(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]: Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.

validphys.pdfplots.plot_pdfdistances(pdfs, distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]: Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations

validphys.pdfplots.plot_pdfreplicas(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the replicas of the specified PDFs. Otherise it works the same as plot_pdfs.

xscale sets the scale of the plot. E.g. ‘linear’ or ‘log’. Default is

deduced from the xplotting_grid, which in turn is ‘log’ by default.

normalize_to should be, a pdf id or an index of the pdf (starting from one).

validphys.pdfplots.plot_pdfreplicas_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]: Plot the kinetic energy of the replicas of the specified PDFs. Otherise it works the same as plot_pdfs_kinetic_energy.

validphys.pdfplots.plot_pdfs(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]

Plot the central value and the uncertainty of a list of pdfs as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding PDF. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the PDFs or its corresponding index in the list, starting from one, or None to plot absolute values.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

pdfs_noband: A list of PDFs to plot without error bands, i.e. only the central values of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.

show_mc_errors (bool): Plot 1σ bands in addition to 68% errors for Monte Carlo PDF.

legend_stat_labels (bool): Show detailed information on what kind of confidence interval is being plotted in the legend labels.

validphys.pdfplots.plot_pdfs_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]: Band plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q. The input of this function is similar to those of plot_pdfs.

validphys.pdfplots.plot_pdfs_mixed(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]

This function is similar to plot_pdfs, except instead of only plotting the central value and the uncertainty of the PDFs, those PDFs indicated by mixband_as_replicas will be plotted as replicas without the central value.

Inputs are the same as plot_pdfs, with the exeption of mixband_as_replicas, which only exists here.

mixband_as_replicas: A list of PDFs to plot as replicas, i.e. the central values and replicas of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.

validphys.pdfplots.plot_pdfs_mixed_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]: Mixed band and replica plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q.

validphys.pdfplots.plot_pdfvardistances(pdfs, variance_distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]: Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations

validphys.pdfplots.plot_polarized_boundaries(pdfs, xplotting_grids, unpolarized_bcs, boundary_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]: Possess the exact same functionalities as plot_pdfs but for a list of Polarized PDF sets. In addition, it plots the unpolarized PDF set used as a Boundary Condition.

validphys.pineparser module

Loader for the pineappl-based FKTables

The FKTables for pineappl have pineappl.lz4 and can be utilized directly with the pineappl cli as well as read with pineappl.fk_table

exception validphys.pineparser.GridFileNotFound[source]

Bases: FileNotFoundError

PineAPPL file for FK table not found.

validphys.pineparser.get_yaml_information(yaml_file, theorypath)[source]

Reads the yaml information from a yaml compound file

Transitional function: the call to “pineko” might be to some other commondata reader that will know how to extract the information from the commondata

validphys.pineparser.pineappl_reader(fkspec)[source]

Receives a fkspec, which contains the path to the fktables that are to be read by pineappl as well as metadata that fixes things like conversion factors or apfelcomb flag. The fkspec contains also the cfactors which are applied _directly_ to each of the fktables.

The output of this function is an instance of FKTableData which can be generated from reading several FKTable files which get concatenated on the ndata (bin) axis.

For more information on the reading of pineappl tables:

https://pineappl.readthedocs.io/en/latest/modules/pineappl/pineappl.html#pineappl.pineappl.PyFkTable

About the reader:

Each pineappl table is a 4-dimensional grid with:: (ndata, active channels, x1, x2)

for DIS grids x2 will contain one single number. The luminosity channels are given in a (flav1, flav2) format and thus need to be converted to the 1-D index of a (14x14) luminosity tensor in order to put in the form of a dataframe.

All grids in pineappl are constructed with the exact same xgrid, the active channels can vary and so when grids are concatenated for an observable the gaps are filled with 0s.

The pineappl grids are such that obs = sum_{bins} fk * f (*f) * bin_w so in order to use them together with old-style grids (obs = sum_{bins} fk * xf (*xf)) it is necessary to remove the factor of x and the normalization of the bins.

About apfelcomb flags in yamldb files:

old commondata files and old grids have over time been through various iterations while remaining compatibility between each other, and fixes and hacks have been incorporated in one or another for the new theory to be compatible with old commpondata it is necessary to keep track of said hacks (and to apply conversion factors when required)

NOTE: both conversion factors and apfelcomb flags will be eventually removed.

Returns: an FKTableData object containing all necessary information to compute predictions
Return type: validphys.coredata.FKTableData

validphys.pineparser.pineko_yaml(yaml_file, grids_folder)[source]

Given a yaml_file, returns the corresponding dictionary and grids.

The dictionary contains all information and we return an extra field with all the grids to be loaded for the given dataset.

Parameters

yaml_file (pathlib.Path) – path of the yaml file for the given dataset
grids_folder (pathlib.Path) – path of the grids folder
check_grid_existence (bool) – if True (default) checks whether the grid exists

Returns

yaml_content (dict) – Metadata prepared for the FKTables
paths (list(list(path))) – List (of lists) with all the grids that will need to be loaded

validphys.plotutils module

Basic utilities for plotting functions.

class validphys.plotutils.ComposedHandler[source]

Bases: object

Legend artist for PDF plots.

legend_artist(legend, orig_handle, fontsize, handlebox)[source]

validphys.plotutils.HandlerSpec: alias of HandelrSpec

validphys.plotutils.add_subplot(figsize=None, projection=None, **kwargs)[source]

matplotlib.figure wrapper used to generate a figure and add a subplot.

Use matplotlib.figure.Figure() objects to avoid importing pyplot anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.

Parameters

figsize (2-tuple of floats) – default is None
projections (The projection type of the subplot (Axes).) – default is None

Returns

fig, ax = (matplotlib.figure.Figure, fig.add_subplot)

Return type

tuple

validphys.plotutils.ax_or_gca(f)[source]: A decorator. When applied to a function, the keyword argument ax will automatically be filled with the current axis, if it was None.

validphys.plotutils.ax_or_newfig(f)[source]: A decorator. When applied to a function, the keyword argument ax will automatically be filled with the a new axis corresponding to an empty, if it was None.

validphys.plotutils.barplot(values, collabels, datalabels, orientation='auto')[source]

The barplot as matplotlib should have it. It resizes on overflow. values should be one or two dimensional and should contain the values for the barplot. collabels must have as many elements as values has columns (or total elements if it is one dimensional), and contains the labels for each column in the bar plot. datalabels should have as many elements as values has rows, and contains the labels for the individual items to be compared. If orientation is "auto", the barplot will be horizontal or vertical depending on the number of items. Otherwise, the orientation can ve fixes as "horizontal" or "vertical".

Parameters

values (array of dimensions M×N or N.) – The input data.
collabels (Iterable[str] of dimensions N) – The labels for each of the bars.
datalabels (Iterable[str] of dimensions M or 1) – The label for each of the datasets to be compared.
orientation ({'auto', 'horizontal', 'vertical'}, 'optional') – The orientation of the bars.

Returns

(fig, ax) – a tuple of a matplotlib figure and an axis, like matplotlib.pyplot.subplots. The axis will have a _bar_orientation attribute that will either be ‘horizontal’ or ‘vertical’ and will correspond to the actual orientaion of the plot.

Return type

tuple

Examples

>>> import numpy as np
>>> from validphys.plotutils import barplot
>>> vals = np.random.rand(2,5)
>>> collabels = ["A", "B", "C", "D", "e"]
>>> fig, ax = barplot(vals, collabels, ['First try', 'Second try'])
>>> ax.legend()

validphys.plotutils.centered_range(n, value=0, distance=1)[source]: Generte a range of n points centered around value, unifirmely sampled at intervals of distance.

validphys.plotutils.color_iter()[source]: Yield the colors in the cycle defined in the matplotlib style. When the colores are exhausted a warning will be logged and the cycle will be repeated infinitely. Therefore this avoids the overflow error at runtime when using matplotlib’s f'C{i}' color specification (equivalent to colors[i]) when i>len(colors)

validphys.plotutils.expand_margin(a, b, proportion)[source]: Return a pair of numbers that have the same mean as (a,b) and their distance is proportion times bigger.

validphys.plotutils.frame_center(ax, x, values)[source]: Set the ylims of the axis ax to appropriately display values, which can be 1 or 2D and are assumed to be sampled uniformly in the coordinates of the plot (in the second dimension, for 2D arrays).

validphys.plotutils.hatch_iter()[source]: An infinite iterator that yields increasingly denser patterns of hatches suitable for passing as the hatch argument of matplotlib functions.

validphys.plotutils.kde_plot(a, height=0.05, ax=None, label=None, color=None, max_marks=100000)[source]

Plot a Kernel Density Estimate of a 1D array, togther with individual occurrences .

This plot provides a quick visualizaton of the distribution of one dimensional data in a more complete way than an histogram would. It produces both a Kernel Density Estimate (KDE) and individual occurences of the data (rug plot). The KDE uses a Gaussian Kernel with the Silverman rule to select the bandwidth (this is the optimal choice if the input data is Gaussian). The individual ocurrences are displayed as marks along the bottom axis. For performance reasons, and to avoid cluttering the plot, a maximum of max_marks marks are displayed; if the length of the data is bigger, a random sample of max_marks is taken.

Parameters

a (vector) – 1D array of observations.
height (scalar, optional) – Height of marks in the rug plot as proportion of the axis height.
ax (matplotlib axes, optional) – Axes to draw plot into; otherwise grabs current axes.
label (string, optional) – The label for the legend (note that you have to generate the legend yourself).
color (optional) – A matplotlib color specification, used for both the KDE and the rugplot. If not given, the next in the underlying axis cycle will be consumed and used.
max_marks (integer, optional) – The maximum number of points that will be displayed individually.

Returns

ax – The Axes object with the plot on it, allowing further customization.

Return type

matplotlib axes

Example

>>> import numpy as np
>>> dist = np.random.normal(size=100)
>>> ax = kde_plot(dist)

validphys.plotutils.marker_iter_plot()[source]: Because of the mpl strange interface, markers work differently in plots and scatter. This is the same as marker_iter_scatter, but returns kwargs to be passed to plt.plot()

validphys.plotutils.marker_iter_scatter()[source]: Yield the possible matplotplib.markers.Markersyle instances with different fillsyles and markers. This can be passed to plt.scatter. For plt.plot, use marker_iter_scatter.

validphys.plotutils.offset_xcentered(n, ax, *, offset_prop=0.05)[source]: Yield n matplotlib transforms in such a way that the corresponding n transofrmed x values are centered around the middle. The offset between to consecutive points is offset_prop in units of the figure dpi scale.

validphys.plotutils.plot_horizontal_errorbars(cvs, errors, categorylabels, datalabels=None, xlim=None)[source]: A plots with a list of horizontal errorbars oriented vertically. cvs and errors are the central values and errors both of shape ndatasets x ncategories, cateogorylabels are the labels of each element for which errorbars are drawn and datalabels are the labels of the different datasets that are compared.

validphys.plotutils.scalar_log_formatter()[source]

Return a matplotlib formatter to display powers of 10 in a log rather than exponential notation.

Returns: formatter – an object that can be passed to the set_major_formatter matplotlib functions.
Return type: ticker.FuncFormatter

Examples

>>> from matplotlib.figure import Figure
>>> fig = Figure()
>>> ax = fig.subplots()
>>> ax.plot([0.01, 0.1, 1, 10, 100])
>>> ax.set_yscale("log")
>>> ax.yaxis.set_major_formatter(scalar_log_formatter())

validphys.plotutils.spiderplot(xticks, vals, label, ax)[source]

Makes a spider/radar plot.

xticks: list of names of x tick labels, e.g. datasets vals: list of values to plot corresponding to each xtick label: label for values, e.g. fit name ax: a PolarAxes instance

validphys.plotutils.subplots(figsize=None, nrows=1, ncols=1, sharex=False, sharey=False, **kwargs)[source]

matplotlib.figure wrapper used to generate a figure and add subplots.

Use matplotlib.figure.Figure() objects to avoid importing pyplot anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.

Parameters

figsize (2-tuple of floats) – defaults is None
nrows (int, default 1) –
ncols (int, default 1) –
sharex (bool, default False) –
sharey (bool, default False) –

Returns

fig, ax = (matplotlib.figure.Figure, fig.subplots)

Return type

tuple

validphys.promptutils module

Module which extends the functionality of promp_toolkit for user inputs/interactivity

class validphys.promptutils.KeywordsWithCache(loader)[source]: Bases: object

validphys.promptutils.confirm(message, default=None)[source]

This is like prompt_toolkit.shortcuts.confirm (implemented by create_confirm_session) except that it doesn’t bind control+c to “No”, but instead raises an exception.

It also support defaults.

validphys.promptutils.yes_no_str(default=None)[source]: Return a yes or no string for the prompt, with the default highlighted

validphys.pseudodata module

Tools to obtain and analyse the pseudodata that was seen by the neural networks during the fitting.

class validphys.pseudodata.DataTrValSpec(pseudodata, tr_idx, val_idx)

Bases: tuple

pseudodata: Alias for field number 0

tr_idx: Alias for field number 1

val_idx: Alias for field number 2

exception validphys.pseudodata.ReplicaGenerationError[source]: Bases: Exception

validphys.pseudodata.indexed_make_replica(groups_index, make_replica)[source]: Index the make_replica pseudodata appropriately

validphys.pseudodata.level0_commondata_wc(data, fakepdf)[source]

Given a validphys.core.DataGroupSpec object, load commondata and generate a new commondata instance with central values replaced by fakepdf prediction

Parameters

data (validphys.core.DataGroupSpec) –
fakepdf (validphys.core.PDF) –

Returns

list of nnpdf_data.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data.

Return type

list

Example

>>> from validphys.api import API
>>> API.level0_commondata_wc(dataset_inputs = [{"dataset":"NMC"}], use_cuts="internal", theoryid=200,fakepdf = "NNPDF40_nnlo_as_01180")

[CommonData(setname=’NMC’, ndata=204, commondataproc=’DIS_NCE’, nkin=3, nsys=16)]

validphys.pseudodata.make_level1_data(data, level0_commondata_wc, filterseed, data_index, sep_mult)[source]

Given a list of Level 0 commondata instances, return the same list with central values replaced by Level 1 data.

Level 1 data is generated using validphys.make_replica. The covariance matrix, from which the stochastic Level 1 noise is sampled, is built from Level 0 commondata instances (level0_commondata_wc). This, in particular, means that the multiplicative systematics are generated from the Level 0 central values.

Note that the covariance matrix used to generate Level 2 pseudodata is consistent with the one used at Level 1 up to corrections of the order eta * eps, where eta and eps are defined as shown below:

Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0) Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL0 and CL1 means that the multiplicative entries have been constructed from Level 0 and Level 1 central values respectively.

Parameters

data (validphys.core.DataGroupSpec) –
level0_commondata_wc (list) – list of nnpdf_data.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data. Cuts already applied.
filterseed (int) – random seed used for the generation of Level 1 data
data_index (pandas.MultiIndex) –

Returns

list of nnpdf_data.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 1 fake data.

Return type

list

Example

>>> from validphys.api import API
>>> dataset='NMC'
>>> l1_cd = API.make_level1_data(dataset_inputs = [{"dataset":dataset}],use_cuts="internal", theoryid=200,
                         fakepdf = "NNPDF40_nnlo_as_01180",filterseed=1)
>>> l1_cd
[CommonData(setname='NMC', ndata=204, commondataproc='DIS_NCE', nkin=3, nsys=16)]

validphys.pseudodata.make_replica(groups_dataset_inputs_loaded_cd_with_cuts, replica_mcseed, dataset_inputs_sampling_covmat, sep_mult=False, genrep=True, max_tries=1000000, resample_negative_pseudodata=False)[source]

Function that takes in a list of nnpdf_data.coredata.CommonData objects and returns a pseudodata replica accounting for possible correlations between systematic uncertainties.

The function loops until positive definite pseudodata is generated for any non-asymmetry datasets. In the case of an asymmetry dataset negative values are permitted so the loop block executes only once.

Parameters

groups_dataset_inputs_loaded_cd_with_cuts (list[nnpdf_data.coredata.CommonData]) – List of CommonData objects which stores information about systematic errors, their treatment and description, for each dataset.
replica_mcseed (int, None) – Seed used to initialise the numpy random number generator. If None then a random seed is allocated using the default numpy behaviour.
dataset_inputs_sampling_covmat (np.array) – Full covmat to be used. It can be either only experimental or also theoretical.
sep_mult (bool) – Specifies whether computing the shifts with the full covmat or whether multiplicative errors should be separated
genrep (bool) – Specifies whether computing replicas or not
max_tries (int) – The stochastic nature of replica generation means one can obtain (unphysical) negative predictions. If after max_tries (default=1e6) no physical configuration is found, it will raise a ReplicaGenerationError
resample_negative_pseudodata (bool) – When True, replicas that produce negative predictions will be resampled for max_tries until all points are positive (default: False)

Returns

pseudodata – Numpy array which is N_dat (where N_dat is the combined number of data points after cuts) containing monte carlo samples of data centered around the data central value.

Return type

np.array

Example

>>> from validphys.api import API
>>> pseudodata = API.make_replica(
                                dataset_inputs=[{"dataset":"NMC"}, {"dataset": "NMCPD"}],
                                use_cuts="nocuts",
                                theoryid=53,
                                replica=1,
                                mcseed=123,
                                genrep=True,
                            )
array([0.25640033, 0.25986534, 0.27165461, 0.29001009, 0.30863588,
   0.30100351, 0.31781208, 0.30827054, 0.30258217, 0.32116842,
   0.34206012, 0.31866286, 0.2790856 , 0.33257621, 0.33680007,

validphys.pseudodata.read_replica_pseudodata(fit, context_index, replica)[source]

Function to handle the reading of training and validation splits for a fit that has been produced with the savepseudodata flag set to True.

The data is read from the PDF to handle the mixing introduced by postfit.

The data files are concatenated to return all the data that went into a fit. The training and validation indices are also returned so one can access the splits using pandas indexing.

Raises

FileNotFoundError – If the training or validation files for the PDF set cannot be found.
CheckError – If the use_cuts flag is not set to fromfit

Returns

data_indices_list – List of namedtuple where each entry corresponds to a given replica. Each element contains attributes pseudodata, tr_idx, and val_idx. The latter two being used to slice the former to return training and validation data respectively.

Return type

list[namedtuple]

Example

>>> from validphys.api import API
>>> data_indices_list = API.read_fit_pseudodata(fit="pseudodata_test_fit_n3fit")
>>> len(data_indices_list) # Same as nrep
10
>>> rep_info = data_indices_list[0]
>>> rep_info.pseudodata.loc[rep_info.tr_idx].head()
                            replica 1
group dataset           id
ATLAS ATLASZPT8TEVMDIST 1   30.665835
                        3   15.795880
                        4    8.769734
                        5    3.117819
                        6    0.771079

validphys.pseudodata.recreate_fit_pseudodata(_recreate_fit_pseudodata, fitreplicas, fit_masks)[source]

Function used to reconstruct the pseudodata seen by each of the Monte Carlo fit replicas.

Returns: res – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.
Return type: list[namedtuple]

Example

>>> from validphys.api import API
>>> API.recreate_fit_pseudodata(fit="pseudodata_test_fit_n3fit")

Notes

This function does not account for the postfit reshuffling.

validphys.renametools module

A collection of utility functions to handle logistics of LHAPDFs and fits. For use by vp-scripts.

class validphys.renametools.Spinner(delay=0.1)[source]

Bases: object

Context manager to provide a spinning cursor while validphys performs some other task silently.

When exececuted in a TTY, it shows a spinning cursor for the duration of the context manager. In non interactive prompts, it prints to stdout at the beginning and end.

Example

>>> from validphys.renametools import Spinner
>>> with Spinner():
...     import time
...     time.sleep(5)

property interactive

spinner_task()[source]

static spinning_cursor()[source]

validphys.renametools.change_name(initial_path, final_name)[source]: Function that takes initial fit name and final fit name and performs the renaming

validphys.renametools.rename_nnfit(nnfit_path, initial_fit_name, final_name)[source]

validphys.renametools.rename_pdf(pdf_folder, initial_fit_name, final_name)[source]

validphys.renametools.rename_postfit(postfit_path, initial_fit_name, final_name)[source]

validphys.replica_selector module

replica_selector.py

Tools for filtering replica sets based on criteria on the replicas.

validphys.replica_selector.alpha_s_bundle_pdf(pdf, pdfs, output_path, target_name: (<class 'str'>, <class 'NoneType'>) = None)[source]

Action that bundles PDFs for distributing to the LHAPDF format. The baseline pdf is declared as the pdf key and the PDFs from which the replica 0s are to be added is declared as the pdfs list.

The bundled PDF set is stored inside the output directory.

Parameters

pdf (validphys.core.PDF) – The baseline PDF to which the new replicas will be added
pdfs (list of validphys.core.PDF) – The list of PDFs from which replica0 will be appended
target_name (str or None) – Optional argument specifying the name of the output PDF. If None, then the name of the original pdf is used but with _pdfas appended

validphys.results module

results.py

Tools to obtain theory predictions and basic statistical estimators.

class validphys.results.Chi2Data(replica_result, central_result, ndata)

Bases: tuple

central_result: Alias for field number 1

ndata: Alias for field number 2

replica_result: Alias for field number 0

class validphys.results.DataResult(dataset, covmat, sqrtcovmat)[source]

Bases: StatsResult

Holds the relevant information from a given dataset

property central_value

property covmat

property label

property name

property sqrtcovmat: Lower part of the Cholesky decomposition

property std_error

class validphys.results.PositivityResult(stats)[source]

Bases: StatsResult

classmethod from_convolution(pdf, posset)[source]

class validphys.results.Result[source]: Bases: object

class validphys.results.StatsResult(stats)[source]

Bases: Result

property central_value

property error_members: Returns the error members with shape (Npoints, Npdf)

property rawdata: Returns the raw data with shape (Npoints, Npdf)

property std_error

class validphys.results.ThPredictionsResult(dataobj, stats_class, datasetnames=None, label=None, pdf=None, theoryid=None)[source]

Bases: StatsResult

Class holding theory prediction, inherits from StatsResult When created with from_convolution, it keeps tracks of the PDF for which it was computed

property datasetnames

classmethod from_convolution(pdf, dataset, central_only=False)[source]

static make_label(pdf, dataset)[source]: Deduce a reasonable label for the result based on pdf and dataspec

class validphys.results.ThUncertaintiesResult(central, std_err, label=None)[source]

Bases: StatsResult

Class holding central theory predictions and the error bar corresponding to the theory uncertainties considered. The error members of this class correspond to central +- error_bar

property central_value

property error_members: Returns the error members with shape (Npoints, Npdf)

property rawdata: Returns the raw data with shape (Npoints, Npdf)

property std_error

validphys.results.abs_chi2_data(results)[source]: Return a tuple (member_chi², central_chi², numpoints) for a given dataset

validphys.results.abs_chi2_data_thcovmat(results_with_theory_covmat)[source]: The same as abs_chi2_data but considering as well the theory uncertainties

validphys.results.chi2_stats(abs_chi2_data)[source]

Compute several estimators from the chi²:

central_mean
npoints
perreplica_mean
perreplica_std
chi2_per_data

validphys.results.count_negative_points(possets_predictions)[source]: Return the number of replicas with negative predictions for each bin in the positivity observable.

validphys.results.data_index(data)[source]

Given a core.DataGroupSpec instance, return pd.MultiIndex with the following levels:

experiment
datasets
datapoints indices (cuts already applied to)

Parameters: data (core.DataGroupSpec) –
Return type: pd.MultiIndex

validphys.results.dataset_chi2_table(chi2_stats, dataset)[source]: Show the chi² estimators for a given dataset

validphys.results.dataset_inputs_abs_chi2_data(dataset_inputs_results)[source]: Like abs_chi2_data but for a group of inputs

validphys.results.dataset_inputs_bootstrap_chi2_central(dataset_inputs_results, bootstrap_samples=500, boot_seed=123)[source]: Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of central chi2. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.

validphys.results.dataset_inputs_bootstrap_phi_data(dataset_inputs_results, bootstrap_samples=500)[source]

Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of phi. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.

For more information on how phi is calculated see phi_data

validphys.results.dataset_inputs_chi2_per_point_data(dataset_inputs_abs_chi2_data)[source]: Return the total chi²/ndata for all data, specified by dataset_inputs. Covariance matrix is fully correlated across datasets, with all known correlations.

validphys.results.dataset_inputs_phi_data(dataset_inputs_abs_chi2_data)[source]: Like phi_data but for group of datasets

validphys.results.dataset_inputs_results(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]: Like results but for a group of datasets

validphys.results.dataset_inputs_results_central(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]: Like dataset_inputs_results but for a group of datasets and replica0.

validphys.results.dataset_inputs_results_without_covmat(data, pdf: PDF)[source]: Like dataset_inputs_results but skipping the computation of the covmat

validphys.results.dataspecs_chi2_differences_table(dataspecs, dataspecs_chi2_table)[source]: Given two dataspecs, print the chi² (using dataspecs_chi2_table) and the difference between the first and the second.

validphys.results.dataspecs_chi2_table(dataspecs_total_chi2_data, dataspecs_datasets_chi2_table, dataspecs_groups_chi2_table, show_total: bool = False)[source]: Same as fits_chi2_table but for an arbitrary list of dataspecs

validphys.results.dataspecs_dataset_chi2_difference_table(dataspecs_each_dataset, dataspecs_each_dataset_chi2, dataspecs_speclabel)[source]

Returns a table with difference between the chi2 and the expected chi2 in units of the expected chi2 standard deviation, given by

chi2_diff = (chi2 - N)/sqrt(2N)

for each dataset for each dataspec.

validphys.results.dataspecs_datasets_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_datasets_chi2_data, per_point_data: bool = True)[source]: Same as fits_datasets_chi2_table but for arbitrary dataspecs.

validphys.results.dataspecs_datasets_nsigma_table(dataspecs_datasets_chi2_table)[source]: Same as dataspecs_datasets_chi2_table but for nsigma.

validphys.results.dataspecs_groups_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_groups_chi2_data, per_point_data: bool = True)[source]: Same as fits_groups_chi2_table but for an arbitrary list of dataspecs.

validphys.results.dataspecs_groups_nsigma_table(dataspecs_groups_chi2_table)[source]: Same as fits_groups_nsigma_table but for an arbitrary list of dataspecs.

validphys.results.dataspecs_nsigma_table(dataspecs_total_chi2_data, dataspecs_datasets_nsigma_table, dataspecs_groups_nsigma_table, show_total: bool = False)[source]: Same as fits_nsigma_table but for an arbitrary list of dataspecs

validphys.results.experiments_chi2_stats(total_chi2_data)[source]

Compute several estimators from the chi² for an aggregate of experiments:

central_mean

npoints

perreplica_mean

perreplica_std

chi2_per_data

validphys.results.experiments_covmat_no_table(experiments_data, experiments_index, experiments_covmat_collection)[source]: Makes the total experiments covariance matrix, which can then be reindexed appropriately by the chosen grouping. The covariance matrix must first be grouped by experiments to ensure correlations within experiments are preserved.

validphys.results.experiments_index(experiments_data, diagonal_basis=False)[source]

validphys.results.experiments_invcovmat(experiments_data, experiments_index, experiments_covmat_collection)[source]: Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.

validphys.results.experiments_sqrtcovmat(experiments_data, experiments_index, experiments_sqrt_covmat)[source]: Like experiments_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.

validphys.results.fits_chi2_table(fits_total_chi2_data, fits_datasets_chi2_table, fits_groups_chi2_table, show_total: bool = False)[source]: Show the chi² of each and number of points of each dataset and experiment of each fit, where experiment is a group of datasets according to the experiment key in the PLOTTING info file, computed with the theory corresponding to the fit. Dataset that are not included in some fit appear as NaN

validphys.results.fits_datasets_chi2_table(fits_name_with_covmat_label, fits_groups, fits_datasets_chi2_data, per_point_data: bool = True)[source]: A table with the chi2 for each included dataset in the fits, computed with the theory corresponding to the fit. The result are indexed in two levels by experiment and dataset, where experiment is the grouping of datasets according to the experiment key in the PLOTTING info file. If points_per_data is True, the chi² will be shown divided by ndata. Otherwise they will be absolute.

validphys.results.fits_datasets_nsigma_table(fits_datasets_chi2_table)[source]: A table with nsigma values for each dataset included in the fit. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.

validphys.results.fits_groups_chi2_table(fits_name_with_covmat_label, fits_groups, fits_groups_chi2_data, per_point_data: bool = True)[source]

A table with the chi2 computed with the theory corresponding to each fit for all datasets in the fit, grouped according to a key in the metadata, the grouping can be controlled with metadata_group.

If points_per_data is True, the chi² will be shown divided by ndata. Otherwise chi² values will be absolute.

validphys.results.fits_groups_nsigma_table(fits_groups_chi2_table)[source]: Similar to fits_groups_chi2_table but for nsigma. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.

validphys.results.fits_groups_phi_table(fits_name_with_covmat_label, fits_groups, fits_groups_phi)[source]: For every fit, returns phi and number of data points for each group of datasets, which are grouped according to a key in the metadata. The behaviour of the grouping can be controlled with metadata_group runcard key.

validphys.results.fits_nsigma_table(fits_total_chi2_data, fits_datasets_nsigma_table, fits_groups_nsigma_table, show_total: bool = False)[source]: Show the nsigma of each and number of points of each dataset and experiment for each fit, computed with the theory corresponding to the fit. Datasets that are not included in one of the fit appear as NaN

validphys.results.group_result_central_table_no_table(groups_results_central, groups_index)[source]: Generate a table containing the data central value and the central prediction

validphys.results.group_result_table(group_result_table_no_table)[source]: Duplicate of group_result_table_no_table but with a table decorator.

validphys.results.group_result_table_68cl(groups_results, group_result_table_no_table: DataFrame, pdf: PDF)[source]: Generate a table containing the data central value, the data 68% confidence levels, the central prediction, and 68% confidence level bounds of the prediction.

validphys.results.group_result_table_no_table(groups_results, groups_index)[source]: Generate a table containing the data central value, the central prediction, and the prediction for each PDF member.

validphys.results.groups_central_values(group_result_central_table_no_table)[source]: Duplicate of groups_central_values_no_table but takes group_result_table rather than groups_central_values_no_table, and has a table decorator.

validphys.results.groups_central_values_no_table(group_result_central_table_no_table)[source]: Returns a theoryid-dependent list of central theory predictions for a given group.

validphys.results.groups_chi2_table(groups_data, pdf, groups_chi2, groups_each_dataset_chi2)[source]: Return a table with the chi² to the groups and each dataset in the groups, grouped by metadata.

validphys.results.groups_corrmat(groups_covmat)[source]: Generates the grouped experimental correlation matrix with groups_covmat as input

validphys.results.groups_covmat(groups_covmat_no_table)[source]: Duplicate of groups_covmat_no_table but with a table decorator.

validphys.results.groups_covmat_no_table(experiments_covmat_no_table, groups_index)[source]

Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:

group name

dataset name

index of the point within the dataset.

validphys.results.groups_data_values(group_result_table)[source]: Returns list of data values for the input groups.

validphys.results.groups_index(groups_data, diagonal_basis=False)[source]

Return a pandas.MultiIndex with levels for group, dataset and point respectively, the group is determined by a key in the dataset metadata, and controlled by metadata_group key in the runcard.

In case diagonal_basis is True, the dataset name is replaced by the eigenmode, because individual datasets appear mixed in the diagonal basis

Example

TODO: add example

validphys.results.groups_invcovmat(experiments_invcovmat, groups_index)[source]: Like experiments_invcovmat but relabelled to the chosen grouping.

validphys.results.groups_normcovmat(groups_covmat, groups_data_values)[source]: Calculates the grouped experimental covariance matrix normalised to data.

validphys.results.groups_sqrtcovmat(experiments_sqrtcovmat, groups_index)[source]: Like experiments_sqrtcovmat but relabelled to the chosen grouping.

validphys.results.one_or_more_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), covariance_matrix, sqrt_covmat, pdfs: (<class 'NoneType'>, <class 'collections.abc.Sequence'>) = None, pdf: (<class 'NoneType'>, <class 'validphys.core.PDF'>) = None)[source]: Generate a list of results, where the first element is the data values, and the next is either the prediction for pdf or for each of the pdfs. Which of the two is selected intelligently depending on the namespace, when executing as an action.

validphys.results.pdf_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), pdfs: ~collections.abc.Sequence, covariance_matrix, sqrt_covmat)[source]: Return a list of results, the first for the data and the rest for each of the PDFs.

validphys.results.perreplica_chi2_table(groups_data, groups_chi2, total_chi2_data)[source]: Chi² per point for each replica for each group. Also outputs the total chi² per replica. The columns come in two levels: The first is the name of the group, and the second is the number of points.

validphys.results.phi_data(abs_chi2_data)[source]

Calculate phi using values returned by abs_chi2_data.

Returns tuple of (float, int): (phi, numpoints)

For more information on how phi is calculated see Eq.(24) in 1410.8849

validphys.results.positivity_predictions_data_result(pdf, posdataset)[source]: Return an object containing the values of the positivuty observable.

validphys.results.predictions_by_kinematics_table(results, kinematics_table_notable)[source]: Return a table combining the output of validphys.kinematics.kinematics_table`() with the data and theory central values.

validphys.results.proc_result_table(proc_result_table_no_table)[source]

validphys.results.proc_result_table_experiment(procs_results_experiment, experiments_index)[source]

validphys.results.proc_result_table_no_table(procs_results, procs_index)[source]

validphys.results.procs_central_values(procs_central_values_no_table)[source]

validphys.results.procs_central_values_no_table(proc_result_table_no_table)[source]

validphys.results.procs_chi2_table(procs_data, pdf, groups_chi2_by_process, groups_each_dataset_chi2_by_process)[source]: Same as groups_chi2_table but by process

validphys.results.procs_corrmat(procs_covmat)[source]

validphys.results.procs_covmat(procs_covmat_no_table)[source]

validphys.results.procs_covmat_no_table(experiments_covmat_no_table, procs_index)[source]

validphys.results.procs_data_values(proc_result_table)[source]: Like groups_data_values but grouped by process.

validphys.results.procs_data_values_experiment(proc_result_table_experiment)[source]: Like groups_data_values but grouped by experiment.

validphys.results.procs_index(procs_data)[source]

validphys.results.procs_normcovmat(procs_covmat, procs_data_values)[source]

validphys.results.relabel_experiments_to_groups(input_covmat, groups_index)[source]: Takes a covmat grouped by experiments and relabels it by groups. This allows grouping over experiments to preserve experimental correlations outwith the chosen grouping.

validphys.results.results(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]

Tuple of data and theory results for a single pdf. The data will have an associated covariance matrix, which can include a contribution from the theory covariance matrix which is constructed from scale variation.

The theory is specified as part of the dataset (a remnant of the old C++ layout) A group of datasets is also allowed.

validphys.results.results_central(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]: Same as results() but only calculates the prediction for replica0.

validphys.results.results_with_scale_variations(results, theory_covmat_dataset)[source]

Use the theory covariance matrix to generate a ThPredictionsResult-compatible object modified so that its uncertainties correspond to a combination of the PDF and theory (scale variations) errors added in quadrature. This allows to plot results including scale variations

By doing this we lose all information about prediction for the individual replicas or theories

validphys.results.results_with_theory_covmat(dataset, results, theory_covmat_dataset)[source]: Returns results with a modfy DataResult such that the covariance matrix includes also the theory covmat. This can be used to make use of results that consider scale variations without including the theory covmat as part of the covariance matrix used by other validphys function. Most notably, this can be used to compute the chi2 including theory errors while plotting data theory covariance in which the experimental uncertainties are not stained by the thcovmat

validphys.results.results_without_covmat(dataset: DataSetSpec, pdf: PDF)[source]: Return a results object with a diagonal covmat so that it can be used to generate results-depending covmats elsewhere. Uses :py:funct:`results` under the hook

validphys.results.theory_description(theoryid)[source]: A table with the theory settings.

validphys.results.total_chi2_data_from_experiments(experiments_chi2_data, pdf)[source]

Like dataset_inputs_abs_chi2_data(), except sums the contribution from each experiment which is more efficient in the case that the total covariance matrix is block diagonal in experiments.

This is valid as long as there are no cross experiment correlations from e.g. theory covariance matrices.

validphys.results.total_chi2_per_point_data(total_chi2_data)[source]

validphys.results.total_phi_data_from_experiments(experiments_phi_data)[source]

Like dataset_inputs_phi_data() except calculate phi for each experiment and then sum the contributions. Note that since the definition of phi is

phi = sqrt( (<chi2[T_k]> - chi2[<T_k>]) / n_data ),

where k is the replica index, the total phi is

sqrt( sum(n_data*phi**2) / sum(n_data) )

where the sums run over experiment

This is only a valid method of calculating total phi provided that there are no inter-experimental correlations.

validphys.reweighting module

Utilities for reweighting studies.

Implements utilities for calculating the NNPDF weights and unweighted PDF sets. It also allows for some basic statistics.

validphys.reweighting.chi2_data_for_reweighting_experiments(chi2_data_for_reweighting_experiments_inner, use_t0)[source]

validphys.reweighting.make_pdf_from_filtered_outliers(fit, chi2filtered_index, set_name: str, output_path=None, installgrid: bool = True)[source]: Produce a new grid with the result of chi2filtered_index

validphys.reweighting.make_unweighted_pdf(pdf, unweighted_index, set_name: str, output_path=None, installgrid: bool = True)[source]: Generate an unweighted PDF set, from the prior pdf and the reweighting_experiments. The PDF is written to a pdfsets directory of the output folder. Return the relative path of the newly created PDF.

validphys.reweighting.nnpdf_weights(chi2_data_for_reweighting_experiments)[source]: Compute the replica weights according to the NNPDF formula.

validphys.reweighting.nnpdf_weights_numerator(chi2_data_for_reweighting_experiments)[source]: Compute the numerator of the NNPDF weights. This is useful for P(α), which uses a different normalization.

validphys.reweighting.p_alpha_study(chi2_data_for_reweighting_experiments)[source]: Compute P(α) in an automatic range

validphys.reweighting.plot_p_alpha(p_alpha_study)[source]: Plot the results of p_alpha_study.

validphys.reweighting.reweighting_stats(pdf, nnpdf_weights, p_alpha_study)[source]

Compute various statistics related to reweighting.

Those are:

Number of initial replicas.
Effective number of replicas.
Median of the weightd.
The maximum value of P(alpha) in some sensible range.

validphys.reweighting.unweighted_index(nnpdf_weights, nreplicas: int = 100)[source]: The index of the input replicas that corresponds to an unweighted set, for the given weights. This can be saved for testing purposes.

validphys.sumrules module

sumrules.py

Module for the computation of sum rules

Note that this contains only the code for the computation of sum rules from scratch using LHAPDF tables. The code reading the sum rule information output from the fit is present in fitinfo.py

validphys.sumrules.bad_replica_sumrules(pdf, sum_rules, threshold: Real = 0.01)[source]: Return a table with the sum rules for the replica where some sum rule is farther from the correct value than threshold (in absolute value).

validphys.sumrules.central_sum_rules(pdf: PDF, Q: Real)[source]: Compute the sum rules for the central member, at the scale Q

validphys.sumrules.central_sum_rules_table(central_sum_rules)[source]: Construct a table with the value of each sum rule for the central member

validphys.sumrules.partial_polarized_sum_rules(pdf: PDF, Q: Real, lims: tuple = ((0.0001, 0.001), (0.001, 1)))[source]: Compute the partial low- and large-x polarized sum rules. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.polarized_sum_rules(partial_polarized_sum_rules)[source]: Compute the full polarized sum rules. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.polarized_sum_rules_table(polarized_sum_rules)[source]: Return a table with the descriptive statistics of the polarized sum rules, over members of the PDF.

validphys.sumrules.sum_rules(pdf: PDF, Q: Real)[source]: Compute the momentum, uvalence, dvalence, svalence and cvalence sum rules for each member, at the energy scale Q. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.sum_rules_table(sum_rules)[source]: Return a table with the descriptive statistics of the sum rules, over members of the PDF.

validphys.sumrules.unknown_sum_rules(pdf: PDF, Q: Real)[source]: Compute the following integrals - u momentum fraction - ubar momentum fraction - d momentum fraction - dbar momentum fraction - s momentum fraction - sbar momentum fraction - cp momentum fraction - cm momentum fraction - g momentum fraction - T3 - T8

validphys.sumrules.unknown_sum_rules_table(unknown_sum_rules)[source]

validphys.tableloader module

#tableloader.py

Load from file some of the tables that validphys produces. Contrary to validphys.loader this module consists of functions that take absolute paths, and return mostly dataframes.

exception validphys.tableloader.TableLoaderError[source]

Bases: Exception

Errors in the tableloader module.

validphys.tableloader.combine_pseudoreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2)[source]: Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.

validphys.tableloader.combine_pseudorreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2): Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.

validphys.tableloader.fixup_header(df, head_index, dtype)[source]: Set the type of the column index in place

validphys.tableloader.get_extrasum_slice(df, components)[source]: Extract a slice of a table that has the components in the format that extra_sums expects.

validphys.tableloader.load_adapted_fits_chi2_table(filename)[source]: Load the fits_chi2_table and adapt it in the way that suits the paramfits module. That is, return a table with the total chi² and another with the number of points.

validphys.tableloader.load_experiments_covmat(filename): Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.load_experiments_invcovmat(filename): Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.load_fits_chi2_table(filename)[source]: Load the result of fits_chi2_tavle or similar.

validphys.tableloader.load_perreplica_chi2_table(filename)[source]: Load the output of perreplica_chi2_table.

validphys.tableloader.parse_data_cv(filename)[source]: Useful for reading DataFrames with just one column.

validphys.tableloader.parse_exp_mat(filename)[source]: Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.set_actual_column_level0(df, new_levels)[source]: Set the first level of the index to new_levels. Note: This is a separate function mostly because it breaks in every patch update of pandas.

validphys.theoryinfo module

theoryinfo.py

Actions for displaying theory info for one or more theories.

validphys.theoryinfo.all_theory_info_table(theory_database)[source]

Produces a DataFrame with all theory info and saves it

Returns: all_theory_info_table – dataframe filled with all entries in theorydb file
Return type: pd.Dataframe

Example

>>> from validphys.api import API
>>> df = API.all_theory_info_table()
>>> df['Comments'].iloc[:5]
ID
1                 3.0 LO benchmark
2                3.0 NLO benchmark
3               3.0 NNLO benchmark
4     3.0 NLO - Q0=1.3 For IC Test
5    3.0 NNLO - Q0=1.3 For IC Test
Name: Comments, dtype: object

validphys.theoryinfo.theory_info_table(theory_database, theory_db_id)[source]

fetches theory info for given theory_db_id constructs DataFrame from it

Parameters: theory_db_id (int) – numeric identifier of theory to be queried. Can be specified at the runcard level.
Returns: theory_info_table – dataframe filled with theory info for specified theory_db_id
Return type: pd.Dataframe

Example

>>> from validphys.api import API
>>> df = API.theory_info_table(theory_db_id=53)
>>> df.loc['Comments']
Info for theory 53    NNPDF3.1 NNLO central
Name: Comments, dtype: object

validphys.uploadutils module

uploadutils.py

Tools to upload resources to remote servers.

class validphys.uploadutils.ArchiveUploader[source]

Bases: FileUploader

Uploader for objects comprising many files such as fits or PDFs

get_relative_path(output_path=None)[source]: Return the relative path to the target_dir.

root_url = None

target_dir = None

upload_context(output_path, force)[source]: Before entering the context, check that uploading is feasible. On exiting the context, upload output.

upload_or_exit_context(output, force)[source]: Like upload context, but log and sys.exit on error

upload_output(output_path, force)[source]: Rsync output_path to the server and print the resulting URL. If specific_file is given

exception validphys.uploadutils.BadSSH[source]: Bases: UploadError

class validphys.uploadutils.FileUploader[source]

Bases: Uploader

Uploader for individual files for single-file resources. It does the ” “same but prints the URL of the file.

upload_context(output_and_file)[source]: Before entering the context, check that uploading is feasible. On exiting the context, upload output.

class validphys.uploadutils.FitUploader[source]

Bases: ArchiveUploader

An uploader for fits. Fits will be automatically compressed before uploading.

check_fit_md5(output_path)[source]: When vp-setupfit is run successfully, it creates an md5 from the config. We check that the md5 matches the filter.yml which is checking that vp-setupfit was run and that the filter.yml inside the fit folder wasn’t modified.

property root_url

property target_dir

upload_output(output_path, force)[source]: Rsync output_path to the server and print the resulting URL. If specific_file is given

class validphys.uploadutils.HyperscanUploader[source]

Bases: FitUploader

Uploader for hyperopt scans, which are just special cases of fits

property root_url

property target_dir

class validphys.uploadutils.PDFUploader[source]

Bases: ArchiveUploader

An uploader for PDFs. PDFs will be automatically compressed before uploading.

property root_url

property target_dir

class validphys.uploadutils.ReportFileUploader[source]: Bases: FileUploader, ReportUploader

class validphys.uploadutils.ReportUploader[source]

Bases: Uploader

An uploader for validphys reports.

property root_url

property target_dir

exception validphys.uploadutils.UploadError[source]: Bases: Exception

class validphys.uploadutils.Uploader[source]

Bases: object

Base class for implementing upload behaviour. The main abstraction is a context manager upload_context which checks that the upload seems possible, then does the work inside the context and then uploads the result. The various derived classes should be used.

check_auth()[source]: Check that we can authenticate with a certificate.

check_rsync()[source]: Check that the rsync command exists

check_upload()[source]: Check that it looks possible to upload something. Raise an UploadError if not.

get_relative_path(output_path)[source]: Return the relative path to the target_dir.

upload_context(output)[source]: Before entering the context, check that uploading is feasible. On exiting the context, upload output.

property upload_host

upload_or_exit_context(output)[source]: Like upload context, but log and sys.exit on error

upload_output(output_path)[source]: Rsync output_path to the server and print the resulting URL. If specific_file is given

validphys.uploadutils.check_for_meta(path)[source]

Function that checks if a report input has a meta.yaml file. If not it prompts the user to either create one or follow an interactive prompt which assists the user in creating one.

Parameters: path (pathlib.Path) – Input path
Return type: None

validphys.uploadutils.check_input(path)[source]

A function that checks the type of the input for vp-upload. The type determines where on the vp server the file will end up

A fit is defined as any folder structure containing a filter.yml file at its root.

A pdf is defined as any folder structure that contains a .info file and a replica 0 at its root.

A report is defined as any folder structure that contains an index.html at its root.

If the input file does not fall under any such category ValueError exception is raised and the user is prompted to use either rsync or validphys.scripts.wiki_upload.

Parameters: path (pathlib.Path) – Path of the input file

validphys.uploadutils.interactive_meta(path)[source]

Function to interactively create a meta.yaml file

Parameters: path (pathlib.Path) – Input path
Return type: None

validphys.utils module

validphys.utils.common_prefix(*s)[source]: Return the longest string that is a prefix to both s1 and s2

validphys.utils.experiments_to_dataset_inputs(experiments_list)[source]

Flatten a list of old style experiment inputs to the new, flat, dataset_inputs style.

Example

>>> from validphys.api import API
>>> from validphys.utils import experiments_to_dataset_inputs
>>> fit = API.fit(fit='NNPDF31_nnlo_as_0118_1000')
>>> experiments = fit.as_input()['experiments']
>>> dataset_inputs = experiments_to_dataset_inputs(experiments)
>>> dataset_inputs[:3]
[{'dataset': 'NMCPD', 'frac': 0.5},
 {'dataset': 'NMC', 'frac': 0.5},
 {'dataset': 'SLACP', 'frac': 0.5}]

validphys.utils.sane_groupby_iter(df, by, *args, **kwargs)[source]

Iterate groupby in such a way that first value is always the tuple of the common values.

As a concenience for plotting, if by is None, yield the empty string and the whole dataframe.

validphys.utils.scale_from_grid(grid)[source]: Guess the appropriate matplotlib scale from a grid object. Returns 'linear' if the scale of the grid object is linear, and otherwise ' log'.

validphys.utils.split_by(it, crit)[source]: Split it in two lists, the first is such that crit evaluates to True and the second such it doesn’t. Crit can be either a function or an iterable (in this case the original it will be sliced if the length of crit is smaller).

validphys.utils.split_ranges(a, cond=None, *, filter_falses=False)[source]: Split a so that each range has the same value for cond . If filter_falses is true, only the ranges for which the condition is true will be returned.

validphys.utils.tempfile_cleaner(root, exit_func, exc, prefix=None, **kwargs)[source]

A context manager to handle temporary directory creation and clean-up upon raising an expected exception.

Parameters

root (str) – The root directory to create the temporary directory in.
exit_func (Callable) – The exit function to call upon exiting the context manager. Usually one of shutil.move or shutil.rmtree. Use the former if the temporary directory will be the final result directory and the latter if the temporary directory will contain the result directory, for example when downloading a resource.
exc (Exception) – The exception to catch within the with block.
prefix (optional[str]) – A prefix to prepend to the temporary directory.
**kwargs (dict) – Keyword arguments to provide to exit_func.

Returns

tempdir – The path to the temporary directory.

Return type

pathlib.Path

Example

The following example creates a temporary directory prepended with tutorial_ in the /tmp directory. The context manager will listen for a KeyboardInterrupt and will clean up if this exception is raised. Upon completion of the with block, it will rename the temporary to completed as the dst, using shutil.move. The final directory will contain an empty file called new_file, which we created within the with block.

  import shutil

  from validphys.utils import tempfile_cleaner

  with tempfile_cleaner(
      root="/tmp",
      exit_func=shutil.move,
      exc=KeyboardInterrupt,
      prefix="tutorial_",
      dst="completed",
  ) as tempdir:
      new_file = tempdir / "new_file"
      input("Press enter to continue or Ctrl-C to interrupt:\n")
      new_file.touch()

validphys package

Subpackages

Submodules

validphys.api module

Example:

validphys.app module

validphys.arclength module

validphys.asy_exponents module

validphys.calcutils module

validphys.checks module

validphys.commondata module

validphys.config module

Known keys:

validphys.convolution module

validphys.core module

validphys.coredata module

validphys.correlations module

validphys.covmats module

validphys.covmats_utils module

validphys.dataplots module

validphys.deltachi2 module

validphys.eff_exponents module

validphys.filters module

validphys.fitdata module

Parameter

validphys.fitveto module

validphys.fkparser module

Parameters

validphys.gridvalues module

validphys.hessian2mc module

validphys.hyper_algorithm module

validphys.hyperoptplot module

validphys.kinematics module

validphys.lhaindex module

validphys.lhapdf_compatibility module

Parameters:

Returns:

validphys.lhapdfset module

validphys.lhio module

validphys.loader module

validphys.mc2hessian module

validphys.mc_gen module

validphys.n3fit_data module

validphys.n3fit_data_utils module

validphys.overfit_metric module

validphys.pdfbases module

validphys.pdfgrids module

validphys.pdfoutput module

validphys.pdfplots module

validphys.pineparser module

validphys.plotutils module

validphys.promptutils module

validphys.pseudodata module

validphys.renametools module

validphys.replica_selector module

validphys.results module

validphys.reweighting module

validphys.sumrules module

validphys.tableloader module

validphys.theoryinfo module

validphys.uploadutils module

validphys.utils module

Module contents