validphys package

Subpackages

Submodules

validphys.api module

api.py

This module contains the reportengine programmatic API, initialized with the validphys providers, Config and Environment.

Example:

Simple Usage:

>> from validphys.api import API >> fig = API.plot_pdfs(pdf=”NNPDF_nlo_as_0118”, Q=100) >> fig.show()

validphys.app module

app.py

Mainloop of the validphys application. Here we define tailoted extensions to the reporthengine application (such as extra command line flags). Additionally the provider modules that serve as source to the validphys actions are declared here.

The entry point of the validphys application is the main funcion of this module.

class validphys.app.App(name='validphys', providers=['validphys.results', 'validphys.commondata', 'validphys.pdfgrids', 'validphys.pdfplots', 'validphys.dataplots', 'validphys.fitdata', 'validphys.arclength', 'validphys.sumrules', 'validphys.reweighting', 'validphys.kinematics', 'validphys.correlations', 'validphys.eff_exponents', 'validphys.asy_exponents', 'validphys.theorycovariance.construction', 'validphys.theorycovariance.output', 'validphys.theorycovariance.tests', 'validphys.replica_selector', 'validphys.closuretest', 'validphys.mc_gen', 'validphys.theoryinfo', 'validphys.pseudodata', 'validphys.renametools', 'validphys.covmats', 'validphys.hyperoptplot', 'validphys.deltachi2', 'validphys.n3fit_data', 'validphys.mc2hessian', 'reportengine.report', 'validphys.overfit_metric', 'validphys.hessian2mc'])[source]

Bases: App

property argparser
config_class

alias of Config

critical_message = 'A critical error occurred. This is likely due to one of the following reasons:\n\n - A bug in validphys.\n - Corruption of the provided resources (e.g. incorrect plotting files).\n - Cosmic rays hitting your CPU and altering the registers.\n\nThe traceback above should help determine the cause of the problem. If you\nbelieve this is a bug in validphys (please discard the cosmic rays first),\nplease open an issue on GitHub<https://github.com/NNPDF/nnpdf/issues>,\nincluding the contents of the following file:\n\n%s\n'
property default_style
environment_class

alias of Environment

init()[source]
run()[source]

TODO

static upload_context(do_upload, output)[source]

If do_upload is False, do notihing. Otherwise, on enter, check the requiements for uploading and on exit, upload the output path if do_upload is True. Otherwise do nothing. Raise SystemExit on error.

validphys.app.main()[source]

validphys.arclength module

arclength.py

Module for the computation and presentation of arclengths.

class validphys.arclength.ArcLengthGrid(pdf, basis, flavours, stats)

Bases: tuple

basis

Alias for field number 1

flavours

Alias for field number 2

pdf

Alias for field number 0

stats

Alias for field number 3

validphys.arclength.arc_length_table(arc_lengths)[source]

Return a table with the descriptive statistics of the arc lengths over members of the PDF.

validphys.arclength.arc_lengths(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Compute arc lengths at scale Q

set up a grid with three segments and compute the arclength for each segment. Note: the variation of the PDF over the grid is computed by computing the forward differences between adjacent grid points.

Parameters:
  • pdf (validphys.core.PDF object)

  • Q (float) – scale at which to evaluate PDF

  • basis (default = "flavour")

  • flavours (default = None)

Returns:

  • validphys.arclength.ArcLengthGrid object

  • object that contains the PDF, basis, flavours, and computed

  • arc length statistics.

validphys.arclength.integrability_number(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'evolution', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return sum_i |x_i*f(x_i)|, x_i = {1e-9, 1e-8, 1e-7} for selected flavours

validphys.arclength.plot_arc_lengths(pdfs_arc_lengths: ~collections.abc.Sequence, Q: ~numbers.Real, normalize_to: (<class 'NoneType'>, <class 'int'>) = None)[source]

Plot the arc lengths of provided pdfs

validphys.asy_exponents module

Tools for computing and plotting asymptotic exponents.

class validphys.asy_exponents.AsyExponentBandPlotter(exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter

Class inheriting from BandPDFPlotter, changing title and ylabel to reflect the asymptotic exponent being plotted.

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
validphys.asy_exponents.alpha_asy(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent alpha, as defined by the first relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.asymptotic_exponents_table(pdf: ~validphys.core.PDF, *, x_alpha: ~numbers.Real = 1e-06, x_beta: ~numbers.Real = 0.9, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, npoints=100)[source]

Returns a table with the values of the asymptotic exponents alpha and beta, as defined in Eq. (4) of [arXiv:1604.00024], at the specified value of x and Q.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.beta_asy(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns a list of xplotting_grids containing the value of the asymptotic exponent beta, as defined by the second relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.

validphys.asy_exponents.plot_alpha_asy(pdfs, alpha_asy_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plots the alpha asymptotic exponent

validphys.asy_exponents.plot_beta_asy(pdfs, beta_asy_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plots the beta asymptotic exponent

validphys.calcutils module

calcutils.py

Low level utilities to calculate χ² and such. These are used to implement the higher level functions in results.py

validphys.calcutils.all_chi2(results)[source]

Return the chi² for all elements in the result, regardless of the Stats class Note that the interpretation of the result will depend on the PDF error type

validphys.calcutils.all_chi2_theory(results, totcov)[source]

Like all_chi2 but here the chi² are calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.bootstrap_values(data, nresamples, *, boot_seed: int = None, apply_func: Callable = None, args=None)[source]

General bootstrap sample

data is the data which is to be sampled, replicas is assumed to be on the final axis e.g N_bins*N_replicas

boot_seed can be specified if the user wishes to be able to take exact same bootstrap samples multiple times, as default it is set as None, in which case a random seed is used.

If just data and nresamples is provided, then bootstrap_values creates N resamples of the data, where each resample is a Monte Carlo selection of the data across replicas. The mean of each resample is returned

Alternatively, the user can specify a function to be sampled apply_func plus any additional arguments required by that function. bootstrap_values then returns apply_func(bootstrap_data, *args) where bootstrap_data.shape = (data.shape, nresamples). It is critical that apply_func can handle data input in this format.

validphys.calcutils.calc_chi2(sqrtcov, diffs)[source]

Elementary function to compute the chi², given a Cholesky decomposed lower triangular part and a vector of differences.

Parameters:
  • sqrtcov (matrix) – A lower tringular matrix corresponding to the lower part of the Cholesky decomposition of the covariance matrix.

  • diffs (array) – A vector of differences (e.g. between data and theory). The first dimenssion must match the shape of sqrtcov. The computation will be broadcast over the other dimensions.

Returns:

chi2 – The result of the χ² for each vector of differences. Will have the same shape as diffs.shape[1:].

Return type:

array

Notes

This function computes the χ² more efficiently and accurately than following the direct definition of inverting the covariance matrix, \(\chi^2 = d\Sigma^{-1}d\), by solving the triangular linear system instead.

Examples

>>> from validphys.calcutils import calc_chi2
>>> import numpy as np
>>> import scipy.linalg as la
>>> np.random.seed(0)
>>> diffs = np.random.rand(10)
>>> s = np.random.rand(10,10)
>>> cov = s@s.T
>>> calc_chi2(la.cholesky(cov, lower=True), diffs)
44.64401691354948
>>> diffs@la.inv(cov)@diffs
44.64401691354948
validphys.calcutils.calc_phi(sqrtcov, diffs)[source]

Low level function which calculates phi given a Cholesky decomposed lower triangular part and a vector of differences. Primarily used when phi is to be calculated independently from chi2.

The vector of differences diffs is expected to have N_bins on the first axis

validphys.calcutils.central_chi2(results)[source]

Calculate the chi² from the central value of the theory prediction to the data

validphys.calcutils.central_chi2_theory(results, totcov)[source]

Like central_chi2 but here the chi² is calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.

validphys.calcutils.get_df_block(matrix: DataFrame, key: str, level)[source]

Given a pandas dataframe whose index and column keys match, and data represents a symmetric matrix returns a diagonal block of this matrix corresponding to matrix`[key, key`] as a numpy array

addtitionally, the user can specify the level of the key for which the cross section is being taken, by default it is set to 1 which corresponds to the dataset level of a theory covariance matrix

validphys.calcutils.regularize_covmat(covmat: array, norm_threshold=4)[source]

Given a covariance matrix, performs a regularization which is equivalent to performing regularize_l2 on the sqrt of covmat: the l2 norm of the inverse of the correlation matrix calculated from covmat is set to be less than or equal to norm_threshold. If the input covmat already fulfills this criterion it is returned.

Parameters:
  • covmat (array) – a covariance matrix which is to be regularized.

  • norm_threshold (float) – The acceptable l2 norm of the sqrt correlation matrix, by default set to 4.

Returns:

new_covmat – A new covariance matrix which has been regularized according to prescription above.

Return type:

array

validphys.calcutils.regularize_l2(sqrtcov, norm_threshold=4)[source]

Return a regularized version of sqrtcov.

Given sqrtcov an (N, nsys) matrix, such that it’s gram matrix is the covariance matrix (covmat = sqrtcov@sqrtcov.T), first decompose it like sqrtcov = D@A, where D is a positive diagonal matrix of standard deviations and A is the “square root” of the correlation matrix, corrmat = A@A.T. Then produce a new version of A which removes the unstable behaviour and assemble a new square root covariance matrix, which is returned.

The stability condition is controlled by norm_threshold. It is

\[\left\Vert A^+ \right\Vert_{L2} \leq \frac{1}{\text{norm_threshold}}\]

A+ is the pseudoinverse of A, norm_threshold roughly corresponds to the sqrt of the maximimum relative uncertainty in any systematic.

Parameters:
  • sqrtcov (2d array) – An (N, nsys) matrix specifying the uncertainties.

  • norm_threshold (float) – The tolerance for the regularization.

Returns:

newsqrtcov – A regularized version of sqrtcov.

Return type:

2d array

validphys.checks module

validphys.checks.check_at_least_two_replicas(pdf)[source]
validphys.checks.check_can_save_grid(ns, **kwags)[source]
validphys.checks.check_cuts_considered(use_cuts)[source]
validphys.checks.check_cuts_fromfit(use_cuts)[source]
validphys.checks.check_darwin_single_process(NPROC)[source]

Check that if we are on macOS (platform is Darwin), NPROC is equal to 1. This is related to the infamous issues with multiprocessing on macOS.

The “solution” is to run the code sequentially if NPROC is 1 and enforce that macOS users don’t set NPROC as anything else.

TODO: Once pseudodata is generated in python, try using spawn instead of fork with multiprocessing.

Notes

for the specific NNPDF issue: https://github.com/NNPDF/nnpdf/issues/931

General discussion: https://wefearchange.org/2018/11/forkmacos.rst.html

validphys.checks.check_data_cuts_match_theorycovmat(data, fitthcovmat)[source]
validphys.checks.check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat)[source]
validphys.checks.check_dataspecs_fits_different(dataspecs_fit)[source]

Need this check because oterwise the pandas object gets confused

validphys.checks.check_fits_different(fits)[source]

Need this check because oterwise the pandas object gets confused

validphys.checks.check_has_fitted_replicas(ns, **kwargs)[source]
validphys.checks.check_have_two_pdfs(pdfs)[source]
validphys.checks.check_know_errors(ns, **kwargs)[source]
validphys.checks.check_mixband_as_replicas(pdfs, mixband_as_replicas)[source]

Same as check_pdfs_noband, but for the mixband_as_replicas key. Allows mixband_as_replicas to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_norm_threshold(norm_threshold)[source]

Check norm_threshold is not None

validphys.checks.check_not_using_pdferr(use_pdferr=False, **kwargs)[source]
validphys.checks.check_pdf_is_hessian(pdf, **kwargs)[source]
validphys.checks.check_pdf_is_montecarlo(ns, **kwargs)[source]
validphys.checks.check_pdf_is_montecarlo_or_hessian(pdf, **kwargs)[source]
validphys.checks.check_pdf_normalize_to(pdfs, normalize_to)[source]

Transforn normalize_to into an index.

validphys.checks.check_pdfs_noband(pdfs, pdfs_noband)[source]

Allows pdfs_noband to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).

validphys.checks.check_scale(scalename, allow_none=False)[source]

Check that we have a valid matplotlib scale. With allow_none=True, also None is valid.

validphys.checks.check_speclabels_different(dataspecs_speclabel)[source]

This is needed for grouping dataframes (and because generally indecated a bug)

validphys.checks.check_two_dataspecs(dataspecs)[source]
validphys.checks.check_use_t0(ns, **kwargs)[source]

Checks use_t0 is set to true

validphys.checks.check_xlimits(xmax, xmin)[source]

validphys.commondata module

commondata.py

Module containing actions which return loaded commondata, leverages utils found in validphys.commondataparser, and returns objects from validphys.coredata

validphys.commondata.loaded_commondata_with_cuts(commondata, cuts)[source]

Load the commondata and apply cuts.

Parameters:
  • commondata (validphys.core.CommonDataSpec) – commondata to load and cut.

  • cuts (validphys.core.cuts, None) – valid cuts, used to cut loaded commondata.

Returns:

loaded_cut_commondata

Return type:

validphys.coredata.CommonData

validphys.commondataparser module

This module implements parsers for commondata and its associated metadata and uncertainties files into useful structures that can be fed to the main validphys.coredata.CommonData class.

A CommonData file is completely defined by a dataset name (which defines the folder in which the information is) and observable name (which defines the specific data, fktables and plotting settings to read).

<experiment>_<process>_<energy>{_<extras>}_<observable>

Where the folder name is <experiment>_<process>_<energy>{_<extras>}

The definition of all information for a given dataset (and all its observable) is in the metadata.yaml file and its implemented_observables.

This module defines a number of parsers using the validobj library.

The full metadata.yaml is read as a SetMetaData object which contains a list of ObservableMetaData. These ObservableMetaData are the “datasets” of NNPDF for all intents and purposes. The parent SetMetaData collects some shared variables such as the version of the dataset, arxiv, inspire or hepdata ids, the folder in which the data is, etc.

The main class in this module is thus ObservableMetaData which holds _all_ information about the particular dataset-observable that we are interested in (and a reference to its parent).

Inside the ObservableMetaData we can find:
  • TheoryMeta: contains the necessary information to read the (new style) fktables

  • KinematicsMeta: containins metadata about the kinematics

  • PlottingOptions: plotting style and information for validphys

  • Variant: variant to be used

The CommonMetaData defines how the CommonData file is to be loaded, by modifying the CommonMetaData using one of the loaded Variants one can change the resulting validphys.coredata.CommonData object.

class validphys.commondataparser.CommonDataMetadata(name: str, nsys: int, ndata: int, process_type: str)[source]

Bases: object

Contains metadata information about the data being read

name: str
ndata: int
nsys: int
process_type: str
class validphys.commondataparser.ObservableMetaData(observable_name: str, observable: dict, ndata: int, plotting: validphys.plotoptions.plottingoptions.PlottingOptions, process_type: Annotated[Union[validphys.process_options._Process, str], InputType(Any), Validator(<function ValidProcess at 0x7f81df87fba0>)], kinematic_coverage: list[str], kinematics: validphys.commondataparser.ValidKinematics, data_uncertainties: list[typing.Annotated[pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]], data_central: Optional[Annotated[pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]] = None, theory: Optional[validphys.commondataparser.TheoryMeta] = None, tables: Optional[list] = <factory>, npoints: Optional[list] = <factory>, variants: Optional[dict[str, validphys.commondataparser.Variant]] = <factory>, applied_variant: Optional[str] = None, ported_from: Optional[str] = None, _parent: Optional[Any] = None)[source]

Bases: object

applied_variant: str | None = None
apply_variant(variant_name)[source]

Return a new instance of this class with the variant applied

This class also defines how the variant is applied to the commondata. If more than a variant is being used, this function will be called recursively until all variants are applied.

check()[source]

Various checks to apply manually to the observable before it is used anywhere These are not part of the __post_init__ call since they can only happen after the metadata has been read, the observable selected and (likely) variants applied.

property cm_energy
data_central: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)] | None = None
data_uncertainties: list[~typing.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]]
digest_plotting_variable(variable)[source]

Digest plotting variables in the line_by or figure_by fields and return the appropiate kX or other label such that the plotting functions of validphys can understand it.

These might be variables included as part of the kinematics or extra labels defined in the plotting dictionary.

property experiment
property is_integrability
property is_nnpdf_special

Is this an NNPDF special dataset used for e.g., Lagrange multipliers or QED fits

property is_ported_dataset

Return True if this is an automatically ported dataset that has not been updated

property is_positivity
kinematic_coverage: list[str]
kinematics: ValidKinematics
property kinlabels

Return the kinematic labels in the same order as they are set in kinematic_coverage (which in turns follow the key kinematic_coverage) If this is a ported dataset, rely on the process type using the legacy labels

load_data_central()[source]

Loads the data for this commondata returns a dataframe

Returns:

a dataframe containing the data

Return type:

pd.DataFrame

load_kinematics(fill_to_three=True, drop_minmax=True)[source]

Returns a dataframe with the kinematic information

Parameters:
  • fill_to_three (bool) – ensure that there are always three columns (repeat the last one) in the kinematics

  • drop_minmax (bool) – Drop the min and max value, necessary for legacy comparisons

Returns:

a dataframe containing the kinematics

Return type:

pd.DataFrame

load_uncertainties()[source]

Returns a dataframe with all appropiate uncertainties

Returns:

a dataframe containing the uncertainties

Return type:

pd.DataFrame

property name
ndata: int
property nnpdf_metadata
npoints: list | None
observable: dict
observable_name: str
property path_data_central
property path_kinematics
property paths_uncertainties
plotting: PlottingOptions
property plotting_options
ported_from: str | None = None
property process
process_type: Any), Validator(<function ValidProcess at 0x7f81df87fba0>)]
property setname
tables: list | None
theory: TheoryMeta | None = None
variants: dict[str, Variant] | None
class validphys.commondataparser.SetMetaData(setname: str, version: int, version_comment: str, nnpdf_metadata: dict, implemented_observables: list[ObservableMetaData], arXiv: ValidReference | None = None, iNSPIRE: ValidReference | None = None, hepdata: ValidReference | None = None)[source]

Bases: object

Metadata of the whole set

property allowed_datasets

Return the implemented datasets as a list <setname>_<observable>

property allowed_observables

observable} dictionary

Type:

Returns the implemented observables as a {observable_name.upper()

arXiv: ValidReference | None = None
property cm_energy

Return the center of mass energy as GeV if it can be understood from the name otherwise return None

property folder
hepdata: ValidReference | None = None
iNSPIRE: ValidReference | None = None
implemented_observables: list[ObservableMetaData]
nnpdf_metadata: dict
select_observable(obs_name_raw)[source]

Check whether the observable is implemented and return said observable

setname: str
version: int
version_comment: str
class validphys.commondataparser.TheoryMeta(FK_tables: list[tuple], operation: ~typing.Annotated[str, InputType(typing.Optional[str]), Validator(<function ValidOperation at 0x7f81df8d4d60>)] = 'NULL', conversion_factor: float = 1.0, shifts: dict | None = None, normalization: dict | None = None, comment: str | None = None)[source]

Bases: object

Contains the necessary information to load the associated fktables

The theory metadata must always contain a key FK_tables which defines the fktables to be loaded. The FK_tables is organized as a double list such that:

The inner list is concatenated In practice these are different fktables that might refer to the same observable but that are divided in subgrids for practical reasons. The outer list instead are the operands for whatever operation needs to be computed in order to match the experimental data.

In addition there are other flags that can affect how the fktables are read or used: - operation: defines the operation to apply to the outer list - shifts: mapping with the single fktables and their respective shifts

useful to create “gaps” so that the fktables and the respective experimental data are ordered in the same way (for instance, when some points are missing from a grid)

This class is inmutable, what is read from the commondata metadata should be considered final

Example

>>> from validphys.commondataparser import TheoryMeta
... from validobj import parse_input
... from ruamel.yaml import YAML
... theory_raw = '''
... FK_tables:
...   - - fk1
...   - - fk2
...     - fk3
... operation: ratio
... '''
... theory = YAML(typ='safe').load(theory_raw)
... parse_input(theory, TheoryMeta)
TheoryMeta(FK_tables=[['fk1'], ['fk2', 'fk3']], operation='RATIO', shifts = None, conversion_factor=1.0, comment=None, normalization=None))
FK_tables: list[tuple]
comment: str | None = None
conversion_factor: float = 1.0
fktables_to_paths(grids_folder)[source]

Given a source for pineappl grids, constructs the lists of fktables to be loaded

normalization: dict | None = None
operation: Optional[str]), Validator(<function ValidOperation at 0x7f81df8d4d60>)] = 'NULL'
classmethod parser(yaml_file)[source]

The yaml databases in the server use “operands” as key instead of “FK_tables”

shifts: dict | None = None
class validphys.commondataparser.ValidKinematics(file: ~typing.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)], variables: dict[str, ~validphys.commondataparser.ValidVariable])[source]

Bases: object

Contains the metadata necessary to load the kinematics of the dataset. The variables should be a dictionary with the key naming the variable and the content complying with the ValidVariable spec.

Only the kinematics defined by the key kinematic_coverage will be loaded, which must be three.

Three shall be the number of the counting and the number of the counting shall be three. Four shalt thou not count, neither shalt thou count two, excepting that thou then proceedeth to three. Once the number three, being the number of the counting, be reached, then the kinematics be loaded in the direction of thine validobject.

apply_label(var, value)[source]

For a given value for a given variable, return the labels as label = value (unit) If the variable is not included in the list of variables, returns None as the variable could’ve been transformed by a kinematic transformation

file: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]
get_label(var)[source]

For the given variable, return the label as label (unit) If the label is an “extra” return the last one

variables: dict[str, ValidVariable]
class validphys.commondataparser.ValidReference(url: str, version: int | None = None, journal: str | None = None, tables: list[int] = <factory>)[source]

Bases: object

Holds literature information for the dataset

journal: str | None = None
tables: list[int]
url: str
version: int | None = None
class validphys.commondataparser.ValidVariable(label: str, description: str = '', units: str = '')[source]

Bases: object

Defines the variables

apply_label(value)[source]

Return a string formatted as label = value (units)

description: str = ''
full_label()[source]
label: str
units: str = ''
class validphys.commondataparser.Variant(data_uncertainties: list[~typing.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]] | None = None, theory: ~validphys.commondataparser.TheoryMeta | None = None, data_central: ~typing.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)] | None = None, experiment: str | None = None)[source]

Bases: object

The new commondata format allow the usage of variants A variant can overwrite a number of keys, as defined by this dataclass:

data_uncertainties theory data_central

This class may overwrite some other keys for the benefit of reproducibility of old NNPDF fits, but the usage of these features is undocumented and discouraged.

data_central: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)] | None = None
data_uncertainties: list[~typing.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f81df8d4540>)]] | None = None
experiment: str | None = None
theory: TheoryMeta | None = None
validphys.commondataparser.get_kinlabel_key(process_label)[source]

Since there is no 1:1 correspondence between latex keys and the old libNNPDF names we match the longest key such that the proc label starts with it.

validphys.commondataparser.get_plot_kinlabels(commondata)[source]

Return the LaTex kinematic labels for a given Commondata

validphys.commondataparser.load_commondata(spec)[source]

Load the data corresponding to a CommonDataSpec object. Returns an instance of CommonData

validphys.commondataparser.load_commondata_new(metadata)[source]

TODO: update this docstring since now the load_commondata_new takes the information from the metadata, and the name -> split is done outside

In the current iteration of the commondata, each of the commondata (i.e., an observable from a data publication) correspond to one single observable inside a folder which is named as “<experiment>_<process>_<energy>_<extra>” The observable is defined by a last suffix of the form “_<obs>” so that the full name of the dataset is always:

“<experiment>_<process>_<energy>{_<extra>}_<obs>”

where <extra> is optional.

This function right now works under the assumotion that the folder/observable is separated in the last _ so that:

folder_name = <experiment>_<process>_<energy>{_<extra>}

but note that this convention is still not fully defined.

This function returns a commondata object constructed by parsing the metadata.

Once a variant is selected, it can no longer be changed

Note that this function reproduces parse_commondata below, which parses the _old_ file format

validphys.commondataparser.load_commondata_old(commondatafile, systypefile, setname)[source]

Parse a commondata file and a systype file into a CommonData.

Parameters:
  • commondatafile (file or path to file)

  • systypefile (file or path to file)

Returns:

commondata – An object containing the data and information from the commondata and systype files.

Return type:

CommonData

validphys.commondataparser.parse_new_metadata(metadata_file, observable_name, variant=None)[source]

Given a metadata file in the new format and the specific observable to be read load and parse the metadata and select the observable. If any variants are selected, apply them.

The triplet (metadata_file, observable_name, variant) define unequivocally the information to be parsed from the commondata library

validphys.commondataparser.parse_set_metadata(metadata_file)[source]

Read the metadata file

validphys.commondataparser.parse_systypes(systypefile)[source]

Parses a systype file and returns a pandas dataframe.

validphys.commondataparser.peek_commondata_metadata(commondatafilename)[source]

Read some of the properties of the commondata object as a CommonData Metadata

validphys.config module

class validphys.config.Config(input_params, environment=None)[source]

Bases: Config, CoreConfig

The effective configuration parser class.

class validphys.config.CoreConfig(input_params, environment=None)[source]

Bases: Config

load_default_data_grouping(spec)[source]

Load the default grouping of data

load_default_default_filter_rules(spec)[source]
load_default_default_filter_settings(spec)[source]
property loader
parse_added_filter_rules(rules: (<class 'list'>, <class 'NoneType'>) = None)[source]

Returns a tuple of AddedFilterRule objects. Rules are immutable after parsing. AddedFilterRule objects inherit from FilterRule objects.

parse_additional_errors(bool)[source]

PDF set used to generate the photon additional errors: they are constructed using the replicas 101-107 of the PDF set LUXqed17_plus_PDF4LHC15_nnlo_100 (that are obtained varying some parameters of the LuxQED approach) in the way described in sec. 2.5 of https://arxiv.org/pdf/1712.07053.pdf

parse_cut_similarity_threshold(th: Real)[source]

Maximum relative ratio when using fromsimilarpredictons cuts.

parse_data_grouping(key)[source]

a key which indicates which default grouping to use. Mainly for internal use. It allows the default grouping of experiment to be applied to runcards which don’t specify metadata_group without there being a namespace conflict in the lockfile

parse_dataset_input(dataset: Mapping)[source]

The mapping that corresponds to the dataset specifications in the fit files

This mapping is such that
dataset: str

name of the dataset to load

variant: str

variant of the dataset to load

cfac: list

list of cfactors to apply

frac: float

fraction of the data to consider for training purposes

weight: float

extra weight to give to the dataset

custom_group: str

custom group to apply to the dataset

Note that the sys key is deprecated and allowed only for old-format dataset.

Old-format commondata will be translated to the new version in this function.

parse_dataset_inputs(param: list)

A list of dataset_input objects.

parse_default_filter_rules(spec: (<class 'str'>, <class 'NoneType'>))[source]
parse_default_filter_rules_recorded_spec_(spec)[source]

This function is a hacky fix for parsing the recorded spec of filter rules. The reason we need this function is that without it reportengine detects a conflict in the dataset key.

parse_default_filter_settings(spec: (<class 'str'>, <class 'NoneType'>))[source]
parse_experiment(experiment: dict)[source]

A set of datasets where correlated systematics are taken into account. It is a mapping where the keys are the experiment name ‘experiment’ and a list of datasets.

parse_experiment_input(ei: dict)[source]

The mapping that corresponds to the experiment specification in the fit config files. Currently, this needs to be combined with experiment_from_input to yield an useful result.

parse_experiment_inputs(param: list)

A list of experiment_input objects.

parse_experiments(param: list)

A list of experiment objects.

parse_fakepdf(name)[source]

PDF set used to generate the fake data in a closure test.

parse_filter_defaults(filter_defaults: (<class 'dict'>, <class 'NoneType'>))[source]

A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.

Parameters:

filter_defaults (dict, None) – A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.

Returns:

A hashable object containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are q2min, w2min, and maxTau.

Return type:

FilterDefaults

parse_filter_rules(filter_rules: (<class 'list'>, <class 'NoneType'>))[source]

A tuple of FilterRule objects. Rules are immutable after parsing. See https://docs.nnpdf.science/vp/filters.html for details on the syntax

parse_fit(item)[source]

A fit in the results folder, containing at least a valid filter result. Either just an id (str), or a mapping with ‘id’ and ‘label’.

parse_fitdeclaration(label: str)[source]

Used to guess some informtion from the fit name, without having to download it. This is meant to be used with other providers like e.g.:

{@with fits_as_from_fitdeclarations::fits_name_from_fitdeclarations@} {@ …do stuff… @} {@endwith@}

parse_fitdeclarations(param: list)

A list of fitdeclaration objects.

parse_fits(param: list)

A list of fit objects.

parse_groupby(grouping: str)[source]

parses the groupby key and checks it is an allowed grouping

parse_hyperscan(hyperscan)[source]

A hyperscan in the hyperscan_results folder, containing at least one tries.json file

parse_hyperscan_config(hyperscan_config, hyperopt=None)[source]

Configuration of the hyperscan

parse_hyperscans(param: list)

A list of hyperscan objects.

parse_integdataset(integset: dict, *, theoryid, rules)[source]

An observable corresponding to a PDF in the evolution basis, used as integrability constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_integdatasets(param: list, *, theoryid, rules)

A list of integdataset objects.

parse_lumi_channel(ch: str)[source]
parse_lumi_channels(param: list)

A list of lumi_channel objects.

parse_luxset(name)[source]

PDF set used to generate the photon with fiatlux.

parse_metadata_group(group: str)[source]

User specified key to group data by. The key must exist in the PLOTTING file for example experiment

parse_norm_threshold(val: (<class 'numbers.Number'>, <class 'NoneType'>))[source]

The threshold to use for covariance matrix normalisation, sets the maximum l2 norm of the inverse covariance matrix, by clipping smallest eigenvalues

If norm_threshold is set to None, then no covmat regularization is performed

parse_pdf(item, unpolarized_bc=None)[source]

A PDF set installed in LHAPDF. If an unpolarized boundary condition it defined, it will be registered as part of the PDF.

Either just an id (str), or a mapping with ‘id’ and ‘label’.

parse_pdfs(param: list, unpolarized_bc=None)

A list of pdf objects.

parse_point_prescriptions(point_prescriptions)[source]
parse_posdataset(posset: dict, *, theoryid, rules)[source]

An observable used as positivity constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.

parse_posdatasets(param: list, *, theoryid, rules)

A list of posdataset objects.

parse_reweighting_experiments(experiments, *, theoryid, use_cuts, fit=None)[source]

A list of experiments to be used for reweighting.

parse_speclabel(label: (<class 'str'>, <class 'NoneType'>))[source]

A label for a dataspec. To be used in some plots

parse_t0pdfset(name, unpolarized_bc=None)[source]

PDF set used to generate the t0 covmat.

parse_t0theoryid(theoryID: (<class 'str'>, <class 'int'>))[source]

A number corresponding to the database theory ID where the corresponding theory folder is installed in te data directory.

The t0theoryid is specifically used for SM parameter determinatins (e.g. alphas) using the correlated replicas method of arXiv: 1802.03398. To do an alphas determination we perform multiple fits, each with a different value of alphas in the DGLAP kernel and hard scattering cross section. Then we compute the chi2 for each fit to determine which alphas best describes the data, however, to make a fair comparison we need to ensure that the chi2 (and thus the t0 covariance matrix) has to be exactly the same for each fit. This requires not only to fix the t0pdfset between the different fits, but also to fix the t0theoryid.

parse_theoryid(item)[source]

A number corresponding to the database theory ID where the corresponding theory folder is installed in the data directory. Either just an id (str or int), or a mapping with ‘id’ and ‘label’.

parse_theoryids(param: list)

A list of theoryid objects.

parse_unpolarized_bc(item)[source]

Unpolarised PDF used as a Boundary Condition to impose positivity of pPDFs. Either just an id , or a mapping with ‘id’ and ‘label’.

parse_unpolarized_bcs(param: list)

A list of unpolarized_bc objects.

parse_use_cuts(use_cuts: (<class 'bool'>, <class 'str'>))[source]

Whether to filter the points based on the cuts applied in the fit, or the whole data in the dataset. The possible options are:

  • internal: Calculate the cuts based on the existing rules. This is the default.

  • fromfit: Read the cuts stored in the fit.

  • nocuts: Use the whole dataset.

parse_use_fitcommondata(do_use: bool)[source]

Use the commondata files in the fit instead of those in the data directory.

parse_use_t0(do_use_t0: bool)[source]

Whether to use the t0 PDF set to generate covariance matrices.

produce_all_commondata()[source]

produces all commondata using the loader function

produce_all_lumi_channels()[source]
produce_basisfromfit(fit)[source]

Set the basis from fit config. In the fit config file the basis is set using the key fitbasis, but it is exposed to validphys as basis.

The name of this production rule is intentionally set to not conflict with the existing fitbasis runcard key.

produce_combined_shift_and_theory_dataspecs(dataspecs)[source]
produce_commondata(*, dataset_input, use_fitcommondata=False, fit=None)[source]

Produce a CommondataSpec from a dataset input

produce_covariance_matrix(use_pdferr: bool = False)[source]

Modifies which action is used as covariance_matrix depending on the flag use_pdferr

produce_covmat_t0_considered(use_t0: bool = False)[source]

Modifies which action is used as covariance_matrix depending on the flag use_t0

produce_cuts(*, commondata, use_cuts)[source]

Obtain cuts for a given dataset input, based on the appropriate policy.

produce_data(data_input, *, group_name='data')[source]

A set of datasets where correlated systematics are taken into account

produce_data_input()[source]

Produce the data_input which is a flat list of dataset_input s. This production rule handles the backwards compatibility with old datasets which specify experiments in the runcard.

produce_dataset(*, dataset_input, theoryid, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]

Dataset specification from the theory and CommonData. Use the cuts from the fit, if provided. If check_plotting is set to True, attempt to lod and check the PLOTTING files (note this may cause a noticeable slowdown in general).

produce_dataset_inputs_covariance_matrix(use_pdferr: bool = False)[source]

Modifies which action is used as experiment_covariance_matrix depending on the flag use_pdferr

produce_dataset_inputs_covmat_t0_considered(use_t0: bool = False)[source]

Modifies which action is used as experiment_covariance_matrix depending on the flag use_t0

produce_dataset_inputs_fitting_covmat(use_thcovmat_in_fitting=False)[source]

Produces the correct covmat to be used in fitting_data_dict according to some options: whether to include the theory covmat, whether to separate the multiplcative errors and whether to compute the experimental covmat using the t0 prescription.

produce_dataset_inputs_sampling_covmat(sep_mult=False, use_thcovmat_in_sampling=False)[source]

Produces the correct covmat to be used in make_replica according to some options: whether to include the theory covmat and whether to separate the multiplcative errors.

produce_dataspecs_with_matched_cuts(dataspecs)[source]

Take a list of namespaces (dataspecs), resolve dataset within each of them, and return another list of dataspecs where the datasets all have the same cuts, corresponding to the intersection of the selected points. All the datasets must have the same name (i.e. correspond with the same experimental measurement), but can otherwise differ, for example in the theory used for the experimental predictions.

This rule can be combined with matched_datasets_from_dataspecs.

produce_defaults(q2min=None, w2min=None, maxTau=None, default_filter_settings=None, filter_defaults=None, default_filter_settings_recorded_spec_=None)[source]

Produce default values for filters taking into account the values of q2min, w2min and maxTau defined at namespace level and those inside a filter_defaults mapping.

Within this function the hashable type FilterDefaults is turned into a dictionary so as to allow for overwriting of the values of q2min, w2min and maxTau. The dictionary is then turned back into a FilterDefaults object.

produce_experiment_from_input(experiment_input, theoryid, use_cuts, fit=None)[source]

Return a mapping containing a single experiment from an experiment input. NOTE: This might be deprecated in the future.

produce_filter_data(fakedata: bool = False, theorycovmatconfig=None)[source]

Set the action used to filter the data to filter either real or closure data. If the closure data filter is being used and if the theory covariance matrix is not being closure tested then filter data by experiment for efficiency

produce_fit_id(fit) str[source]

Return a string containing the ID of the fit

produce_fitcontext(fitinputcontext, fitpdf)[source]

Set PDF, theory ID and data input from the fit config

produce_fitcontextwithcuts(fit, fitinputcontext)[source]

Like fitinputcontext but setting the cuts policy.

produce_fitenvironment(fit, fitinputcontext)[source]

Like fitcontext, but additionally forcing various other parameters, such as the cuts policy and Monte Carlo seeding to be the same as the fit.

Notes

produce_fitinputcontext(fit)[source]

Like fitcontext but without setting the PDF

produce_fitpdf(fit)[source]

Like fitcontext only setting the PDF

produce_fitpdfandbasis(fitpdf, basisfromfit)[source]

Set the PDF and basis from the fit config.

produce_fitq0fromfit(fitinputcontext)[source]

Given a fit, return the fitting scale according to the theory

produce_fitreplicas(fit)[source]

Production rule mapping the replica key to each Monte Carlo fit replica.

produce_fitthcovmat(use_thcovmat_if_present: bool = False, fit: (<class 'str'>, <class 'NoneType'>) = None)[source]

If a fit is specified and use_thcovmat_if_present is True then returns the corresponding covariance matrix for the given fit if it exists. If the fit doesn’t have a theory covariance matrix then returns False.

produce_fitunderlyinglaw(fit)[source]

Reads closuretest: fakepdf from fit config file and passes as pdf

produce_group_dataset_inputs_by_experiment(data_input)[source]
produce_group_dataset_inputs_by_metadata(data_input, processed_metadata_group)[source]

Take the data and the processed_metadata_group key and attempt to group the data, returns a list where each element specifies the data_input for a single group and the group_name

produce_group_dataset_inputs_by_process(data_input)[source]
produce_integdatasets(integrability)[source]
produce_loaded_theory_covmat(output_path, data_input, user_covmat_path=None, point_prescriptions=None, use_thcovmat_in_sampling=False, use_thcovmat_in_fitting=False)[source]

Loads the theory covmat from the correct file according to how it was generated by vp-setupfit.

produce_loaded_user_covmat_path(user_covmat_path: str = '')[source]

Path to the user covmat provided by user_covmat_path in the runcard. If no path is provided, returns None. For use in theorycovariance.construction.user_covmat.

produce_matched_datasets_from_dataspecs(dataspecs)[source]

Take an arbitrary list of mappings called dataspecs and return a new list of mappings called dataspecs constructed as follows.

From each of the original dataspecs, resolve the key process, and all the experiments and datasets therein.

Compute the intersection of the dataset names, and for each element in the intersection construct a mapping with the follwing keys:

  • process : A string with the common process name.

  • experiment_name : A string with the common experiment name.

  • dataset_name : A string with the common dataset name.

  • dataspecs : A list of mappinngs matching the original “dataspecs”. Each mapping contains:

    • dataset: A dataset with the name data_set name and the

    properties (cuts, theory, etc) corresponding to the original dataspec. * dataset_input: The input line used to build dataset. * All the other keys in the original dataspec.

produce_matched_positivity_from_dataspecs(dataspecs)[source]

Like produce_matched_datasets_from_dataspecs but for positivity datasets.

produce_multiclosure_underlyinglaw(fits)[source]

Produce the underlying law for a set of fits. This allows a single t0 like covariance matrix to be loaded for all fits, for use with statistical estimators on multiple closure fits. If the fits don’t all have the same underlying law then an error is raised, offending fit is identified.

produce_nnfit_theory_covmat(point_prescriptions: list = None, user_covmat_path: str = None)[source]

Return the theory covariance matrix used in the fit.

This function is only used in vp-setupfit to store the necessary covmats as .csv files in the tables directory.

produce_no_covmat_reg()[source]

explicitly set norm_threshold to None so that no covariance matrix regularization is performed

produce_pdf_id(pdf) str[source]

Return a string containing the PDF’s LHAPDF ID

produce_pdfreplicas(fitpdf)[source]

Production rule mapping the replica key to each postfit replica.

produce_posdatasets(positivity)[source]
produce_processed_data_grouping(use_thcovmat_in_fitting=False, use_thcovmat_in_sampling=False, data_grouping=None, data_grouping_recorded_spec_=None)[source]

Process the data_grouping key from the runcard, or lockfile. If data_grouping_recorded_spec_ is present then its value is taken, and the runcard is assumed to be a lockfile.

If data_grouping is None, then, if either use_thcovmat_in_fitting or use_thcovmat_in_sampling (or both) are true (which means that the fit is a thcovmat fit), group all the datasets together, otherwise fall back to the default behaviour of grouping by experiment (called standard_report).

Else, the user can specfiy their own grouping, for example metadata_process.

produce_processed_metadata_group(processed_data_grouping, metadata_group=None)[source]

Expose the final data grouping result. Either metadata_group is specified by user, in which case uses processed_data_grouping which is experiment by default.

produce_replicas(nreplica: int)[source]

Produce a replicas array

produce_reweight_all_datasets(experiments)[source]
produce_rules(theoryid, use_cuts, defaults, default_filter_rules=None, filter_rules=None, default_filter_rules_recorded_spec_=None, added_filter_rules: (<class 'tuple'>, <class 'NoneType'>) = None)[source]

Produce filter rules based on the user defined input and defaults.

produce_sep_mult(separate_multiplicative=False)[source]
produce_t0dataset(*, dataset_input, t0id, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]

Same as produce_dataset, but if a t0theoryid has been defined in the runcard then those corresponding fktables will be linked.

produce_t0id(theoryid, t0theoryid=None)[source]

Return the t0id if t0theoryid is set and return theoryid otherwise.

produce_t0set(t0pdfset=None, use_t0=False)[source]

Return the t0set if use_t0 is True and None otherwise. Raises an error if t0 is requested but no t0set is given.

produce_theory_database()[source]

Produces path to the folder of the theory runcards

produce_theoryids(t0id, point_prescription)[source]

Produces a list of theoryids given a theoryid at central scales and a point prescription. The options for the latter are defined in pointprescriptions.yaml. This hard codes the theories needed for each prescription to avoid user error.

produce_total_chi2_data(fitthcovmat)[source]

If there is no theory covmat for the fit, then calculate the total chi2 by summing the chi2 from each experiment.

produce_total_phi_data(fitthcovmat)[source]

If there is no theory covmat for the fit, then calculate the total phi using contributions from each experiment.

class validphys.config.Environment(*, this_folder=None, net=True, upload=False, dry=False, **kwargs)[source]

Bases: Environment

Container for information to be filled at run time

validphys.convolution module

This module implements tools for computing convolutions between PDFs and theory grids, which yield observables.

The high level predictions() function can be used to extact theory predictions for experimentally measured quantities:

import numpy as np
from validphys.api import API
from validphys.convolution import predictions


inp = {
    'fit': '181023-001-sc',
    'use_cuts': 'internal',
    'theoryid': 162,
    'pdf': 'NNPDF40_nnlo_lowprecision',
    'dataset_inputs': {'from_': 'fit'}
}


all_datasets = API.data(**inp).datasets

pdf = API.pdf(**inp)


all_preds = [predictions(ds, pdf) for ds in all_datasets]

Some variants such as central_predictions() and linear_predictions() are useful for more specialized tasks.

These functions work with validphys.core.DatasetSpec objects, allowing to account for information on COMPOUND predictions and cuts. A lower level interface which operates with validphys.coredata.FKTableData objects is also available.

exception validphys.convolution.PredictionsRequireCutsError[source]

Bases: Exception

validphys.convolution.central_dis_predictions(loaded_fk, pdf)[source]

Implementation of central_fk_predictions() for DIS observables.

validphys.convolution.central_fk_predictions(loaded_fk, pdf)[source]

Same as fk_predictions(), but computing predictions for the central PDF member only.

validphys.convolution.central_hadron_predictions(loaded_fk, pdf)[source]

Implementation of central_fk_predictions() for hadronic observables.

validphys.convolution.central_predictions(dataset, pdf)[source]

Same as predictions() but computing the predictions for the central member of the PDF set only. For Monte Carlo PDFs, this is a faster alternative to computing the central predictions as the average of the replica predictions (although a small approximation is involved in the case of hadronic predictions).

validphys.convolution.dis_predictions(loaded_fk, pdf)[source]

Implementation of fk_predictions() for DIS observables.

validphys.convolution.fk_predictions(loaded_fk, pdf)[source]

Low level function to compute predictions from a FKTable.

Parameters:
Returns:

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points (use validphys.coredata.FKTableData.with_cuts() to filter out points). The columns correspond to the selected PDF members in the LHAPDF set.

Return type:

pandas.DataFrame

Notes

This function operates on a single FKTable, while the prediction for an experimental quantity generally involves several. Use predictions() to compute those.

Examples

>>> from validphys.loader import Loader
>>> from validphys.convolution import hadron_predictions
>>> from validphys.fkparser import load_fktable
>>> l = Loader()
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
>>> table = load_fktable(ds.fkspecs[0])
>>> hadron_predictions(table, pdf)
             1           2           3           4    ...         97          98          99          100
data                                                  ...
0     176.688118  170.172930  172.460771  173.792321  ...  179.504636  172.343792  168.372508  169.927820
1     252.682923  244.507916  247.840249  249.541798  ...  256.410844  247.805180  242.246438  244.415529
2     828.076008  813.452551  824.581569  828.213508  ...  838.707211  826.056388  810.310109  816.824167
validphys.convolution.hadron_predictions(loaded_fk, pdf)[source]

Implementation of fk_predictions() for hadronic observables.

validphys.convolution.linear_fk_predictions(loaded_fk, pdf)[source]

Same as predictions() for DIS, but compute linearized predictions for hadronic data, using linear_hadron_predictions().

validphys.convolution.linear_hadron_predictions(loaded_fk, pdf)[source]

Implementation of linear_fk_predictions() for hadronic observables. Specifically this computes:

central_value ⊗ FK ⊗ (2 * replica_values - central_value)

which is the linear expansion of the hadronic observable in the difference between each replica and the central value, replica_values - central_value

validphys.convolution.linear_predictions(dataset, pdf)[source]

Same as predictions() but computing linearized predictions. These are the same as predictions for DIS, but truncates to the terms that are linear in the difference between each member and the central value for hadronic predictions.

This approximation is generally a very good approximation in that yields differences that are much smaller that the PDF uncertainty.

validphys.convolution.predictions(dataset, pdf)[source]

“Compute theory predictions for a given PDF and dataset. Information regading the dataset, on cuts, CFactors and combinations of FKTables is taken into account to construct the predictions.

The result should be comparable to experimental predictions implemented in CommonData.

Parameters:
  • dataset (validphys.core.DatasetSpec) – The dataset containing information on the partonic cross section.

  • pdf (validphys.core.PDF) – The PDF set to use for the convolutions.

Returns:

df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points, based on the dataset cuts. The columns correspond to the selected PDF members in the LHAPDF set.

Return type:

pandas.DataFrame

Examples

Obtain descriptive statistics over PDF replicas for each of the three points in the ATLAS ttbar dataset:

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53)
>>> from validphys.convolution import predictions
>>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118')
>>> preds = predictions(ds, pdf)
>>> preds.T.describe()
data            0           1           2
count  100.000000  100.000000  100.000000
mean   161.271292  231.500367  767.816844
std      2.227304    2.883497    7.327617
min    156.638526  225.283254  750.850250
25%    159.652216  229.486793  762.773527
50%    161.066965  231.281248  767.619249
75%    162.620554  233.306836  772.390286
max    168.390840  240.287549  786.549380

validphys.core module

Core datastructures used in the validphys data model.

class validphys.core.CommonDataSpec(name, metadata, legacy=False, datafile=None, sysfile=None, plotfiles=None)[source]

Bases: TupleComp

Holds all the information necessary to load a commondata file and provides methods to easily access them

Parameters:
  • name (str) – name of the commondata

  • metadata (ObservableMetaData) – instance of ObservableMetaData holding all information about the dataset

  • legacy (bool) – whether this is an old or new format metadata file

The datafile, sysfile and plotfiles` arguments are deprecated and only to be used with legacy=True

property legacy_names
load()[source]

load a validphys.core.CommonDataSpec to validphys.core.CommonData

property metadata
property name
property ndata
property nsys
property plot_kinlabels
property process_type
property theory_metadata
with_modified_data(central_data_file, uncertainties_file=None)[source]

Returns a copy of this instance with a new data file in the metadata

class validphys.core.Cuts(commondata, path)[source]

Bases: TupleComp

load()[source]
class validphys.core.CutsPolicy(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

FROMFIT = 'fromfit'
FROM_CUT_INTERSECTION_NAMESPACE = 'fromintersection'
FROM_SIMILAR_PREDICTIONS_NAMESPACE = 'fromsimilarpredictions'
INTERNAL = 'internal'
NOCUTS = 'nocuts'
class validphys.core.DataGroupSpec(name, datasets, dsinputs=None)[source]

Bases: TupleComp, NSList

property as_markdown
load_commondata()[source]
load_commondata_instance()[source]

Given Experiment load list of validphys.coredata.CommonData objects with cuts already applied

property thspec
to_unweighted()[source]

Return a copy of the group with the weights for all experiments set to one. Note that the results cannot be used as a namespace.

class validphys.core.DataSetInput(*, name, cfac, frac, weight, custom_group, variant, sys=None)[source]

Bases: TupleComp

Represents whatever the user enters in the YAML to specify a dataset.

name: str

name of the dataset_inputs

cfac: tuple

cfactors to apply to the final predictions (default: ())

frac: float

fraction of the data to be used during training (default: 1.0)

weight: float

extra weight to apply to the dataset (default: 1.0)

variant: str or tuple[str]

variant or variants to apply (default: None)

sysnum: int

deprecated, systematic file to load for the dataset

class validphys.core.DataSetSpec(*, name, commondata, fkspecs, thspec, cuts, frac=1, op=None, weight=1, rules=())[source]

Bases: TupleComp

load_commondata()[source]

Strips the commondata loading from load

to_unweighted()[source]

Return a copy of the dataset with the weight set to one.

class validphys.core.ExperimentInput(*, name, datasets)[source]

Bases: TupleComp

as_dict()[source]
class validphys.core.FKTableSpec(fkpath, cfactors, metadata=None)[source]

Bases: TupleComp

Each FKTable is formed by a number of sub-fktables to be concatenated each of which having its own path. Therefore the fkpath variable is a list of paths.

Before the pineappl implementation, FKTable were already pre-concatenated. The Legacy interface therefore relies on fkpath being just a string or path instead

The metadata of the FKTable for the given dataset is stored as an attribute to this function. This is transitional, eventually it will be held by the associated CommonData in the new format.

load_cfactors()[source]

Each of the sub-fktables that form the complete FKTable can have several cfactors applied to it. This function uses parse_cfactor to make them into CFactorData

load_with_cuts(cuts)[source]

Load the fktable and apply cuts immediately. Returns a FKTableData

class validphys.core.Filter(indexes, label, **kwargs)[source]

Bases: object

as_pair()[source]
class validphys.core.FitSpec(name, path)[source]

Bases: TupleComp

as_input()[source]
label
name
path
class validphys.core.HessianStats(data, rescale_factor=1)[source]

Bases: SymmHessianStats

Compute stats in the ‘assymetric’ hessian format: The first index (0) is the central value. The odd indexes are the results for lower eigenvectors and the even are the upper eigenvectors.A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

moment(order)[source]
std_error()[source]
class validphys.core.HyperscanSpec(name, path)[source]

Bases: FitSpec

The hyperscan spec is just a special case of FitSpec

get_all_trials(base_params=None)[source]

Read all trials from all tries files. If there are original runcard-based parameters, a reference to them can be passed to the trials so that a full hyperparameter dictionary can be defined

Each hyperopt trial object will also have a reference to all trials in its own file

label
name
path
sample_trials(n=None, base_params=None, sigma=4.0)[source]

Parse all trials in the hyperscan object and then return an array of n trials read from the tries.json files and sampled according to their reward. If n is None, no sapling is performed and all trials are returned

Returns:

Dictionary on the form {parameters

Return type:

list of trials}

property tries_files

Return a dictionary with all tries.json files mapped to their replica number

class validphys.core.IntegrabilitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]

Bases: LagrangeSetSpec

class validphys.core.InternalCutsWrapper(commondata, rules)[source]

Bases: TupleComp

load()[source]
class validphys.core.LagrangeSetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]

Bases: DataSetSpec

Extends DataSetSpec to work around the particularities of the positivity, integrability and other Lagrange Multiplier datasets.

to_unweighted()[source]

Return a copy of the dataset with the weight set to one.

class validphys.core.MCStats(data)[source]

Bases: Stats

Result obtained from a Monte Carlo sample

errorbar68()[source]
moment(order)[source]
sample_values(size)[source]
std_error()[source]
class validphys.core.MatchedCuts(othercuts, ndata)[source]

Bases: TupleComp

load()[source]
class validphys.core.PDF(name, boundary=None)[source]

Bases: TupleComp

Base validphys PDF providing high level access to metadata.

Statistical estimators which depends on the PDF type (MC, Hessian…) are exposed as a Stats object through the stats_class attribute The LHAPDF metadata can directly be accessed through the info attribute

Examples

>>> from validphys.api import API
>>> from validphys.convolution import predictions
>>> args = {"dataset_input":{"dataset": "ATLASTTBARTOT"}, "theoryid":162, "use_cuts":"internal"}
>>> ds = API.dataset(**args)
>>> pdf = API.pdf(pdf="NNPDF40_nnlo_as_01180")
>>> preds = predictions(ds, pdf)
>>> preds.shape
(3, 100)
property alphas_mz

Alpha_s(M_Z) as defined in the LHAPDF .info file

property alphas_vals

List of alpha_s(Q) at various Q for interpolation based alphas. Values as defined in the LHAPDF .info file

property error_conf_level

Error confidence level as defined in the LHAPDF .info file if no number is given in the LHAPDF .info file defaults to 68%

property error_type

Error type as defined in the LHAPDF .info file

get_members()[source]

Return the number of members selected in pdf.load().grid_values

property info

Information contained in the LHAPDF .info file

property infopath
property is_polarized

Returns True if the PDF has a boundary condition associated to it. At the moment LHAPDF provides no mechanism to know whether a PDF is polarized.

property isinstalled
property label
load()[source]
load_t0()[source]

Load the PDF as a t0 set

make_only_cv()[source]
property q_min

Minimum Q as given by the LHAPDF .info file

register_boundary(unpolarized_bc=None)[source]

Register other PDFs as boundary conditions of this PDF

property stats_class

Return the stats calculator for this error type

exception validphys.core.PDFDoesNotExist[source]

Bases: Exception

class validphys.core.PDFcv(name, boundary=None)[source]

Bases: PDF

An add-on for the PDF class that makes only the central value available

load()[source]
class validphys.core.PositivitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]

Bases: LagrangeSetSpec

class validphys.core.SimilarCuts(inputs, threshold)[source]

Bases: TupleComp

load()[source]
class validphys.core.Stats(data)[source]

Bases: object

Class holding statistical information about the objects used in validphys. This object can be a PDF or any function of a PDF (such as hadronic observable).

By convention, member 0 corresponds to the central value of the PDF. Accordingly, the method central_value will return the result held for member 0. Note that this is equal to the mean of the error_members only for the PDF itself and linear functions of the PDF (such as DIS-type observable). If you want to obtain the average of the error members you can do: np.mean(stats_instance.error_members, axis=0)

central_value()[source]
error_members()[source]
errorbar68()[source]
errorbarstd()[source]
moment(order)[source]
sample_values(size)[source]
std_error()[source]
std_interval(nsigma)[source]
class validphys.core.SymmHessianStats(data, rescale_factor=1)[source]

Bases: Stats

Compute stats in the ‘symetric’ hessian format: The first index (0) is the central value. The rest of the indexes are results for each eigenvector. A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.

errorbar68()[source]
moment(order)[source]
std_error()[source]
class validphys.core.ThCovMatSpec(path)[source]

Bases: object

load()[source]
class validphys.core.TheoryIDSpec(id: int, path: pathlib.Path, dbpath: pathlib.Path)[source]

Bases: object

dbpath: Path
get_description()[source]
id: int
is_pineappl()[source]

Check whether this theory is a pineappl-based theory Assume yes unless a compound directory is found

path: Path
class validphys.core.TupleComp(*args, **kwargs)[source]

Bases: object

classmethod argnames()[source]
validphys.core.cut_mask(cuts)[source]

Return an objects that will act as the cuts when applied as a slice

validphys.coredata module

Data containers backed by Python managed memory (Numpy arrays and Pandas dataframes).

class validphys.coredata.CFactorData(description: str, central_value: array, uncertainty: array)[source]

Bases: object

Data contained in a CFactor

Parameters:
  • description (str) – Information on how the data was obtained.

  • central_value (array, shape(ndata)) – The value of the cfactor for each data point.

  • uncertainty (array, shape(ndata)) – The absolute uncertainty on the cfactor if available.

central_value: array
description: str
uncertainty: array
class validphys.coredata.CommonData(setname: str, ndata: int, commondataproc: str, nkin: int, nsys: int, commondata_table: DataFrame, systype_table: DataFrame, legacy: bool = False, legacy_names: list | None = None, kin_variables: list | None = None)[source]

Bases: object

Data contained in Commondata files, relevant cuts applied.

Parameters:
  • setname (str) – Name of the dataset

  • ndata (int) – Number of data points

  • commondataproc (str) – Process type, one of 21 options

  • nkin (int) – Number of kinematics specified

  • nsys (int) – Number of systematics

  • commondata_table (pd.DataFrame) – Pandas dataframe containing the commondata

  • systype_table (pd.DataFrame) – Pandas dataframe containing the systype index for each systematic alongside the uncertainty type (ADD/MULT/RAND) and name (CORR/UNCORR/THEORYCORR/SKIP)

  • systematics_table (pd.DataFrame) – Panda dataframe containing the table of systematics

property additive_errors

Returns the systematics which are additive (systype is ADD) as absolute uncertainties (same units as data), with SKIP uncertainties removed.

property central_values
commondata_table: DataFrame
commondataproc: str
export(folder_path)[source]

Wrapper around export_data and export_uncertainties to write both uncertainties and data after filtering to a given folder

export_data(buffer)[source]

Exports the central data defined by this commondata instance to the given buffer

export_uncertainties(buffer)[source]

Exports the uncertainties defined by this commondata instance to the given buffer

get_cv()[source]
get_kintable()[source]
kin_variables: list | None = None
property kinematics
legacy: bool = False
legacy_names: list | None = None
property multiplicative_errors

Returns the systematics which are multiplicative (systype is MULT) in a percentage format, with SKIP uncertainties removed.

ndata: int
nkin: int
nsys: int
setname: str
property stat_errors
systematic_errors(central_values=None)[source]

Returns all systematic errors as absolute uncertainties, with a single column for each uncertainty. Converts multiplicative_errors to units of data and then appends onto additive_errors. By default uses the experimental central values to perform conversion, but the user can supply a 1-D array of central values, with length self.ndata, to use instead of the experimental central values to calculate the absolute contribution of the multiplicative systematics.

Parameters:

central_values (None, np.array) – 1-D array containing alternative central values to combine with multiplicative uncertainties. This array must have length equal to self.ndata. By default central_values is None, and the central values of the commondata are used.

Returns:

systematic_errors – Dataframe containing systematic errors.

Return type:

pd.DataFrame

systematics_table: DataFrame | None
systype_table: DataFrame
with_central_value(cv)[source]
with_cuts(cuts)[source]

A method to return a CommonData object where an integer mask has been applied, keeping only data points which pass cuts.

Note if the first data point passes cuts, the first entry of cuts should be 0.

Paramters

cuts: list or validphys.core.Cuts or None

class validphys.coredata.FKTableData(hadronic: bool, Q0: float, ndata: int, xgrid: ~numpy.ndarray, sigma: ~pandas.core.frame.DataFrame, convolution_types: tuple[str] = None, metadata: dict = <factory>, protected: bool = False)[source]

Bases: object

Data contained in an FKTable

Parameters:
  • hadronic (bool) – Whether a hadronic (two PDFs) or a DIS (one PDF) convolution is needed.

  • Q0 (float) – The scale at which the PDFs should be evaluated (in GeV).

  • ndata (int) – The number of data points in the grid.

  • xgrid (array, shape (nx)) – The points in x at which the PDFs should be evaluated.

  • sigma (pd.DataFrame) –

    For hadronic data, the columns are the indexes in the NfxNf list of possible flavour combinations of two PDFs. The MultiIndex contains three keys, the data index, an index into xgrid for the first PDF and an idex into xgrid for the second PDF, indicating if the points in x where the PDF should be evaluated.

    For DIS data, the columns are indexes in the Nf list of flavours. The MultiIndex contains two keys, the data index and an index into xgrid indicating the points in x where the PDF should be evaluated.

  • convolution_types (tuple[str]) – The type of convolution that the FkTable is expecting for each of the functions to be convolved with (usually the two types of PDF from the two incoming hadrons).

  • metadata (dict) – Other information contained in the FKTable.

  • protected (bool) – When a fktable is protected cuts will not be applied. The most common use-case is when a total cross section is used as a normalization table for a differential cross section, in legacy code (<= NNPDF4.0) both fktables would be cut using the differential index.

Q0: float
convolution_types: tuple[str] = None
determine_pdfs(pdf)[source]

Determine the PDF (or PDFs) that should be used to be convoluted with this fktable. Uses the convolution_types key to decide the PDFs. If convolution_types is not defined, it returns the pdf object.

get_np_fktable()[source]

Returns the fktable as a dense numpy array that can be directly manipulated with numpy

The return shape is:

(ndata, nx, nbasis) for DIS (ndata, nx, nx, nbasis) for hadronic

where nx is the length of the xgrid and nbasis the number of flavour contributions that contribute

hadronic: bool
property luminosity_mapping

Return the flavour combinations that contribute to the fktable in the form of a single array

The return shape is:

(nbasis,) for DIS (nbasis*2,) for hadronic

metadata: dict
ndata: int
protected: bool = False
sigma: DataFrame
with_cfactor(cfactor)[source]

Returns a copy of the FKTableData object with cfactors applied to the fktable

with_cuts(cuts)[source]

Return a copy of the FKTable with the cuts applied. The data index of the sigma operator (the outermost level), contains the data point that have been kept. The ndata property is updated to reflect the new number of datapoints. If cuts is None, return the object unmodified.

Parameters:

cuts (array_like or validphys.core.Cuts or None.) – The cuts to be applied.

Returns:

res – A copy of the FKtable with the cuts applies.

Return type:

FKTableData

Notes

The original number of points can be accessed with table.metadata['GridInfo'].ndata.

Examples

>>> from validphys.fkparser import load_fktable
... from validphys.loader import Loader
... l = Loader()
... ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',))
... table = load_fktable(ds.fkspecs[0])
... newtable = table.with_cuts([0,1])
>>> assert set(newtable.sigma.index.get_level_values(0)) == {0,1}
>>> assert newtable.ndata == 2
>>> assert newtable.metadata['GridInfo'].ndata == 3
xgrid: ndarray

validphys.correlations module

Utilities for computing correlations in batch.

@author: Zahari Kassabov

validphys.correlations.obs_obs_correlations(pdf, corrpair_results)[source]

Return the theoretical correlation matrix between a pair of observables.

validphys.correlations.obs_pdf_correlations(pdf, results, xplotting_grid)[source]

Return the correlations between each point in a dataset and the PDF values on a grid of (x,f) points in a format similar to xplotting_grid.

validphys.covmats module

Module for handling logic and manipulation of covariance and correlation matrices on different levels of abstraction

validphys.covmats.covmat_from_systematics(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, norm_threshold=None, _central_values=None)[source]

Take the statistical uncertainty and systematics table from a validphys.coredata.CommonData object and construct the covariance matrix accounting for correlations between systematics.

If the systematic has the name SKIP then it is ignored in the construction of the covariance matrix.

ADDitive or MULTiplicative systypes are handled by either multiplying the additive or multiplicative uncertainties respectively. We convert uncertainties so that they are all in the same units as the data:

  • Additive (ADD) systematics are left unchanged

  • multiplicative (MULT) systematics need to be converted from a

percentage by multiplying by the central value and dividing by 100.

Finally, the systematics are split into the five possible archetypes of systematic uncertainties: uncorrelated (UNCORR), correlated (CORR), theory uncorrelated (THEORYUNCORR), theory correlated (THEORYCORR) and special correlated (SPECIALCORR) systematics.

Uncorrelated contributions from statistical error, uncorrelated and theory uncorrelated are added in quadrature to the diagonal of the covmat.

The contribution to the covariance matrix arising due to correlated systematics is schematically A_correlated @ A_correlated.T, where A_correlated is a matrix N_dat by N_sys. The total contribution from correlated systematics is found by adding together the result of mutiplying each correlated systematic matrix by its transpose (correlated, theory_correlated and special_correlated).

For more information on the generation of the covariance matrix see the paper outlining the procedure, specifically equation 2 and surrounding text.

Parameters:
  • loaded_commondata_with_cuts (validphys.coredata.CommonData) – CommonData which stores information about systematic errors, their treatment and description.

  • dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _central_values (None, np.array) – 1-D array containing alternative central values to combine with the multiplicative errors to calculate their absolute contributions. By default this is None, and the experimental central values are used. However, this can be used to calculate, for example, the t0 covariance matrix by using the predictions from the central member of the t0 pdf.

Returns:

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type:

np.array

Example

In order to use this function, simply call it from the API

>>> from validphys.api import API
>>> inp = dict(
...     dataset_input={'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10},
...     theoryid=162,
...     use_cuts="internal"
... )
>>> cov = API.covmat_from_systematics(**inp)
>>> cov.shape
(28, 28)
validphys.covmats.covmat_stability_characteristic(systematics_matrix_from_commondata)[source]

Return a number characterizing the stability of an experimental covariance matrix against uncertainties in the correlation. It is defined as the L2 norm (largest singular value) of the square root of the inverse correlation matrix. This is equivalent to the square root of the inverse of the smallest singular value of the correlation matrix:

Z = (1/λ⁰)^½

Where λ⁰ is the smallest eigenvalue of the correlation matrix.

This is the number used as threshold in calcutils.regularize_covmat(). The interpretation is roughly what precision does the worst correlation need to have in order to not affect meaningfully the χ² computed using the covariance matrix, so for example a stability characteristic of 4 means that correlations need to be known with uncetainties less than 0.25.

Examples

>>> from validphys.api import API
>>> API.covmat_stability_characteristic(dataset_input={"dataset": "NMC"},
... theoryid=162, use_cuts="internal")
2.742658604186114
validphys.covmats.dataset_inputs_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, data_input, use_weights_in_covmat=True, norm_threshold=None, _list_of_central_values=None, _only_additive=False)[source]

Given a list containing validphys.coredata.CommonData s, construct the full covariance matrix.

This is similar to covmat_from_systematics() except that special corr systematics are concatenated across all datasets before being multiplied by their transpose to give off block-diagonal contributions. The other systematics contribute to the block diagonal in the same way as covmat_from_systematics().

Parameters:
  • dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – list of CommonData objects.

  • data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _list_of_central_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.

Returns:

cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Return type:

np.array

Example

This function can be called directly from the API:

>>> dsinps = [
...     {'dataset': 'NMC'},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD']},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10}
... ]
>>> inp = dict(dataset_inputs=dsinps, theoryid=162, use_cuts="internal")
>>> cov = API.dataset_inputs_covmat_from_systematics(**inp)
>>> cov.shape
(235, 235)

Which properly accounts for all dataset settings and cuts.

validphys.covmats.dataset_inputs_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_sqrt_covmat(dataset_inputs_covariance_matrix)[source]

Like sqrt_covmat but for an group of datasets

validphys.covmats.dataset_inputs_stability_table(dataset_inputs_stability, dataset_inputs)[source]

Return a table with py:func:covmat_stability_characteristic for all dataset inputs

validphys.covmats.dataset_inputs_t0_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Like t0_covmat_from_systematics() except for all data

Parameters:
  • dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – The CommonData for all datasets defined in dataset_inputs.

  • data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • dataset_inputs_t0_predictions (list[np.array]) – The t0 predictions for all datasets.

Returns:

t0_covmat – t0 covariance matrix matrix for list of datasets.

Return type:

np.array

validphys.covmats.dataset_inputs_t0_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it.

validphys.covmats.dataset_inputs_t0_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated.

validphys.covmats.dataset_inputs_t0_total_covmat(dataset_inputs_t0_exp_covmat, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_t0_total_covmat_separate(dataset_inputs_t0_exp_covmat_separate, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat(dataset_inputs_exp_covmat, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_inputs_total_covmat_separate(dataset_inputs_exp_covmat_separate, loaded_theory_covmat)[source]

Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.

validphys.covmats.dataset_t0_predictions(t0dataset, t0set)[source]

Returns the t0 predictions for a dataset which are the predictions calculated using the central member of pdf. Note that if pdf has errortype replicas, and the dataset is a hadronic observable then the predictions of the central member are subtly different to the central value of the replica predictions.

Parameters:
Returns:

t0_predictions – 1-D numpy array with predictions for each of the cut datapoints.

Return type:

np.array

validphys.covmats.datasets_covmat_differences_table(each_dataset, datasets_covmat_no_reg, datasets_covmat_reg, norm_threshold)[source]

For each dataset calculate and tabulate two max differences upon regularization given a value for norm_threshold:

  • max relative difference to the diagonal of the covariance matrix (%)

  • max absolute difference to the correlation matrix of each covmat

validphys.covmats.dataspecs_datasets_covmat_differences_table(dataspecs_speclabel, dataspecs_covmat_diff_tables)[source]

For each dataspec calculate and tabulate the two covmat differences described in datasets_covmat_differences_table (max relative difference in variance and max absolute correlation difference)

validphys.covmats.fit_name_with_covmat_label(fit, fitthcovmat)[source]

If theory covariance matrix is being used to calculate statistical estimators for the fit then appends (exp + th) onto the fit name for use in legends and column headers to help the user see what covariance matrix was used to produce the plot or table they are looking at.

validphys.covmats.generate_exp_covmat(datasets_input, data, use_weights, norm_threshold, _list_of_c_values, only_add)[source]

Function to generate the experimental covmat eventually using the t0 prescription. It is also possible to compute it only with the additive errors.

Parameters:
  • dataset_inputs (list[validphys.coredata.CommonData]) – list of CommonData objects.

  • data (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights (bool) – Whether to weight the covmat, True by default.

  • norm_threshold (number) – threshold used to regularize covariance matrix

  • _list_of_c_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.

  • only_add (bool) – specifies whether to use only the additive errors to compute the covmat

Returns:

  • np.array

  • experimental covariance matrix

validphys.covmats.groups_corrmat(groups_covmat)[source]

Generates the grouped experimental correlation matrix with groups_covmat as input

validphys.covmats.groups_covmat(groups_covmat_no_table)[source]

Duplicate of groups_covmat_no_table but with a table decorator.

validphys.covmats.groups_covmat_no_table(groups_data, groups_index, groups_covmat_collection)[source]

Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:

  • group name

  • dataset name

  • index of the point within the dataset.

validphys.covmats.groups_invcovmat(groups_data, groups_index, groups_covmat_collection)[source]

Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.

validphys.covmats.groups_normcovmat(groups_covmat, groups_data_values)[source]

Calculates the grouped experimental covariance matrix normalised to data.

validphys.covmats.groups_sqrtcovmat(groups_data, groups_index, groups_sqrt_covmat)[source]

Like groups_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.

validphys.covmats.pdferr_plus_covmat(results_without_covmat, pdf, covmat_t0_considered)[source]

For a given dataset, returns the sum of the covariance matrix given by covmat_t0_considered and the PDF error: - If the PDF error_type is ‘replicas’, a covariance matrix is estimated from

the replica theory predictions

Parameters:
  • dataset (DataSetSpec) – object parsed from the dataset_input runcard key

  • pdf (PDF) – monte carlo pdf used to estimate PDF error

  • covmat_t0_considered (np.array) – experimental covariance matrix with the t0 considered

Returns:

covariance_matrix – sum of the experimental and pdf error as a numpy array

Return type:

np.array

Examples

use_pdferr makes this action be used for covariance_matrix

>>> from validphys.api import API
>>> import numpy as np
>>> inp = {
        'dataset_input': {
            'dataset': 'ATLAS_TTBAR_8TEV_LJ_DIF_YTTBAR-NORM',
            'variant': 'legacy',
        },
        'theoryid': 700,
        'pdf': 'NNPDF40_nlo_as_01180',
        'use_cuts': 'internal',
    }
>>> a = API.covariance_matrix(**inp, use_pdferr=True)
>>> b = API.pdferr_plus_covmat(**inp)
>>> (a == b).all()
True
validphys.covmats.pdferr_plus_dataset_inputs_covmat(dataset_inputs_results_without_covmat, data, pdf, dataset_inputs_covmat_t0_considered, fitthcovmat)[source]

Like pdferr_plus_covmat except for an experiment

validphys.covmats.reorder_thcovmat_as_expcovmat(fitthcovmat, data)[source]

Reorder the thcovmat in such a way to match the order of the experimental covmat, which means the order of the runcard

validphys.covmats.sqrt_covmat(covariance_matrix)[source]

Function that computes the square root of the covariance matrix.

Parameters:

covariance_matrix (np.array) – A positive definite covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.

Returns:

sqrt_mat – The square root of the input covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts), and which is the the lower triangular decomposition. The following should be True: np.allclose(sqrt_covmat @ sqrt_covmat.T, covariance_matrix).

Return type:

np.array

Notes

The square root is found by using the Cholesky decomposition. However, rather than finding the decomposition of the covariance matrix directly, the (upper triangular) decomposition is found of the corresponding correlation matrix and then the output of this is rescaled and then transposed as sqrt_matrix = (decomp * sqrt_diags).T, where decomp is the Cholesky decomposition of the correlation matrix and sqrt_diags is the square root of the diagonal entries of the covariance matrix. This method is useful in situations in which the covariance matrix is near-singular. See here for more discussion on this.

The lower triangular is useful for efficient calculation of the \(\chi^2\)

Example

>>> import numpy as np
>>> from validphys.api import API
>>> API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
array([[0.0326543 , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
    [0.00314523, 0.01467259, 0.        , ..., 0.        , 0.        ,
        0.        ],
    [0.0037817 , 0.00544256, 0.02874822, ..., 0.        , 0.        ,
        0.        ],
    ...,
    [0.00043404, 0.00031169, 0.00020489, ..., 0.00441073, 0.        ,
        0.        ],
    [0.00048717, 0.00033792, 0.00022971, ..., 0.00126704, 0.00435696,
        0.        ],
    [0.00067353, 0.00050372, 0.0003203 , ..., 0.00107255, 0.00065041,
        0.01002952]])
>>> sqrt_cov = API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
>>> cov = API.covariance_matrix(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal")
>>> np.allclose(np.linalg.cholesky(cov), sqrt_cov)
True
validphys.covmats.systematics_matrix_from_commondata(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, _central_values=None)[source]

Returns a systematics matrix, \(A\), for the corresponding dataset. The systematics matrix is a square root of the covmat:

\[C = A A^T\]

and is obtained by concatenating a block diagonal of the uncorrelated uncertainties with the correlated systematics.

validphys.covmats.t0_covmat_from_systematics(loaded_commondata_with_cuts, *, dataset_input, use_weights_in_covmat=True, norm_threshold=None, dataset_t0_predictions)[source]

Like covmat_from_systematics() except uses the t0 predictions to calculate the absolute constributions to the covmat from multiplicative uncertainties. For more info on the t0 predictions see validphys.commondata.dataset_t0_predictions().

Parameters:
  • loaded_commondata_with_cuts (validphys.coredata.CommonData) – commondata object for which to generate the covmat.

  • dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if use_weights_in_covmat. The default weight is 1, which means the returned covmat will be unmodified.

  • use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.

  • dataset_t0_predictions (np.array) – 1-D array with t0 predictions.

Returns:

t0_covmat – t0 covariance matrix

Return type:

np.array

validphys.covmats_utils module

covmat_utils.py

Utils functions for constructing covariance matrices from systematics. Leveraged by validphys.covmats which contains relevant actions/providers.

validphys.covmats_utils.construct_covmat(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to construct a covariance matrix (covmat), given the statistical error and a dataframe of systematics.

Errors with name UNCORR or THEORYUNCORR are added in quadrature with the statistical error to the diagonal of the covmat.

Other systematics are treated as correlated; their covmat contribution is found by multiplying them by their transpose.

Parameters:
  • stat_errors (np.array) – a 1-D array of statistical uncertainties

  • sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.covmats_utils.systematics_matrix(stat_errors: array, sys_errors: DataFrame)[source]

Basic function to create a systematics matrix , \(A\), such that:

\[C = A A^T\]

Where \(C\) is the covariance matrix. This is achieved by creating a block diagonal matrix by adding the uncorrelated systematics in quadrature then taking the square-root and concatenating the correlated systematics, schematically:

Parameters:
  • stat_errors (np.array) – a 1-D array of statistical uncertainties

  • sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.

Notes

This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of sys_errors before passing that to this function.

validphys.dataplots module

Plots of relations between data PDFs and fits.

validphys.dataplots.check_normalize_to(ns, **kwargs)[source]

Transforn normalize_to into an index.

validphys.dataplots.kde_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]

KDE plot for experiments chi2.

validphys.dataplots.plot_chi2_eigs(pdf, dataset, chi2_per_eig)[source]
validphys.dataplots.plot_chi2dist(dataset, abs_chi2_data, chi2_stats, pdf)[source]

Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]

Plot the distribution of chi²s of the members of the pdfset.

validphys.dataplots.plot_chi2dist_sv(dataset, abs_chi2_data_thcovmat, pdf)[source]

Same as plot_chi2dist considering also the theory covmat in the calculation

validphys.dataplots.plot_dataset_inputs_phi_dist(data, dataset_inputs_bootstrap_phi_data)[source]

Generates a bootstrap distribution of phi and then plots a histogram of the individual bootstrap samples for dataset_inputs. By default the number of bootstrap samples is set to a sensible number (500) however this number can be changed by specifying bootstrap_samples in the runcard

validphys.dataplots.plot_datasets_chi2(groups_data, groups_chi2)[source]

Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_chi2_spider(groups_data, groups_chi2)[source]

Plot the chi² of all datasets with bars.

validphys.dataplots.plot_datasets_pdfs_chi2(data, each_dataset_chi2_pdfs, pdfs)[source]

Plot the chi² of all datasets with bars, and for different pdfs.

validphys.dataplots.plot_datasets_pdfs_chi2_sv(data, each_dataset_chi2_pdfs_sv, pdfs)[source]

Same as plot_datasets_pdfs_chi2_sv with the chi²s computed including scale variations

validphys.dataplots.plot_dataspecs_datasets_chi2(dataspecs_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_datasets_chi2_spider(dataspecs_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2_spider but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2(dataspecs_groups_chi2_table, processed_metadata_group)[source]

Same as plot_fits_groups_data_chi2 but for arbitrary dataspecs

validphys.dataplots.plot_dataspecs_groups_chi2_spider(dataspecs_groups_chi2_table)[source]
validphys.dataplots.plot_dataspecs_positivity(dataspecs_speclabel, dataspecs_positivity_predictions, dataspecs_posdataset, pos_use_kin=False)[source]

Like plot_positivity() except plots positivity for each element of dataspecs, allowing positivity predictions to be generated with different theory_id s as well as pdf s

validphys.dataplots.plot_fancy(one_or_more_results, commondata, cuts, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

Read the PLOTTING configuration for the dataset and generate the corrspondig data theory plot.

The input results are assumed to be such that the first one is the data, and the subsequent ones are the predictions for the PDFfs. See one_or_more_results. The labelling of the predictions can be influenced by setting label attribute of theories and pdfs.

normalize_to: should be either ‘data’, a pdf id or an index of the result (0 for the data, and i for the ith pdf). None means plotting absolute values.

See docs/plotting_format.md for details on the format of the PLOTTING files.

validphys.dataplots.plot_fancy_dataspecs(dataspecs_results, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]

General interface for data-theory comparison plots.

The user should define an arbitrary list of mappings called “dataspecs”. In each of these, dataset must resolve to a dataset with the same name (but could be e.g. different theories). The production rule matched_datasets_from_datasepcs may be used for this purpose.

The result will be a plot combining all the predictions from the dataspecs mapping (whch could vary in theory, pdf, cuts, etc).

The user can define a “speclabel” key in each datasspec (or only on some). By default, the PDF label will be used in the legend (like in plot_fancy).

normalize_to must be either:

  • The string ‘data’ or the integer 0 to plot the ratio to data,

  • or the 1-based index of the dataspec to normalize to the corresponding prediction,

  • or None (default) to plot absolute values.

A limitation at the moment is that the data cuts and errors will be taken from the first specifiaction.

validphys.dataplots.plot_fancy_sv_dataspecs(dataspecs_results_with_scale_variations, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None)[source]

Exactly the same as plot_fancy_dataspecs but the theoretical results passed down are modified so that the 1-sigma error bands correspond to a combination of the PDF error and the scale variations collected over theoryids

See: validphys.results.results_with_scale_variations()

validphys.dataplots.plot_fits_chi2_spider(fits, fits_groups_chi2, fits_groups_data, processed_metadata_group)[source]

Plots the chi²s of all groups of datasets on a spider/radar diagram.

validphys.dataplots.plot_fits_datasets_chi2(fits_datasets_chi2_table)[source]

Generate a plot equivalent to plot_datasets_chi2 using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider(fits_datasets_chi2_table)[source]

Generate a plot equivalent to plot_datasets_chi2_spider using all the fitted datasets as input.

validphys.dataplots.plot_fits_datasets_chi2_spider_bygroup(fits_datasets_chi2_table)[source]

Same as plot_fits_datasets_chi2_spider but one plot for each group.

validphys.dataplots.plot_fits_groups_data_chi2(fits_groups_chi2_table, processed_metadata_group)[source]

Generate a plot equivalent to plot_groups_data_chi2 using all the fitted group of data as input.

validphys.dataplots.plot_fits_groups_data_phi(fits_groups_phi_table, processed_metadata_group)[source]

Plots a set of bars for each fit, each bar represents the value of phi for the corresponding group of datasets, which is defined according to the keys in the PLOTTING info file

validphys.dataplots.plot_fits_phi_spider(fits, fits_groups_data, fits_groups_data_phi, processed_metadata_group)[source]

Like plot_fits_chi2_spider but for phi.

validphys.dataplots.plot_groups_data_chi2(groups_data, groups_chi2, processed_metadata_group)[source]

Plot the chi² of all groups of datasets with bars.

validphys.dataplots.plot_groups_data_chi2_spider(groups_data, groups_chi2, processed_metadata_group, pdf)[source]

Plot the chi² of all groups of datasets as a spider plot.

validphys.dataplots.plot_groups_data_phi_spider(groups_data, groups_data_phi, processed_metadata_group, pdf)[source]

Plot the phi of all groups of datasets as a spider plot.

validphys.dataplots.plot_obscorrs(corrpair_datasets, obs_obs_correlations, pdf)[source]

NOTE: EXPERIMENTAL. Plot the correlation matrix between a pair of datasets.

validphys.dataplots.plot_orbital_momentum(pdf, Q, partial_polarized_sum_rules)[source]

In addition to plotting the correlated spin moments as in plot_polarized_momentum, it also plots the contributions from the Orbital Angular Momentum.

validphys.dataplots.plot_phi(groups_data, groups_data_phi, processed_metadata_group)[source]

plots phi for each group of data as a bar for a single PDF input

See phi_data for information on how phi is calculated

validphys.dataplots.plot_phi_scatter_dataspecs(dataspecs_groups, dataspecs_speclabel, dataspecs_groups_bootstrap_phi)[source]

For each of the dataspecs, a bootstrap distribution of phi is generated for all specified groups of datasets. The distribution is then represented as a scatter point which is the median of the bootstrap distribution and an errorbar which spans the 68% confidence interval. By default the number of bootstrap samples is set to a sensible value, however it can be controlled by specifying bootstrap_samples in the runcard.

validphys.dataplots.plot_polarized_momentum(pdf, Q, partial_polarized_sum_rules, angular_momentum=False)[source]

Plot the correlated uncertainties for the truncated integrals of the polarized gluon and singlet distributions.

validphys.dataplots.plot_positivity(pdfs, positivity_predictions_for_pdfs, posdataset, pos_use_kin=False)[source]

Plot an errorbar spanning the central 68% CI of a positivity observable as well as a point indicating the central value (according to the pdf.stats_class.central_value()).

Errorbars and points are plotted on a symlog scale as a function of the data point index (if pos_use_kin==False) or the first kinematic variable (if pos_use_kin==True).

validphys.dataplots.plot_replica_sum_rules(pdf, sum_rules, Q)[source]

Plot the value of each sum rule as a function of the replica index

validphys.dataplots.plot_smpdf(pdf, dataset, obs_pdf_correlations, mark_threshold: float = 0.9)[source]

Plot the correlations between the change in the observable and the change in the PDF in (x,fl) space.

mark_threshold is the proportion of the maximum absolute correlation that will be used to mark the corresponding area in x in the background of the plot. The maximum absolute values are used for the comparison.

Examples

>>> from validphys.api import API
>>> data_input = {
>>>    "dataset_input" : {"dataset": "HERACOMBNCEP920"},
>>>    "theoryid": 200,
>>>     "use_cuts": "internal",
>>>     "pdf": "NNPDF40_nnlo_as_01180",
>>>     "Q": 1.6,
>>>     "mark_threshold": 0.2
>>> }
>>> smpdf_gen = API.plot_smpdf(**data_input)
>>> fig = next(smpdf_gen)
>>> fig.show()
validphys.dataplots.plot_training_length(replica_data, fit)[source]

Generate an histogram for the distribution of training lengths in a given fit. Each bin is normalised by the total number of replicas.

validphys.dataplots.plot_training_validation(fit, replica_data, replica_filters=None)[source]

Scatter plot with the training and validation chi² for each replica in the fit. The mean is also displayed as well as a line y=x to easily identify whether training or validation chi² is larger.

validphys.dataplots.plot_trainvaliddist(fit, replica_data)[source]

KDEs for the trainning and validation distributions for each replica in the fit.

validphys.dataplots.plot_xq2(dataset_inputs_by_groups_xq2map, use_cuts, data_input, display_cuts: bool = True, marker_by: str = 'process type', highlight_label: str = 'highlight', highlight_datasets: (<class 'collections.abc.Sequence'>, <class 'NoneType'>) = None, aspect: str = 'landscape')[source]

Plot the (x,Q²) coverage based of the data based on some LO approximations. These are governed by the relevant kintransform.

The representation of the filtered data depends on the display_cuts and use_cuts options:

  • If cuts are disabled (use_cuts is CutsPolicy.NOCUTS), all the data

will be plotted (and setting display_cuts to True is an error).

  • If cuts are enabled (use_cuts is either CutsPolicy.FROMFIT or

CutsPolicy.INTERNAL) and display_cuts is False, the masked points will be ignored.

  • If cuts are enabled and display_cuts is True, the filtered points

will be displaed and marked.

The points are grouped according to the marker_by option. The possible values are: “process type”, “experiment”, “group” or “dataset”.

Some datasets can be made to appear highlighted in the figure: Define a key called highlight_datasets containing the names of the datasets to be highlighted and a key highlight_label with a string containing the label of the highlight, which will appear in the legend.

Example

Obtain a plot with some reasonable defaults:

from validphys.api import API
inp = {'dataset_inputs': [{'dataset': 'NMCPD_dw'},
   {'dataset': 'NMC'},
   {'dataset': 'SLACP_dwsh'},
   {'dataset': 'SLACD_dw'},
   {'dataset': 'BCDMSP_dwsh'},
   {'dataset': 'BCDMSD_dw'},
   {'dataset': 'CHORUSNUPb_dw'},
   {'dataset': 'CHORUSNBPb_dw'},
   {'dataset': 'NTVNUDMNFe_dw', 'cfac': ['MAS']},
   {'dataset': 'NTVNBDMNFe_dw', 'cfac': ['MAS']},
   {'dataset': 'HERACOMBNCEM'},
   {'dataset': 'HERACOMBNCEP460'},
   {'dataset': 'HERACOMBNCEP575'},
   {'dataset': 'HERACOMBNCEP820'},
   {'dataset': 'HERACOMBNCEP920'},
   {'dataset': 'HERACOMBCCEM'},
   {'dataset': 'HERACOMBCCEP'},
   {'dataset': 'HERACOMB_SIGMARED_C'},
   {'dataset': 'HERACOMB_SIGMARED_B'},
   {'dataset': 'DYE886R_dw'},
   {'dataset': 'DYE886P', 'cfac': ['QCD']},
   {'dataset': 'DYE605_dw', 'cfac': ['QCD']},
   {'dataset': 'CDFZRAP_NEW', 'cfac': ['QCD']},
   {'dataset': 'D0ZRAP', 'cfac': ['QCD']},
   {'dataset': 'D0WMASY', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP36PB', 'cfac': ['QCD']},
   {'dataset': 'ATLASZHIGHMASS49FB', 'cfac': ['QCD']},
   {'dataset': 'ATLASLOMASSDY11EXT', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP11CC', 'cfac': ['QCD']},
   {'dataset': 'ATLASWZRAP11CF', 'cfac': ['QCD']},
   {'dataset': 'ATLASDY2D8TEV', 'cfac': ['QCDEWK']},
   {'dataset': 'ATLAS_WZ_TOT_13TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'ATLAS_WP_JET_8TEV_PT', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_WM_JET_8TEV_PT', 'cfac': ['QCD']},
   {'dataset': 'ATLASZPT8TEVMDIST', 'cfac': ['QCD'], 'sys': 10},
   {'dataset': 'ATLASZPT8TEVYDIST', 'cfac': ['QCD'], 'sys': 10},
   {'dataset': 'ATLASTTBARTOT', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_TOPDIFF_DILEPT_8TEV_TTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_1JET_8TEV_R06_DEC', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_2JET_7TEV_R06', 'cfac': ['QCD']},
   {'dataset': 'ATLASPHT15', 'cfac': ['QCD', 'EWK']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_R_7TEV', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_T_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_TBAR_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_T_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_TBAR_RAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'CMSWEASY840PB', 'cfac': ['QCD']},
   {'dataset': 'CMSWMASY47FB', 'cfac': ['QCD']},
   {'dataset': 'CMSDY2D11', 'cfac': ['QCD']},
   {'dataset': 'CMSWMU8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMSZDIFF12', 'cfac': ['QCD', 'NRM'], 'sys': 10},
   {'dataset': 'CMS_2JET_7TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_2JET_3D_8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMSTTBARTOT', 'cfac': ['QCD']},
   {'dataset': 'CMSTOPDIFF8TEVTTRAPNORM', 'cfac': ['QCD']},
   {'dataset': 'CMSTTBARTOT5TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTBAR_2D_DIFF_MTT_TRAP_NORM', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTB_DIFF_13TEV_2016_2L_TRAP', 'cfac': ['QCD']},
   {'dataset': 'CMS_TTB_DIFF_13TEV_2016_LJ_TRAP', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_TOT_7TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_R_8TEV', 'cfac': ['QCD']},
   {'dataset': 'CMS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']},
   {'dataset': 'LHCBZ940PB', 'cfac': ['QCD']},
   {'dataset': 'LHCBZEE2FB', 'cfac': ['QCD']},
   {'dataset': 'LHCBWZMU7TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'LHCBWZMU8TEV', 'cfac': ['NRM', 'QCD']},
   {'dataset': 'LHCB_Z_13TEV_DIMUON', 'cfac': ['QCD']},
   {'dataset': 'LHCB_Z_13TEV_DIELECTRON', 'cfac': ['QCD']}],
  'use_cuts': 'internal',
  'display_cuts': False,
  'theoryid': 162,
  'highlight_label': 'Old',
  'highlight_datasets': ['NMC', 'CHORUSNUPb_dw', 'CHORUSNBPb_dw']}
API.plot_xq2(**inp)

validphys.deltachi2 module

deltachi2.py

Plots and data processing that can be used in a delta chi2 analysis

class validphys.deltachi2.PDFEpsilonPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Subclassing PDFPlotter in order to plot epsilon (measure of gaussanity) for multiple PDFs, yielding a separate figure for each flavour

draw(pdf, grid, flstate)[source]

Obtains the gridvalues of epsilon (measure of Gaussianity)

get_ylabel(parton_name)[source]
legend(flstate)[source]
setup_flavour(flstate)[source]
validphys.deltachi2.check_pdf_is_symmhessian(pdf, **kwargs)[source]

Check pdf has error type of symmhessian

validphys.deltachi2.check_pdfs_are_montecarlo(pdfs, **kwargs)[source]

Checks that the action is applied only to a pdf consisiting of MC replicas.

validphys.deltachi2.delta_chi2_hessian(pdf, total_chi2_data)[source]

Return delta_chi2 (computed as in plot_delta_chi2_hessian) relative to each eigenvector of the Hessian set.

validphys.deltachi2.plot_delta_chi2_hessian_distribution(delta_chi2_hessian, pdf, total_chi2_data)[source]

Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_delta_chi2_hessian_eigenv(delta_chi2_hessian, pdf)[source]

Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.

validphys.deltachi2.plot_epsilon(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, eps=None)[source]

Plot the discrepancy (epsilon) of the 1-sigma and 68% bands at each grid value for all pdfs for a given Q. See https://arxiv.org/abs/1505.06736 eq. (11)

xscale is read from pdf plotting_grid scale, which is ‘log’ by default.

eps defines the value at which plot a simple hline

validphys.deltachi2.plot_kullback_leibler(delta_chi2_hessian)[source]

Determines the Kullback–Leibler divergence by comparing the expectation value of Delta chi2 to the cumulative distribution function of chi-square distribution with one degree of freedom (see: https://en.wikipedia.org/wiki/Chi-square_distribution).

The Kullback-Leibler divergence provides a measure of the difference between two distribution functions, here we compare the chi-squared distribution and the cumulative distribution of the expectation value of Delta chi2.

validphys.deltachi2.plot_pos_neg_pdfs(pdf, pos_neg_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None)[source]

Plot the the uncertainty of the original hessian pdfs, as well as that of the positive and negative subset.

validphys.deltachi2.pos_neg_xplotting_grids(delta_chi2_hessian, xplotting_grid)[source]

Generates xplotting_grids correspodning to positive and negative delta chi2s.

validphys.eff_exponents module

Tools for computing and plotting effective exponents.

class validphys.eff_exponents.ExponentBandPlotter(hlines, exponent, *args, **kwargs)[source]

Bases: BandPDFPlotter, PreprocessingPlotter

draw(pdf, grid, flstate)[source]

Overload BandPDFPlotter.draw() to plot bands of the effective exponent calculated from the replicas and horizontal lines for the effective exponents of the previous/next fits, if possible.

flstate is an element of the flavours for the first pdf specified in pdfs. If this flavour doesn’t exist in the current pdf’s fitbasis or the set of flavours for which the preprocessing exponents exist for the current pdf no horizontal lines are plotted.

class validphys.eff_exponents.PreprocessingPlotter(exponent, *args, **kwargs)[source]

Bases: PDFPlotter

Class inherenting from BandPDFPlotter, changing title and ylabel to reflect the effective exponent being plotted.

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
validphys.eff_exponents.alpha_eff(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent alpha at the specified values of x and flavour. alpha is relevant at small x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.beta_eff(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Return a list of xplotting_grids containing the value of the effective exponent beta at the specified values of x and flavour. beta is relevant at large x, hence the linear scale.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

validphys.eff_exponents.effective_exponents_table_internal(next_effective_exponents_table, *, fit=None, basis)[source]

Returns a table which concatenates previous_effective_exponents_table and next_effective_exponents_table if both tables contain effective exponents in the same basis.

If the previous exponents are in a different basis, or no fit was given to read the previous exponents from, then only the next exponents table is returned, for plotting purposes.

validphys.eff_exponents.fmt(a)
validphys.eff_exponents.get_alpha_lines(effective_exponents_table_internal)[source]

Given an effective_exponents_table_internal returns the rows with bounds of the alpha effective exponent for all flavours, used to plot horizontal lines on the alpha effective exponent plots.

validphys.eff_exponents.get_beta_lines(effective_exponents_table_internal)[source]

Same as get_alpha_lines but for beta

validphys.eff_exponents.iterate_preprocessing_yaml(fit, next_fit_eff_exps_table, _flmap_np_clip_arg=None)[source]

Using py:func:next_effective_exponents_table update the preprocessing exponents of the input fit. This is part of the usual pipeline referred to as “iterating a fit”, for more information see: How to run an iterated fit. A fully iterated runcard can be obtained from the action iterated_runcard_yaml().

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterate_preprocessing_yaml@} `

Alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterate_preprocessing_yaml(fit=<fit name>)
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)
Parameters:
  • fit (validphys.core.FitSpec) – Whose preprocessing range will be iterated, the output runcard will be the same as the one used to run this fit, except with new preprocessing range.

  • next_fit_eff_exps_table (pd.DataFrame) – Table outputted by next_fit_eff_exps_table() containing the next preprocessing ranges.

  • _flmap_np_clip_arg (dict) – Internal argument used by vp-nextfitruncard. Dictionary containing a mapping like {<flavour>: {<largex/smallx>: {a_min: <min value>, a_max: <max value>}}}. If a flavour is present in _flmap_np_clip_arg then the preprocessing ranges will be passed through np.clip with the arguments supplied in the mapping.

validphys.eff_exponents.iterated_runcard_yaml(fit, update_runcard_description_yaml)[source]

Takes the runcard with preprocessing iterated and description updated then

  • Updates the t0 pdf, the fiatlux pdf, and the theory covmat pdf to be fit

  • Modifies the random seeds (to random unsigned long ints)

This should facilitate running a new fit with identical input settings as the specified fit with the t0, seeds and preprocessing iterated. For more information see: How to run an iterated fit

This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@iterated_runcard_yaml@} `

alternatively, using the API, the yaml dump returned by this function can be written to a file e.g

>>> from validphys.api import API
>>> yaml_output = API.iterated_runcard_yaml(
...     fit=<fit name>,
...     _updated_description="My iterated fit"
... )
>>> with open("output.yml", "w+") as f:
...     f.write(yaml_output)
validphys.eff_exponents.next_effective_exponents_table(pdf: ~validphys.core.PDF, *, fitq0fromfit: (<class 'numbers.Real'>, <class 'NoneType'>) = None, x1_alpha: ~numbers.Real = 1e-06, x2_alpha: ~numbers.Real = 0.001, x1_beta: ~numbers.Real = 0.65, x2_beta: ~numbers.Real = 0.95, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Given a PDF, calculate the next effective exponents

By default x1_alpha = 1e-6, x2_alpha = 1e-3, x1_beta = 0.65, and x2_beta = 0.95, but different values can be specified in the runcard. The values control where the bounds of alpha and beta are evaluated:

alpha_min:

singlet/gluon: the 2x68% c.l. lower value evaluated at x=`x1_alpha` others : min(2x68% c.l. lower value evaluated at x=`x1_alpha` and x=`x2_alpha`)

alpha_max:

singlet/gluon: min(2 and the 2x68% c.l. upper value evaluated at x=`x1_alpha`) others : min(2 and max(2x68% c.l. upper value evaluated at x=`x1_alpha`

and x=`x2_alpha`))

beta_min:

max(0 and min(2x68% c.l. lower value evaluated at x=`x1_beta` and x=`x2_beta`))

beta_max:

max(2x68% c.l. upper value evaluated at x=`x1_beta` and x=`x2_beta`)

validphys.eff_exponents.plot_alpha_eff(fits_pdf, alpha_eff_fits, fits_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

validphys.eff_exponents.plot_alpha_eff_internal(pdfs, alpha_eff_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.

validphys.eff_exponents.plot_beta_eff(fits_pdf, beta_eff_fits, fits_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Same as plot_alpha_eff but for beta effective exponents

validphys.eff_exponents.plot_beta_eff_internal(pdfs, beta_eff_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]

Same as plot_alpha_eff_internal but for beta effective exponent

validphys.eff_exponents.previous_effective_exponents(basis: str, fit: (<class 'validphys.core.FitSpec'>, <class 'NoneType'>) = None)[source]

If provided with a fit, check that the basis is the basis which was fitted if so then return the previous effective exponents read from the fit runcard.

validphys.eff_exponents.previous_effective_exponents_table(fit: FitSpec)[source]

Given a fit, reads the previous exponents from the fit runcard

validphys.eff_exponents.update_runcard_description_yaml(iterate_preprocessing_yaml, _updated_description=None)[source]

Take the runcard with iterated preprocessing and update the description if _updated_description is provided. As with iterate_preprocessing_yaml() the result can be used in a report but should be wrapped in a code block to be formatted correctly, for example:

`yaml {@update_runcard_description_yaml@} `

validphys.filters module

Filters for NNPDF fits

class validphys.filters.AddedFilterRule(dataset: str = None, process_type: str = None, rule: str = None, reason: str = None, local_variables: Mapping[str, str | float] = None, PTO: str = None, FNS: str = None, IC: str = None)[source]

Bases: FilterRule

Dataclass which carries extra filter rule that is added to the default rule.

exception validphys.filters.BadPerturbativeOrder[source]

Bases: ValueError

Exception raised when the perturbative order string is not recognized.

exception validphys.filters.FatalRuleError[source]

Bases: Exception

Exception raised when a rule application failed at runtime.

class validphys.filters.FilterDefaults(q2min: float = None, w2min: float = None, maxTau: float = None)[source]

Bases: object

Dataclass carrying default values for filters (cuts) taking into account the values of q2min, w2min and maxTau.

maxTau: float = None
q2min: float = None
to_dict()[source]
w2min: float = None
class validphys.filters.FilterRule(dataset: str = None, process_type: str = None, rule: str = None, reason: str = None, local_variables: Mapping[str, str | float] = None, PTO: str = None, FNS: str = None, IC: str = None)[source]

Bases: object

Dataclass which carries the filter rule information.

FNS: str = None
IC: str = None
PTO: str = None
dataset: str = None
local_variables: Mapping[str, str | float] = None
process_type: str = None
reason: str = None
rule: str = None
to_dict()[source]
exception validphys.filters.MissingRuleAttribute[source]

Bases: RuleProcessingError, AttributeError

Exception raised when a rule is missing required attributes.

class validphys.filters.PerturbativeOrder(string)[source]

Bases: object

Class that conveniently handles perturbative order declarations for use within the Rule class filter.

Parameters:

string (str) –

A string in the format of NNLO or equivalently N2LO. This can be followed by one of ! + - or none.

The syntax allows for rules to be executed only if the perturbative order is within a given range. The following enumerates all 4 cases as an example:

NNLO+ only execute the following rule if the pto is 2 or greater NNLO- only execute the following rule if the pto is strictly less than 2 NNLO! only execute the following rule if the pto is strictly not 2 NNLO only execute the following rule if the pto is exactly 2

Any unrecognized string will raise a BadPerturbativeOrder exception.

Example

>>> from validphys.filters import PerturbativeOrder
>>> pto = PerturbativeOrder("NNLO+")
>>> pto.numeric_pto
2
>>> 1 in pto
False
>>> 2 in pto
True
>>> 3 in pto
True
parse()[source]
class validphys.filters.Rule(initial_data: FilterRule, *, defaults: dict, theory_parameters: dict, loader=None)[source]

Bases: object

Rule object to be used to generate cuts mask.

A rule object is created for each rule in ./cuts/filters.yaml

Old commondata relied on the order of the kinematical variables to be the same as specified in the KIN_LABEL dictionary set in this module. The new commondata specification instead defines explicitly the name of the variables in the metadata. Therefore, when using a new-format commondata, the KIN_LABEL dictionary will not be used and the variables defined in it will be used instead.

Parameters:
  • initial_data (dict) –

    A dictionary containing all the information regarding the rule. This contains the name of the dataset the rule to applies to and/or the process type the rule applies to. Additionally, the rule itself is defined, alongside the reason the rule is used. Finally, the user can optionally define their own custom local variables.

    By default these are defined in cuts/filters.yaml

  • defaults (dict) –

    A dictionary containing default values to be used globally in all rules.

    By default these are defined in cuts/defaults.yaml

  • theory_parameters – Dict containing pairs of (theory_parameter, value)

  • loader (validphys.loader.Loader, optional) – A loader instance used to retrieve the datasets.

numpy_functions = {'fabs': <ufunc 'fabs'>, 'log': <ufunc 'log'>, 'sqrt': <ufunc 'sqrt'>}
exception validphys.filters.RuleProcessingError[source]

Bases: Exception

Exception raised when we couldn’t process a rule.

validphys.filters.check_additional_errors(additional_errors)[source]

Lux additional errors pdf check

validphys.filters.check_integrability(integdatasets)[source]

Verify positive datasets are ready for the fit.

validphys.filters.check_luxset(luxset)[source]

Lux pdf check

validphys.filters.check_nonnegative(var: str)[source]

Ensure that var is positive

validphys.filters.check_positivity(posdatasets)[source]

Verify positive datasets are ready for the fit.

validphys.filters.check_t0pdfset(t0pdfset)[source]

T0 pdf check

validphys.filters.check_unpolarized_bc(unpolarized_bc)[source]

Check that unpolarized PDF bound can be loaded normally.

validphys.filters.default_filter_rules_input()[source]

Return a tuple of FilterRule objects. These are defined in filters.yaml in the validphys.cuts module.

validphys.filters.default_filter_settings_input()[source]

Return a FilterDefaults dataclass with the default hardcoded filter settings. These are defined in defaults.yaml in the validphys.cuts module.

validphys.filters.export_mask(path, mask)[source]

Dump mask to file

validphys.filters.filter(filter_data)[source]

Summarise filters applied to all datasets

validphys.filters.filter_closure_data_by_experiment(filter_path, experiments_data, fakepdf, fakenoise, filterseed, data_index, sep_mult)[source]

Applies _filter_closure_data() on each experiment in the closure test.

This function just peforms a for loop over experiments, the reason we don’t use reportengine.collect is that it can permute the order in which closure data is generate, which means that the pseudodata is not reproducible.

validphys.filters.filter_real_data(filter_path, data)[source]

Filter real data, cutting any points which do not pass the filter rules.

validphys.filters.get_cuts_for_dataset(commondata, rules) list[source]

Function to generate a list containing the index of all experimental points that passed kinematic cut rules stored in ./cuts/filters.yaml

Parameters:
Returns:

mask – List object containing index of all passed experimental values

Return type:

list

Example

>>> from validphys.filters import (get_cuts_for_dataset, Rule,
...     default_filter_settings, default_filter_rules_input)
>>> from validphys.loader import Loader
>>> l = Loader()
>>> cd = l.check_commondata("NMC")
>>> theory = l.check_theoryID(53)
>>> filter_defaults = default_filter_settings()
>>> params = theory.get_description()
>>> rule_list = [Rule(initial_data=i, defaults=filter_defaults, theory_parameters=params)
...     for i in default_filter_rules_input()]
>>> get_cuts_for_dataset(cd, rules=rule_list)
validphys.filters.make_dataset_dir(path)[source]

Creates directory at path location.

validphys.fitdata module

Utilities for loading data from fit folders

class validphys.fitdata.DatasetComp(common, first_only, second_only)

Bases: tuple

common

Alias for field number 0

first_only

Alias for field number 1

second_only

Alias for field number 2

class validphys.fitdata.FitInfo(nite, training, validation, chi2, is_positive, arclengths, integnumbers)

Bases: tuple

arclengths

Alias for field number 5

chi2

Alias for field number 3

integnumbers

Alias for field number 6

is_positive

Alias for field number 4

nite

Alias for field number 0

training

Alias for field number 1

validation

Alias for field number 2

validphys.fitdata.check_lhapdf_info(results_dir, fitname)[source]

Check that an LHAPDF info metadata file is present in the fit results

validphys.fitdata.check_nnfit_results_path(path)[source]

Returns True if the requested path is a valid results directory, i.e if it is a directory and has a ‘nnfit’ subdirectory

validphys.fitdata.check_replica_files(replica_path, prefix)[source]

Verification of a replica results directory at replica_path for a fit named prefix. Returns True if the results directory is complete

validphys.fitdata.datasets_properties_table(data_input)[source]

Return dataset properties for each dataset in data_input

validphys.fitdata.fit_code_version(fit)[source]

Returns table with the code version from replica_1/{fitname}.json files. Note that the version for thensorflow distinguishes between the mkl=on and off version

validphys.fitdata.fit_datasets_properties_table(fitinputcontext)[source]

Returns table of dataset properties for each dataset used in a fit.

validphys.fitdata.fit_summary(fit_name_with_covmat_label, replica_data, total_chi2_data, total_phi_data)[source]

Summary table of fit properties - Central chi-squared - Average chi-squared - Training and Validation error functions - Training lengths - Phi

Note: Chi-squared values from the replica_data are not used here (presumably they are fixed to being t0)

This uses a corrected form for the error on phi in comparison to the vp1 value. The error is propagated from the uncertainty on the average chi-squared only.

validphys.fitdata.fit_theory_covmat_summary(fit, fitthcovmat)[source]

returns a table with a single column for the fit, with three rows indicating if the theory covariance matrix was used in the ‘sampling’ of the pseudodata, the ‘fitting’, and the ‘validphys statistical estimators’ in the current namespace for that fit.

validphys.fitdata.fits_replica_data_correlated(fits_replica_data, fits_replica_indexes, fits)[source]

Return a table with the same columns as replica_data indexed by the replica fit ID. For identical fits, the values across rows should be the same.

If some replica ID is not present for a given fit (e.g. discarded by postfit), the corresponding entries in the table will be null.

validphys.fitdata.fits_version_table(fits_fit_code_version)[source]

Produces a table of version information for multiple fits.

validphys.fitdata.fitted_replica_indexes(pdf)[source]

Return nnfit index of replicas 1 to N.

validphys.fitdata.load_fitinfo(replica_path, prefix)[source]

Process the data in the .json. file for a single replica into a FitInfo object. If the .json file does not exist an old-format fit is assumed and old_load_fitinfo will be called instead.

validphys.fitdata.match_datasets_by_name(fits, fits_datasets)[source]

Return a tuple with common, first_only and second_only. The elements of the tuple are mappings where the keys are dataset names and the values are the two datasets contained in each fit for common, and the corresponfing dataset inclucded only in the first fit and only in the second fit.

validphys.fitdata.num_fitted_replicas(fit)[source]

Function to obtain the number of nnfit replicas. That is the number of replicas before postfit was run.

validphys.fitdata.print_dataset_differences(fits, match_datasets_by_name, print_common: bool = True)[source]

Given exactly two fits, print the datasets that are included in one ” “but not in the other. If print_common is True, also print the datasets that are common.

For the purposes of visual aid, everything is ordered by the dataset name, in terms of the the convention for the commondata means that everything is order by:

  1. Experiment name

  2. Process

  3. Energy

validphys.fitdata.print_different_cuts(fits, test_for_same_cuts)[source]

Print a summary of the datasets that are included in both fits but have different cuts.

validphys.fitdata.print_systype_overlap(groups_commondata, group_dataset_inputs_by_metadata)[source]

Returns a set of systypes that overlap between groups. Discards the set of systypes which overlap but do not imply correlations

validphys.fitdata.replica_data(fit, replica_paths)[source]

Load the necessary data from the .json file of each of the replicas. The corresponding PDF set must be installed in the LHAPDF path.

The included information is:

(‘nite’, ‘training’, ‘validation’, ‘chi2’, ‘pos_status’, ‘arclenghts’)

validphys.fitdata.replica_paths(fit)[source]

Return the paths of all the replicas

validphys.fitdata.summarise_fits(collected_fit_summaries)[source]

Produces a table of basic comparisons between fits, includes all the fields used in fit_summary

validphys.fitdata.summarise_theory_covmat_fits(fits_theory_covmat_summary)[source]

Collects the theory covmat summary for all fits and concatenates them into a single table

validphys.fitdata.t0_chi2_info_table(pdf, dataset_inputs_abs_chi2_data, t0pdfset, use_t0)[source]

Provides table with - t0pdfset name - Central t0-chi-squared - Average t0-chi-squared

validphys.fitdata.test_for_same_cuts(fits, match_datasets_by_name)[source]

Given two fits, return a list of tuples (first, second) where first and second are DatasetSpecs that correspond to the same dataset but have different cuts, such that first is included in the first fit and second in the second.

validphys.fitveto module

fitveto.py

Module for the determination of passing fit replicas.

Current active vetoes:

Positivity - Replicas with FitInfo.is_positive == False ChiSquared - Replicas with ChiSquared > nsigma_discard_chi2*StandardDev + Average ArclengthX - Replicas with ArcLengthX > nsigma_discard_arclength*StandardDev + Average Integrability - Replicas with IntegrabilityNumbers < integ_threshold

validphys.fitveto.determine_vetoes(fitinfos: list, nsigma_discard_chi2: float, nsigma_discard_arclength: float, integ_threshold: float)[source]

Assesses whether replica fitinfo passes standard NNPDF vetoes Returns a dictionary of vetoes and their passing boolean masks. Included in the dictionary is a ‘Total’ veto.

validphys.fitveto.distribution_veto(dist, prior_mask, nsigma_threshold)[source]

For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy:

value <= mean + nsigma_threshold*standard_deviation

Only points passing the prior_mask are considered in the average or standard deviation.

validphys.fitveto.integrability_veto(dist, integ_threshold)[source]

For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy: value <= integ_threshold

validphys.fitveto.save_vetoes_info(veto_dict: dict, chi2_threshold, arclength_threshold, integ_threshold, filepath)[source]

Saves to file the chi2 and arclength thresholds used by postfit as well as veto dictionaries which contain information on which replicas pass each veto.

validphys.fkparser module

This module implements parsers for FKtable and CFactor files into useful datastructures, contained in the validphys.coredata module, which can be easily pickled and interfaced with common Python libraries.

Most users will be interested in using the high level interface load_fktable(). Given a validphys.core.FKTableSpec object, it returns an instance of validphys.coredata.FKTableData, an object with the required information to compute a convolution, with the CFactors applied.

from validphys.fkparser import load_fktable
from validphys.loader import Loader
l = Loader()
fk = l.check_fktable(setname="ATLASTTBARTOT", theoryID=53, cfac=('QCD',))
res = load_fktable(fk)
exception validphys.fkparser.BadCFactorError[source]

Bases: Exception

Exception raised when an CFactor cannot be parsed correctly

exception validphys.fkparser.BadFKTableError[source]

Bases: Exception

Exception raised when an FKTable cannot be parsed correctly

class validphys.fkparser.GridInfo(setname: str, hadronic: bool, ndata: int, nx: int)[source]

Bases: object

Class containing the basic properties of an FKTable grid.

hadronic: bool
ndata: int
nx: int
setname: str
validphys.fkparser.load_fktable(spec)[source]

Load the data corresponding to a FKSpec object. The cfactors will be applied to the grid. If we have a new-type fktable, call directly load(), otherwise fallback to the old parser

validphys.fkparser.open_fkpath(path)[source]

Return a file-like object from the fktable path, regardless of whether it is compressed

Parameters

path: Path or str

Path like file containing a valid FKTable. It can be either inside a tarball or in plain text.

returns:

f – A file like object for further processing.

rtype:

file

validphys.fkparser.parse_cfactor(f)[source]

Parse an open byte stream into a :py:class`CFactorData`. Raise a BadCFactorError if problems are encountered.

Parameters:

f (file) – Binary file-like object

Returns:

cfac – An object containing the data on the cfactor for each point.

Return type:

CFactorData

validphys.fkparser.parse_fktable(f)[source]

Parse an open byte stream into an FKTableData. Raise a BadFKTableError if problems are encountered.

Parameters:

f (file) – Open file-like object. See :func:`open_fkpath`to obtain it.

Returns:

fktable – An object containing the FKTable data and information.

Return type:

FKTableData

Notes

This function operates at the level of a single file, and therefore it does not apply CFactors (see load_fktable() for that) or handle operations within COMPOUND ensembles.

validphys.gridvalues module

gridvalues.py

Core functionality needed to obtain a set of values from LHAPDF. The tools for representing these grids are in pdfgrids.py (the validphys provider module), and the basis transformations are in pdfbases.py

validphys.gridvalues.central_grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Same as grid_values() but it returns only the central values. The return value is indexed as:

grid_values[replica][flavour][x][Q]

where the first dimension (coresponding to the central member of the PDF set) is always one.

validphys.gridvalues.evaluate_luminosity(pdf_set: LHAPDFSet, n: int, s: float, mx: float, x1: float, x2: float, channel)[source]

Returns PDF luminosity at specified values of mx, x1, x2, sqrts**2 for a given channel.

pdf_set: The interested PDF set s: The square of the center of mass energy GeV^2. mx: The invariant mass bin GeV. x1 and x2: The partonic x1 and x2. channel: The channel tag name from LUMI_CHANNELS.

validphys.gridvalues.grid_values(pdf: PDF, flmat, xmat, qmat)[source]

Evaluate x*f(x) on a grid of points in flavour, x and Q.

Parameters:
  • pdf (PDF) – Any PDF set

  • flmat (iterable) – A list of PDG IDs corresponding the the LHAPDF flavours in the grid.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

Returns:

  • A 4-dimension array with the PDF values at the input parameters

  • for each replica. The return value is indexed as follows:: – grid_values[replica][flavour][x][Q]

See also

validphys.pdfbases.Basis.grid_values(), interface, allowing, and, aliases

Examples

Compute the maximum difference across replicas between the u and ubar PDFs (times x) for x=0.05 and both Q=10 and Q=100:

>>> from validphys.loader import Loader
>>> from validphys.gridvalues import grid_values
>>> import numpy as np
>>> gv = grid_values(Loader().check_pdf('NNPDF31_nnlo_as_0118'), [-1, 1], [0.5], [10, 100])
>>> #Take the difference across the flavour dimension, the max
>>> #across the replica dimension, and leave the Q dimension untouched.
>>> np.diff(gv, axis=1).max(axis=0).ravel()
array([0.07904731, 0.04989902], dtype=float32)

validphys.hessian2mc module

validphys.hessian2mc.py

This module contains the functions that can be used to convert Hessian sets like MSHT20 and CT18 to Monte Carlo sets. The functions implemented here follow equations (4.3) of the paper arXiv:2203.05506

validphys.hessian2mc.write_hessian_to_mc_watt_thorne(pdf, mc_pdf_name, num_members, watt_thorne_rnd_seed=1)[source]

Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne (MSHT20) prescription described in Eq. 4.3 of arXiv:2203.05506.

Parameters:
  • pdf (validphys.core.PDF) – The Hessian PDF set that is to be converted to Monte Carlo.

  • mc_pdf_name (str) – The name of the new Monte Carlo PDF set.

validphys.hessian2mc.write_mc_watt_thorne_replicas(Rjk_std_normal, replicas_df, mc_pdf_path)[source]

Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne prescription described in Eq. 4.3 of arXiv:2203.05506.

Parameters:
  • Rjk_std_normal (np.ndarray) – Array of shape (num_members, n_eig) containing random standard normal numbers.

  • replicas_df (pd.DataFrame) – DataFrame containing replicas of the hessian set at all scales.

  • mc_pdf_path (pathlib.Path) – Path to the new Monte Carlo PDF set.

validphys.hessian2mc.write_new_lhapdf_info_file_from_previous_pdf(path_old_pdfset, name_old_pdfset, path_new_pdfset, name_new_pdfset, num_members, description_set='MC representation of hessian PDF set', errortype='replicas')[source]

Writes a new LHAPDF set info file based on an existing set.

validphys.hyper_algorithm module

This module contains functions dedicated to process the json dictionaries

validphys.hyper_algorithm.autofilter_dataframe(dataframe, keys, n_to_combine=1, n_to_kill=1, threshold=-1)[source]

Receives a dataframe and a list of keys. Creates combinations of n_to_combine keys and computes the reward Finally removes from the dataframe the n_to_kill worse combinations

Anything under threshold will be removed and will not count towards the n_to_kill (by default threshold = -50 so only things which are really bad will be removed)

# Arguments:
  • dataframe: a pandas dataframe

  • keys: keys to combine

  • n_to_combine: how many keys do we want to combine

  • n_to_kill: how many combinations to kill

  • threshold: anything under this reward will be removed

# Returns:
  • dataframe_sliced: a slice of the dataframe with the weakest combinations

    removed

validphys.hyper_algorithm.bin_generator(df_values, max_n=10)[source]

Receives a dataframe with a list of unique values . If there are more than max_n of them and they are numeric, create max_n bins. If they are already discrete values or there are less than max_n options, output the same input

# Arguments:
  • df_values: dataframe with unique values

  • maximum: maximum number of allowed different values

# Returns:
  • new_vals: list of tuples with (initial, end) value of the bin

validphys.hyper_algorithm.compute_reward(mdict, biggest_ntotal)[source]

Given a combination dictionary computes the reward function:

If the fail rate for this combination is above the fail threshold, rewards is -100

The formula below for the reward takes into account:
  • The rate of ok fits that have a loss below the loss_threshold

  • The rate of fits that failed

  • The std deviation

  • How far away is the median from the best loss

  • How far away are median and average

validphys.hyper_algorithm.dataframe_removal(dataframe, hit_list)[source]

Removes all combinations defined in hit_list from the dataframe. The hit list is list of dictionaries containing the ‘slice’ key where ‘slice’ must be a slice of ‘dataframe’

# Arguments:
  • dataframe: a pandas dataframe

  • hit_list: the list of element to remove

# Returns:
  • new_dataframe: the same dataframe with all elements from hit_list removed

validphys.hyper_algorithm.get_combinations(key_info, ncomb)[source]

Given a dictionary mapping keys to iterables of possible values (key_info), return a list of the product of all possible mappings of a subset of ncomb keys to single values out of the corresponding possible values, for all such subsets.

For instance, key_info = {

‘key1’ : [val1-1, val1-2, …], ‘key2’ : [val2-1, val2-2, …], }

ncomb = 2

will return a list of dictionaries: [ {‘key1’ : val1-1, ‘key2’, val2-1 … }, {‘key1’ : val1-1, ‘key2’, val2-2 … }, {‘key1’ : val1-2, ‘key2’, val2-1 … }, {‘key1’ : val1-2, ‘key2’, val2-2 … }, ]

Get all combinations of ncomb elements for the keys and values given in the dictionary key_info:

# Arguments:
  • key_info: dictionary with the possible values for each key

  • ncomb: elements to combine

# Returns:
  • all_combinations: A list of dictionaries of parameters

validphys.hyper_algorithm.get_slice(dataframe, query_dict)[source]

Returns a slice of the dataframe where some keys match some values keys_info must be a dictionary {key1 : value1, key2, value2 …} # Arguments:

  • dataframe: a pandas dataframe

  • query_dict: a dictionary of combination as given by get_combinations

validphys.hyper_algorithm.parse_keys(dataframe, keys)[source]

Receives a dataframe and a set of keys Looks into the dataframe to read the possible values of the keys

Returns a dictionary { ‘key’ : [possible values] },

If the values are not discrete then we need to bin it let’s do this for anything with two many numerical values

# Arguments:
  • dataframe: a pandas dataframe

  • keys: keys to combine

# Returns:
  • key_info: a dictionary with the possible values for each key

validphys.hyper_algorithm.process_slice(df_slice)[source]

Function to process a slice into a dictionary with useful stats If the slice is None it means the combination does not apply

# Arguments:
  • df_slice: a slice of a pandas dataframe

# Returns:
  • proc_dict: a dictionary of stats

validphys.hyper_algorithm.study_combination(dataframe, query_dict)[source]

Given a dataframe and a dictionary of {key1 : value1, key2: value2} returns a dictionary with a number of stats for that combination

# Arguments:
  • dataframe: a pandas dataframe

  • query_dict: a dictionary for a combination as given by get_combinations

# Returns:
  • proc_dict: a dictionary of the “statistics” for this combination

validphys.hyperoptplot module

Module for the parsing and plotting of the results and output of previous hyperparameter scans

class validphys.hyperoptplot.HyperoptTrial(trial_dict, base_params=None, minimum_losses=1, linked_trials=None)[source]

Bases: object

Hyperopt trial class. Makes the dictionary-like output of hyperopt into an object that can be easily managed

Parameters:
  • trial_dict (dict) – one single result (a dictionary) from a tries.json file

  • base_params (dict) – Base parameters of the runcard which can be used to complete the hyperparameter dictionary when not all parameters were scanned

  • minimum_losses (int) – Minimum number of losses to be found in the trial for it to be considered succesful

  • linked_trials (list) – List of trials coming from the same file as this trial

get(item, default=None)[source]

Link a list of trials to this trial

property loss

Return the loss of the hyperopt dict

property params

Parameters for the fit

property reward

Return and cache the reward value

property weighted_reward

Return the reward weighted to the mean value of the linked trials

validphys.hyperoptplot.best_setup(hyperopt_dataframe, hyperscan_config, commandline_args)[source]

Generates a clean table with information on the hyperparameter settings of the best setup.

validphys.hyperoptplot.evaluate_trial(trial_dict, validation_multiplier, fail_threshold, loss_target)[source]

Read a trial dictionary and compute the true loss and decide whether the run passes or not

validphys.hyperoptplot.filter_by_string(filter_string)[source]

Receives a data_dict (a parsed trial) and a filter string, returns True if the trial passes the filter

filter string must have the format: key<operator>string where <operator> can be any of !=, =, >, <

# Arguments:
  • filter_string: the expresion to evaluate

# Returns:
  • filter_function: a function that takes a data_dict and

    returns true if the condition in filter_string passes

validphys.hyperoptplot.generate_dictionary(replica_path, loss_target, json_name='tries.json', starting_index=0, val_multiplier=0.5, fail_threshold=10.0)[source]

Reads a json file and returns a list of dictionaries

# Arguments:
  • replica_path: folder in which the tries.json file can be found

  • starting_index: if the trials are to be added to an already existing

    set, make sure the id has the correct index!

  • val_multiplier: validation multipler

  • fail_threhsold: threshold for the loss to consider a configuration as a failure

validphys.hyperoptplot.hyperopt_dataframe(commandline_args)[source]

Loads the data generated by running hyperopt and stored in json files into a dataframe, and then filters the data according to the selection criteria provided by the command line arguments. It then returns both the entire dataframe as well as a dataframe object with the hyperopt parametesr of the best setup.

validphys.hyperoptplot.hyperopt_table(hyperopt_dataframe)[source]

Generates a table containing complete information on all the tested setups that passed the filters set in the commandline arguments.

validphys.hyperoptplot.order_axis(df, bestdf, key)[source]

Helper function for ordering the axis and make sure the best is always first

validphys.hyperoptplot.parse_architecture(trial)[source]

This function parses the family of parameters which regards the architecture of the NN

number_of_layers activation_per_layer nodes_per_layer l1, l2, l3, l4… max_layers layer_type dropout initializer

validphys.hyperoptplot.parse_optimizer(trial)[source]

This function parses the parameters that affect the optimization

optimizer learning_rate (if it exists)

validphys.hyperoptplot.parse_statistics(trial)[source]

Parse the statistical information of the trial

validation loss testing loss status of the run

validphys.hyperoptplot.parse_stopping(trial)[source]

This function parses the parameters that affect the stopping

epochs stopping_patience pos_initial pos_multiplier

validphys.hyperoptplot.parse_trial(trial)[source]

Trials are very convoluted object, very branched inside The goal of this function is to separate said branching so we can create hierarchies

validphys.hyperoptplot.plot_activation_per_layer(hyperopt_dataframe)[source]

Generates a violin plot of the loss per activation function.

validphys.hyperoptplot.plot_clipnorm(hyperopt_dataframe, optimizer_name)[source]

Generates a scatter plot of the loss as a function of the clipnorm for a given optimizer.

validphys.hyperoptplot.plot_epochs(hyperopt_dataframe)[source]

Generates a scatter plot of the loss as a function the number of epochs.

validphys.hyperoptplot.plot_initializer(hyperopt_dataframe)[source]

Generates a violin plot of the loss per initializer.

validphys.hyperoptplot.plot_iterations(hyperopt_dataframe)[source]

Generates a scatter plot of the loss as a function of the iteration index.

validphys.hyperoptplot.plot_learning_rate(hyperopt_dataframe, optimizer_name)[source]

Generates a scatter plot of the loss as a function of the learning rate for a given optimizer.

validphys.hyperoptplot.plot_number_of_layers(hyperopt_dataframe)[source]

Generates a violin plot of the loss as a function of the number of layers of the model.

validphys.hyperoptplot.plot_optimizers(hyperopt_dataframe)[source]

Generates a violin plot of the loss per optimizer.

validphys.hyperoptplot.plot_scans(df, best_df, plotting_parameter, include_best=True)[source]

This function performs the plotting and is called by the plot_ functions in this file.

validphys.kinematics module

Provides information on the kinematics involved in the data.

Uses the PLOTTING file specification.

class validphys.kinematics.XQ2Map(experiment, commondata, fitted, masked, group)

Bases: tuple

commondata

Alias for field number 1

experiment

Alias for field number 0

fitted

Alias for field number 2

group

Alias for field number 4

masked

Alias for field number 3

validphys.kinematics.all_commondata_grouping(all_commondata, metadata_group)[source]

Return a table with the grouping specified by metadata_group key for each dataset for all available commondata.

validphys.kinematics.all_kinlimits_table(all_kinlimits, use_kinoverride: bool = True)[source]

Return a table with the kinematic limits for the datasets given as input in dataset_inputs. If the PLOTTING overrides are not used, the information on sqrt(k2) will be displayed.

validphys.kinematics.describe_kinematics(commondata, titlelevel: int = 1)[source]

Output a markdown text describing the stored metadata for a given commondata.

titlelevel can be used to control the header level of the title.

validphys.kinematics.kinematics_table(kinematics_table_notable)[source]

Same as kinematics_table_notable but writing the table to file

validphys.kinematics.kinematics_table_notable(commondata, cuts, show_extra_labels: bool = False)[source]

Table containing the kinematics of a commondata object, indexed by their datapoint id. The kinematics will be tranfsormed as per the PLOTTING file of the dataset or process type, and the column headers will be the labels of the variables defined in the metadata.

If show_extra_labels is True then extra label defined in the PLOTTING files will be displayed. Otherwise only the original three kinematics will be shown.

validphys.kinematics.kinlimits(commondata, cuts, use_cuts, use_kinoverride: bool = True)[source]

Return a mapping containing the number of fitted and used datapoints, as well as the label, minimum and maximum value for each of the three kinematics. If use_kinoverride is set to False, the PLOTTING files will be ignored and the kinematics will be interpred based on the process type only. If use_cuts is ‘CutsPolicy.NOCUTS’, the information on the total number of points will be displayed, instead of the fitted ones.

validphys.kinematics.total_fitted_points(all_kinlimits_table) int[source]

Print the total number of fitted points in a given set of data

validphys.kinematics.xq2map_with_cuts(commondata, cuts, group_name=None)[source]

Return two (x,Q²) tuples: one for the fitted data and one for the cut data. If display_cuts is false or all data passes the cuts, the second tuple will be empty.

validphys.lhaindex module

Created on Fri Jan 23 12:11:23 2015

@author: zah

validphys.lhaindex.as_from_name(name)[source]

Annoying function needed because this is not in the info files. as(M_z) there is actually as(M_ref).

validphys.lhaindex.expand_index_names(globstr)[source]
validphys.lhaindex.expand_local_names(globstr)[source]
validphys.lhaindex.expand_names(globstr)[source]

Return names of installed PDFs. If none is found, return names from index

validphys.lhaindex.finddir(name)[source]
validphys.lhaindex.get_collaboration(name)[source]
validphys.lhaindex.get_index_path(folder=None)[source]
validphys.lhaindex.get_indexes_to_names()[source]
validphys.lhaindex.get_lha_datapath()[source]

Return an existing datapath from LHAPDF, starting from the end. If no path is found to exist, recover the old behaviour and returns the last path.

The check for existence intends to solve problems where a previously filled LHAPATH or LHAPDF_DATA_PATH environment variable is pointing to a non-existent path or shared systems where LHAPDF might be compiled with hard-coded paths not available to all users.

validphys.lhaindex.get_names_to_indexes()[source]
validphys.lhaindex.get_pdf_indexes(name)[source]

Get index in the amc@nlo format

validphys.lhaindex.get_pdf_name(index)[source]
validphys.lhaindex.infofilename(name)[source]
validphys.lhaindex.isinstalled(name)[source]

Check that name exists in LHAPDF dir

validphys.lhaindex.parse_index(index_file)[source]
validphys.lhaindex.parse_info(name)[source]

validphys.lhapdf_compatibility module

Module for LHAPDF compatibility backends

If LHAPDF is installed, the module will transparently hand over everything to LHAPDF if LHAPDF is not available, it will try to use a combination of the packages

lhapdf-management and pdfflow

which cover all the features of LHAPDF used during the fit (and likely most of validphys)

validphys.lhapdf_compatibility.make_pdf(pdf_name, member=None)[source]

Load a PDF if member is given, load the single member otherwise, load the entire set as a list

if LHAPDF is provided, it returns LHAPDF PDF instances otherwise it returns and object which is _compatible_ with LHAPDF for lhapdf functions for the selected backend

Parameters:

pdf_name: str

name of the PDF to load

member: int

index of the member of the PDF to load

Returns:

list(pdf_sets)

validphys.lhapdfset module

Module containing an LHAPDF class compatible with validphys using the official lhapdf python interface.

The .members and .central_member of the LHAPDFSet are LHAPDF objects (the typical output from mkPDFs) and can be used normally.

Examples

>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> len(pdf.members)
101
>>> pdf.central_member.alphasQ(91.19)
0.11800
>>> pdf.members[0].xfxQ2(0.5, 15625)
{-5: 6.983360500601136e-05,
-4: 0.0021818063617227604,
-3: 0.00172453472243952,
-2: 0.0010906577230485718,
-1: 0.0022049272225017286,
1: 0.020051104853608722,
2: 0.0954139944889494,
3: 0.004116641378803191,
4: 0.002180124185625795,
5: 6.922722705177504e-05,
21: 0.007604124516892057}
class validphys.lhapdfset.LHAPDFSet(name, error_type)[source]

Bases: object

Wrapper for the lhapdf python interface.

Once instantiated this class will load the PDF set from LHAPDF. If it is a T0 set only the CV will be loaded.

property central_member

Returns a reference to member 0 of the PDF list

property flavors

Returns the list of accepted flavors by the LHAPDF set

grid_values(flavors: ndarray, xgrid: ndarray, qgrid: ndarray)[source]

Returns the PDF values for every member for the required flavours, points in x and pointx in q The return shape is

(members, flavors, xgrid, qgrid)

Return type:

ndarray of shape (members, flavors, xgrid, qgrid)

Examples

>>> import numpy as np
>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> xgrid = np.random.rand(10)
>>> qgrid = np.random.rand(3)
>>> flavs = np.arange(-4,4)
>>> flavs[4] = 21
>>> results = pdf.grid_values(flavs, xgrid, qgrid)
property is_t0

Check whether we are in t0 mode

property members

Return the members of the set the special error type t0 returns only member 0

property n_members

Return the number of active members in the PDF set

xfxQ(x, Q, n, fl)[source]

Return the PDF value for one single point for one single member If the flavour is not included in the PDF (for instance top/antitop) return 0.0

validphys.lhio module

A module that reads and writes LHAPDF grids.

validphys.lhio.big_matrix(gridlist)[source]

Return a properly indexes matrix of the differences between each member and the central value

validphys.lhio.generate_replica0(pdf, kin_grids=None, extra_fields=None)[source]
Generates a replica 0 as an average over an existing set of LHAPDF

replicas and outputs it to the PDF’s parent folder

Parameters:
  • pdf (validphys.core.PDF) – An existing validphys PDF object from which the average replica will be (re-)computed

  • kin_grids (Grids in (x,Q) used to print replica0 upon. If None, the grids) – of the source replicas are used.

validphys.lhio.hessian_from_lincomb(pdf, V, set_name=None, folder=None, extra_fields=None)[source]

Construct a new LHAPDF grid from a linear combination of members

validphys.lhio.load_all_replicas(pdf, db=None)[source]
validphys.lhio.load_replica(pdf, rep, kin_grids=None)[source]
validphys.lhio.new_pdf_from_indexes(pdf, indexes, set_name=None, folder=None, extra_fields=None, installgrid=False, use_rep0grid=False)[source]

Create a new PDF set from by selecting replicas from another one.

Parameters:
  • pdf (validphys.core.PDF) – An existng validphys PDF object from which the indexes will be selected.

  • indexes (Iterable[int]) – An iterable with integers corresponding to files in the LHAPDF set. Note that replica 0 will be calculated for you as the mean of the selected replicas.

  • set_name (str) – The name of the new PDF set.

  • folder (str, bytes, os.PathLike) – The path where the LHAPDF set will be written. Must exsist.

  • installgrid (bool, optional, default=``False``.) – Whether to copy the grid to the LHAPDF path.

  • use_rep0grid (bool, optional, default=``False``) – Whether to fill the original replica 0 grid when computing replica 0, instead of relying that all grids are the same and averaging the files directly. It is slower and will call LHAPDF to fill the grids, but works for sets where the replicas have different grids.

validphys.lhio.read_all_xqf(f)[source]
validphys.lhio.read_xqf_from_file(f)[source]
validphys.lhio.read_xqf_from_lhapdf(pdf, replica, kin_grids)[source]
validphys.lhio.rep_matrix(gridlist)[source]

Return a properly indexes matrix of all the members

validphys.lhio.split_sep(f)[source]
validphys.lhio.write_replica(rep, set_root, header, subgrids)[source]

validphys.loader module

Resolve paths to useful objects, and query the existence of different resources within the specified paths.

exception validphys.loader.CfactorNotFound[source]

Bases: LoadFailedError

exception validphys.loader.CompoundNotFound[source]

Bases: LoadFailedError

exception validphys.loader.CutsNotFound[source]

Bases: LoadFailedError

exception validphys.loader.DataNotFoundError[source]

Bases: LoadFailedError

exception validphys.loader.EkoNotFound[source]

Bases: LoadFailedError

exception validphys.loader.FKTableNotFound[source]

Bases: LoadFailedError

class validphys.loader.FallbackLoader(profile=None)[source]

Bases: Loader, RemoteLoader

A loader that first tries to find resources locally (calling Loader.check_*) and if it fails, it tries to download them (calling RemoteLoader.download_*).

make_checker(resource)[source]
exception validphys.loader.FitNotFound[source]

Bases: LoadFailedError

exception validphys.loader.HyperscanNotFound[source]

Bases: LoadFailedError

exception validphys.loader.InconsistentMetaDataError[source]

Bases: LoaderError

exception validphys.loader.LoadFailedError[source]

Bases: FileNotFoundError, LoaderError

class validphys.loader.Loader(profile=None)[source]

Bases: LoaderBase

Load various resources from the NNPDF data path.

property available_datasets

Provide all available datasets that were available before the new commondata was implemented and that have a translation. Returns old names

TODO: This should be substituted by a subset of implemented_dataset that returns only complete datasets.

property available_ekos

Return a string token for each of the available theories

property available_fits
property available_hyperscans
property available_pdfs
property available_theories

Return a string token for each of the available theories

check_cfactor(theoryID, setname, cfactors)[source]
check_commondata(setname, sysnum=None, use_fitcommondata=False, fit=None, variant=None)[source]

Prepare the commondata files to be loaded. A commondata is defined by its name (setname) and the variant(s) (variant)

At the moment both old-format and new-format commondata can be utilized and loaded however old-format commondata are deprecated and will be removed in future relases.

The function parse_dataset_input in config.py translates all known old commondata into their new names (and variants), therefore this function should only receive requestes for new format.

Any actions trying to requests an old-format commondata from this function will log an error message. This error message will eventually become an actual error.

check_compound(theoryID, setname, cfac)[source]
check_dataset(name, *, rules=None, sysnum=None, theoryid, cfac=(), frac=1, cuts=CutsPolicy.INTERNAL, use_fitcommondata=False, fit=None, weight=1, variant=None)[source]

Loads a given dataset If the dataset contains new-type fktables, use the pineappl loading function, otherwise fallback to legacy

check_default_filter_rules(theoryid, defaults=None)[source]
check_eko(theoryID)[source]

Check the eko (and the parent theory) both exists and returns the path to it

check_experiment(name: str, datasets: list[DataSetSpec]) DataGroupSpec[source]

Loader method for instantiating DataGroupSpec objects. The NNPDF::Experiment object can then be instantiated using the load method.

Parameters:
  • name (str) – A string denoting the name of the resulting DataGroupSpec object.

  • dataset (List[DataSetSpec]) – A list of DataSetSpec objects pre-created by the user. Note, these too will be loaded by Loader.

Return type:

DataGroupSpec

Example

>>> from validphys.loader import Loader
>>> l = Loader()
>>> ds = l.check_dataset("NMC", theoryid=53, cuts="internal")
>>> exp = l.check_experiment("My DataGroupSpec Name", [ds])
check_fit(fitname)[source]
check_fit_cuts(commondata, fit)[source]
check_fk_from_theory_metadata(theory_metadata, theoryID, cfac=None)[source]

Load a pineappl fktable in the new commondata forma Receives a theory metadata describing the fktables necessary for a given observable the theory ID and the corresponding cfactors. The cfactors should correspond directly to the fktables, the “compound folder” is not supported for pineappl theories. As such, the name of the cfactor is expected to be

CF_{cfactor_name}_{fktable_name}

check_fktable(theoryID, setname, cfac)[source]
check_hyperscan(hyperscan_name)[source]

Obtain a hyperscan run

check_integset(theoryID, setname, postlambda, rules)[source]

Load an integrability dataset

check_internal_cuts(commondata, rules)[source]
check_pdf(name)[source]
check_posset(theoryID, setname, postlambda, rules)[source]

Load a positivity dataset

check_theoryID(theoryID)[source]
check_vp_output_file(filename, extra_paths=('.',))[source]

Find a file in the vp-cache folder, or (with higher priority) in the extra_paths.

property commondata_folder
get_commondata(setname, sysnum)[source]

Get a Commondata from the set name and number.

get_fktable(theoryID, setname, cfac)[source]
get_pdf(name)[source]
get_posset(theoryID, setname, postlambda, rules)[source]
property implemented_datasets

Provide all implemented datasets that can be found in the datafiles folder regardless of whether they can be used for fits (i.e., whether they include a theory), are “fake” (integrability/positivity) or are missing some information.

property theorydb_folder

Checks theory db file exists and returns path to it

class validphys.loader.LoaderBase(profile=None)[source]

Bases: object

Base class for the NNPDF loader. It can take as input a profile dictionary from which all data can be read. It is possible to override the datapath and resultpath when the class is instantiated.

property hyperscan_resultpath
exception validphys.loader.LoaderError[source]

Bases: Exception

exception validphys.loader.PDFNotFound[source]

Bases: LoadFailedError

exception validphys.loader.ProfileNotFound[source]

Bases: LoadFailedError

class validphys.loader.RemoteLoader(profile=None)[source]

Bases: LoaderBase

download_eko(thid)[source]

Download the EKO for a given theory ID

download_fit(fitname)[source]
download_hyperscan(hyperscan_name)[source]

Download a hyperscan run from the remote server Downloads the run to the results folder

download_pdf(name)[source]
download_theoryID(thid)[source]
download_vp_output_file(filename, **kwargs)[source]
property downloadable_ekos
property downloadable_fits
property downloadable_hyperscans
property downloadable_pdfs
property downloadable_theories
property eko_index
property eko_urls
property fit_index
property fit_urls
property hyperscan_index
property hyperscan_url
property lhapdf_pdfs
property lhapdf_urls
property nnpdf_pdfs
property nnpdf_pdfs_index
property nnpdf_pdfs_urls
property remote_ekos
remote_files(urls, index, thing='files')[source]
property remote_fits
property remote_hyperscans
property remote_keywords
property remote_nnpdf_pdfs
property remote_theories
property theory_index
property theory_urls
exception validphys.loader.RemoteLoaderError[source]

Bases: LoaderError

exception validphys.loader.SysNotFoundError[source]

Bases: LoadFailedError

exception validphys.loader.TheoryDataBaseNotFound[source]

Bases: LoadFailedError

exception validphys.loader.TheoryMetadataNotFound[source]

Bases: LoadFailedError

exception validphys.loader.TheoryNotFound[source]

Bases: LoadFailedError

validphys.loader.download_and_extract(url, local_path, target_name=None)[source]

Download a compressed archive and then extract it to the given path

validphys.loader.download_file(url, stream_or_path, make_parents=False, delete_on_failure=False)[source]

Download a file and show a progress bar if the INFO log level is enabled. If make_parents is True stream_or_path is path-like, all the parent folders will be created.

validphys.mc2hessian module

mc2hessian.py

This module containts the functionality to compute reduced set using the mc2hessian algorithm (See section 2.1 of of 1602.00005).

validphys.mc2hessian.gridname(pdf, Neig, mc2hname: (<class 'str'>, <class 'NoneType'>) = None)[source]

If no custom `mc2hname’ is specified, the name of the Hessian PDF is automatically generated.

validphys.mc2hessian.mc2hessian(pdf, Q, Neig: int, mc2hessian_xgrid, output_path, gridname, installgrid: bool = False)[source]

Produces a Hessian PDF by transfroming a Monte Carlo PDF set.

Parameters:
  • pdf (validphys.core.PDF) – An existng validphys PDF object which will be converted into a Hessian PDF set

  • Q (float) – Energy scale at which the Monte Carlo PDF is sampled

  • Neig (int) – Number of basis eigenvectors in the Hessian PDF set

  • mc2hessian_xgrid (numpy.ndarray) – The points in x at which to sample the Monte Carlo PDF set

  • path (output) – The validphys output path where the PDF will be written

  • gridname (str) – Name of the Hessian PDF set

  • installgrid (bool, optional, default=``False``) – Whether to copyt the Hessian grid to the LHAPDF path

validphys.mc2hessian.mc2hessian_xgrid(xmin: float = 1e-05, xminlin: float = 0.1, xmax: Real = 1, nplog: int = 50, nplin: int = 50)[source]

Provides the points in x to sample the PDF. logspace and linspace will be called with the respsctive parameters.

Generates a grid with nplog logarithmically spaced points between xmin and xminlin followed by nplin linearly spaced points between xminlin and xmax

validphys.mc_gen module

mc_gen.py

Tools to check the pseudo-data MC generation.

validphys.mc_gen.art_data_comparison(art_rep_generation, nreplica: int)[source]

Plots per datapoint of the distribution of replica values.

validphys.mc_gen.art_data_distribution(art_rep_generation, title='Artificial Data Distribution', color='green')[source]

Plot of the distribution of pseudodata.

validphys.mc_gen.art_data_mean_table(art_rep_generation, groups_data)[source]

Generate table for artdata mean values

validphys.mc_gen.art_data_moments(art_rep_generation, color='green')[source]

Returns the moments of the distributions per data point, as a histogram.

validphys.mc_gen.art_data_residuals(art_rep_generation, color='green')[source]

Plot the residuals distribution of pseudodata compared to experiment.

validphys.mc_gen.art_rep_generation(groups_data, make_replicas)[source]

Generates the nreplica pseudodata replicas

validphys.mc_gen.one_art_data_residuals(groups_data, indexed_make_replicas)[source]

Residuals plot for the first datapoint.

validphys.n3fit_data module

n3fit_data.py

Providers which prepare the data ready for n3fit.performfit.performfit().

validphys.n3fit_data.fittable_datasets_masked(data, tr_masks)[source]

Generate a list of validphys.n3fit_data_utils.FittableDataSet from a group of dataset and the corresponding training/validation masks

validphys.n3fit_data.fitting_data_dict(data, make_replica, dataset_inputs_loaded_cd_with_cuts, dataset_inputs_fitting_covmat, tr_masks, kfold_masks, fittable_datasets_masked, diagonal_basis=None)[source]

Provider which takes the information from validphys data.

Returns:

all_dict_out – Containing all the information of the experiment/dataset for training, validation and experimental With the following keys:

’datasets’

list of dictionaries for each of the datasets contained in data

’name’

name of the data - typically experiment/group name

’expdata_true’

non-replica data

’covmat’

full covmat

’invcovmat_true’

inverse of the covmat (non-replica)

’trmask’

mask for the training data

’invcovmat’

inverse of the covmat for the training data

’ndata’

number of datapoints for the training data

’expdata’

experimental data (replica’d) for training

’vlmask’

(same as above for validation)

’invcovmat_vl’

(same as above for validation)

’ndata_vl’

(same as above for validation)

’expdata_vl’

(same as above for validation)

’positivity’

bool - is this a positivity set?

’count_chi2’

should this be counted towards the chi2

Return type:

dict

validphys.n3fit_data.integdatasets_fitting_integ_dict(integdatasets=None)[source]

Loads the integrability datasets. Calls same function as fitting_pos_dict(), except on each element of integdatasets if integdatasets is not None.

Parameters:

integdatasets (list[validphys.core.IntegrabilitySetSpec]) – list containing the settings for the integrability sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

Examples

>>> from validphys.api import API
>>> integdatasets = [{"dataset": "INTEGXT3", "maxlambda": 1e2}]
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=integdatasets, theoryid=53)
>>> len(res), len(res[0])
(1, 9)
>>> res = API.integdatasets_fitting_integ_dict(integdatasets=None)
>>> print(res)
None
validphys.n3fit_data.kfold_masks(kpartitions, data)[source]

Collect the masks (if any) due to kfolding for this data. These will be applied to the experimental data before starting the training of each fold.

Parameters:
  • kpartitions (list[dict]) – list of partitions, each partition dictionary with key-value pair datasets and a list containing the names of all datasets in that partition. See n3fit/runcards/Basic_hyperopt.yml for an example runcard or the hyperopt documentation for an expanded discussion on k-fold partitions.

  • data (validphys.core.DataGroupSpec) – full list of data which is to be partitioned.

Returns:

kfold_masks – A list containing a boolean array for each partition. Each array is a 1-D boolean array with length equal to the number of cut datapoints in data. If a dataset is included in a particular fold then the mask will be True for the elements corresponding to those datasets such that data.load().get_cv()[kfold_masks[i]] will return the datapoints in the ith partition. See example below.

Return type:

list[np.array]

Examples

>>> from validphys.api import API
>>> partitions=[
...     {"datasets": ["HERACOMBCCEM", "HERACOMBNCEP460", "NMC", "NTVNBDMNFe"]},
...     {"datasets": ["HERACOMBCCEP", "HERACOMBNCEP575", "NMCPD", "NTVNUDMNFe"]}
... ]
>>> ds_inputs = [{"dataset": ds} for part in partitions for ds in part["datasets"]]
>>> kfold_masks = API.kfold_masks(dataset_inputs=ds_inputs, kpartitions=partitions, theoryid=53, use_cuts="nocuts")
>>> len(kfold_masks) # one element for each partition
2
>>> kfold_masks[0] # mask which splits data into first partition
array([False, False, False, ...,  True,  True,  True])
>>> data = API.data(dataset_inputs=ds_inputs, theoryid=53, use_cuts="nocuts")
>>> fold_data = data.load().get_cv()[kfold_masks[0]]
>>> len(fold_data)
604
>>> kfold_masks[0].sum()
604
validphys.n3fit_data.posdatasets_fitting_pos_dict(posdatasets=None)[source]

Loads all positivity datasets. It is not allowed to be empty.

Parameters:

integdatasets (list[validphys.core.PositivitySetSpec]) – list containing the settings for the positivity sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to dataset_input.

validphys.n3fit_data.pseudodata_table(groups_replicas_indexed_make_replica, replicas)[source]

Creates a pandas DataFrame containing the generated pseudodata. The index is validphys.results.experiments_index() and the columns are the replica numbers.

Notes

Whilst running n3fit, this action will only be called if fitting::savepseudodata is true (as per the default setting) and replicas are fitted one at a time. The table can be found in the replica folder i.e. <fit dir>/nnfit/replica_*/

validphys.n3fit_data.replica_luxseed(replica, luxseed)[source]

Generate the luxseed for a replica. Identical to replica_nnseed but used for a different purpose.

validphys.n3fit_data.replica_mcseed(replica, mcseed, genrep)[source]

Generates the mcseed for a replica.

validphys.n3fit_data.replica_nnseed(replica, nnseed)[source]

Generates the nnseed for a replica.

validphys.n3fit_data.replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nnseed)[source]

For a single replica return a tuple of the inputs to this function. Used with collect over replicas to avoid having to perform multiple collects.

See also

replicas_nnseed_fitting_data_dict, over

validphys.n3fit_data.replica_training_mask(exps_tr_masks, replica, experiments_index)[source]

Save the boolean mask used to split data into training and validation for a given replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters:
  • exps_tr_masks (list[list[np.array]]) – Result of tr_masks() collected over experiments, which creates the nested structure. The outer list is len(group_dataset_inputs_by_experiment) and the inner-most list has an array for each dataset in that particular experiment - as defined by the metadata. The arrays should be 1-D boolean arrays which can be used as masks.

  • replica (int) – The index of the replica.

  • experiments_index (pd.MultiIndex) – Index returned by validphys.results.experiments_index().

Example

>>> from validphys.api import API
>>> ds_inp = [
...     {'dataset': 'NMC', 'frac': 0.75},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75}
... ]
>>> API.replica_training_mask(dataset_inputs=ds_inp, replica=1, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False)
                     replica 1
group dataset    id
NMC   NMC        0        True
                1        True
                2       False
                3        True
                4        True
...                        ...
CMS   CMSZDIFF12 45       True
                46       True
                47       True
                48      False
                49       True

[345 rows x 1 columns]

validphys.n3fit_data.replica_training_mask_table(replica_training_mask)[source]

Same as replica_training_mask but with a table decorator.

validphys.n3fit_data.replica_trvlseed(replica, trvlseed, same_trvl_per_replica=False)[source]

Generates the trvlseed for a replica.

validphys.n3fit_data.tr_masks(data, replica_trvlseed, parallel_models=False, replica=1, replicas=(1,))[source]

Generate the boolean masks used to split data into training and validation points. Returns a list of 1-D boolean arrays, one for each dataset. Each array has length equal to N_data, the datapoints which will be included in the training are True such that

tr_data = data[tr_mask]

validphys.n3fit_data.training_mask(replicas_training_mask)[source]

Save the boolean mask used to split data into training and validation for each replica as a pandas DataFrame, indexed by validphys.results.experiments_index(). Can be used to reconstruct the training and validation data used in a fit.

Parameters:

replicas_exps_tr_masks (list[list[list[np.array]]]) – Result of replica_tr_masks() collected over replicas

Example

>>> from validphys.api import API
>>> from reportengine.namespaces import NSList
>>> # create namespace list for collects over replicas.
>>> reps = NSList(list(range(1, 4)), nskey="replica")
>>> ds_inp = [
...     {'dataset': 'NMC', 'frac': 0.75},
...     {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75},
...     {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75}
... ]
>>> API.training_mask(dataset_inputs=ds_inp, replicas=reps, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False)
                    replica 1  replica 2  replica 3
group dataset    id
NMC   NMC        0        True      False      False
                1        True       True       True
                2       False       True       True
                3        True       True      False
                4        True       True       True
...                        ...        ...        ...
CMS   CMSZDIFF12 45       True       True       True
                46       True      False       True
                47       True       True       True
                48      False       True       True
                49       True       True       True

[345 rows x 3 columns]

validphys.n3fit_data.training_mask_table(training_mask)[source]

Same as training_mask but with a table decorator

validphys.n3fit_data.training_pseudodata(pseudodata_table, training_mask)[source]

Save the training data for the given replica. Deactivate by setting fitting::savepseudodata: False from within the fit runcard.

validphys.n3fit_data.validation_pseudodata(pseudodata_table, training_mask)[source]

Save the training data for the given replica. Deactivate by setting fitting::savepseudodata: False from within the fit runcard.

validphys.n3fit_data_utils module

n3fit_data_utils.py

This module reads validphys validphys.core.DataSetSpec and extracts the relevant information into validphys.n3fit_data_utils.FittableDataSet

The validphys_group_extractor will loop over every dataset of a given group loading their fktables (and applying any necessary cuts).

class validphys.n3fit_data_utils.FittableDataSet(name: str, fktables_data: list, operation: str = 'NULL', frac: float = 1.0, training_mask: ndarray = None)[source]

Bases: object

Representation of the DataSet information necessary to run a fit

Parameters:
  • name (str) – name of the dataset

  • fktables_data (list(validphys.coredata.FKTableData)) – list of coredata fktable objects

  • operation (str) – operation to be applied to the fktables in the dataset, default “NULL”

  • frac (float) – fraction of the data to enter the training set

  • training_mask (bool) – training mask to apply to the fktable

fktables()[source]

Return the list of fktable tensors for the dataset

fktables_data: list
frac: float = 1.0
property hadronic

Returns true if this is a hadronic collision dataset

name: str
property ndata

Number of datapoints in the dataset

operation: str = 'NULL'
training_fktables()[source]

Return the fktable tensors for the trainig data

training_mask: ndarray = None
validation_fktables()[source]

Return the fktable tensors for the validation data

validphys.n3fit_data_utils.validphys_group_extractor(datasets, tr_masks)[source]

Receives a grouping spec from validphys (most likely an experiment) and loops over its content extracting and parsing all information required for the fit

Parameters:
  • datasets (list(validphys.core.DataSetSpec)) – List of dataset specs in this group

  • tr_masks (list(np.array)) – List of training masks to be set for each dataset

Returns:

loaded_obs

Return type:

list (validphys.n3fit_data_utils.FittableDataSet)

validphys.overfit_metric module

overfit_metric.py

This module contains the functions used to calculate the overfit metric and produce the corresponding tables and figures.

validphys.overfit_metric.array_expected_overfitting(calculate_chi2s_per_replica, replica_data, number_of_resamples=1000, resampling_fraction=0.95)[source]

Calculates the expected difference in chi2 between: 1. The chi2 of a PDF replica calculated using the corresponding pseudodata

replica used during the fit

  1. The chi2 of a PDF replica calculated using an alternative i.i.d random

    pseudododata replicas

The expected difference along with an error estimate is obtained through a bootstrapping consisting of number_of_resamples resamples per pdf replica where each resampling contains a fraction resampling_fraction of all replicas.

Parameters:
  • calculate_chi2s_per_replica (np.ndarray) – validation chi2 per pdf replica

  • replica_data (list(vp.fitdata.FitInfo))

  • number_of_resamples (int, optional) – number of resamples per pdf replica, by default 1000

  • resampling_fraction (float, optional) – fraction of replicas used in the bootstrap resampling, by default 0.95

Returns:

(number_of_resamples*Npdfs,) sized array containing the mean delta chi2 values per resampled list.

Return type:

np.ndarray

validphys.overfit_metric.calculate_chi2s_per_replica(pdf, fit_code_version, recreate_pdf_pseudodata_no_table, preds, dataset_inputs, groups_covmat_no_table)[source]

Calculates, for each PDF replica, the chi2 of the validation with the pseudodata generated for all other replicas in the fit

Parameters:
  • recreate_pdf_pseudodata_no_table (list[namedtuple]) – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.

  • preds (list[pd.core.frame.DataFrame]) – List of pandas dataframes, each containing the predictions of the pdf replicas for a dataset_input

  • dataset_inputs (list[DataSetInput])

  • groups_covmat_no_table (pdf.core.frame.DataFrame)

Returns:

(Npdfs, Npdfs) sized matrix containing the chi2 of a pdf replica calculated to a given psuedodata replica. The diagonal values correspond to the cases where the PDF replica has been fitted to the coresponding pseudodata replica

Return type:

np.ndarray

validphys.overfit_metric.fit_overfitting_summary(fit, array_expected_overfitting)[source]

Creates a table containing the overfitting information: - mean chi2 difference - bootstrap error - sigmas away from 0

validphys.overfit_metric.plot_overfitting_histogram(fit, array_expected_overfitting)[source]

Plots the bootrap error and central value of the overfittedness in a historgram

validphys.overfit_metric.summarise_overfitting(fits_overfitting_summary)[source]

Same as fit_overfitting_summary, but collected over all fits in the runcard and put in a single table.

validphys.pdfbases module

pdfbases.py

This holds the concrete labels data relative to the PDF bases, as declaratively as possible.

class validphys.pdfbases.Basis(labels, *, aliases=None, default_elements=None, element_representations=None)[source]

Bases: ABC

A Basis maps a set of PDF flavours (typically as given by LHAPDF) to functions thereof. This abstract class provides functionalities to manage labels (used for plotting) and defaults, while the concrete implementation of the transformations is handled by the subclasses (by implementing the validphys.pdfbases.Basis.apply_grid_values() method). The high level validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() methods then provide convenient functionality to work with transformations.

labels

A list of strings representing the labels of each possible transformation, in order.

Type:

list

aliases

A mapping from strings to labels appearing in labels, specifying equivalent ways to enter elements in the user interface.

Type:

dict, optional

default_elements

A list of the labels to be computed by default when no subset of elements is specified. If not given it is assumed to be the same as labels.

Type:

list, optional

element_representations

A mapping from strings to labels indicating the preferred string representation of the provided elements (to be used in plotting). If this parameter is not given or the element is not in the mapping, the label itself is used. It may be convenient to set this when heavy use of LaTeX is desired.

Type:

dict, optional

abstract apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters:
  • func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.

  • vmat (iterable) – A list of flavour aliases valid for the basis.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

central_grid_values(pdf, vmat, xmat, qmat)[source]

Same as Basis.grid_values() but returning information on the central member of the PDF set.

elementlabel(element)[source]

Return the printable representation of a given element of this basis.

grid_values(pdf, vmat, xmat, qmat)[source]

Like validphys.gridvalues.grid_values(), but taking and returning vmat in terms of the vectors in this base.

Parameters:
  • pdf (PDF) – Any PDF set

  • vmat (iterable) – A list of flavour aliases valid for the basis.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

Returns:

grid – A 4-dimension array with the PDF values at the input parameters for each replica. The return value is indexed as follows:

grid_values[replica][flavour][x][Q]

Return type:

np.ndarray

Examples

Compute the median ratio over replicas between singlet and gluon for a fixed point in x and a range of values in Q:

>>> import numpy as np
>>> from validphys.loader import Loader
>>> from validphys.pdfbases import evolution
>>> gv = evolution.grid_values(Loader().check_pdf("NNPDF31_nnlo_as_0118"), ["singlet", "gluon"], [0.01], [2,20,200])
>>> np.median(gv[:,0,...]/gv[:,1,...], axis=0)
array([[0.56694959, 0.53782002, 0.60348812]])
has_element(element)[source]

Return true if basis has knowledge of the given element

to_known_elements(vmat)[source]

Transform the list of aliases into an array of known labels. Raise UnknownElement on failure.

class validphys.pdfbases.LinearBasis(labels, from_flavour_mat, *args, **kwargs)[source]

Bases: Basis

A basis that implements a linear transformation of flavours.

from_flavour_mat

A matrix that rotates the flavour basis into this basis.

Type:

np.ndarray

apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters:
  • func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.

  • vmat (iterable) – A list of flavour aliases valid for the basis.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

classmethod from_mapping(mapping, *, aliases=None, default_elements=None)[source]

Construct a basus from a mapping of the form {label:{pdf_flavour:coefficient}}.

class validphys.pdfbases.ScalarFunctionTransformation(transform_func, *args, **kwargs)[source]

Bases: Basis

A basis that transforms the flavour basis into a single element given by transform_func.

Optional keyword arguments are passed to the constructor of validphys.pdfbases.Basis.

transform_func

A callable with the signature transform_func(func, xmat, qmat) that fills the grid in \(x\) and \(Q\) using func and returns a grid with a single basis element.

Type:

callable

apply_grid_values(func, vmat, xmat, qmat)[source]

Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to func and implements the transformation from the flavour basis to the basis. Methods like validphys.pdfbases.Basis.grid_values() and validphys.pdfbases.Basis.central_grid_values() are derived from this method by selecting the appropriate func.

It should return an array indexed as

grid_values[N][flavour][x][Q]

Parameters:
  • func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.

  • vmat (iterable) – A list of flavour aliases valid for the basis.

  • xmat (iterable) – A list of x values

  • qmat (iterable) – A list of values in Q, expressed in GeV.

exception validphys.pdfbases.UnknownElement[source]

Bases: KeyError

validphys.pdfbases.check_basis(basis, flavours)[source]

Check to verify a given basis and set of flavours. Returns a dictionary with the relevant instance of the basis class and flavour specification

validphys.pdfbases.fitbasis_to_NN31IC(flav_info, fitbasis)[source]

Return a rotation matrix R_{ij} which takes from one of the possible fitting basis (evolution, NN31IC, FLAVOUR) to the NN31IC basis, (sigma, g, v, v3, v8, t3, t8, cp), corresponding to the one used in NNPDF31. Denoting the rotation matrix as R_{ij} i is the flavour index and j is the evolution index. The evolution basis (NN31IC) is defined as cp = c + cbar = 2c and sigma = u + ubar + d + dbar + s + sbar + cp v = u - ubar + d - dbar + s - sbar + c - cbar v3 = u - ubar - d + dbar v8 = u - ubar + d - dbar - 2*s + 2*sbar t3 = u + ubar - d - dbar t8 = u + ubar + d + dbar - 2*s - 2*sbar

If the input is already in the evolution basis it returns the identity.

Parameters:
  • flav_info (dict) – dictionary containing the information about each PDF (basis dictionary in the runcard)

  • fitbasis (str) – name of the fitting basis

Returns:

mat.transpose() – matrix performing the change of basis from fitbasis to NN31IC

Return type:

numpy matrix

validphys.pdfbases.list_bases()[source]

List available PDF bases

validphys.pdfbases.parse_flarr(flarr)[source]

Parse a free form list into a list of PDG parton indexes (that may contain indexes or values from PDF_ALIASES)

validphys.pdfbases.pdg_id_to_canonical_index(flindex)[source]

Given an LHAPDF id, return its index in the ALL_FLAVOURS list.

validphys.pdfbases.scalar_function_transformation(label, *args, **kwargs)[source]

Convenience decorator factory to produce a validphys.pdfbases.ScalarFunctionTransformation basis from a function.

Parameters:

label (str) – The single label of the element produced by the function transformation.

Notes

Optional keyword arguments are passed to the constructor of validphys.pdfbases.ScalarFunctionTransformation.

Returns:

decorator – A decorator that can be applied to a suitable transformation function.

Return type:

callable

validphys.pdfgrids module

High level providers for PDF and luminosity grids, formatted in such a way to facilitate plotting and analysis.

class validphys.pdfgrids.KineticXPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]

Bases: XPlottingGrid

Kinetic Energy version of the XPlottingGrid

derivative()[source]

Return the derivative of the grid with respect to dlogx A call to this function will return a new XPlottingGrid instance with the derivative as grid values and with an increased derivative_degree

process_label(base_label)[source]

Wraps the base_label inside the kinetic energy formula

class validphys.pdfgrids.Lumi1dGrid(m, grid_values)

Bases: tuple

grid_values

Alias for field number 1

m

Alias for field number 0

class validphys.pdfgrids.Lumi2dGrid(y, m, grid_values)

Bases: tuple

grid_values

Alias for field number 2

m

Alias for field number 1

y

Alias for field number 0

class validphys.pdfgrids.XPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]

Bases: object

DataClass holding the value of the PDF at the specified values of x, Q and flavour. The grid_values attribute corresponds to a Stats instance in order to compute statistical estimators in a sensible manner.

Q: float
basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>)
copy_grid(grid_values)[source]

Create a copy of the grid with potentially a different set of values

derivative()[source]

Return the derivative of the grid with respect to dlogx A call to this function will return a new XPlottingGrid instance with the derivative as grid values and with an increased derivative_degree

derivative_degree: int = 0
flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>)
grid_values: Stats
process_label(base_label)[source]

Process the base_label used for plotting. For instance, for derivatives it will add d/dlogx to the base_label.

scale: str
select_flavour(flindex)[source]

Return a new grid for one single flavour

xgrid: ndarray
validphys.pdfgrids.boundary_xplotting_grid(unpolarized_bc: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

A wrapper around xplotting_grid to compute instead unpolarized_bcs.

validphys.pdfgrids.distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]

Return an object containing the value of the distance PDF at the specified values of x and flavour.

The parameter normalize_to identifies the reference PDF set with respect to the distance is computed.

This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.

validphys.pdfgrids.kinetic_xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]

Returns an object containing the value of the kinetic energy of the PDF at the specified values of x and flavour for a given Q. Utilizes xplotting_grid The kinetic energy of the PDF is defined as:

\[k = \sqrt{1 + (d/dlogx f)^2}\]
validphys.pdfgrids.lumigrid1d(pdf: ~validphys.core.PDF, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'NoneType'>, <class 'numbers.Real'>) = None, nbins_m: int = 50, mxmin: ~numbers.Real = 10, mxmax: (<class 'NoneType'>, <class 'numbers.Real'>) = None, scale='log')[source]

Return the integrated luminosity in a grid of nbins_m points, for the values of invariant mass given (proton-proton) collider energy sqrts (given in GeV). A rapidity cut on the integration range (if specified) is taken into account.

By default, the grid is sampled logarithmically in mass. The limits are given by mxmin and mxmax, given in GeV. By default mxmin is 10 GeV and mxmax is set based on sqrts.

The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.

validphys.pdfgrids.lumigrid2d(pdf: PDF, lumi_channel, sqrts: Real, y_lim: Real = 5, nbins_m: int = 100, nbins_y: int = 50)[source]

Return the differential luminosity in a grid of (nbins_m x nbins_y) points, for the allowed values of invariant mass and rpidity for given (proton-proton) collider energy sqrts (given in GeV). y_lim specifies the maximum rapidy.

The grid is sampled linearly in rapidity and logarithmically in mass.

The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.

validphys.pdfgrids.pull_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]

Return an object containing the value of the pull between the two PDFs at the specified values of x and flavour. The parameter normalize_to identifies the reference PDF set with respect to the pull is computed. This method returns pull grids where the relative pull between both PDF sets, defined as the distance in terms of the standard deviations of the reference PDF, is computed. At least one grid will be identical to zero.

validphys.pdfgrids.variance_distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]

Return an object containing the value of the variance distance PDF at the specified values of x and flavour.

The parameter normalize_to identifies the reference PDF set with respect to the distance is computed.

This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.

validphys.pdfgrids.xgrid(xmin: Real = 1e-05, xmax: Real = 1, scale: str = 'log', npoints: int = 200)[source]

Return a tuple (scale, array) where scale is the input scale (“linear” or “log”) and array is generated from the input parameters and distributed according to scale.

validphys.pdfgrids.xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, derivative: int = 0)[source]

Return an object containing the value of the PDF at the specified values of x and flavour.

basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.

flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.

Q: The PDF scale in GeV.

derivative (int): how many derivtives of the PDF should be taken (default=0)

validphys.pdfoutput module

pdfoutput.py

reportengine helpers to enable outputing PDFs.

This module provides one decorator, pdfset that is used to mark a provider as generating a PDF set. The providers must take a set_name and an output_path argument. set_name will be required to be a unique string that does not correspond to any installed LHAPDF grid, and output_path will be modified to actually correspond to <output>/pdfsets. Within reportengine, the return value of the sets marked with @pdfset will be discarded, and the relative path to the output folder will be used instead. This can be used to formulate links within the report.

validphys.pdfoutput.pdfset(f)[source]

Mark the function as returning a PDF set. Make sure that providers marked with this decorator take set_name and output_path as arguments.

validphys.pdfplots module

pdfplots.py

Plots of quantities that are mostly functions of the PDFs only.

class validphys.pdfplots.AllFlavoursPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Auxiliary class which groups multiple PDF flavours in one plot.

get_ylabel(parton_name)[source]
setup_flavour(flstate)[source]
class validphys.pdfplots.BandPDFPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

legend(flstate)[source]
setup_flavour(flstate)[source]
class validphys.pdfplots.BandPDFPlotterBC(*args, unpolarized_bcs, boundary_xplotting_grids, **kwargs)[source]

Bases: BandPDFPlotter

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.DistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

Auxiliary class which draws the distance plots.

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
normalize()[source]
class validphys.pdfplots.FlavourState[source]

Bases: SimpleNamespace

This is the namespace for the pats specific for each flavour

class validphys.pdfplots.FlavoursDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: DistancePDFPlotter, AllFlavoursPlotter

class validphys.pdfplots.FlavoursPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]

Bases: AllFlavoursPlotter, BandPDFPlotter

get_title(parton_name)[source]
class validphys.pdfplots.FlavoursVarDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: VarDistancePDFPlotter, AllFlavoursPlotter

class validphys.pdfplots.MixBandPDFPlotter(*args, mixband_as_replicas, **kwargs)[source]

Bases: BandPDFPlotter

Special wrapper class to plot, in the same figure, PDF bands and PDF replicas depending on the type of PDF. Practical use: plot together the PDF central values with the NNPDF bands

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.PDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: object

Stateful object breaks plotting grids by favour, as a function of x and for fixed Q.

This class has a lot of state, but it should all be defined at initialization time. Things that change e.g. per flavour should be passed explicitly as arguments.

property Q
abstract draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

property firstgrid
get_title(parton_name)[source]
get_ylabel(parton_name)[source]
legend(flstate)[source]
normalize()[source]
property normalize_pdf
setup_flavour(flstate)[source]
property xscale
class validphys.pdfplots.PullPDFPlotter(pdfs_list, pull_grids_list, xscale, normalize_to, ymin, ymax)[source]

Bases: object

Auxiliary class which groups multiple pulls in one plot.

pdfs_list is a list of dictionaries, each containing the two PDFs to be used for the pull. pull_grids_list is the list of the pull computed for the PDF pairs described by pdfs_list.

property Q
draw(pdfs, grid, flstate)[source]
get_title(flstate)[source]
get_ylabel()[source]
legend(flstate)[source]
plot_call()[source]
property xscale
class validphys.pdfplots.ReplicaPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

class validphys.pdfplots.UncertaintyPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: PDFPlotter

draw(pdf, grid, flstate)[source]

Plot the desired function of the grid and return the array to be used for autoscaling

get_ylabel(parton_name)[source]
class validphys.pdfplots.VarDistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]

Bases: DistancePDFPlotter

Auxiliary class which draws the variance distance plots

get_title(parton_name)[source]
get_ylabel(parton_name)[source]
validphys.pdfplots.plot_flavours(pdf, xplotting_grid, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the absolute central value and the uncertainty of all the flavours of a pdf as a function of x for a given value of Q.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

validphys.pdfplots.plot_lumi1d(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, show_mc_errors: bool = True, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, pdfs_noband=None, scale='log', legend_stat_labels: bool = True)[source]

Plot PDF luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).

This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel. normalize_to works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs. A list pdfs_noband can be passed to supress the error bands for certain PDFs and plot the central values only. legend_stat_labels controls whether to show detailed information on what kind of confidence interval is being plotted in the legend labels.

validphys.pdfplots.plot_lumi1d_replicas(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]

This function is similar to plot_lumi1d, but instead of plotting the standard deviation and 68% c.i. it plots the luminosities for individual replicas.

Plot PDF replica luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).

This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel. normalize_to works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs.

validphys.pdfplots.plot_lumi1d_uncertainties(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]

Plot PDF luminosity uncertainties at a given center of mass energy. sqrts is the center of mass energy (GeV).

If normalize_to is set, the values are normalized to the central value of the corresponding PDFs. y_cut can be used to specify a rapidity cut over the integration range.

validphys.pdfplots.plot_lumi2d(pdf, lumi_channel, lumigrid2d, sqrts, display_negative: bool = True)[source]

Plot the absolute luminosity on a grid of invariant mass and rapidity for a given center of mass energy sqrts. The color scale is logarithmic. If display_negative is True, mark the negative values.

The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.

validphys.pdfplots.plot_lumi2d_uncertainty(pdf, lumi_channel, lumigrid2d, sqrts: Real)[source]

Plot 2D luminosity unciertainty plot at a given center of mass energy. Porting code from https://github.com/scarrazza/lumi2d.

The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.

validphys.pdfplots.plot_pdf_pulls(pdfs_list, pull_grids_list, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.

validphys.pdfplots.plot_pdf_uncertainties(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.

validphys.pdfplots.plot_pdfdistances(pdfs, distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]

Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations

validphys.pdfplots.plot_pdfreplicas(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the replicas of the specified PDFs. Otherise it works the same as plot_pdfs.

  • xscale sets the scale of the plot. E.g. ‘linear’ or ‘log’. Default is

deduced from the xplotting_grid, which in turn is ‘log’ by default.

  • normalize_to should be, a pdf id or an index of the pdf (starting from one).

validphys.pdfplots.plot_pdfreplicas_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]

Plot the kinetic energy of the replicas of the specified PDFs. Otherise it works the same as plot_pdfs_kinetic_energy.

validphys.pdfplots.plot_pdfs(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]

Plot the central value and the uncertainty of a list of pdfs as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding PDF. Otherwise, plot absolute values. See the help for xplotting_grid for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.

normalize_to: Either the name of one of the PDFs or its corresponding index in the list, starting from one, or None to plot absolute values.

xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.

pdfs_noband: A list of PDFs to plot without error bands, i.e. only the central values of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.

show_mc_errors (bool): Plot 1σ bands in addition to 68% errors for Monte Carlo PDF.

legend_stat_labels (bool): Show detailed information on what kind of confidence interval is being plotted in the legend labels.

validphys.pdfplots.plot_pdfs_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]

Band plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q. The input of this function is similar to those of plot_pdfs.

validphys.pdfplots.plot_pdfs_mixed(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]

This function is similar to plot_pdfs, except instead of only plotting the central value and the uncertainty of the PDFs, those PDFs indicated by mixband_as_replicas will be plotted as replicas without the central value.

Inputs are the same as plot_pdfs, with the exeption of mixband_as_replicas, which only exists here.

mixband_as_replicas: A list of PDFs to plot as replicas, i.e. the central values and replicas of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.

validphys.pdfplots.plot_pdfs_mixed_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]

Mixed band and replica plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q.

validphys.pdfplots.plot_pdfvardistances(pdfs, variance_distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]

Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations

validphys.pdfplots.plot_polarized_boundaries(pdfs, xplotting_grids, unpolarized_bcs, boundary_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]

Possess the exact same functionalities as plot_pdfs but for a list of Polarized PDF sets. In addition, it plots the unpolarized PDF set used as a Boundary Condition.

validphys.pineparser module

Loader for the pineappl-based FKTables

The FKTables for pineappl have pineappl.lz4 and can be utilized directly with the pineappl cli as well as read with pineappl.fk_table

exception validphys.pineparser.GridFileNotFound[source]

Bases: FileNotFoundError

PineAPPL file for FK table not found.

validphys.pineparser.get_yaml_information(yaml_file, theorypath)[source]

Reads the yaml information from a yaml compound file

Transitional function: the call to “pineko” might be to some other commondata reader that will know how to extract the information from the commondata

validphys.pineparser.pineappl_reader(fkspec)[source]

Receives a fkspec, which contains the path to the fktables that are to be read by pineappl as well as metadata that fixes things like conversion factors or apfelcomb flag. The fkspec contains also the cfactors which are applied _directly_ to each of the fktables.

The output of this function is an instance of FKTableData which can be generated from reading several FKTable files which get concatenated on the ndata (bin) axis.

For more information on the reading of pineappl tables:

https://pineappl.readthedocs.io/en/latest/modules/pineappl/pineappl.html#pineappl.pineappl.PyFkTable

About the reader:
Each pineappl table is a 4-dimensional grid with:

(ndata, active channels, x1, x2)

for DIS grids x2 will contain one single number. The luminosity channels are given in a (flav1, flav2) format and thus need to be converted to the 1-D index of a (14x14) luminosity tensor in order to put in the form of a dataframe.

All grids in pineappl are constructed with the exact same xgrid, the active channels can vary and so when grids are concatenated for an observable the gaps are filled with 0s.

The pineappl grids are such that obs = sum_{bins} fk * f (*f) * bin_w so in order to use them together with old-style grids (obs = sum_{bins} fk * xf (*xf)) it is necessary to remove the factor of x and the normalization of the bins.

About apfelcomb flags in yamldb files:

old commondata files and old grids have over time been through various iterations while remaining compatibility between each other, and fixes and hacks have been incorporated in one or another for the new theory to be compatible with old commpondata it is necessary to keep track of said hacks (and to apply conversion factors when required)

NOTE: both conversion factors and apfelcomb flags will be eventually removed.

Returns:

an FKTableData object containing all necessary information to compute predictions

Return type:

validphys.coredata.FKTableData

validphys.pineparser.pineko_yaml(yaml_file, grids_folder)[source]

Given a yaml_file, returns the corresponding dictionary and grids.

The dictionary contains all information and we return an extra field with all the grids to be loaded for the given dataset.

Parameters:
  • yaml_file (pathlib.Path) – path of the yaml file for the given dataset

  • grids_folder (pathlib.Path) – path of the grids folder

  • check_grid_existence (bool) – if True (default) checks whether the grid exists

Returns:

  • yaml_content (dict) – Metadata prepared for the FKTables

  • paths (list(list(path))) – List (of lists) with all the grids that will need to be loaded

validphys.plotutils module

Basic utilities for plotting functions.

class validphys.plotutils.ComposedHandler[source]

Bases: object

Legend artist for PDF plots.

legend_artist(legend, orig_handle, fontsize, handlebox)[source]
validphys.plotutils.HandlerSpec

alias of HandelrSpec

validphys.plotutils.add_subplot(figsize=None, projection=None, **kwargs)[source]

matplotlib.figure wrapper used to generate a figure and add a subplot.

Use matplotlib.figure.Figure() objects to avoid importing pyplot anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.

Parameters:
  • figsize (2-tuple of floats) – default is None

  • projections (The projection type of the subplot (Axes).) – default is None

Returns:

fig, ax = (matplotlib.figure.Figure, fig.add_subplot)

Return type:

tuple

validphys.plotutils.ax_or_gca(f)[source]

A decorator. When applied to a function, the keyword argument ax will automatically be filled with the current axis, if it was None.

validphys.plotutils.ax_or_newfig(f)[source]

A decorator. When applied to a function, the keyword argument ax will automatically be filled with the a new axis corresponding to an empty, if it was None.

validphys.plotutils.barplot(values, collabels, datalabels, orientation='auto')[source]

The barplot as matplotlib should have it. It resizes on overflow. values should be one or two dimensional and should contain the values for the barplot. collabels must have as many elements as values has columns (or total elements if it is one dimensional), and contains the labels for each column in the bar plot. datalabels should have as many elements as values has rows, and contains the labels for the individual items to be compared. If orientation is "auto", the barplot will be horizontal or vertical depending on the number of items. Otherwise, the orientation can ve fixes as "horizontal" or "vertical".

Parameters:
  • values (array of dimensions M×N or N.) – The input data.

  • collabels (Iterable[str] of dimensions N) – The labels for each of the bars.

  • datalabels (Iterable[str] of dimensions M or 1) – The label for each of the datasets to be compared.

  • orientation ({'auto', 'horizontal', 'vertical'}, 'optional') – The orientation of the bars.

Returns:

(fig, ax) – a tuple of a matplotlib figure and an axis, like matplotlib.pyplot.subplots. The axis will have a _bar_orientation attribute that will either be ‘horizontal’ or ‘vertical’ and will correspond to the actual orientaion of the plot.

Return type:

tuple

Examples

>>> import numpy as np
>>> from validphys.plotutils import barplot
>>> vals = np.random.rand(2,5)
>>> collabels = ["A", "B", "C", "D", "e"]
>>> fig, ax = barplot(vals, collabels, ['First try', 'Second try'])
>>> ax.legend()
validphys.plotutils.centered_range(n, value=0, distance=1)[source]

Generte a range of n points centered around value, unifirmely sampled at intervals of distance.

validphys.plotutils.color_iter()[source]

Yield the colors in the cycle defined in the matplotlib style. When the colores are exhausted a warning will be logged and the cycle will be repeated infinitely. Therefore this avoids the overflow error at runtime when using matplotlib’s f'C{i}' color specification (equivalent to colors[i]) when i>len(colors)

validphys.plotutils.expand_margin(a, b, proportion)[source]

Return a pair of numbers that have the same mean as (a,b) and their distance is proportion times bigger.

validphys.plotutils.frame_center(ax, x, values)[source]

Set the ylims of the axis ax to appropriately display values, which can be 1 or 2D and are assumed to be sampled uniformly in the coordinates of the plot (in the second dimension, for 2D arrays).

validphys.plotutils.hatch_iter()[source]

An infinite iterator that yields increasingly denser patterns of hatches suitable for passing as the hatch argument of matplotlib functions.

validphys.plotutils.kde_plot(a, height=0.05, ax=None, label=None, color=None, max_marks=100000)[source]

Plot a Kernel Density Estimate of a 1D array, togther with individual occurrences .

This plot provides a quick visualizaton of the distribution of one dimensional data in a more complete way than an histogram would. It produces both a Kernel Density Estimate (KDE) and individual occurences of the data (rug plot). The KDE uses a Gaussian Kernel with the Silverman rule to select the bandwidth (this is the optimal choice if the input data is Gaussian). The individual ocurrences are displayed as marks along the bottom axis. For performance reasons, and to avoid cluttering the plot, a maximum of max_marks marks are displayed; if the length of the data is bigger, a random sample of max_marks is taken.

Parameters:
  • a (vector) – 1D array of observations.

  • height (scalar, optional) – Height of marks in the rug plot as proportion of the axis height.

  • ax (matplotlib axes, optional) – Axes to draw plot into; otherwise grabs current axes.

  • label (string, optional) – The label for the legend (note that you have to generate the legend yourself).

  • color (optional) – A matplotlib color specification, used for both the KDE and the rugplot. If not given, the next in the underlying axis cycle will be consumed and used.

  • max_marks (integer, optional) – The maximum number of points that will be displayed individually.

Returns:

ax – The Axes object with the plot on it, allowing further customization.

Return type:

matplotlib axes

Example

>>> import numpy as np
>>> dist = np.random.normal(size=100)
>>> ax = kde_plot(dist)
validphys.plotutils.marker_iter_plot()[source]

Because of the mpl strange interface, markers work differently in plots and scatter. This is the same as marker_iter_scatter, but returns kwargs to be passed to plt.plot()

validphys.plotutils.marker_iter_scatter()[source]

Yield the possible matplotplib.markers.Markersyle instances with different fillsyles and markers. This can be passed to plt.scatter. For plt.plot, use marker_iter_scatter.

validphys.plotutils.offset_xcentered(n, ax, *, offset_prop=0.05)[source]

Yield n matplotlib transforms in such a way that the corresponding n transofrmed x values are centered around the middle. The offset between to consecutive points is offset_prop in units of the figure dpi scale.

validphys.plotutils.plot_horizontal_errorbars(cvs, errors, categorylabels, datalabels=None, xlim=None)[source]

A plots with a list of horizontal errorbars oriented vertically. cvs and errors are the central values and errors both of shape ndatasets x ncategories, cateogorylabels are the labels of each element for which errorbars are drawn and datalabels are the labels of the different datasets that are compared.

validphys.plotutils.scalar_log_formatter()[source]

Return a matplotlib formatter to display powers of 10 in a log rather than exponential notation.

Returns:

formatter – an object that can be passed to the set_major_formatter matplotlib functions.

Return type:

ticker.FuncFormatter

Examples

>>> from matplotlib.figure import Figure
>>> fig = Figure()
>>> ax = fig.subplots()
>>> ax.plot([0.01, 0.1, 1, 10, 100])
>>> ax.set_yscale("log")
>>> ax.yaxis.set_major_formatter(scalar_log_formatter())
validphys.plotutils.spiderplot(xticks, vals, label, ax)[source]

Makes a spider/radar plot.

xticks: list of names of x tick labels, e.g. datasets vals: list of values to plot corresponding to each xtick label: label for values, e.g. fit name ax: a PolarAxes instance

validphys.plotutils.subplots(figsize=None, nrows=1, ncols=1, sharex=False, sharey=False, **kwargs)[source]

matplotlib.figure wrapper used to generate a figure and add subplots.

Use matplotlib.figure.Figure() objects to avoid importing pyplot anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.

Parameters:
  • figsize (2-tuple of floats) – defaults is None

  • nrows (int, default 1)

  • ncols (int, default 1)

  • sharex (bool, default False)

  • sharey (bool, default False)

Returns:

fig, ax = (matplotlib.figure.Figure, fig.subplots)

Return type:

tuple

validphys.process_options module

Module to hold process dependent options

Only variables included in the _Vars enum and processes included in the Processes dictionary are allowed.

validphys.promptutils module

Module which extends the functionality of promp_toolkit for user inputs/interactivity

class validphys.promptutils.KeywordsWithCache(loader)[source]

Bases: object

validphys.promptutils.confirm(message, default=None)[source]

This is like prompt_toolkit.shortcuts.confirm (implemented by create_confirm_session) except that it doesn’t bind control+c to “No”, but instead raises an exception.

It also support defaults.

validphys.promptutils.yes_no_str(default=None)[source]

Return a yes or no string for the prompt, with the default highlighted

validphys.pseudodata module

Tools to obtain and analyse the pseudodata that was seen by the neural networks during the fitting.

class validphys.pseudodata.DataTrValSpec(pseudodata, tr_idx, val_idx)

Bases: tuple

pseudodata

Alias for field number 0

tr_idx

Alias for field number 1

val_idx

Alias for field number 2

exception validphys.pseudodata.ReplicaGenerationError[source]

Bases: Exception

validphys.pseudodata.indexed_make_replica(groups_index, make_replica)[source]

Index the make_replica pseudodata appropriately

validphys.pseudodata.level0_commondata_wc(data, fakepdf)[source]

Given a validphys.core.DataGroupSpec object, load commondata and generate a new commondata instance with central values replaced by fakepdf prediction

Parameters:
Returns:

list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data.

Return type:

list

Example

>>> from validphys.api import API
>>> API.level0_commondata_wc(dataset_inputs = [{"dataset":"NMC"}], use_cuts="internal", theoryid=200,fakepdf = "NNPDF40_nnlo_as_01180")

[CommonData(setname=’NMC’, ndata=204, commondataproc=’DIS_NCE’, nkin=3, nsys=16)]

validphys.pseudodata.make_level1_data(data, level0_commondata_wc, filterseed, data_index, sep_mult)[source]

Given a list of Level 0 commondata instances, return the same list with central values replaced by Level 1 data.

Level 1 data is generated using validphys.make_replica. The covariance matrix, from which the stochastic Level 1 noise is sampled, is built from Level 0 commondata instances (level0_commondata_wc). This, in particular, means that the multiplicative systematics are generated from the Level 0 central values.

Note that the covariance matrix used to generate Level 2 pseudodata is consistent with the one used at Level 1 up to corrections of the order eta * eps, where eta and eps are defined as shown below:

Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0) Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL0 and CL1 means that the multiplicative entries have been constructed from Level 0 and Level 1 central values respectively.

Parameters:
  • data (validphys.core.DataGroupSpec)

  • level0_commondata_wc (list) – list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data. Cuts already applied.

  • filterseed (int) – random seed used for the generation of Level 1 data

  • data_index (pandas.MultiIndex)

Returns:

list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 1 fake data.

Return type:

list

Example

>>> from validphys.api import API
>>> dataset='NMC'
>>> l1_cd = API.make_level1_data(dataset_inputs = [{"dataset":dataset}],use_cuts="internal", theoryid=200,
                         fakepdf = "NNPDF40_nnlo_as_01180",filterseed=1)
>>> l1_cd
[CommonData(setname='NMC', ndata=204, commondataproc='DIS_NCE', nkin=3, nsys=16)]
validphys.pseudodata.make_replica(groups_dataset_inputs_loaded_cd_with_cuts, replica_mcseed, dataset_inputs_sampling_covmat, sep_mult=False, genrep=True, max_tries=1000000, resample_negative_pseudodata=True)[source]

Function that takes in a list of validphys.coredata.CommonData objects and returns a pseudodata replica accounting for possible correlations between systematic uncertainties.

The function loops until positive definite pseudodata is generated for any non-asymmetry datasets. In the case of an asymmetry dataset negative values are permitted so the loop block executes only once.

Parameters:
  • groups_dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – List of CommonData objects which stores information about systematic errors, their treatment and description, for each dataset.

  • replica_mcseed (int, None) – Seed used to initialise the numpy random number generator. If None then a random seed is allocated using the default numpy behaviour.

  • dataset_inputs_sampling_covmat (np.array) – Full covmat to be used. It can be either only experimental or also theoretical.

  • sep_mult (bool) – Specifies whether computing the shifts with the full covmat or whether multiplicative errors should be separated

  • genrep (bool) – Specifies whether computing replicas or not

  • max_tries (int) – The stochastic nature of replica generation means one can obtain (unphysical) negative predictions. If after max_tries (default=1e6) no physical configuration is found, it will raise a ReplicaGenerationError

  • resample_negative_pseudodata (bool) – When True, replicas that produce negative predictions will be resampled for max_tries until all points are positive (default: True)

Returns:

pseudodata – Numpy array which is N_dat (where N_dat is the combined number of data points after cuts) containing monte carlo samples of data centered around the data central value.

Return type:

np.array

Example

>>> from validphys.api import API
>>> pseudodata = API.make_replica(
                                dataset_inputs=[{"dataset":"NMC"}, {"dataset": "NMCPD"}],
                                use_cuts="nocuts",
                                theoryid=53,
                                replica=1,
                                mcseed=123,
                                genrep=True,
                            )
array([0.25640033, 0.25986534, 0.27165461, 0.29001009, 0.30863588,
   0.30100351, 0.31781208, 0.30827054, 0.30258217, 0.32116842,
   0.34206012, 0.31866286, 0.2790856 , 0.33257621, 0.33680007,
validphys.pseudodata.read_replica_pseudodata(fit, context_index, replica)[source]

Function to handle the reading of training and validation splits for a fit that has been produced with the savepseudodata flag set to True.

The data is read from the PDF to handle the mixing introduced by postfit.

The data files are concatenated to return all the data that went into a fit. The training and validation indices are also returned so one can access the splits using pandas indexing.

Raises:
  • FileNotFoundError – If the training or validation files for the PDF set cannot be found.

  • CheckError – If the use_cuts flag is not set to fromfit

Returns:

data_indices_list – List of namedtuple where each entry corresponds to a given replica. Each element contains attributes pseudodata, tr_idx, and val_idx. The latter two being used to slice the former to return training and validation data respectively.

Return type:

list[namedtuple]

Example

>>> from validphys.api import API
>>> data_indices_list = API.read_fit_pseudodata(fit="pseudodata_test_fit_n3fit")
>>> len(data_indices_list) # Same as nrep
10
>>> rep_info = data_indices_list[0]
>>> rep_info.pseudodata.loc[rep_info.tr_idx].head()
                            replica 1
group dataset           id
ATLAS ATLASZPT8TEVMDIST 1   30.665835
                        3   15.795880
                        4    8.769734
                        5    3.117819
                        6    0.771079
validphys.pseudodata.recreate_fit_pseudodata(_recreate_fit_pseudodata, fitreplicas, fit_tr_masks)[source]

Function used to reconstruct the pseudodata seen by each of the Monte Carlo fit replicas.

Returns:

res – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.

Return type:

list[namedtuple]

Example

>>> from validphys.api import API
>>> API.recreate_fit_pseudodata(fit="pseudodata_test_fit_n3fit")

Notes

  • This function does not account for the postfit reshuffling.

validphys.pseudodata.recreate_pdf_pseudodata(_recreate_pdf_pseudodata, pdfreplicas, pdf_tr_masks)[source]

Like validphys.pseudodata.recreate_fit_pseudodata() but accounts for the postfit reshuffling of replicas.

Returns:

res – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.

Return type:

list[namedtuple]

Example

>>> from validphys.api import API
>>> API.recreate_pdf_pseudodata(fit="pseudodata_test_fit_n3fit")
validphys.pseudodata.recreate_pdf_pseudodata_no_table(_recreate_pdf_pseudodata, pdfreplicas, pdf_tr_masks_no_table)[source]

validphys.renametools module

A collection of utility functions to handle logistics of LHAPDFs and fits. For use by vp-scripts.

class validphys.renametools.Spinner(delay=0.1)[source]

Bases: object

Context manager to provide a spinning cursor while validphys performs some other task silently.

When exececuted in a TTY, it shows a spinning cursor for the duration of the context manager. In non interactive prompts, it prints to stdout at the beginning and end.

Example

>>> from validphys.renametools import Spinner
>>> with Spinner():
...     import time
...     time.sleep(5)
property interactive
spinner_task()[source]
static spinning_cursor()[source]
validphys.renametools.change_name(initial_path, final_name)[source]

Function that takes initial fit name and final fit name and performs the renaming

validphys.renametools.rename_nnfit(nnfit_path, initial_fit_name, final_name)[source]
validphys.renametools.rename_pdf(pdf_folder, initial_fit_name, final_name)[source]
validphys.renametools.rename_postfit(postfit_path, initial_fit_name, final_name)[source]

validphys.replica_selector module

replica_selector.py

Tools for filtering replica sets based on criteria on the replicas.

validphys.replica_selector.alpha_s_bundle_pdf(pdf, pdfs, output_path, target_name: (<class 'str'>, <class 'NoneType'>) = None)[source]

Action that bundles PDFs for distributing to the LHAPDF format. The baseline pdf is declared as the pdf key and the PDFs from which the replica 0s are to be added is declared as the pdfs list.

The bundled PDF set is stored inside the output directory.

Parameters:
  • pdf (validphys.core.PDF) – The baseline PDF to which the new replicas will be added

  • pdfs (list of validphys.core.PDF) – The list of PDFs from which replica0 will be appended

  • target_name (str or None) – Optional argument specifying the name of the output PDF. If None, then the name of the original pdf is used but with _pdfas appended

validphys.results module

results.py

Tools to obtain theory predictions and basic statistical estimators.

class validphys.results.Chi2Data(replica_result, central_result, ndata)

Bases: tuple

central_result

Alias for field number 1

ndata

Alias for field number 2

replica_result

Alias for field number 0

class validphys.results.DataResult(dataset, covmat, sqrtcovmat)[source]

Bases: StatsResult

Holds the relevant information from a given dataset

property central_value
property covmat
property label
property name
property sqrtcovmat

Lower part of the Cholesky decomposition

property std_error
class validphys.results.PositivityResult(stats)[source]

Bases: StatsResult

classmethod from_convolution(pdf, posset)[source]
class validphys.results.Result[source]

Bases: object

class validphys.results.StatsResult(stats)[source]

Bases: Result

property central_value
property error_members

Returns the error members with shape (Npoints, Npdf)

property rawdata

Returns the raw data with shape (Npoints, Npdf)

property std_error
class validphys.results.ThPredictionsResult(dataobj, stats_class, datasetnames=None, label=None, pdf=None, theoryid=None)[source]

Bases: StatsResult

Class holding theory prediction, inherits from StatsResult When created with from_convolution, it keeps tracks of the PDF for which it was computed

property datasetnames
classmethod from_convolution(pdf, dataset, central_only=False)[source]
static make_label(pdf, dataset)[source]

Deduce a reasonable label for the result based on pdf and dataspec

class validphys.results.ThUncertaintiesResult(central, std_err, label=None)[source]

Bases: StatsResult

Class holding central theory predictions and the error bar corresponding to the theory uncertainties considered. The error members of this class correspond to central +- error_bar

property central_value
property error_members

Returns the error members with shape (Npoints, Npdf)

property rawdata

Returns the raw data with shape (Npoints, Npdf)

property std_error
validphys.results.abs_chi2_data(results)[source]

Return a tuple (member_chi², central_chi², numpoints) for a given dataset

validphys.results.abs_chi2_data_thcovmat(results_with_theory_covmat)[source]

The same as abs_chi2_data but considering as well the theory uncertainties

validphys.results.chi2_stats(abs_chi2_data)[source]

Compute several estimators from the chi²:

  • central_mean

  • npoints

  • perreplica_mean

  • perreplica_std

  • chi2_per_data

validphys.results.count_negative_points(possets_predictions)[source]

Return the number of replicas with negative predictions for each bin in the positivity observable.

validphys.results.data_index(data)[source]

Given a core.DataGroupSpec instance, return pd.MultiIndex with the following levels:

  1. experiment

  2. datasets

  3. datapoints indices (cuts already applied to)

Parameters:

data (core.DataGroupSpec)

Return type:

pd.MultiIndex

validphys.results.dataset_chi2_table(chi2_stats, dataset)[source]

Show the chi² estimators for a given dataset

validphys.results.dataset_inputs_abs_chi2_data(dataset_inputs_results)[source]

Like abs_chi2_data but for a group of inputs

validphys.results.dataset_inputs_bootstrap_chi2_central(dataset_inputs_results, bootstrap_samples=500, boot_seed=123)[source]

Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of central chi2. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.

validphys.results.dataset_inputs_bootstrap_phi_data(dataset_inputs_results, bootstrap_samples=500)[source]

Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of phi. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.

For more information on how phi is calculated see phi_data

validphys.results.dataset_inputs_chi2_per_point_data(dataset_inputs_abs_chi2_data)[source]

Return the total chi²/ndata for all data, specified by dataset_inputs. Covariance matrix is fully correlated across datasets, with all known correlations.

validphys.results.dataset_inputs_phi_data(dataset_inputs_abs_chi2_data)[source]

Like phi_data but for group of datasets

validphys.results.dataset_inputs_results(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]

Like results but for a group of datasets

validphys.results.dataset_inputs_results_central(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]

Like dataset_inputs_results but for a group of datasets and replica0.

validphys.results.dataset_inputs_results_without_covmat(data, pdf: PDF)[source]

Like dataset_inputs_results but skipping the computation of the covmat

validphys.results.dataspecs_chi2_differences_table(dataspecs, dataspecs_chi2_table)[source]

Given two dataspecs, print the chi² (using dataspecs_chi2_table) and the difference between the first and the second.

validphys.results.dataspecs_chi2_table(dataspecs_total_chi2_data, dataspecs_datasets_chi2_table, dataspecs_groups_chi2_table, show_total: bool = False)[source]

Same as fits_chi2_table but for an arbitrary list of dataspecs

validphys.results.dataspecs_dataset_chi2_difference_table(dataspecs_each_dataset, dataspecs_each_dataset_chi2, dataspecs_speclabel)[source]

Returns a table with difference between the chi2 and the expected chi2 in units of the expected chi2 standard deviation, given by

chi2_diff = (chi2 - N)/sqrt(2N)

for each dataset for each dataspec.

validphys.results.dataspecs_datasets_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_datasets_chi2_data, per_point_data: bool = True)[source]

Same as fits_datasets_chi2_table but for arbitrary dataspecs.

validphys.results.dataspecs_datasets_nsigma_table(dataspecs_datasets_chi2_table)[source]

Same as dataspecs_datasets_chi2_table but for nsigma.

validphys.results.dataspecs_groups_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_groups_chi2_data, per_point_data: bool = True)[source]

Same as fits_groups_chi2_table but for an arbitrary list of dataspecs.

validphys.results.dataspecs_groups_nsigma_table(dataspecs_groups_chi2_table)[source]

Same as fits_groups_nsigma_table but for an arbitrary list of dataspecs.

validphys.results.dataspecs_nsigma_table(dataspecs_total_chi2_data, dataspecs_datasets_nsigma_table, dataspecs_groups_nsigma_table, show_total: bool = False)[source]

Same as fits_nsigma_table but for an arbitrary list of dataspecs

validphys.results.experiments_chi2_stats(total_chi2_data)[source]

Compute several estimators from the chi² for an aggregate of experiments:

  • central_mean

  • npoints

  • perreplica_mean

  • perreplica_std

  • chi2_per_data

validphys.results.experiments_covmat_no_table(experiments_data, experiments_index, experiments_covmat_collection)[source]

Makes the total experiments covariance matrix, which can then be reindexed appropriately by the chosen grouping. The covariance matrix must first be grouped by experiments to ensure correlations within experiments are preserved.

validphys.results.experiments_index(experiments_data)[source]
validphys.results.experiments_invcovmat(experiments_data, experiments_index, experiments_covmat_collection)[source]

Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.

validphys.results.experiments_sqrtcovmat(experiments_data, experiments_index, experiments_sqrt_covmat)[source]

Like experiments_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.

validphys.results.fits_chi2_table(fits_total_chi2_data, fits_datasets_chi2_table, fits_groups_chi2_table, show_total: bool = False)[source]

Show the chi² of each and number of points of each dataset and experiment of each fit, where experiment is a group of datasets according to the experiment key in the PLOTTING info file, computed with the theory corresponding to the fit. Dataset that are not included in some fit appear as NaN

validphys.results.fits_datasets_chi2_table(fits_name_with_covmat_label, fits_groups, fits_datasets_chi2_data, per_point_data: bool = True)[source]

A table with the chi2 for each included dataset in the fits, computed with the theory corresponding to the fit. The result are indexed in two levels by experiment and dataset, where experiment is the grouping of datasets according to the experiment key in the PLOTTING info file. If points_per_data is True, the chi² will be shown divided by ndata. Otherwise they will be absolute.

validphys.results.fits_datasets_nsigma_table(fits_datasets_chi2_table)[source]

A table with nsigma values for each dataset included in the fit. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.

validphys.results.fits_groups_chi2_table(fits_name_with_covmat_label, fits_groups, fits_groups_chi2_data, per_point_data: bool = True)[source]

A table with the chi2 computed with the theory corresponding to each fit for all datasets in the fit, grouped according to a key in the metadata, the grouping can be controlled with metadata_group.

If points_per_data is True, the chi² will be shown divided by ndata. Otherwise chi² values will be absolute.

validphys.results.fits_groups_nsigma_table(fits_groups_chi2_table)[source]

Similar to fits_groups_chi2_table but for nsigma. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.

validphys.results.fits_groups_phi_table(fits_name_with_covmat_label, fits_groups, fits_groups_phi)[source]

For every fit, returns phi and number of data points for each group of datasets, which are grouped according to a key in the metadata. The behaviour of the grouping can be controlled with metadata_group runcard key.

validphys.results.fits_nsigma_table(fits_total_chi2_data, fits_datasets_nsigma_table, fits_groups_nsigma_table, show_total: bool = False)[source]

Show the nsigma of each and number of points of each dataset and experiment for each fit, computed with the theory corresponding to the fit. Datasets that are not included in one of the fit appear as NaN

validphys.results.group_result_central_table_no_table(groups_results_central, groups_index)[source]

Generate a table containing the data central value and the central prediction

validphys.results.group_result_table(group_result_table_no_table)[source]

Duplicate of group_result_table_no_table but with a table decorator.

validphys.results.group_result_table_68cl(groups_results, group_result_table_no_table: DataFrame, pdf: PDF)[source]

Generate a table containing the data central value, the data 68% confidence levels, the central prediction, and 68% confidence level bounds of the prediction.

validphys.results.group_result_table_no_table(groups_results, groups_index)[source]

Generate a table containing the data central value, the central prediction, and the prediction for each PDF member.

validphys.results.groups_central_values(group_result_central_table_no_table)[source]

Duplicate of groups_central_values_no_table but takes group_result_table rather than groups_central_values_no_table, and has a table decorator.

validphys.results.groups_central_values_no_table(group_result_central_table_no_table)[source]

Returns a theoryid-dependent list of central theory predictions for a given group.

validphys.results.groups_chi2_table(groups_data, pdf, groups_chi2, groups_each_dataset_chi2)[source]

Return a table with the chi² to the groups and each dataset in the groups, grouped by metadata.

validphys.results.groups_corrmat(groups_covmat)[source]

Generates the grouped experimental correlation matrix with groups_covmat as input

validphys.results.groups_covmat(groups_covmat_no_table)[source]

Duplicate of groups_covmat_no_table but with a table decorator.

validphys.results.groups_covmat_no_table(experiments_covmat_no_table, groups_index)[source]

Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:

  • group name

  • dataset name

  • index of the point within the dataset.

validphys.results.groups_data_values(group_result_table)[source]

Returns list of data values for the input groups.

validphys.results.groups_index(groups_data)[source]

Return a pandas.MultiIndex with levels for group, dataset and point respectively, the group is determined by a key in the dataset metadata, and controlled by metadata_group key in the runcard.

Example

TODO: add example

validphys.results.groups_invcovmat(experiments_invcovmat, groups_index)[source]

Like experiments_invcovmat but relabelled to the chosen grouping.

validphys.results.groups_normcovmat(groups_covmat, groups_data_values)[source]

Calculates the grouped experimental covariance matrix normalised to data.

validphys.results.groups_sqrtcovmat(experiments_sqrtcovmat, groups_index)[source]

Like experiments_sqrtcovmat but relabelled to the chosen grouping.

validphys.results.one_or_more_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), covariance_matrix, sqrt_covmat, pdfs: (<class 'NoneType'>, <class 'collections.abc.Sequence'>) = None, pdf: (<class 'NoneType'>, <class 'validphys.core.PDF'>) = None)[source]

Generate a list of results, where the first element is the data values, and the next is either the prediction for pdf or for each of the pdfs. Which of the two is selected intelligently depending on the namespace, when executing as an action.

validphys.results.pdf_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), pdfs: ~collections.abc.Sequence, covariance_matrix, sqrt_covmat)[source]

Return a list of results, the first for the data and the rest for each of the PDFs.

validphys.results.perreplica_chi2_table(groups_data, groups_chi2, total_chi2_data)[source]

Chi² per point for each replica for each group. Also outputs the total chi² per replica. The columns come in two levels: The first is the name of the group, and the second is the number of points.

validphys.results.phi_data(abs_chi2_data)[source]

Calculate phi using values returned by abs_chi2_data.

Returns tuple of (float, int): (phi, numpoints)

For more information on how phi is calculated see Eq.(24) in 1410.8849

validphys.results.positivity_predictions_data_result(pdf, posdataset)[source]

Return an object containing the values of the positivuty observable.

validphys.results.predictions_by_kinematics_table(results, kinematics_table_notable)[source]

Return a table combining the output of validphys.kinematics.kinematics_table`() with the data and theory central values.

validphys.results.proc_result_table(proc_result_table_no_table)[source]
validphys.results.proc_result_table_experiment(procs_results_experiment, experiments_index)[source]
validphys.results.proc_result_table_no_table(procs_results, procs_index)[source]
validphys.results.procs_central_values(procs_central_values_no_table)[source]
validphys.results.procs_central_values_no_table(proc_result_table_no_table)[source]
validphys.results.procs_chi2_table(procs_data, pdf, groups_chi2_by_process, groups_each_dataset_chi2_by_process)[source]

Same as groups_chi2_table but by process

validphys.results.procs_corrmat(procs_covmat)[source]
validphys.results.procs_covmat(procs_covmat_no_table)[source]
validphys.results.procs_covmat_no_table(experiments_covmat_no_table, procs_index)[source]
validphys.results.procs_data_values(proc_result_table)[source]

Like groups_data_values but grouped by process.

validphys.results.procs_data_values_experiment(proc_result_table_experiment)[source]

Like groups_data_values but grouped by experiment.

validphys.results.procs_index(procs_data)[source]
validphys.results.procs_normcovmat(procs_covmat, procs_data_values)[source]
validphys.results.relabel_experiments_to_groups(input_covmat, groups_index)[source]

Takes a covmat grouped by experiments and relabels it by groups. This allows grouping over experiments to preserve experimental correlations outwith the chosen grouping.

validphys.results.results(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]

Tuple of data and theory results for a single pdf. The data will have an associated covariance matrix, which can include a contribution from the theory covariance matrix which is constructed from scale variation.

The theory is specified as part of the dataset (a remnant of the old C++ layout) A group of datasets is also allowed.

validphys.results.results_central(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]

Same as results() but only calculates the prediction for replica0.

validphys.results.results_with_scale_variations(results, theory_covmat_dataset)[source]

Use the theory covariance matrix to generate a ThPredictionsResult-compatible object modified so that its uncertainties correspond to a combination of the PDF and theory (scale variations) errors added in quadrature. This allows to plot results including scale variations

By doing this we lose all information about prediction for the individual replicas or theories

validphys.results.results_with_theory_covmat(dataset, results, theory_covmat_dataset)[source]

Returns results with a modfy DataResult such that the covariance matrix includes also the theory covmat. This can be used to make use of results that consider scale variations without including the theory covmat as part of the covariance matrix used by other validphys function. Most notably, this can be used to compute the chi2 including theory errors while plotting data theory covariance in which the experimental uncertainties are not stained by the thcovmat

validphys.results.results_without_covmat(dataset: DataSetSpec, pdf: PDF)[source]

Return a results object with a diagonal covmat so that it can be used to generate results-depending covmats elsewhere. Uses :py:funct:`results` under the hook

validphys.results.theory_description(theoryid)[source]

A table with the theory settings.

validphys.results.total_chi2_data_from_experiments(experiments_chi2_data, pdf)[source]

Like dataset_inputs_abs_chi2_data(), except sums the contribution from each experiment which is more efficient in the case that the total covariance matrix is block diagonal in experiments.

This is valid as long as there are no cross experiment correlations from e.g. theory covariance matrices.

validphys.results.total_chi2_per_point_data(total_chi2_data)[source]
validphys.results.total_phi_data_from_experiments(experiments_phi_data)[source]

Like dataset_inputs_phi_data() except calculate phi for each experiment and then sum the contributions. Note that since the definition of phi is

phi = sqrt( (<chi2[T_k]> - chi2[<T_k>]) / n_data ),

where k is the replica index, the total phi is

sqrt( sum(n_data*phi**2) / sum(n_data) )

where the sums run over experiment

This is only a valid method of calculating total phi provided that there are no inter-experimental correlations.

validphys.reweighting module

Utilities for reweighting studies.

Implements utilities for calculating the NNPDF weights and unweighted PDF sets. It also allows for some basic statistics.

validphys.reweighting.chi2_data_for_reweighting_experiments(chi2_data_for_reweighting_experiments_inner, use_t0)[source]
validphys.reweighting.make_pdf_from_filtered_outliers(fit, chi2filtered_index, set_name: str, output_path=None, installgrid: bool = True)[source]

Produce a new grid with the result of chi2filtered_index

validphys.reweighting.make_unweighted_pdf(pdf, unweighted_index, set_name: str, output_path=None, installgrid: bool = True)[source]

Generate an unweighted PDF set, from the prior pdf and the reweighting_experiments. The PDF is written to a pdfsets directory of the output folder. Return the relative path of the newly created PDF.

validphys.reweighting.nnpdf_weights(chi2_data_for_reweighting_experiments)[source]

Compute the replica weights according to the NNPDF formula.

validphys.reweighting.nnpdf_weights_numerator(chi2_data_for_reweighting_experiments)[source]

Compute the numerator of the NNPDF weights. This is useful for P(α), which uses a different normalization.

validphys.reweighting.p_alpha_study(chi2_data_for_reweighting_experiments)[source]

Compute P(α) in an automatic range

validphys.reweighting.plot_p_alpha(p_alpha_study)[source]

Plot the results of p_alpha_study.

validphys.reweighting.reweighting_stats(pdf, nnpdf_weights, p_alpha_study)[source]

Compute various statistics related to reweighting.

Those are:
  • Number of initial replicas.

  • Effective number of replicas.

  • Median of the weightd.

  • The maximum value of P(alpha) in some sensible range.

validphys.reweighting.unweighted_index(nnpdf_weights, nreplicas: int = 100)[source]

The index of the input replicas that corresponds to an unweighted set, for the given weights. This can be saved for testing purposes.

validphys.sumrules module

sumrules.py

Module for the computation of sum rules

Note that this contains only the code for the computation of sum rules from scratch using LHAPDF tables. The code reading the sum rule information output from the fit is present in fitinfo.py

validphys.sumrules.bad_replica_sumrules(pdf, sum_rules, threshold: Real = 0.01)[source]

Return a table with the sum rules for the replica where some sum rule is farther from the correct value than threshold (in absolute value).

validphys.sumrules.central_sum_rules(pdf: PDF, Q: Real)[source]

Compute the sum rules for the central member, at the scale Q

validphys.sumrules.central_sum_rules_table(central_sum_rules)[source]

Construct a table with the value of each sum rule for the central member

validphys.sumrules.partial_polarized_sum_rules(pdf: PDF, Q: Real, lims: tuple = ((0.0001, 0.001), (0.001, 1)))[source]

Compute the partial low- and large-x polarized sum rules. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.polarized_sum_rules(partial_polarized_sum_rules)[source]

Compute the full polarized sum rules. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.polarized_sum_rules_table(polarized_sum_rules)[source]

Return a table with the descriptive statistics of the polarized sum rules, over members of the PDF.

validphys.sumrules.sum_rules(pdf: PDF, Q: Real)[source]

Compute the momentum, uvalence, dvalence, svalence and cvalence sum rules for each member, at the energy scale Q. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.

validphys.sumrules.sum_rules_table(sum_rules)[source]

Return a table with the descriptive statistics of the sum rules, over members of the PDF.

validphys.sumrules.unknown_sum_rules(pdf: PDF, Q: Real)[source]

Compute the following integrals - u momentum fraction - ubar momentum fraction - d momentum fraction - dbar momentum fraction - s momentum fraction - sbar momentum fraction - cp momentum fraction - cm momentum fraction - g momentum fraction - T3 - T8

validphys.sumrules.unknown_sum_rules_table(unknown_sum_rules)[source]

validphys.tableloader module

#tableloader.py

Load from file some of the tables that validphys produces. Contrary to validphys.loader this module consists of functions that take absolute paths, and return mostly dataframes.

exception validphys.tableloader.TableLoaderError[source]

Bases: Exception

Errors in the tableloader module.

validphys.tableloader.combine_pseudoreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2)[source]

Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.

validphys.tableloader.combine_pseudorreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2)

Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.

validphys.tableloader.fixup_header(df, head_index, dtype)[source]

Set the type of the column index in place

validphys.tableloader.get_extrasum_slice(df, components)[source]

Extract a slice of a table that has the components in the format that extra_sums expects.

validphys.tableloader.load_adapted_fits_chi2_table(filename)[source]

Load the fits_chi2_table and adapt it in the way that suits the paramfits module. That is, return a table with the total chi² and another with the number of points.

validphys.tableloader.load_experiments_covmat(filename)

Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.load_experiments_invcovmat(filename)

Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.load_fits_chi2_table(filename)[source]

Load the result of fits_chi2_tavle or similar.

validphys.tableloader.load_perreplica_chi2_table(filename)[source]

Load the output of perreplica_chi2_table.

validphys.tableloader.parse_data_cv(filename)[source]

Useful for reading DataFrames with just one column.

validphys.tableloader.parse_exp_mat(filename)[source]

Parse a dump of a matrix like experiments_covmat.

validphys.tableloader.set_actual_column_level0(df, new_levels)[source]

Set the first level of the index to new_levels. Note: This is a separate function mostly because it breaks in every patch update of pandas.

validphys.theoryinfo module

theoryinfo.py

Actions for displaying theory info for one or more theories.

validphys.theoryinfo.all_theory_info_table(theory_database)[source]

Produces a DataFrame with all theory info and saves it

Returns:

all_theory_info_table – dataframe filled with all entries in theorydb file

Return type:

pd.Dataframe

Example

>>> from validphys.api import API
>>> df = API.all_theory_info_table()
>>> df['Comments'].iloc[:5]
ID
1                 3.0 LO benchmark
2                3.0 NLO benchmark
3               3.0 NNLO benchmark
4     3.0 NLO - Q0=1.3 For IC Test
5    3.0 NNLO - Q0=1.3 For IC Test
Name: Comments, dtype: object
validphys.theoryinfo.theory_info_table(theory_database, theory_db_id)[source]

fetches theory info for given theory_db_id constructs DataFrame from it

Parameters:

theory_db_id (int) – numeric identifier of theory to be queried. Can be specified at the runcard level.

Returns:

theory_info_table – dataframe filled with theory info for specified theory_db_id

Return type:

pd.Dataframe

Example

>>> from validphys.api import API
>>> df = API.theory_info_table(theory_db_id=53)
>>> df.loc['Comments']
Info for theory 53    NNPDF3.1 NNLO central
Name: Comments, dtype: object

validphys.uploadutils module

uploadutils.py

Tools to upload resources to remote servers.

class validphys.uploadutils.ArchiveUploader[source]

Bases: FileUploader

Uploader for objects comprising many files such as fits or PDFs

get_relative_path(output_path=None)[source]

Return the relative path to the target_dir.

root_url = None
target_dir = None
upload_context(output_path, force)[source]

Before entering the context, check that uploading is feasible. On exiting the context, upload output.

upload_or_exit_context(output, force)[source]

Like upload context, but log and sys.exit on error

upload_output(output_path, force)[source]

Rsync output_path to the server and print the resulting URL. If specific_file is given

exception validphys.uploadutils.BadSSH[source]

Bases: UploadError

class validphys.uploadutils.FileUploader[source]

Bases: Uploader

Uploader for individual files for single-file resources. It does the ” “same but prints the URL of the file.

upload_context(output_and_file)[source]

Before entering the context, check that uploading is feasible. On exiting the context, upload output.

class validphys.uploadutils.FitUploader[source]

Bases: ArchiveUploader

An uploader for fits. Fits will be automatically compressed before uploading.

check_fit_md5(output_path)[source]

When vp-setupfit is run successfully, it creates an md5 from the config. We check that the md5 matches the filter.yml which is checking that vp-setupfit was run and that the filter.yml inside the fit folder wasn’t modified.

property root_url
property target_dir
upload_output(output_path, force)[source]

Rsync output_path to the server and print the resulting URL. If specific_file is given

class validphys.uploadutils.HyperscanUploader[source]

Bases: FitUploader

Uploader for hyperopt scans, which are just special cases of fits

property root_url
property target_dir
class validphys.uploadutils.PDFUploader[source]

Bases: ArchiveUploader

An uploader for PDFs. PDFs will be automatically compressed before uploading.

property root_url
property target_dir
class validphys.uploadutils.ReportFileUploader[source]

Bases: FileUploader, ReportUploader

class validphys.uploadutils.ReportUploader[source]

Bases: Uploader

An uploader for validphys reports.

property root_url
property target_dir
exception validphys.uploadutils.UploadError[source]

Bases: Exception

class validphys.uploadutils.Uploader[source]

Bases: object

Base class for implementing upload behaviour. The main abstraction is a context manager upload_context which checks that the upload seems possible, then does the work inside the context and then uploads the result. The various derived classes should be used.

check_auth()[source]

Check that we can authenticate with a certificate.

check_rsync()[source]

Check that the rsync command exists

check_upload()[source]

Check that it looks possible to upload something. Raise an UploadError if not.

get_relative_path(output_path)[source]

Return the relative path to the target_dir.

upload_context(output)[source]

Before entering the context, check that uploading is feasible. On exiting the context, upload output.

property upload_host
upload_or_exit_context(output)[source]

Like upload context, but log and sys.exit on error

upload_output(output_path)[source]

Rsync output_path to the server and print the resulting URL. If specific_file is given

validphys.uploadutils.check_for_meta(path)[source]

Function that checks if a report input has a meta.yaml file. If not it prompts the user to either create one or follow an interactive prompt which assists the user in creating one.

Parameters:

path (pathlib.Path) – Input path

Return type:

None

validphys.uploadutils.check_input(path)[source]

A function that checks the type of the input for vp-upload. The type determines where on the vp server the file will end up

A fit is defined as any folder structure containing a filter.yml file at its root.

A pdf is defined as any folder structure that contains a .info file and a replica 0 at its root.

A report is defined as any folder structure that contains an index.html at its root.

If the input file does not fall under any such category ValueError exception is raised and the user is prompted to use either rsync or validphys.scripts.wiki_upload.

Parameters:

path (pathlib.Path) – Path of the input file

validphys.uploadutils.interactive_meta(path)[source]

Function to interactively create a meta.yaml file

Parameters:

path (pathlib.Path) – Input path

Return type:

None

validphys.utils module

validphys.utils.common_prefix(*s)[source]

Return the longest string that is a prefix to both s1 and s2

validphys.utils.experiments_to_dataset_inputs(experiments_list)[source]

Flatten a list of old style experiment inputs to the new, flat, dataset_inputs style.

Example

>>> from validphys.api import API
>>> from validphys.utils import experiments_to_dataset_inputs
>>> fit = API.fit(fit='NNPDF31_nnlo_as_0118_1000')
>>> experiments = fit.as_input()['experiments']
>>> dataset_inputs = experiments_to_dataset_inputs(experiments)
>>> dataset_inputs[:3]
[{'dataset': 'NMCPD', 'frac': 0.5},
 {'dataset': 'NMC', 'frac': 0.5},
 {'dataset': 'SLACP', 'frac': 0.5}]
validphys.utils.generate_path_filtered_data(fit_path, setname)[source]

Utility to ensure that both the loader and tools like setupfit utilize the same convention to generate the names of generated pseudodata

validphys.utils.sane_groupby_iter(df, by, *args, **kwargs)[source]

Iterate groupby in such a way that first value is always the tuple of the common values.

As a concenience for plotting, if by is None, yield the empty string and the whole dataframe.

validphys.utils.scale_from_grid(grid)[source]

Guess the appropriate matplotlib scale from a grid object. Returns 'linear' if the scale of the grid object is linear, and otherwise ' log'.

validphys.utils.split_by(it, crit)[source]

Split it in two lists, the first is such that crit evaluates to True and the second such it doesn’t. Crit can be either a function or an iterable (in this case the original it will be sliced if the length of crit is smaller).

validphys.utils.split_ranges(a, cond=None, *, filter_falses=False)[source]

Split a so that each range has the same value for cond . If filter_falses is true, only the ranges for which the condition is true will be returned.

validphys.utils.tempfile_cleaner(root, exit_func, exc, prefix=None, **kwargs)[source]

A context manager to handle temporary directory creation and clean-up upon raising an expected exception.

Parameters:
  • root (str) – The root directory to create the temporary directory in.

  • exit_func (Callable) – The exit function to call upon exiting the context manager. Usually one of shutil.move or shutil.rmtree. Use the former if the temporary directory will be the final result directory and the latter if the temporary directory will contain the result directory, for example when downloading a resource.

  • exc (Exception) – The exception to catch within the with block.

  • prefix (optional[str]) – A prefix to prepend to the temporary directory.

  • **kwargs (dict) – Keyword arguments to provide to exit_func.

Returns:

tempdir – The path to the temporary directory.

Return type:

pathlib.Path

Example

The following example creates a temporary directory prepended with tutorial_ in the /tmp directory. The context manager will listen for a KeyboardInterrupt and will clean up if this exception is raised. Upon completion of the with block, it will rename the temporary to completed as the dst, using shutil.move. The final directory will contain an empty file called new_file, which we created within the with block.

 1  import shutil
 2
 3  from validphys.utils import tempfile_cleaner
 4
 5  with tempfile_cleaner(
 6      root="/tmp",
 7      exit_func=shutil.move,
 8      exc=KeyboardInterrupt,
 9      prefix="tutorial_",
10      dst="completed",
11  ) as tempdir:
12      new_file = tempdir / "new_file"
13      input("Press enter to continue or Ctrl-C to interrupt:\n")
14      new_file.touch()

Module contents