validphys.closuretest package
Subpackages
Submodules
validphys.closuretest.closure_checks module
closuretest/checks.py
Module containing checks specific to the closure tests.
- validphys.closuretest.closure_checks.check_fit_isclosure(fit)[source]
Check the input fit is a closure test
- validphys.closuretest.closure_checks.check_fits_areclosures(fits)[source]
Check all fits are closures
- validphys.closuretest.closure_checks.check_fits_different_filterseed(fits)[source]
Input fits should have different filter seeds if they are being used for multiple closure test studies, because in high-level hand-waving terms the different level 1 shifts represents different ‘runs of the universe’!
- validphys.closuretest.closure_checks.check_fits_have_same_basis(fits_basis)[source]
Check the basis is the same for all fits
- validphys.closuretest.closure_checks.check_fits_same_filterseed(fits)[source]
Input fits should have the same filter seed if they are being compared
- validphys.closuretest.closure_checks.check_fits_underlying_law_match(fits)[source]
Check that the fits being compared have the same underlying law
- validphys.closuretest.closure_checks.check_multifit_replicas(fits_pdf, _internal_max_reps, _internal_min_reps)[source]
Checks that all the fit pdfs have the same number of replicas N_rep. Then check that N_rep is greater than the smallest number of replicas used in actions which subsample the replicas of each fit.
This check also has the secondary effect of filling in the namespace key _internal_max_reps which can be used to override the number of replicas used at the level of the runcard, but by default get filled in as the number of replicas in each fit.
validphys.closuretest.closure_plots module
closuretest/plots.py
Plots of statistical estimators for single closure test. See multiclosure module for more estimators and plots.
validphys.closuretest.closure_results module
closuretest/closure_results.py
Module containing actiosn to calculate sigle closure test estimators. This is useful for quickly checking the bias of a fit without having to run the full multiclosure analysis.
- validphys.closuretest.closure_results.delta_chi2_bootstrap(fits_level_1_noise, fits_exps_bootstrap_chi2_central, fits, use_fitcommondata)[source]
Bootstraps delta chi2 for specified fits. Delta chi2 measures whether the level one data is fitted better by the underlying law or the specified fit, it is a measure of overfitting.
delta chi2 = (chi2(T[<f>], D_1) - chi2(T[f_in], D_1))/chi2(T[f_in], D_1)
where T[<f>] is central theory prediction from fit, T[f_in] is theory prediction from t0 pdf (input) and D_1 is level 1 closure data
Exact details on delta chi2 can be found in 1410.8849 eq (28).
- validphys.closuretest.closure_results.delta_chi2_table(fits_exps_chi2, fits_exps_level_1_noise, fits_name_with_covmat_label, fits_experiments, fits, use_fitcommondata)[source]
Calculated delta chi2 per experiment and put in table Here delta chi2 is just normalised by ndata and is equal to
delta_chi2 = (chi2(T[<f>], D_1) - chi2(T[f_in], D_1))/ndata
validphys.closuretest.inconsistent_ct module
This module contains the InconsistentCommonData class which is meant to have all the methods needed in order to introduce an inconsistency within a Closure Test.
- class validphys.closuretest.inconsistent_ct.InconsistentCommonData(setname: str, ndata: int, commondataproc: str, nkin: int, nsys: int, commondata_table: DataFrame, systype_table: DataFrame, legacy: bool = False, systematics_table: DataFrame = None, legacy_names: Optional[list] = None, kin_variables: Optional[list] = None)[source]
Bases:
CommonDataClass that inherits all of the methods of coredata.CommonData class.
This class is meant to have all the methods needed in order to introduce an inconsistency within a Closure Test.
- commondata_table: DataFrame
- export_uncertainties(buffer)[source]
Same as the export_uncertainties method of the CommonData class. The only difference is that systematic_errors is now a property of the class and not a method.
- process_commondata(treatment_names, names_uncertainties, sys_rescaling_factor, inconsistent_datasets)[source]
returns a commondata instance with modified systematics. Note that if commondata.setname is not within the inconsistent_datasets or if both ADD and MULT are False, then the commondata object will not be modified.
- Parameters
treatment_names (list) – list of the names of the treatments that should be rescaled possible values are: MULT, ADD
names_uncertainties (list) – list of the names of the uncertainties that should be rescaled possible values are: CORR, UNCORR, THEORYCORR, THEORYUNCORR, SPECIAL SPECIAL is used for intra-dataset systematics
inconsistent_datasets (list) – list of the datasets for which an inconsistency should be introduced
- Return type
validphys.inconsistent_ct.InconsistentCommonData
- rescale_systematics(treatment_names, names_uncertainties, sys_rescaling_factor)[source]
Rescale the columns of the systematic_errors() that are included in the the names_uncertainties list. And return the rescaled table.
- Parameters
treatment_names (list) – list of the names of the treatments that should be rescaled possible values are: MULT, ADD
names_uncertainties (list) – list of the names of the uncertainties that should be rescaled possible values are: CORR, UNCORR, THEORYCORR, THEORYUNCORR, SPECIAL SPECIAL is used for intra-dataset systematics
sys_rescaling_factor (float) – factor by which the systematics should be rescaled
- Returns
self.systematics_table
- Return type
pd.DataFrame
- select_systype_table_indices(treatment_names, names_uncertainties)[source]
Is used to get the indices of the systype_table that correspond to the intersection of the treatment_names and names_uncertainties lists.
- Parameters
treatment_names (list) – list of the names of the treatments that should be selected possible values are: MULT, ADD
names_uncertainties (list) – list of the names of the uncertainties that should be selected possible values are: CORR, UNCORR, THEORYCORR, THEORYUNCORR, SPECIAL SPECIAL is used for intra-dataset systematics
- Returns
systype_tab.index
- Return type
pd.Index
- property systematic_errors
Overrides the systematic_errors method of the CommonData class.
This is done in order to allow the systematic_errors to be a property and hence to be able to assign values to it (setter).
- systematics_table: DataFrame = None
- systype_table: DataFrame
validphys.closuretest.multiclosure module
closuretest/multiclosure.py
Module containing all of the statistical estimators which are
averaged across multiple fits or a single replica proxy fit. The actions
in this module are used to produce results which are plotted in
multiclosure_output.py
- class validphys.closuretest.multiclosure.MulticlosureLoader(closure_theories: list, law_theory: ThPredictionsResult, covmat_reps_mean: array)[source]
Bases:
objectStores the basic information for a multiclosure study.
- law_theory
ThPredictionsResult object for the underlying law.
- covmat_reps_mean
Covariance matrix of the theory predictions averaged over fits.
- Type
np.array
- covmat_reps_mean: array
- law_theory: ThPredictionsResult
- class validphys.closuretest.multiclosure.RegularizedMulticlosureLoader(closure_theories: list, law_theory: ThPredictionsResult, covmat_reps_mean: array, pc_basis: array, n_comp: int, reg_covmat_reps_mean: array, sqrt_reg_covmat_reps_mean: array, std_covmat_reps: array)[source]
Bases:
MulticlosureLoader- pc_basis
Basis of principal components.
- Type
np.array
- reg_covmat_reps_mean
Diagonal, regularised covariance matrix computed from replicas of theory predictions.
- Type
np.array
- sqrt_reg_covmat_reps_mean
Sqrt of the regularised covariance matrix.
- Type
np.array
- std_covmat_reps
Square root of diagonal entries of the original covariance matrix.
- Type
np.array
- pc_basis: array
- reg_covmat_reps_mean: array
- sqrt_reg_covmat_reps_mean: array
- std_covmat_reps: array
- validphys.closuretest.multiclosure.bias_data(regularized_multiclosure_data_loader)[source]
Similar to bias_dataset but for all data.
- validphys.closuretest.multiclosure.bias_dataset(regularized_multiclosure_dataset_loader)[source]
Computes the normalized bias for a RegularizedMulticlosureLoader object for a single dataset.
- Parameters
regularized_multiclosure_dataset_loader (RegularizedMulticlosureLoader) –
- Returns
bias_fits n_comp
- Return type
- validphys.closuretest.multiclosure.compute_normalized_bias(regularized_multiclosure_loader: RegularizedMulticlosureLoader, corrmat: bool = False) array[source]
Compute the normalized bias for a RegularizedMulticlosureLoader object. If corrmat is True, the bias is computed assuming that RegularizedMulticlosureLoader contains the correlation matrix, this is needed when computing the bias for the entire data.
- Parameters
regularized_multiclosure_loader (RegularizedMulticlosureLoader) –
corrmat (bool, default is False) –
- Returns
Array of shape len(fits) containing the normalized bias for each fit.
- Return type
np.array
- validphys.closuretest.multiclosure.eigendecomposition(covmat: array) tuple[source]
Computes the eigendecomposition of a covariance matrix and returns the eigenvalues, eigenvectors and the normalized eigenvalues ordered from largest to smallest.
- Parameters
covmat (np.array) – covariance matrix
- Returns
3D tuple containing the eigenvalues, eigenvectors and the normalized eigenvalues. Note that the eigenvalues are sorted from largest to smallest.
- Return type
- validphys.closuretest.multiclosure.fits_normed_dataset_central_delta(multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]
For each fit calculate the difference between central expectation value and true val. Normalize this value by the variance of the differences between replicas and central expectation value (different for each fit but expected to vary only a little). Each observable central exp value is expected to be gaussianly distributed around the true value set by the fakepdf.
- Parameters
- Returns
deltas – 2-D array with shape (n_fits, n_obs)
- Return type
np.array
- validphys.closuretest.multiclosure.mean_covmat_multiclosure(closure_theories: list) array[source]
Computes the ‘PDF’ covariance matrices obtained from each multiclosure fit and averages over them.
- Parameters
closure_theories (list) – list of ThPredictionsResult
- Returns
np.array
- Return type
covmat_reps_mean
- validphys.closuretest.multiclosure.multiclosure_data_loader(data: DataGroupSpec, fits_pdf: list, multiclosure_underlyinglaw: PDF, t0set: PDF) MulticlosureLoader[source]
Like multiclosure_dataset_loader except for all data
- validphys.closuretest.multiclosure.multiclosure_dataset_loader(dataset: DataSetSpec, fits_pdf: list, multiclosure_underlyinglaw: PDF, t0set: PDF) MulticlosureLoader[source]
Internal function for loading multiple theory predictions and underlying law for a given dataset. This function is used to avoid memory issues when caching the load function of a group of datasets.
- Parameters
dataset ((DataSetSpec, DataGroupSpec)) – dataset for which the theory predictions and t0 covariance matrix will be loaded. Note that due to the structure of validphys this function can be overloaded to accept a DataGroupSpec.
fits_pdf (list) – list of PDF objects produced from performing multiple closure tests fits. Each fit should have a different filterseed but the same underlying law used to generate the pseudodata.
multiclosure_underlyinglaw (PDF) – PDF used to generate the pseudodata which the closure tests fitted. This is inferred from the fit runcards.
t0set (validphys.core.PDF) – t0 pdfset, is only used to check that the underlying law matches the t0set.
- Returns
A dataclass storing the theory predictions for the fits and the underlying law.
- Return type
Notes
This function replicates behaviour found elsewhere in validphys, the reason for this is that due to the default caching behaviour one can run into memory issues when loading the theory predictions for the amount of fits typically used in these studies.
- validphys.closuretest.multiclosure.normalized_delta_bias_data(regularized_multiclosure_data_loader: RegularizedMulticlosureLoader) tuple[source]
Compute for all data only the normalized delta after PCA regularization.
- validphys.closuretest.multiclosure.regularized_multiclosure_data_loader(multiclosure_data_loader: MulticlosureLoader, explained_variance_ratio=0.95, _internal_max_reps=None, _internal_min_reps=20)[source]
Similar to multiclosure.regularized_multiclosure_dataset_loader except for all data. In this case we regularize the correlation matrix rather than the covariance matrix, the reason for this is that different experiments can have different units.
- Parameters
multiclosure_data_loader (MulticlosureLoader) –
explained_variance_ratio (float, default is 0.95) –
_internal_max_reps (int, default is None) – Maximum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits
_internal_min_reps (int, default is 20) – Minimum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits
- Return type
- validphys.closuretest.multiclosure.regularized_multiclosure_dataset_loader(multiclosure_dataset_loader: MulticlosureLoader, explained_variance_ratio=0.95, _internal_max_reps=None, _internal_min_reps=20) RegularizedMulticlosureLoader[source]
Similar to multiclosure.multiclosure_dataset_loader but computes the regularized PDF covariance matrix by only keeping the largest eigenvalues that sum to the explained_variance_ratio.
- Parameters
multiclosure_dataset_loader (MulticlosureLoader) –
explained_variance_ratio (float, default is 0.95) –
_internal_max_reps (int, default is None) – Maximum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits
_internal_min_reps (int, default is 20) – Minimum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits
- Return type
- validphys.closuretest.multiclosure.xq2_dataset_map(xq2map_with_cuts, multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]
For a single dataset and a set of fits define a dictionary which contains for each datapoint of the dataset the following information: - x coordinate - Q**2 coordinate - value of Ratio bias-variance at that point for the given fits - value of xi at that point for the given fits
for double Parameters ———- xq2map_with_cuts: validphys.kinematics.XQ2Map
contains kinematic information of dataset’s datapoints
- multiclosure_dataset_loader: tuple
closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix
- _internal_max_reps: int
maximum number of replicas to use for each fit
- _internal_min_reps: int
minimum number of replicas to use for each fit
- xq2map: dictionary
dictionary containing: - x coordinate - Q**2 coordinate - Ratio bias-variance - xi
validphys.closuretest.multiclosure_bootstrap module
Module for bootstrapping multiclosure fits.
- class validphys.closuretest.multiclosure_bootstrap.BootstrappedTheoryResult(data)[source]
Bases:
objectProxy class which mimics results.ThPredictionsResult so that pre-existing bias/variance actions can be used with bootstrapped replicas
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_bias_data(bootstrapped_regularized_multiclosure_data_loader)[source]
Computes Bias and Variance for each bootstrap sample. Returns a DataFrame with the results.
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_bias_dataset(bootstrapped_regularized_multiclosure_dataset_loader, dataset)[source]
Computes Bias for each bootstrap sample. Returns a DataFrame with the results.
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_indicator_function_data(bootstrapped_normalized_delta_bias_data, nsigma=1)[source]
Compute the indicator function for each bootstrap sample.
- Parameters
- Returns
- list
list of length N_boot and entrances are arrays of dim Npca x Nfits containing the indicator function for each bootstrap sample.
- float
average number of degrees of freedom
- Return type
2-D tuple
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_multiclosure_data_loader(multiclosure_data_loader: MulticlosureLoader, n_fit_max: int, n_fit: int, n_rep_max: int, n_rep: int, n_boot_multiclosure: int, use_repeats: bool = True)[source]
Like bootstrapped_multiclosure_dataset_loader except for all data.
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_multiclosure_dataset_loader(multiclosure_dataset_loader: MulticlosureLoader, n_fit_max: int, n_fit: int, n_rep_max: int, n_rep: int, n_boot_multiclosure: int, use_repeats: bool = True)[source]
Returns a tuple of MulticlosureLoader objects each of which is a bootstrap resample of the original dataset.
- Parameters
multiclosure_dataset_loader (MulticlosureLoader) –
n_fit_max (int) – maximum number of fits, should be smaller or equal to number of multiclosure fits
n_fit (int) – number of fits to draw for each resample
n_rep_max (int) – maximum number of replicas, should be smaller or equal to number of replicas in each fit
n_rep (int) – number of replicas to draw for each resample
n_boot_multiclosure (int) – number of bootstrap resamples to perform
rng_seed_mct_boot (int) – seed for random number generator
use_repeats (bool, default is True) – whether to allow repeated fits and replicas in each resample
- Returns
resampled_multiclosure – tuple of MulticlosureLoader objects each of which is a bootstrap resample of the original dataset
- Return type
tuple of shape (n_boot_multiclosure,)
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_normalized_delta_bias_data(bootstrapped_regularized_multiclosure_data_loader)[source]
Compute the normalized deltas for each bootstrap sample. Note: delta is the bias in the diagonal basis.
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_regularized_multiclosure_data_loader(multiclosure_data_loader: MulticlosureLoader, n_fit_max: int, n_fit: int, n_rep_max: int, n_rep: int, n_boot_multiclosure: int, use_repeats: bool = True, explained_variance_ratio: float = 0.95, _internal_max_reps=None, _internal_min_reps=20) tuple[source]
Same as bootstrapped_regularized_multiclosure_dataset_loader but for all the data.
- validphys.closuretest.multiclosure_bootstrap.bootstrapped_regularized_multiclosure_dataset_loader(multiclosure_dataset_loader: MulticlosureLoader, n_fit_max: int, n_fit: int, n_rep_max: int, n_rep: int, n_boot_multiclosure: int, use_repeats: bool = True, explained_variance_ratio: float = 0.95, _internal_max_reps=None, _internal_min_reps=20) tuple[source]
Similar to multiclosure.bootstrapped_multiclosure_dataset_loader but returns PCA regularised covariance matrix, where the covariance matrix has been computed from the replicas of the theory predictions.
Returns a tuple of RegularizedMulticlosureLoader objects.
- validphys.closuretest.multiclosure_bootstrap.standard_indicator_function(standard_variable, nsigma=1)[source]
Calculate the indicator function for a standardised variable.
- Parameters
standard_variable (np.array) – array of variables that have been standardised: (x - mu)/sigma
nsigma (float) – number of standard deviations to consider
- Returns
array of ones and zeros. If 1 then the variable is within nsigma standard deviations from the mean, otherwise it is 0.
- Return type
np.array
validphys.closuretest.multiclosure_inconsistent_output module
multiclosure_inconsistent_output
Module containing the actions which produce some output in validphys reports i.e figures or tables for (inconsistent) multiclosure estimators in the space of data
- validphys.closuretest.multiclosure_inconsistent_output.lambdavalues_bootstrapped_table_bias_datasets = <reportengine.resourcebuilder.collect object>
Collects bootstrapped_table_bias_data over multiple lambda values dataspecs.
- validphys.closuretest.multiclosure_inconsistent_output.plot_l2_condition_number(each_dataset, internal_multiclosure_data_collected_loader, evr_min=0.9, evr_max=0.995, evr_n=20)[source]
Plot the L2 condition number of the covariance matrix as a function of the explained variance ratio. The plot gives an idea of the stability of the covariance matrix as a function of the exaplained variance ratio and hence the number of principal components used to reduce the dimensionality.
The ideal explained variance ratio is chosen based on a threshold L2 condition number, in general this threshold number (and the derived explained variance ratio) should be chosen so that
relative error in output (inverse covmat) <= relative error in input (covmat) * condition number Note that in a closure test the relative error in the covariance matrix is very small and only numerical.
- validphys.closuretest.multiclosure_inconsistent_output.plot_lambdavalues_bias_values(lambdavalues_bootstrapped_table_bias_datasets, lambdavalues, each_dataset)[source]
Plot sqrt of bias and its bootstrap uncertainty as a function of lambda for each dataset.
validphys.closuretest.multiclosure_nsigma module
This module contains the functions used in Sec. 4 of paper: arXiv: 2503.17447
set_1, set_2, and set_3 correspond to (S_1), (S_2), and (S_3) in Eq. 4.3, 4.6, and 4.7.
- class validphys.closuretest.multiclosure_nsigma.MulticlosureNsigma(nsigma_table: DataFrame, is_weighted: bool)[source]
Bases:
objectDataclass containing nsigma values for all datasets and fits, also used to keep track on whether the multiclosure fit is weighted or not.
- nsigma_table
A table containing n_sigma values.
- Type
pd.DataFrame
- nsigma_table: DataFrame
- class validphys.closuretest.multiclosure_nsigma.NsigmaAlpha(alpha_dict: dict, is_weighted: bool)[source]
Bases:
objectDataclass storing the set 1 values (can be used both for the set 1 and its complement).
- validphys.closuretest.multiclosure_nsigma.Z_ALPHA_RANGE = array([ inf, 2.57235211, 2.32257453, 2.16610675, 2.04959427, 1.95566144, 1.87635856, 1.8073542 , 1.74601652, 1.69062163, 1.63997627, 1.59321882, 1.5497059 , 1.50894386, 1.47054524, 1.43420016, 1.39965665, 1.36670697, 1.33517774, 1.30492264, 1.27581704, 1.24775386, 1.22064035, 1.19439566, 1.16894884, 1.14423727, 1.12020535, 1.09680356, 1.0739875 , 1.05171725, 1.02995676, 1.00867336, 0.98783733, 0.96742157, 0.94740127, 0.92775369, 0.90845787, 0.88949451, 0.87084575, 0.85249503, 0.83442701, 0.81662736, 0.79908276, 0.78178075, 0.76470967, 0.74785859, 0.73121725, 0.71477599, 0.69852571, 0.68245784, 0.66656426, 0.65083731, 0.63526971, 0.61985457, 0.60458535, 0.5894558 , 0.57445999, 0.55959227, 0.54484724, 0.53021973, 0.51570479, 0.50129771, 0.48699394, 0.47278912, 0.45867907, 0.44465976, 0.4307273 , 0.41687796, 0.40310812, 0.3894143 , 0.37579311, 0.3622413 , 0.3487557 , 0.33533322, 0.32197089, 0.30866581, 0.29541514, 0.28221615, 0.26906614, 0.25596249, 0.24290266, 0.22988412, 0.21690443, 0.20396118, 0.19105201, 0.1781746 , 0.16532667, 0.15250597, 0.1397103 , 0.12693746, 0.11418529, 0.10145167, 0.08873448, 0.07603162, 0.06334102, 0.05066062, 0.03798835, 0.02532218, 0.01266008, 0. ])
Quantile range for computing the true positive rate and true negative rate.
- validphys.closuretest.multiclosure_nsigma.comp_nsigma_alpha(multiclosurefits_nsigma: DataFrame, weighted_dataset: str) NsigmaAlpha[source]
Computes the complement set 1 alpha values.
- validphys.closuretest.multiclosure_nsigma.comp_set_1(dataspecs_comp_nsigma_alpha: list) dict[source]
Returns the complement set 1 alpha values.
- validphys.closuretest.multiclosure_nsigma.dataspecs_comp_nsigma_alpha = <reportengine.resourcebuilder.collect object>
Collect complement set 1 alpha over dataspecs.
- validphys.closuretest.multiclosure_nsigma.dataspecs_multiclosurefits_nsigma = <reportengine.resourcebuilder.collect object>
Collect the multiclosurefits_nsigma over dataspecs.
- validphys.closuretest.multiclosure_nsigma.dataspecs_nsigma_alpha = <reportengine.resourcebuilder.collect object>
Collect set 1 alpha over dataspecs.
- validphys.closuretest.multiclosure_nsigma.def_of_nsigma_alpha(multiclosurefits_nsigma: DataFrame, weighted_dataset: str, complement: bool = False) NsigmaAlpha[source]
Defines how the set 1 alpha values are computed. It allows to compute both the set 1 and its complement.
- Parameters
- Return type
- validphys.closuretest.multiclosure_nsigma.def_set_3(dataspecs_multiclosurefits_nsigma: list, weighted_dataset: str, complement: bool = False) dict[source]
Defines how the set 3 values are computed. It allows to compute both the set 3 and its complement.
- validphys.closuretest.multiclosure_nsigma.multiclosurefits_nsigma(fits: NSList, fits_data: list, fits_datasets_chi2_nsigma_deviation: list, is_weighted: bool) MulticlosureNsigma[source]
Returns a table (dataframe) containing n_sigma values. Index: dataset names, Columns: Level 1 seeds (filterseed).
- Parameters
- Return type
- validphys.closuretest.multiclosure_nsigma.nsigma_alpha(multiclosurefits_nsigma: DataFrame, weighted_dataset: str) NsigmaAlpha[source]
Computes the set 1 alpha values.
- validphys.closuretest.multiclosure_nsigma.probability_inconsistent(set_1, set_2, set_3, comp_set_1, n_fits, weighted_dataset)[source]
The set of inconsistent fits can be defined in different ways, two possible cases are:
C_2: (S_1 intersect S_2) union (S_3)
C_3 = S_1 union (~S_1 intersect S_3)
- The probability of a dataset being inconsistent is defined as:
P(inconsistent) = |I_alpha| / N
where N is the total number of fits.
- validphys.closuretest.multiclosure_nsigma.set_1(dataspecs_nsigma_alpha: list) dict[source]
Returns the set 1 alpha values, these are defined as
S_1 = {j | n_{sigma}^{j} > Z_{alpha}}
where j is the index of the fit and n_{sigma}^{j} is the n-sigma value computed for fit j.
- validphys.closuretest.multiclosure_nsigma.set_2(dataspecs_nsigma_alpha: list) dict[source]
Same as the set 1 alpha values, but for the weighted fits.
S_2 = {i | n_{weighted, sigma}^{i} > Z_{alpha}}
where i is the index of the fit and n_{weighted, sigma}^{i} is the n-sigma value computed on the weighted dataset for fit i.
- validphys.closuretest.multiclosure_nsigma.set_3(dataspecs_multiclosurefits_nsigma: list, weighted_dataset: str) dict[source]
Computes the set 3 alpha values. The set 3 is defined as:
S_3 = {i | n_{weighted, sigma}^{i} - n_{ref, sigma}^{i}> + Z_{alpha}}
where the n-sigma is computed on all datasets that are not the weighted dataset. Moreover if for a fit i any dataset has a n-sigma value greater than Z_{alpha}, then the fit i is included in the set.
validphys.closuretest.multiclosure_nsigma_helpers module
This module contains some helper functions that are used for the computation of nsigma in the context of a multi-closure test.
- class validphys.closuretest.multiclosure_nsigma_helpers.CentralChi2Data(value: float, ndata: int, dataset: validphys.core.DataSetSpec)[source]
Bases:
object- dataset: DataSetSpec
- property reduced
- validphys.closuretest.multiclosure_nsigma_helpers.central_member_chi2(central_predictions: DataFrame, sqrt_covmat: ndarray, dataset: DataSetSpec, loaded_commondata_with_cuts: CommonData) CentralChi2Data[source]
Computes the chi2 value for a dataset.
- Parameters
central_predictions – The central predictions for the dataset.
sqrt_covmat (np.ndarray) – The square root of the covariance matrix.
dataset (DataSetSpec) – The dataset.
loaded_commondata_with_cuts (nnpdf_data.coredata.CommonData) –
- Return type
- validphys.closuretest.multiclosure_nsigma_helpers.chi2_nsigma_deviation(central_member_chi2: CentralChi2Data) float[source]
Computes n_sigma as: (chi2 - ndata) / sqrt(2 * ndata)
- Parameters
central_member_chi2 (CentralChi2Data) –
- Returns
The deviation in units of sigma.
- Return type
- validphys.closuretest.multiclosure_nsigma_helpers.datasets_chi2_nsigma_deviation = <reportengine.resourcebuilder.collect object>
Collect the n_sigma values over list of
dataset_input.
- validphys.closuretest.multiclosure_nsigma_helpers.fits_data = <reportengine.resourcebuilder.collect object>
Collects the data for each fit.
- validphys.closuretest.multiclosure_nsigma_helpers.fits_datasets_chi2_nsigma_deviation = <reportengine.resourcebuilder.collect object>
Collects over fits and for all datasets the n_sigma values.
validphys.closuretest.multiclosure_nsigma_output module
Module for plotting the results of the multiclosure_nsigma.py script.
Can be used to reproduce the plots in Sec. 4 of arXiv: 2503.17447
- validphys.closuretest.multiclosure_nsigma_output.plot_1_minus_all_sets(set_1, set_3, set_2, n_fits)[source]
Plots complement of S_1, S_2 and S_3.
- validphys.closuretest.multiclosure_nsigma_output.plot_all_sets(set_1, set_3, set_2, n_fits)[source]
Plots S_1, S_2 and S_3.
- validphys.closuretest.multiclosure_nsigma_output.plot_probability_consistent(probability_inconsistent, comp_set_1, weighted_dataset, n_fits)[source]
Plots the probability of dataset being flagged as consistent.
- validphys.closuretest.multiclosure_nsigma_output.plot_probability_inconsistent(probability_inconsistent, set_1, weighted_dataset, n_fits)[source]
The set of inconsistent fits:
C_1 = S_1
C_2 = (S_1 intersect S_3) union (S_2)
C_3 = S_1 union (~S_1 intersect S_3)
- The probability of a dataset being inconsistent is defined as:
P(inconsistent) = |I_alpha| / N
where N is the total number of fits.
validphys.closuretest.multiclosure_output module
multiclosure_output
Module containing the actions which produce some output in validphys reports i.e figures or tables for multiclosure estimators in the space of data.
- validphys.closuretest.multiclosure_output.bootstrapped_table_bias_data(bootstrapped_bias_data)[source]
Compute the bias, sqrt bias and their bootstrap errors for a DataGroup and return a DataFrame with the results.
- validphys.closuretest.multiclosure_output.bootstrapped_table_bias_datasets(bootstrapped_bias_datasets)[source]
Compute the bias, variance, ratio and sqrt(ratio) for each dataset and return a DataFrame with the results. Uncertainty on ratio and sqrt ratio is computed by Gaussian error propagation of the bootstrap uncertainty on bias and variance.
- validphys.closuretest.multiclosure_output.plot_xq2_data_prcs_maps(xq2_data_map, each_dataset)[source]
Heat map of the ratio bias variance and xi quantile estimator for each datapoint in each dataset.
- Parameters
xq2_data_map (dictionary) –
containing (dictionary) –
x coordinate
Q**2 coordinate
Ratio bias-variance
xi
each_dataset (list) –
- Yields
figure
- validphys.closuretest.multiclosure_output.table_bias_data(bias_data)[source]
Same as table_bias_datasets but for all the data, meaning that the correlations between the datasets are taken into account.
- Parameters
bias_data (list) – Same of bias_dataset but for all the data
- Returns
DataFrame containing the bias, variance, ratio and sqrt(ratio) for each dataset
- Return type
pd.DataFrame
- validphys.closuretest.multiclosure_output.table_bias_datasets(bias_datasets, each_dataset)[source]
Compute the bias and sqrt bias and associated errors for each dataset and return a DataFrame with the results.
- validphys.closuretest.multiclosure_output.table_xi_indicator_function_data(bootstrapped_indicator_function_data)[source]
Computes the bootstrap average and std of the indicator function for the data.
- Parameters
bootstrapped_indicator_function_data (tuple) –
- Returns
DataFrame containing the average and std of the indicator function for the data.
- Return type
pd.DataFrame
validphys.closuretest.multiclosure_pdf module
multiclosure_pdf.py
Module containing all of the actions related to statistical estimators across
multiple closure fits or proxy fits defined in PDF space. The actions
in this module are used to produce results which are plotted in
multiclosure_pdf_output.py
- validphys.closuretest.multiclosure_pdf.bootstrap_pdf_differences(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, rng)[source]
Generate a single bootstrap sample of
pdf_central_differenceandpdf_replica_differencegiven the multiclosure fits grid values (fits_xi_grid_values); the underlying law grid values and the underlying law; and a numpy random state which is used to generate random indices for bootstrap sample. The bootstrap does include repeats and has the same number of fits and replicas as the originalfits_xi_grid_valueswhich is being resampled.- Returns
pdf_difference – a tuple of 2 lists: the central differences and the replica differences. Each list is n_fits long and each element is a resampled differences array for a randomly selected fit, randomly selected replicas.
- Return type
- validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_expected_xi(fits_bootstrap_pdf_sqrt_ratio)[source]
Using fits_bootstrap_pdf_sqrt_ratio calculate a bootstrap of the expected xi using the same procedure as in
validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().
- validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_ratio(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, multiclosure_nx=4, n_boot=100, boot_seed=1234)[source]
Perform a bootstrap sampling across fits and replicas of the sqrt ratio, by flavour and total and then tabulate the mean and error
- validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_sqrt_ratio(fits_bootstrap_pdf_ratio)[source]
Take the square root of fits_bootstrap_pdf_ratio
- validphys.closuretest.multiclosure_pdf.fits_correlation_matrix_totalpdf(fits_covariance_matrix_totalpdf)[source]
Given the fits_covariance_matrix_totalpdf, returns the corresponding correlation matrix
- validphys.closuretest.multiclosure_pdf.fits_covariance_matrix_by_flavour(fits_replica_difference)[source]
Given a set of PDF grids from multiple closure tests, obtain an estimate of the covariance matrix for each flavour separately, return as a list of covmats
- validphys.closuretest.multiclosure_pdf.fits_covariance_matrix_totalpdf(fits_replica_difference, multiclosure_nx=4)[source]
Given a set of PDF grids from multiple closure tests, obtain an estimate of the covariance matrix allowing for correlations across flavours
- validphys.closuretest.multiclosure_pdf.fits_pdf_flavour_ratio(fits_sqrt_covmat_by_flavour, fits_central_difference, fits_replica_difference)[source]
Calculate the bias (chi2 between central PDF and underlying PDF) for each flavour and the variance (mean chi2 between replica and central PDF), then return a numpy array with shape (flavours, 2) with second axis being bias, variance
- validphys.closuretest.multiclosure_pdf.fits_pdf_total_ratio(fits_central_difference, fits_replica_difference, fits_covariance_matrix_totalpdf, multiclosure_nx=4)[source]
Calculate the total bias and variance for all flavours and x allowing for correlations across flavour.
Returns:
- ratio_data: tuple
required data for calculating mean(bias) over mean(variance) across fits in form of tuple (bias, variance)
- validphys.closuretest.multiclosure_pdf.fits_sqrt_covmat_by_flavour(fits_covariance_matrix_by_flavour)[source]
For each flavour covariance matrix calculate the sqrt covmat (cholesky lower triangular)
- validphys.closuretest.multiclosure_pdf.internal_nonsinglet_xgrid(multiclosure_nx=4)[source]
Given the number of x points, set up the xgrid for flavours which are not singlet or gluon, defined as being linearly spaced points between 0.1 and 0.5
- validphys.closuretest.multiclosure_pdf.internal_singlet_gluon_xgrid(multiclosure_nx=4)[source]
Given the number of x points, set up the singlet and gluon xgrids, which are defined as half the points being logarithmically spaced between 10^-3 and 0.1 and the other half of the points being linearly spaced between 0.1 and 0.5
- validphys.closuretest.multiclosure_pdf.pdf_central_difference(xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw)[source]
Calculate the difference between underlying law and central PDF for, specifically:
underlying_grid - mean(grid_vals)
where mean is across replicas.
Returns:
- diffs: np.array
array of diffs with shape (flavour, x)
- validphys.closuretest.multiclosure_pdf.pdf_replica_difference(xi_grid_values)[source]
Calculate the difference between the central PDF and the replica PDFs, specifically:
mean(grid_vals) - grid_vals
where the mean is across replicas.
Returns:
- diffs: np.array
array of diffs with shape (replicas, flavour, x)
- validphys.closuretest.multiclosure_pdf.replica_and_central_diff_totalpdf(fits_replica_difference, fits_central_difference, fits_covariance_matrix_totalpdf, multiclosure_nx=4, use_x_basis=False)[source]
Calculate sigma and delta, like
xi_flavour_x()but return before calculating xi.
- validphys.closuretest.multiclosure_pdf.underlying_xi_grid_values(multiclosure_underlyinglaw: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid)[source]
Like xi_pdfgrids but setting the PDF as the underlying law, extracted from a set of fits
- validphys.closuretest.multiclosure_pdf.xi_flavour_x(fits_replica_difference, fits_central_difference, fits_covariance_matrix_by_flavour, use_x_basis=False)[source]
For a set of fits calculate the indicator function
I_{[-sigma, sigma]}(delta)
where sigma is the RMS difference between central and replicas PDF and delta is the difference between central PDF and underlying law.
The differences are all rotated to basis which diagonalises the covariance matrix that was estimated from the super set of all fit replicas.
Finally take the mean across fits to get xi in flavour and x.
- validphys.closuretest.multiclosure_pdf.xi_grid_values(xi_pdfgrids)[source]
Grid values from the xi_pdfgrids concatenated as single numpy array
- validphys.closuretest.multiclosure_pdf.xi_pdfgrids(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid)[source]
Generate PDF grids which are required for calculating xi in PDF space in the NN31IC basis, excluding the charm. We want to specify different xgrids for different flavours to avoid sampling PDFs in deep extrapolation regions. The limits are chosen to achieve this and specifically they are chosen to be:
gluon and singlet: 10^-3 < x < 0.5 other non-singlets: 0.1 < x < 0.5
- Returns
tuple of xplotting_grids, one for gluon and singlet and one for other
non-singlets
- validphys.closuretest.multiclosure_pdf.xi_totalpdf(replica_and_central_diff_totalpdf)[source]
Like
xi_flavour_x()except calculate the total xi across flavours and x accounting for correlations
validphys.closuretest.multiclosure_pdf_output module
multiclosure_pdf_output.py
Module containing all of the plots and tables for multiclosure estimators in PDF space.
- validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_compare_xi_to_expected(fits_bootstrap_pdf_expected_xi_table, fits_bootstrap_pdf_xi_table)[source]
Table comparing the mean and standard deviation across bootstrap samples of the measured value of xi to the value calculated from bias/variance in PDF space. This is done for each flavour and for the total across all flavours accounting for correlations.
- validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_expected_xi_table(fits_bootstrap_pdf_expected_xi)[source]
Tabulate the mean and standard deviation across bootstrap samples of
fits_bootstrap_pdf_expected_xi()with a row for each flavour and the total expected xi.
- validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_sqrt_ratio_table(fits_bootstrap_pdf_sqrt_ratio)[source]
Tabulate the mean and standard deviation across bootstrap samples of the sqrt ratio of bias/variance in PDF space, with a row for each flavour and the total. For more information on the bootstrap sampling see
fits_bootstrap_pdf_ratio().
- validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_xi_table(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, multiclosure_nx=4, n_boot=100, boot_seed=1234, use_x_basis=False)[source]
Perform a bootstrap sampling across fits and replicas of xi, by flavour and total and then tabulate the mean and error.
- validphys.closuretest.multiclosure_pdf_output.fits_pdf_bias_variance_ratio(fits_pdf_flavour_ratio, fits_pdf_total_ratio)[source]
Returns a table with the values of mean bias / mean variance with mean referring to mean across fits, by flavour. Includes total across all flavours allowing for correlations.
- validphys.closuretest.multiclosure_pdf_output.fits_pdf_compare_xi_to_expected(fits_pdf_expected_xi_from_ratio, xi_flavour_table)[source]
Two-column table comparing the measured value of xi for each flavour to the value calculated from the bias/variance.
- validphys.closuretest.multiclosure_pdf_output.fits_pdf_expected_xi_from_ratio(fits_pdf_sqrt_ratio)[source]
Like
validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance()but in PDF space. An estimate is made of the integral across the central difference distribution, with domain defined by the replica distribution. For more details seevalidphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().
- validphys.closuretest.multiclosure_pdf_output.fits_pdf_sqrt_ratio(fits_pdf_bias_variance_ratio)[source]
Like
fits_pdf_bias_variance_ratio()except taking the sqrt. This is to see how faithful our uncertainty is in units of the standard deviation.
- validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_correlation_eigenvalues(fits_correlation_matrix_totalpdf)[source]
Plot scatter points for each of the eigenvalues from the estimated correlation matrix from the multiclosure PDFs in flavour and x.
In the legend add the ratio of the largest eigenvalue over the smallest eigenvalue, aka the l2 condition number of the correlation matrix.
- validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_correlation_matrix(fits_correlation_matrix_totalpdf, multiclosure_nx=4)[source]
Like plot_multiclosure_covariance_matrix but plots the total correlation matrix.
- validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_covariance_matrix(fits_covariance_matrix_totalpdf, multiclosure_nx=4)[source]
Plot the covariance matrix for all flavours. The covariance matrix has shape n_flavours * n_x, where each block is the covariance of the replica PDFs on the x-grid defined in
xi_pdfgrids().
- validphys.closuretest.multiclosure_pdf_output.plot_pdf_central_diff_histogram(replica_and_central_diff_totalpdf)[source]
Histogram of the difference between central PDF and underlying law normalised by the corresponding replica standard deviation for all points in x and flavour alongside a scaled Gaussian. Total xi is proportion of the histogram which falls within the central 1-sigma confidence interval.
- validphys.closuretest.multiclosure_pdf_output.plot_pdf_matrix(matrix, n_x, **kwargs)[source]
Utility function which, given a covmat/corrmat for all flavours and x, plots it with appropriate labels. Input matrix is expected to be size (n_flavours*n_x) * (n_flavours*n_x).
- Parameters
matrix (np.array) – square matrix which must be (n_flavours*n_x) * (n_flavours*n_x) with elements ordered like: (flavour0_x0, flavour0_x1, …, flavourN_x0, …, flavourN_xN) i.e. the points along x for flavour 0, then points along x for flavour 1 etc.
**kwargs – keyword arguments for the matplotlib.axes.Axes.imshow function
Notes
See matplotlib.axes.Axes.imshow for more details on the plotting function.
- validphys.closuretest.multiclosure_pdf_output.plot_xi_flavour_x(xi_flavour_x, Q, internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid, multiclosure_nx=4, use_x_basis=False)[source]
For each flavour plot xi for each x-point. By default xi is calculated and plotted in the basis which diagonalises the covmat, which is estimated from the union of all the replicas. However, if
use_x_basisisTruethen xi will be calculated and plotted in the x-basis.
- validphys.closuretest.multiclosure_pdf_output.xi_flavour_table(xi_flavour_x, xi_totalpdf)[source]
For each flavour take the mean of xi_flavour_x across x to get a single number, which is the proportion of points on the central PDF which are within 1 sigma. This is calculated from the replicas of the underlying PDF.
- Returns
xi_flavour – table of xi by flavour
- Return type
pd.DataFrame
validphys.closuretest.multiclosure_preprocessing module
multiclosure_preprocessing.py
Module containing all of the actions related to preprocessing exponents. In particular, comparing the next preprocessing exponents across the multiple closure fits with the previous effective exponents, to see if there is a big dependence on the level 1 shift.
- validphys.closuretest.multiclosure_preprocessing.next_multiclosure_alpha_preprocessing_table(fits, fits_basis, fits_pdf, fits_fitbasis_alpha_lines)[source]
Returns a table with the next alpha preprocessing exponent for each fit with a multiindex column of flavour and next preprocessing range limits.
For more information see
_next_multiclosure_preprocessing_table()
- validphys.closuretest.multiclosure_preprocessing.next_multiclosure_beta_preprocessing_table(fits, fits_basis, fits_pdf, fits_fitbasis_beta_lines)[source]
Returns a table with the next beta preprocessing exponent for each fit with a multiindex column of flavour and next preprocessing range limits.
For more information see
_next_multiclosure_preprocessing_table()
- validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_alpha_preprocessing(fits_fitbasis_alpha_lines, fits_pdf, next_multiclosure_alpha_preprocessing_table)[source]
Using the table produced by
next_multiclosure_alpha_preprocessing_table(), plot the next alpha preprocessing exponent ranges. The ranges are represented by horizontal error bars, with vertical lines indicating the previous range limits of the first fit.
- validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_alpha_preprocessing_range_width(fits_fitbasis_alpha_lines, fits_pdf, next_multiclosure_alpha_preprocessing_table)[source]
Using the table produced by
next_multiclosure_alpha_preprocessing_table(), plot the next alpha preprocessing exponent ranges width, aka max alpha - min alpha as a histogram over fits for each flavour. Add a vertical line of the previous range width of the first fit for reference
- validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_beta_preprocessing(fits_fitbasis_beta_lines, fits_pdf, next_multiclosure_beta_preprocessing_table)[source]
Using the table produced by
next_multiclosure_beta_preprocessing_table(), plot the next beta preprocessing exponent ranges. The ranges are represented by horizontal error bars, with vertical lines indicating the previous range limits of the first fit.
- validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_beta_preprocessing_range_width(fits_fitbasis_beta_lines, fits_pdf, next_multiclosure_beta_preprocessing_table)[source]
Using the table produced by
next_multiclosure_beta_preprocessing_table(), plot the next beta preprocessing exponent ranges width, aka max beta - min beta as a histogram over fits for each flavour. Add a vertical line of the previous range width of the first fit for reference
validphys.closuretest.multiclosure_pseudodata module
multiclosure_pseudodata
actions which load fit pseudodata and compute actions related to overfitting. Estimators here can only be calculated on data used in the fit.
- validphys.closuretest.multiclosure_pseudodata.expected_data_delta_chi2(data_fits_cv, multiclosure_data_loader)[source]
For
data, calculate the mean of delta chi2 across all fits, returns a tuple of number of data points and unnormalised delta chi2.
- validphys.closuretest.multiclosure_pseudodata.expected_delta_chi2_table(groups_expected_delta_chi2, group_dataset_inputs_by_metadata, total_expected_data_delta_chi2)[source]
Tabulate the expectation value of delta chi2 across fits for groups with an additional row with the total across all data at the bottom.
- validphys.closuretest.multiclosure_pseudodata.fits_dataset_cvs(fits_dataset)[source]
Internal function for loading the level one data for all fits for a single dataset. This function avoids the stringent metadata checks of the newer python commondata parser.
- validphys.closuretest.multiclosure_pseudodata.total_expected_data_delta_chi2(exps_expected_delta_chi2)[source]
Takes
expected_data_delta_chi2()evaluated for each experiment and then sums across experiments. Returns the total number of datapoints and unnormalised delta chi2.
Module contents
closuretest
module containing all actions specific to closure test