validphys.closuretest package

Submodules

validphys.closuretest.closure_checks module

closuretest/checks.py

Module containing checks specific to the closure tests.

validphys.closuretest.closure_checks.check_at_least_10_fits(fits)[source]
validphys.closuretest.closure_checks.check_fit_isclosure(fit)[source]

Check the input fit is a closure test

validphys.closuretest.closure_checks.check_fits_areclosures(fits)[source]

Check all fits are closures

validphys.closuretest.closure_checks.check_fits_different_filterseed(fits)[source]

Input fits should have different filter seeds if they are being used for multiple closure test studies, because in high-level hand-waving terms the different level 1 shifts represents different ‘runs of the universe’!

validphys.closuretest.closure_checks.check_fits_have_same_basis(fits_basis)[source]

Check the basis is the same for all fits

validphys.closuretest.closure_checks.check_fits_same_filterseed(fits)[source]

Input fits should have the same filter seed if they are being compared

validphys.closuretest.closure_checks.check_fits_underlying_law_match(fits)[source]

Check that the fits being compared have the same underlying law

validphys.closuretest.closure_checks.check_multifit_replicas(fits_pdf, _internal_max_reps, _internal_min_reps)[source]

Checks that all the fit pdfs have the same number of replicas N_rep. Then check that N_rep is greater than the smallest number of replicas used in actions which subsample the replicas of each fit.

This check also has the secondary effect of filling in the namespace key _internal_max_reps which can be used to override the number of replicas used at the level of the runcard, but by default get filled in as the number of replicas in each fit.

validphys.closuretest.closure_checks.check_t0pdfset_matches_law(t0pdfset, fit)[source]
validphys.closuretest.closure_checks.check_t0pdfset_matches_multiclosure_law(multiclosure_underlyinglaw, t0set)[source]

Checks that, if a multiclosure_underlyinglaw is present, it matches the t0set Checks t0set instead of t0pdfset since different mechanisms can fill t0set

validphys.closuretest.closure_checks.check_use_fitcommondata(use_fitcommondata)[source]

Base check that use_fitcommondata is being used, check should be used with all actions which require comparison to fitcommondata

validphys.closuretest.closure_plots module

closuretest/plots.py

Plots of statistical estimators for closure tests

validphys.closuretest.closure_plots.errorbar_figure_from_table(df)[source]

Given a table with even columns as central values as odd columns as errors plot an errorbar plot

validphys.closuretest.closure_plots.plot_biases(biases_table)[source]

Plot the bias of each experiment for all fits with bars. For information on how biases is calculated see bias_experiment

validphys.closuretest.closure_plots.plot_delta_chi2(delta_chi2_bootstrap, fits)[source]

Plots distributions of delta chi2 for each fit in fits. Distribution is generated by bootstrapping. For more information on delta chi2 see delta_chi2_bootstrap

validphys.closuretest.closure_plots.plot_fits_bootstrap_bias(fits_bootstrap_bias_table)[source]

Plot the bias for each experiment for all fits as a point with an error bar, where the error bar is given by bootstrapping the bias across replicas

The number of bootstrap samples can be controlled by the parameter bootstrap_samples

validphys.closuretest.closure_plots.plot_fits_bootstrap_variance(fits_bootstrap_variance_table)[source]

Plot variance as error bars, with mean and central value calculated from bootstrap sample

validphys.closuretest.closure_results module

closuretest/results.py

underlying actions to calculate closure test estimators plus some table actions

class validphys.closuretest.closure_results.BiasData(bias, ndata)

Bases: tuple

bias

Alias for field number 0

ndata

Alias for field number 1

class validphys.closuretest.closure_results.VarianceData(variance, ndata)

Bases: tuple

ndata

Alias for field number 1

variance

Alias for field number 0

validphys.closuretest.closure_results.bias_dataset(results, underlying_results, fit, use_fitcommondata)[source]

Calculate the bias for a given dataset and fit. The bias is defined as chi2 between the prediction from the underlying PDF (which was used to generate the closure pseudodata), also known as level zero closure data, and the central prediction calculated from the fitted PDF.

we require that use_fitcommondata is true because the generated closure data is used to generate the multiplicative contributions to the covariance matrix

validphys.closuretest.closure_results.bias_experiment(dataset_inputs_results, underlying_dataset_inputs_results, fit, use_fitcommondata)[source]

Like bias_dataset but for a whole experiment.

validphys.closuretest.closure_results.biases_table(fits_experiments, fits_experiments_bias, fits, show_total: bool = False)[source]

Creates a table with fits as the columns and the experiments from both fits as the row index.

validphys.closuretest.closure_results.bootstrap_bias_experiment(dataset_inputs_results, underlying_dataset_inputs_results, bootstrap_samples=500)[source]

Calculates bias as per bias_experiment but performs bootstrap sample across replicas. note that bias_experiment returns a named tuple like (unnormalised_bias, ndata) whereas this actions simply returns an array boostrap_bias with length bootstrap_samples. Each element of returned array is bias/n_data (bias normalised by number of datapoints)

validphys.closuretest.closure_results.bootstrap_variance_experiment(dataset_inputs_results, bootstrap_samples=500)[source]

Calculate the variance as in variance_experiment but performs bootstrap sample of the estimator. Returns an array of variance for each resample, normalised to the number of data in the experiment.

validphys.closuretest.closure_results.delta_chi2_bootstrap(fits_level_1_noise, fits_exps_bootstrap_chi2_central, fits, use_fitcommondata)[source]

Bootstraps delta chi2 for specified fits. Delta chi2 measures whether the level one data is fitted better by the underlying law or the specified fit, it is a measure of overfitting.

delta chi2 = (chi2(T[<f>], D_1) - chi2(T[f_in], D_1))/chi2(T[f_in], D_1)

where T[<f>] is central theory prediction from fit, T[f_in] is theory prediction from t0 pdf (input) and D_1 is level 1 closure data

Exact details on delta chi2 can be found in 1410.8849 eq (28).

validphys.closuretest.closure_results.delta_chi2_table(fits_exps_chi2, fits_exps_level_1_noise, fits_name_with_covmat_label, fits_experiments, fits, use_fitcommondata)[source]

Calculated delta chi2 per experiment and put in table Here delta chi2 is just normalised by ndata and is equal to

delta_chi2 = (chi2(T[<f>], D_1) - chi2(T[f_in], D_1))/ndata

validphys.closuretest.closure_results.fit_underlying_pdfs_summary(fit, fitunderlyinglaw)[source]

Returns a table with a single column for the fit with a row indication the PDF used to generate the data and the t0 pdf

validphys.closuretest.closure_results.fits_bootstrap_bias_table(fits_experiments_bootstrap_bias, fits_name_with_covmat_label, fits_experiments, fits, use_fitcommondata)[source]

Produce a table with bias for each experiment for each fit, along with variance calculated from doing a bootstrap sample

validphys.closuretest.closure_results.fits_bootstrap_variance_table(fits_exps_bootstrap_var, fits_name_with_covmat_label, fits_experiments, fits, use_fitcommondata)[source]

Produce a table with variance and its error. Variance is defined as

var = sum_ij E_rep[(E_rep[T_i], T_i) invcov_ij (E_rep[T_j], T_j)] / N_data

which is the expectation value across replicas of the chi2 between the central theory predictions and replica theory predictions. It is the same as phi^2 and gives the variance of the theory predictions in units of the covariance matrix, normalised by the number of data points.

The error is the standard deviation across bootstrap samples.

validphys.closuretest.closure_results.summarise_closure_underlying_pdfs(fits_underlying_pdfs_summary)[source]

Collects the underlying pdfs for all fits and concatenates them into a single table

validphys.closuretest.closure_results.variance_dataset(results, fit, use_fitcommondata)[source]

calculate the variance for a given dataset, which is the spread of replicas measured in the space of the covariance matrix. Given by:

E_rep [ (T - E_rep[T])_i C^{-1}_ij (T - E_rep[T])_j ]

where E_rep is the expectation value across replicas. The quantity is the same as squaring phi_data, however it is redefined here in a way which can be made fully independent of the closure data. This is useful when checking the variance of data which was not included in the fit.

validphys.closuretest.closure_results.variance_experiment(dataset_inputs_results, fit, use_fitcommondata)[source]

Like variance_dataset but for a whole experiment

validphys.closuretest.multiclosure module

closuretest/multiclosure.py

Module containing all of the statistical estimators which are averaged across multiple fits or a single replica proxy fit. The actions in this module are used to produce results which are plotted in multiclosure_output.py

class validphys.closuretest.multiclosure.BootstrappedTheoryResult(data)[source]

Bases: object

Proxy class which mimics results.ThPredictionsResult so that pre-existing bias/variance actions can be used with bootstrapped replicas

class validphys.closuretest.multiclosure.PCAInternalMulticlosureLoader(closures_th: list, law_th: ThPredictionsResult, pc_basis: array, n_comp: int, covmat_pca: array, sqrt_covmat_pca: array)[source]

Bases: object

Parameters:
  • closures_th (list) – list containing validphys.results.ThPredictionsResult objects for each fit

  • law_th (ThPredictionsResult object) – underlying law theory predictions

  • pc_basis (np.array) – basis of principal components

  • n_comp (int) – number of principal components kept after regularisation

  • covmat_pca (np.array) – regularised covariance matrix computed from replicas of theory predictions

  • sqrt_covmat_pca (np.array) – cholesky decomposed covariance matrix

closures_th: list
covmat_pca: array
law_th: ThPredictionsResult
n_comp: int
pc_basis: array
sqrt_covmat_pca: array
validphys.closuretest.multiclosure.bias_variance_resampling_data(internal_multiclosure_data_loader, n_fit_samples, n_replica_samples, bootstrap_samples=100, boot_seed=9689372, use_repeats=True)[source]

Like ratio_n_dependence_dataset except for all data.

Notes

The bootstrap samples are seeded in this function. If this action is collected over multiple experiments then the set of resamples all used corresponding fits/replicas and can be added together.

validphys.closuretest.multiclosure.bias_variance_resampling_dataset(internal_multiclosure_dataset_loader, n_fit_samples, n_replica_samples, bootstrap_samples=100, boot_seed=9689372, use_repeats=True)[source]

For a single dataset, create bootstrap distributions of bias and variance varying the number of fits and replicas drawn for each resample. Return two 3-D arrays with dimensions

(number of n_rep samples, number of n_fit samples, n_boot)

filled with resampled bias and variance respectively. The number of bootstrap_samples is 100 by default. The number of n_rep samples is determined by varying n_rep between 10 and the number of replicas each fit has in intervals of 5. This action requires that each fit has the same number of replicas which also must be at least 10. The number of n_fit samples is determined analogously to the number of n_rep samples, also requiring at least 10 fits.

Returns:

resamples – tuple of two 3-D arrays with resampled bias and variance respectively for each n_rep samples and each n_fit samples

Return type:

tuple

Notes

The bootstrap samples are seeded in this function. If this action is collected over multiple datasets then the set of resamples all used corresponding replicas and fits.

validphys.closuretest.multiclosure.bias_variance_resampling_total(exps_bias_var_resample)[source]

Sum the bias_variance_resampling_data for all experiments, giving the total bias and variance resamples. This relies on the bootstrap seed being the same for all experiments, such that the fits/replicas are the same, and there being no inter-experiment correlations.

validphys.closuretest.multiclosure.bootstrapped_indicator_function_data(bootstrapped_principal_components_normalized_delta_data, nsigma=1)[source]

Compute the indicator function for each bootstrap sample.

Parameters:
  • bootstrapped_principal_components_normalized_delta_data (list) – list of tuples containing the normalized deltas and the number of principal components. Each tuple corresponds to a bootstrap sample.

  • nsigma (int, default is 1)

Returns:

list

list of length N_boot and entrances are arrays of dim Npca x Nfits containing the indicator function for each bootstrap sample.

float

average number of degrees of freedom

Return type:

2-D tuple

validphys.closuretest.multiclosure.bootstrapped_internal_multiclosure_data_loader(internal_multiclosure_data_loader, n_fit_max, n_fit, n_rep_max, n_rep, n_boot_multiclosure, rng_seed_mct_boot, use_repeats=True)[source]

Like bootstrapped_internal_multiclosure_dataset_loader except for all data

validphys.closuretest.multiclosure.bootstrapped_internal_multiclosure_data_loader_pca(internal_multiclosure_data_loader, n_fit_max, n_fit, n_rep_max, n_rep, n_boot_multiclosure, rng_seed_mct_boot, use_repeats=True, explained_variance_ratio=0.99, _internal_max_reps=None, _internal_min_reps=20)[source]

Same as bootstrapped_internal_multiclosure_dataset_loader_pca but for all the data.

validphys.closuretest.multiclosure.bootstrapped_internal_multiclosure_dataset_loader(internal_multiclosure_dataset_loader, n_fit_max, n_fit, n_rep_max, n_rep, n_boot_multiclosure, rng_seed_mct_boot, use_repeats=True)[source]

Returns a tuple of internal_multiclosure_dataset_loader objects each of which is a bootstrap resample of the original dataset

Parameters:
  • internal_multiclosure_dataset_loader (tuple) – closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix

  • n_fit_max (int) – maximum number of fits, should be smaller or equal to number of multiclosure fits

  • n_fit (int) – number of fits to draw for each resample

  • n_rep_max (int) – maximum number of replicas, should be smaller or equal to number of replicas in each fit

  • n_rep (int) – number of replicas to draw for each resample

  • n_boot_multiclosure (int) – number of bootstrap resamples to perform

  • rng_seed_mct_boot (int) – seed for random number generator

  • use_repeats (bool, default is True) – whether to allow repeated fits and replicas in each resample

Returns:

resampled_multiclosure – tuple of internal_multiclosure_dataset_loader objects each of which is a bootstrap resample of the original dataset

Return type:

tuple of shape (n_boot_multiclosure,)

validphys.closuretest.multiclosure.bootstrapped_internal_multiclosure_dataset_loader_pca(internal_multiclosure_dataset_loader, n_fit_max, n_fit, n_rep_max, n_rep, n_boot_multiclosure, rng_seed_mct_boot, use_repeats=True, explained_variance_ratio=0.99, _internal_max_reps=None, _internal_min_reps=20)[source]

Similar to multiclosure.bootstrapped_internal_multiclosure_dataset_loader but returns PCA regularised covariance matrix, where the covariance matrix has been computed from the replicas of the theory predictions.

validphys.closuretest.multiclosure.bootstrapped_principal_components_bias_variance_data(bootstrapped_internal_multiclosure_data_loader_pca)[source]

Computes Bias and Variance for each bootstrap sample. Returns a DataFrame with the results.

validphys.closuretest.multiclosure.bootstrapped_principal_components_bias_variance_dataset(bootstrapped_internal_multiclosure_dataset_loader_pca, dataset)[source]

Computes Bias and Variance for each bootstrap sample. Returns a DataFrame with the results.

validphys.closuretest.multiclosure.bootstrapped_principal_components_normalized_delta_data(bootstrapped_internal_multiclosure_data_loader_pca)[source]

Compute the normalized deltas for each bootstrap sample.

Parameters:

bootstrapped_internal_multiclosure_data_loader_pca (list) – list of tuples containing the results of multiclosure fits after pca regularization

Returns:

list of tuples containing the normalized deltas and the number of principal components. Each tuple corresponds to a bootstrap sample.

Return type:

list

validphys.closuretest.multiclosure.data_replica_and_central_diff(internal_multiclosure_data_loader, diagonal_basis=True)[source]

Like dataset_replica_and_central_diff but for all data

validphys.closuretest.multiclosure.data_xi(data_replica_and_central_diff)[source]

Like dataset_xi but for all data

validphys.closuretest.multiclosure.dataset_fits_bias_replicas_variance_samples(internal_multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]

For a single dataset, calculate the samples of chi2-quantities which are used to calculate the bias and variance for each fit. The output of this function is similar to fits_dataset_bias_variance() except that the mean is not taken across replicas when calculating the mean squared difference between replica predictions and central predictions and instead the results are concatenated. The mean of this array would be the expected value of the variance across fits.

Return tuple (fits_bias, fits_replica_variance, n_data), where fits_bias is 1-D array of length N_fits and fits_replica_variance is 1-D array length N_fits * N_replicas.

For more information on bias see closuretest.bias_dataset and for more information on variance see validphys.closuretest.closure_results.variance_dataset().

The fits should each have the same underlying law and t0 PDF, but have different filterseeds, so that the level 1 shift is different.

Can control the number of replicas taken from each fit with _internal_max_reps.

validphys.closuretest.multiclosure.dataset_inputs_fits_bias_replicas_variance_samples(internal_multiclosure_data_loader, _internal_max_reps=None, _internal_min_reps=20)[source]
validphys.closuretest.multiclosure.dataset_replica_and_central_diff(internal_multiclosure_dataset_loader, diagonal_basis=True)[source]

For a given dataset calculate sigma, the RMS difference between replica predictions and central predictions, and delta, the difference between the central prediction and the underlying prediction.

If diagonal_basis is True he differences are calculated in the basis which would diagonalise the dataset’s covariance matrix. This is the default behaviour.

validphys.closuretest.multiclosure.dataset_xi(dataset_replica_and_central_diff)[source]

Take sigma and delta for a dataset, where sigma is the RMS difference between replica predictions and central predictions, and delta is the difference between the central prediction and the underlying prediction.

Then the indicator function is evaluated elementwise for sigma and delta

\(I_{[-\sigma_j, \sigma_j]}(\delta_j)\)

which is 1 when \(|\delta_j| < \sigma_j\) and 0 otherwise. Finally, take the mean across fits.

Returns:

xi_1sigma_i – a 1-D array where each element is the value of xi_1sigma for that particular eigenvector. We note that the eigenvectors are ordered by ascending eigenvalues

Return type:

np.array

validphys.closuretest.multiclosure.eigendecomposition(covmat)[source]

Compute the eigendecomposition of a covariance matrix.

Parameters:

covmat (np.array) – covariance matrix

Returns:

3D tuple containing the eigenvalues, eigenvectors and the normalized eigenvalues. Note that the eigenvalues are sorted from largest to smallest.

Return type:

tuple

validphys.closuretest.multiclosure.expected_data_bias_variance(fits_data_bias_variance)[source]

Like expected_dataset_bias_variance except for all data

validphys.closuretest.multiclosure.expected_dataset_bias_variance(fits_dataset_bias_variance)[source]

For a given dataset calculate the expected bias and variance across fits then return tuple (expected bias, expected variance, n_data)

validphys.closuretest.multiclosure.expected_total_bias_variance(fits_total_bias_variance)[source]

Like expected_dataset_bias_variance except for all data

validphys.closuretest.multiclosure.experiments_bootstrap_expected_xi(experiments_bootstrap_sqrt_ratio)[source]

Calculate a bootstrap resampling of the expected xi from experiments_bootstrap_sqrt_ratio, using the same formula as validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().

validphys.closuretest.multiclosure.experiments_bootstrap_ratio(experiments_bootstrap_bias_variance, total_bootstrap_ratio)[source]

Returns a bootstrap resampling of the ratio of bias/variance for each experiment and total. Total is calculated as sum(bias)/sum(variance) where each sum refers to the sum across experiments.

Returns:

ratios_resampled – list of bootstrap samples of ratio of bias/var, length of list is len(experiments) + 1 because the final element is the total ratio resampled.

Return type:

list

validphys.closuretest.multiclosure.experiments_bootstrap_sqrt_ratio(experiments_bootstrap_ratio)[source]

Square root of experiments_bootstrap_ratio

validphys.closuretest.multiclosure.fits_bootstrap_data_bias_variance(internal_multiclosure_data_loader, fits, _internal_max_reps=None, _internal_min_reps=20, bootstrap_samples=100, boot_seed=9689372)[source]

Perform bootstrap resample of fits_data_bias_variance, returns tuple of bias_samples, variance_samples where each element is a 1-D np.array of length bootstrap_samples. The elements of the arrays are bootstrap samples of bias and variance respectively.

validphys.closuretest.multiclosure.fits_bootstrap_data_xi(internal_multiclosure_data_loader, fits, _internal_max_reps=None, _internal_min_reps=20, bootstrap_samples=100, boot_seed=9689372)[source]

Perform bootstrap resample of data_xi, returns a list where each element is an independent resampling of data_xi.

For more information on bootstrapping see _bootstrap_multiclosure_fits. For more information on xi see dataset_xi.

validphys.closuretest.multiclosure.fits_data_bias_variance(internal_multiclosure_data_loader, _internal_max_reps=None, _internal_min_reps=20)[source]

Like fits_dataset_bias_variance but for all data

validphys.closuretest.multiclosure.fits_dataset_bias_variance(internal_multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]

For a single dataset, calculate the bias and variance for each fit and return tuple (bias, variance, n_data), where bias and variance are 1-D arrays of length len(fits).

For more information on bias see closuretest.bias_dataset and for more information on variance see validphys.closuretest.closure_results.variance_dataset().

The fits should each have the same underlying law and t0 PDF, but have different filterseeds, so that the level 1 shift is different.

Can control the number of replicas taken from each fit with _internal_max_reps.

validphys.closuretest.multiclosure.fits_normed_dataset_central_delta(internal_multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]

For each fit calculate the difference between central expectation value and true val. Normalize this value by the variance of the differences between replicas and central expectation value (different for each fit but expected to vary only a little). Each observable central exp value is expected to be gaussianly distributed around the true value set by the fakepdf.

Parameters:
  • internal_multiclosure_dataset_loader (tuple) – closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix

  • _internal_max_reps (int) – maximum number of replicas to use for each fit

  • _internal_min_reps (int) – minimum number of replicas to use for each fit

Returns:

deltas – 2-D array with shape (n_fits, n_obs)

Return type:

np.array

validphys.closuretest.multiclosure.fits_total_bias_variance(fits_experiments_bias_variance)[source]

Like fits_dataset_bias_variance except for all data, assumes there are no inter-experiment correlations. That assumption is broken if a theory covariance matrix is used.

validphys.closuretest.multiclosure.groups_bootstrap_expected_xi(groups_bootstrap_sqrt_ratio)[source]

Like experiments_bootstrap_expected_xi() but for metadata groups.

validphys.closuretest.multiclosure.groups_bootstrap_ratio(groups_bootstrap_bias_variance, total_bootstrap_ratio)[source]

Like experiments_bootstrap_ratio() but for metadata groups.

validphys.closuretest.multiclosure.groups_bootstrap_sqrt_ratio(groups_bootstrap_ratio)[source]

Like experiments_bootstrap_sqrt_ratio() but for metadata groups.

validphys.closuretest.multiclosure.internal_multiclosure_data_loader(data, fits_pdf, multiclosure_underlyinglaw, fits, dataset_inputs_t0_covmat_from_systematics)[source]

Like internal_multiclosure_dataset_loader except for all data

validphys.closuretest.multiclosure.internal_multiclosure_data_loader_pca(internal_multiclosure_data_loader, explained_variance_ratio=0.99, _internal_max_reps=None, _internal_min_reps=20)[source]

Like multiclosure.internal_multiclosure_dataset_loader_pca except for all data

Parameters:
  • internal_multiclosure_data_loader (tuple) – closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix

  • explained_variance_ratio (float, default is 0.99)

  • _internal_max_reps (int, default is None) – Maximum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits

  • _internal_min_reps (int, default is 20) – Minimum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits

Return type:

PCAInternalMulticlosureLoader

validphys.closuretest.multiclosure.internal_multiclosure_dataset_loader(dataset, fits_pdf, multiclosure_underlyinglaw, fits, t0_covmat_from_systematics)[source]

Internal function for loading multiple theory predictions for a given dataset and a single covariance matrix using underlying law as t0 PDF, which is for use with multiclosure statistical estimators. Avoiding memory issues from caching the load function of a group of datasets.

Parameters:
  • dataset ((DataSetSpec, DataGroupSpec)) – dataset for which the theory predictions and t0 covariance matrix will be loaded. Note that due to the structure of validphys this function can be overloaded to accept a DataGroupSpec.

  • fits_pdf (list) – list of PDF objects produced from performing multiple closure tests fits. Each fit should have a different filterseed but the same underlying law used to generate the pseudodata.

  • multiclosure_underlyinglaw (PDF) – PDF used to generate the pseudodata which the closure tests fitted. This is inferred from the fit runcards.

  • fits (list) – list of closure test fits, used to collect fits_pdf

Returns:

multiclosure_results – a tuple of length 4 containing all necessary dependencies of multiclosure statistical estimators in order:

closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix

Return type:

tuple

Notes

This function replicates behaviour found elsewhere in validphys, the reason for this is that due to the default caching behaviour one can run into memory issues when loading the theory predictions for the amount of fits typically used in these studies.

validphys.closuretest.multiclosure.internal_multiclosure_dataset_loader_pca(internal_multiclosure_dataset_loader, explained_variance_ratio=0.99, _internal_max_reps=None, _internal_min_reps=20)[source]

Similar to multiclosure.internal_multiclosure_dataset_loader but returns PCA regularised covariance matrix, where the covariance matrix has been computed from the replicas of the theory predictions.

Parameters:
  • internal_multiclosure_dataset_loader (tuple) – closure fits theory predictions, underlying law theory predictions, covariance matrix, sqrt covariance matrix

  • explained_variance_ratio (float, default is 0.99)

  • _internal_max_reps (int, default is None) – Maximum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits

  • _internal_min_reps (int, default is 20) – Minimum number of replicas used in the fits this is needed to check that the number of replicas is the same for all fits

Return type:

PCAInternalMulticlosureLoader

validphys.closuretest.multiclosure.n_fit_samples(fits)[source]

Return a range object where each item is a number of fits to use for resampling a multiclosure quantity.

It is determined by varying n_fits between 10 and number of fits provided by user in steps of 5. User must provide at least 10 fits.

validphys.closuretest.multiclosure.n_replica_samples(fits_pdf, _internal_max_reps=None, _internal_min_reps=20)[source]

Return a range object where each item is a number of replicas to use for resampling a multiclosure quantity.

It is determined by varying n_reps between 20 and number of replicas that each provided closure fit has. All provided fits must have the same number of replicas and that number must be at least 20.

The number of replicas used from each fit can be overridden by supplying _internal_max_reps.

validphys.closuretest.multiclosure.principal_components_bias_variance_data(internal_multiclosure_data_loader_pca)[source]

Like principal_components_bias_variance_datasets but for all data

Parameters:

internal_multiclosure_data_loader_pca (tuple) – Tuple containing the results of multiclosure fits after pca regularization

Returns:

3D tuple: - biases: 1-D array of shape (Nfits,) - variances: 1-D array of shape (Nfits, ) - n_comp: number of principal components kept

Return type:

tuple

validphys.closuretest.multiclosure.principal_components_bias_variance_dataset(internal_multiclosure_dataset_loader_pca)[source]

Compute the bias and variance for one dataset using the principal component reduced covariance matrix.

Parameters:
  • internal_multiclosure_dataset_loader (tuple) – Tuple containing the results of multiclosure fits

  • explained_variance_ratio (float, default is 0.99) – 3D tuple containing the principal components of the theory predictions

Returns:

3D tuple: - biases: 1-D array of shape (Nfits,) - variances: 1-D array of shape (Nfits, ) - n_comp: number of principal components kept

Return type:

tuple

validphys.closuretest.multiclosure.principal_components_normalized_delta_data(internal_multiclosure_data_loader_pca)[source]

Compute for all data only the normalized delta after PCA regularization

Parameters:

internal_multiclosure_data_loader_pca (tuple) – Tuple containing the results of multiclosure fits after pca regularization

Returns:

nd.array

Return type:

deltas

validphys.closuretest.multiclosure.standard_indicator_function(standard_variable, nsigma=1)[source]

Calculate the indicator function for a standardised variable.

Parameters:
  • standard_variable (np.array) – array of variables that have been standardised: (x - mu)/sigma

  • nsigma (float) – number of standard deviations to consider

Returns:

array of ones and zeros. If 1 then the variable is within nsigma standard deviations from the mean, otherwise it is 0.

Return type:

np.array

validphys.closuretest.multiclosure.total_bootstrap_ratio(experiments_bootstrap_bias_variance)[source]

Calculate the total bootstrap ratio for all data. Leverages the fact that the covariance matrix is block diagonal in experiments so

Total ratio = sum(bias) / sum(variance)

Which is valid provided there are no inter-experimental correlations.

Returns:

bias_var_total – tuple of the total bias and variance

Return type:

tuple

validphys.closuretest.multiclosure.total_bootstrap_xi(experiments_bootstrap_xi)[source]

Given the bootstrap samples of xi_1sigma for all experiments, concatenate the result to get xi_1sigma for all data points in a single array

validphys.closuretest.multiclosure.total_expected_xi_resample(bias_variance_resampling_total)[source]

Using the bias and variance resample, return a resample of expected xi using the method outlined in validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().

The general concept is based on assuming all of the distributions are gaussians and using the ratio of bias/variance to predict the corresponding integral. To see a more in depth explanation, see validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().

validphys.closuretest.multiclosure.total_xi_resample(exps_xi_resample)[source]

Concatenate the xi for each datapoint for all data

validphys.closuretest.multiclosure.xi_resampling_data(internal_multiclosure_data_loader, n_fit_samples, n_replica_samples, bootstrap_samples=100, boot_seed=9689372, use_repeats=True)[source]

Like xi_resampling_dataset except for all data.

Notes

The bootstrap samples are seeded in this function. If this action is collected over multiple experiments then the set of resamples all used corresponding replicas and can be added together.

validphys.closuretest.multiclosure.xi_resampling_dataset(internal_multiclosure_dataset_loader, n_fit_samples, n_replica_samples, bootstrap_samples=100, boot_seed=9689372, use_repeats=True)[source]

For a single dataset, create bootstrap distributions of xi_1sigma varying the number of fits and replicas drawn for each resample. Return a 4-D array with dimensions

(number of n_rep samples, number of n_fit samples, n_boot, n_data)

filled with resampled bias and variance respectively. The number of bootstrap_samples is 100 by default. The number of n_rep samples is determined by varying n_rep between 10 and the number of replicas each fit has in intervals of 5. This action requires that each fit has the same number of replicas which also must be at least 10. The number of n_fit samples is determined analogously to the number of n_rep samples, also requiring at least 10 fits.

Returns:

resamples – 4-D array with resampled xi for each n_rep samples and each n_fit samples

Return type:

array

Notes

The bootstrap samples are seeded in this function. If this action is collected over multiple datasets then the set of resamples all used corresponding replicas.

validphys.closuretest.multiclosure.xq2_dataset_map(xq2map_with_cuts, internal_multiclosure_dataset_loader, _internal_max_reps=None, _internal_min_reps=20)[source]

Load in a dictionary all the specs of a dataset meaning: - ds name - ds coords - standard deviation (in multiclosure) - mean (in multiclosure again) - (x,Q^2) coords

validphys.closuretest.multiclosure_output module

multiclosure_output

Module containing the actions which produce some output in validphys reports i.e figures or tables for multiclosure estimators in the space of data.

validphys.closuretest.multiclosure_output.compare_measured_expected_xi(fits_measured_xi, expected_xi_from_bias_variance)[source]

Table with measured xi and expected xi from bias/variance for each experiment and total. For details on expected xi, see expected_xi_from_bias_variance. For more details on measured xi see fits_measured_xi.

validphys.closuretest.multiclosure_output.dataset_ratio_error_finite_effects(bias_variance_resampling_dataset, n_fit_samples, n_replica_samples)[source]

For a single dataset vary number of fits and number of replicas used to perform bootstrap sample of expected bias and variance. For each combination of n_rep and n_fit tabulate the std deviation across bootstrap samples of

ratio = bias / variance

The resulting table gives and approximation of how error varies with number of fits and number of replicas for each dataset.

validphys.closuretest.multiclosure_output.dataset_std_xi_error_finite_effects(xi_resampling_dataset, n_fit_samples, n_replica_samples)[source]

For a single dataset vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the standard deviation of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the standard deviation of std(xi_1sigma) across bootstrap samples.

validphys.closuretest.multiclosure_output.dataset_std_xi_means_finite_effects(xi_resampling_dataset, n_fit_samples, n_replica_samples)[source]

For a single dataset vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the standard deviation of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the mean of std(xi_1sigma) across bootstrap samples.

validphys.closuretest.multiclosure_output.dataset_xi_error_finite_effects(xi_resampling_dataset, n_fit_samples, n_replica_samples)[source]

For a single dataset vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the mean of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the standard deviation of xi_1sigma across bootstrap samples.

validphys.closuretest.multiclosure_output.dataset_xi_means_finite_effects(xi_resampling_dataset, n_fit_samples, n_replica_samples)[source]

For a single dataset vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the mean of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the mean of xi_1sigma across bootstrap samples.

validphys.closuretest.multiclosure_output.datasets_bias_variance_ratio(datasets_expected_bias_variance, each_dataset)[source]

For each dataset calculate the expected bias and expected variance across fits, then calculate the ratio

ratio = expected bias / expected variance

and tabulate the results.

This gives an idea of how faithful uncertainties are for a set of datasets.

Notes

If uncertainties are faithfully estimated then we would expect to see ratio = 1. We should note that the ratio is a squared quantity and sqrt(ratio) is more appropriate for seeing how much uncertainties are over or underestimated. An over-estimate of uncertainty leads to sqrt(ratio) < 1, similarly an under-estimate of uncertainty leads to sqrt(ratio) > 1.

validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance(sqrt_experiments_bias_variance_ratio)[source]

Given the sqrt_experiments_bias_variance_ratio calculate a predicted value of \(\xi_{1 \sigma}\) for each experiment. The predicted value is based of the assumption that the difference between replica and central prediction and the difference between central prediction and underlying prediction are both gaussians centered on zero.

For example, if sqrt(expected bias/expected variance) is 0.5, then we would expect xi_{1 sigma} to be given by performing an integral of the distribution of

diffs = (central - underlying predictions)

over the domain defined by the variance. In this case the sqrt(variance) is twice as large as the sqrt(bias) which is the same as integrating a normal distribution mean = 0, std = 1 over the interval [-2, 2], given by

integral = erf(2/sqrt(2))

where erf is the error function.

In general the equation is

integral = erf(sqrt(variance / (2*bias)))

validphys.closuretest.multiclosure_output.experiments_bias_variance_ratio(experiments_expected_bias_variance, experiments_data, expected_total_bias_variance)[source]

Like datasets_bias_variance_ratio except for each experiment. Also calculate and tabulate

(total expected bias) / (total expected variance)

where the total refers to summing over all experiments.

validphys.closuretest.multiclosure_output.experiments_bias_variance_table(experiments_expected_bias_variance, group_dataset_inputs_by_experiment, expected_total_bias_variance)[source]

Tabulate the values of bias and variance for each experiment as well as the sqrt ratio of the two as in :py:func`sqrt_experiments_bias_variance_ratio`. Used as a performance indicator.

validphys.closuretest.multiclosure_output.experiments_bootstrap_expected_xi_table(experiments_bootstrap_expected_xi, experiments_data)[source]

Tabulate the mean and standard deviation across bootstrap samples of the expected xi calculated from the ratio of bias/variance. Returns a table with two columns, for the bootstrap mean and standard deviation and a row for each experiment plus the total across all experiments.

validphys.closuretest.multiclosure_output.experiments_bootstrap_sqrt_ratio_table(experiments_bootstrap_sqrt_ratio, experiments_data)[source]

Given experiments_bootstrap_sqrt_ratio, which a bootstrap resampling of the sqrt(bias/variance) for each experiment and the total across all data, tabulate the mean and standard deviation across bootstrap samples.

validphys.closuretest.multiclosure_output.experiments_bootstrap_xi_comparison(experiments_bootstrap_xi_table, experiments_bootstrap_expected_xi_table)[source]

Table comparing the mean and standard deviation across bootstrap samples of the measured xi_1sigma and the expected xi_1sigma calculated from bias/variance.

validphys.closuretest.multiclosure_output.experiments_bootstrap_xi_table(experiments_bootstrap_xi, experiments_data, total_bootstrap_xi)[source]

Tabulate the mean and standard deviation of xi_1sigma across bootstrap samples. Note that the mean has already be taken across data points (or eigenvectors in the basis which diagonalises the covariance matrix) for each individual bootstrap sample.

Tabulate the results for each experiment and for the total xi across all data.

validphys.closuretest.multiclosure_output.fits_measured_xi(experiments_xi_measured, experiments_data)[source]

Tabulate the measure value of xi_{1sigma} for each experiment, as calculated by data_xi (collected over experiments). Note that the mean is taken across eigenvectors of the covariance matrix.

validphys.closuretest.multiclosure_output.groups_bootstrap_expected_xi_table(groups_bootstrap_expected_xi, groups_data)[source]

Like experiments_bootstrap_expected_xi_table() but for metadata groups.

validphys.closuretest.multiclosure_output.groups_bootstrap_sqrt_ratio_table(groups_bootstrap_sqrt_ratio, groups_data)[source]

Like experiments_bootstrap_sqrt_ratio_table() but for metadata groups.

validphys.closuretest.multiclosure_output.groups_bootstrap_xi_comparison(groups_bootstrap_xi_table, groups_bootstrap_expected_xi_table)[source]

Like experiments_bootstrap_xi_comparison() but for metadata groups.

validphys.closuretest.multiclosure_output.groups_bootstrap_xi_table(groups_bootstrap_xi, groups_data, total_bootstrap_xi)[source]

Like experiments_bootstrap_xi_table() but for metadata groups.

validphys.closuretest.multiclosure_output.plot_bias_variance_distributions(experiments_fits_bias_replicas_variance_samples, group_dataset_inputs_by_experiment)[source]

For each experiment, plot the distribution across fits of bias and the distribution across fits and replicas of

fit_rep_var = (E[g] - g)_i inv(cov)_ij (E[g] - g)_j

where g is the replica prediction for fit l, replica k and E[g] is the mean across replicas of g for fit l.

validphys.closuretest.multiclosure_output.plot_data_central_diff_histogram(experiments_replica_central_diff)[source]

Histogram of the difference between central prediction and underlying law prediction normalised by the corresponding replica standard deviation by concatenating the difference across all data. plot a scaled gaussian for reference. Total xi is the number of central differences which fall within the 1-sigma confidence interval of the scaled gaussian.

validphys.closuretest.multiclosure_output.plot_data_fits_bias_variance(fits_data_bias_variance, data)[source]

Like plot_dataset_fits_bias_variance but for all data. Can use alongside group_dataset_inputs_by_experiment to plot for each experiment.

validphys.closuretest.multiclosure_output.plot_data_xi(data_xi, data)[source]

Like plot_dataset_xi except for all data. Can be used alongside group_dataset_inputs_by_experiment to plot for each experiment.

validphys.closuretest.multiclosure_output.plot_data_xi_histogram(data_xi, data)[source]

Like plot_dataset_xi_histogram but for all data. Can be used alongside group_dataset_inputs_by_experiment to plot for each experiment.

validphys.closuretest.multiclosure_output.plot_dataset_fits_bias_variance(fits_dataset_bias_variance, dataset)[source]

For a set of closure fits, calculate the bias and variance across fits and then plot scatter points so we can see the distribution of each quantity with fits. The spread of the variance across fits is assumed to be small compared to the spread of the biases, deviation from this assumption could suggest that there are finite size effects due to too few replicas.

validphys.closuretest.multiclosure_output.plot_dataset_xi(dataset_xi, dataset)[source]

For a given dataset, plot the value of xi_{1 sigma} for each eigenvector of the covariance matrix, along with the expected value of xi_{1 sigma} if the replicas distribution perfectly matches the central distribution (0.68). In the legend include the mean across eigenvectors.

validphys.closuretest.multiclosure_output.plot_dataset_xi_histogram(dataset_xi, dataset)[source]

For a given dataset, bin the values of xi_{1 sigma} for each eigenvector of the covariance matrix, plot as a histogram with a vertical line for the expected value: 0.68. In the legend print the mean and standard deviation of the distribution.

validphys.closuretest.multiclosure_output.plot_experiments_sqrt_ratio_bootstrap_distribution(experiments_bootstrap_sqrt_ratio, experiments_data)[source]

Plots a histogram for each experiment and the total, showing the distribution of bootstrap samples. Takes the mean and std deviation of the bootstrap sample and plots the corresponding scaled normal distribution for comparison. The limits are set to be +/- 3 std deviations of the mean.

validphys.closuretest.multiclosure_output.plot_experiments_xi_bootstrap_distribution(experiments_bootstrap_xi, total_bootstrap_xi, experiments_data)[source]

Similar to plot_sqrt_ratio_bootstrap_distribution() except plots the bootstrap distribution of xi_1sigma, along with a corresponding scaled gaussian, for each experiment (and for all data).

validphys.closuretest.multiclosure_output.plot_total_fits_bias_variance(fits_total_bias_variance)[source]

Like plot_dataset_fits_bias_variance but for the total bias/variance for all data, with the total calculated by summing up contributions from each experiment.

validphys.closuretest.multiclosure_output.sqrt_datasets_bias_variance_ratio(datasets_bias_variance_ratio)[source]

Given datasets_bias_variance_ratio take the sqrt and tabulate the results. This gives an idea of how faithful the uncertainties are in sensible units. As noted in datasets_bias_variance_ratio, bias/variance is a squared quantity and so when considering how much uncertainty has been over or underestimated it is more natural to consider sqrt(bias/variance).

validphys.closuretest.multiclosure_output.sqrt_experiments_bias_variance_ratio(experiments_bias_variance_ratio)[source]

Like sqrt_datasets_bias_variance_ratio except for each experiment.

validphys.closuretest.multiclosure_output.table_datasets_bias_variance_fits(fits_datasets_bias_variance, each_dataset)[source]

Table with ratio bias variance value and associated uncertainty computed with simple gaussian error propagation for each dataset.

validphys.closuretest.multiclosure_output.total_bias_variance_ratio(experiments_bias_variance_ratio, datasets_bias_variance_ratio, experiments_data)[source]

Combine datasets_bias_variance_ratio and experiments_bias_variance_ratio into single table with MultiIndex of experiment and dataset.

validphys.closuretest.multiclosure_output.total_expected_xi_error_finite_effects(total_expected_xi_resample, n_fit_samples, n_replica_samples)[source]

Given the resampled ratio of bias/variance, returns table of mean of resampled expected xi across bootstrap samples.

See expected_xi_from_bias_variance() for more details on how to calculate expected xi.

validphys.closuretest.multiclosure_output.total_expected_xi_means_finite_effects(total_expected_xi_resample, n_fit_samples, n_replica_samples)[source]

Given the resampled ratio of bias/variance, returns table of mean of resampled expected xi across bootstrap samples.

See expected_xi_from_bias_variance for more details on how to calculate expected xi.

validphys.closuretest.multiclosure_output.total_ratio_error_finite_effects(bias_variance_resampling_total, n_fit_samples, n_replica_samples)[source]

Like dataset_ratio_relative_error_finite_effects except for the total bias / variance (across all data).

validphys.closuretest.multiclosure_output.total_ratio_means_finite_effects(bias_variance_resampling_total, n_fit_samples, n_replica_samples)[source]

Vary number of fits and number of replicas used to perform bootstrap sample of expected bias and variance. For each combination of n_rep and n_fit tabulate the the mean across bootstrap samples of

ratio = total bias / total variance

which can give context to total_ratio_relative_error_finite_effects.

validphys.closuretest.multiclosure_output.total_std_xi_error_finite_effects(exps_xi_resample, n_fit_samples, n_replica_samples)[source]

For all data vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the std deviation of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the mean of std(xi_1sigma) across bootstrap samples.

validphys.closuretest.multiclosure_output.total_std_xi_means_finite_effects(exps_xi_resample, n_fit_samples, n_replica_samples)[source]

For all data vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the std deviation of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the standard deviation of std(xi_1sigma) across bootstrap samples.

validphys.closuretest.multiclosure_output.total_xi_error_finite_effects(total_xi_resample, n_fit_samples, n_replica_samples)[source]

For all data vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the mean of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the standard deviation of xi_1sigma across bootstrap samples.

validphys.closuretest.multiclosure_output.total_xi_means_finite_effects(total_xi_resample, n_fit_samples, n_replica_samples)[source]

For all data vary number of fits and number of replicas used to perform bootstrap sample of xi. Take the mean of xi across datapoints (note that points here refers to points in the basis which diagonalises the covmat) and then tabulate the standard deviation of xi_1sigma across bootstrap samples.

validphys.closuretest.multiclosure_output.xq2_data_prcs_maps(xq2_data_map, each_dataset)[source]

Heat map of the ratio bias variance (and xi, quantile estimator) for each datapoint in a dataset. The x and y axis are the x and Q2 coordinates of the datapoints. The color of each point is determined by the value of the ratio bias variance (and xi, quantile estimator).

validphys.closuretest.multiclosure_pdf module

multiclosure_pdf.py

Module containing all of the actions related to statistical estimators across multiple closure fits or proxy fits defined in PDF space. The actions in this module are used to produce results which are plotted in multiclosure_pdf_output.py

validphys.closuretest.multiclosure_pdf.bootstrap_pdf_differences(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, rng)[source]

Generate a single bootstrap sample of pdf_central_difference and pdf_replica_difference given the multiclosure fits grid values (fits_xi_grid_values); the underlying law grid values and the underlying law; and a numpy random state which is used to generate random indices for bootstrap sample. The bootstrap does include repeats and has the same number of fits and replicas as the original fits_xi_grid_values which is being resampled.

Returns:

pdf_difference – a tuple of 2 lists: the central differences and the replica differences. Each list is n_fits long and each element is a resampled differences array for a randomly selected fit, randomly selected replicas.

Return type:

tuple

validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_expected_xi(fits_bootstrap_pdf_sqrt_ratio)[source]

Using fits_bootstrap_pdf_sqrt_ratio calculate a bootstrap of the expected xi using the same procedure as in validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().

validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_ratio(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, multiclosure_nx=4, n_boot=100, boot_seed=9689372)[source]

Perform a bootstrap sampling across fits and replicas of the sqrt ratio, by flavour and total and then tabulate the mean and error

validphys.closuretest.multiclosure_pdf.fits_bootstrap_pdf_sqrt_ratio(fits_bootstrap_pdf_ratio)[source]

Take the square root of fits_bootstrap_pdf_ratio

validphys.closuretest.multiclosure_pdf.fits_correlation_matrix_totalpdf(fits_covariance_matrix_totalpdf)[source]

Given the fits_covariance_matrix_totalpdf, returns the corresponding correlation matrix

validphys.closuretest.multiclosure_pdf.fits_covariance_matrix_by_flavour(fits_replica_difference)[source]

Given a set of PDF grids from multiple closure tests, obtain an estimate of the covariance matrix for each flavour separately, return as a list of covmats

validphys.closuretest.multiclosure_pdf.fits_covariance_matrix_totalpdf(fits_replica_difference, multiclosure_nx=4)[source]

Given a set of PDF grids from multiple closure tests, obtain an estimate of the covariance matrix allowing for correlations across flavours

validphys.closuretest.multiclosure_pdf.fits_pdf_flavour_ratio(fits_sqrt_covmat_by_flavour, fits_central_difference, fits_replica_difference)[source]

Calculate the bias (chi2 between central PDF and underlying PDF) for each flavour and the variance (mean chi2 between replica and central PDF), then return a numpy array with shape (flavours, 2) with second axis being bias, variance

validphys.closuretest.multiclosure_pdf.fits_pdf_total_ratio(fits_central_difference, fits_replica_difference, fits_covariance_matrix_totalpdf, multiclosure_nx=4)[source]

Calculate the total bias and variance for all flavours and x allowing for correlations across flavour.

Returns:

ratio_data: tuple

required data for calculating mean(bias) over mean(variance) across fits in form of tuple (bias, variance)

validphys.closuretest.multiclosure_pdf.fits_sqrt_covmat_by_flavour(fits_covariance_matrix_by_flavour)[source]

For each flavour covariance matrix calculate the sqrt covmat (cholesky lower triangular)

validphys.closuretest.multiclosure_pdf.internal_nonsinglet_xgrid(multiclosure_nx=4)[source]

Given the number of x points, set up the xgrid for flavours which are not singlet or gluon, defined as being linearly spaced points between 0.1 and 0.5

validphys.closuretest.multiclosure_pdf.internal_singlet_gluon_xgrid(multiclosure_nx=4)[source]

Given the number of x points, set up the singlet and gluon xgrids, which are defined as half the points being logarithmically spaced between 10^-3 and 0.1 and the other half of the points being linearly spaced between 0.1 and 0.5

validphys.closuretest.multiclosure_pdf.pdf_central_difference(xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw)[source]

Calculate the difference between underlying law and central PDF for, specifically:

underlying_grid - mean(grid_vals)

where mean is across replicas.

Returns:

diffs: np.array

array of diffs with shape (flavour, x)

validphys.closuretest.multiclosure_pdf.pdf_replica_difference(xi_grid_values)[source]

Calculate the difference between the central PDF and the replica PDFs, specifically:

mean(grid_vals) - grid_vals

where the mean is across replicas.

Returns:

diffs: np.array

array of diffs with shape (replicas, flavour, x)

validphys.closuretest.multiclosure_pdf.replica_and_central_diff_totalpdf(fits_replica_difference, fits_central_difference, fits_covariance_matrix_totalpdf, multiclosure_nx=4, use_x_basis=False)[source]

Calculate sigma and delta, like xi_flavour_x() but return before calculating xi.

validphys.closuretest.multiclosure_pdf.underlying_xi_grid_values(multiclosure_underlyinglaw: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid)[source]

Like xi_pdfgrids but setting the PDF as the underlying law, extracted from a set of fits

validphys.closuretest.multiclosure_pdf.xi_flavour_x(fits_replica_difference, fits_central_difference, fits_covariance_matrix_by_flavour, use_x_basis=False)[source]

For a set of fits calculate the indicator function

I_{[-sigma, sigma]}(delta)

where sigma is the RMS difference between central and replicas PDF and delta is the difference between central PDF and underlying law.

The differences are all rotated to basis which diagonalises the covariance matrix that was estimated from the super set of all fit replicas.

Finally take the mean across fits to get xi in flavour and x.

validphys.closuretest.multiclosure_pdf.xi_grid_values(xi_pdfgrids)[source]

Grid values from the xi_pdfgrids concatenated as single numpy array

validphys.closuretest.multiclosure_pdf.xi_pdfgrids(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid)[source]

Generate PDF grids which are required for calculating xi in PDF space in the NN31IC basis, excluding the charm. We want to specify different xgrids for different flavours to avoid sampling PDFs in deep extrapolation regions. The limits are chosen to achieve this and specifically they are chosen to be:

gluon and singlet: 10^-3 < x < 0.5 other non-singlets: 0.1 < x < 0.5

Returns:

  • tuple of xplotting_grids, one for gluon and singlet and one for other

  • non-singlets

validphys.closuretest.multiclosure_pdf.xi_totalpdf(replica_and_central_diff_totalpdf)[source]

Like xi_flavour_x() except calculate the total xi across flavours and x accounting for correlations

validphys.closuretest.multiclosure_pdf_output module

multiclosure_pdf_output.py

Module containing all of the plots and tables for multiclosure estimators in PDF space.

validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_compare_xi_to_expected(fits_bootstrap_pdf_expected_xi_table, fits_bootstrap_pdf_xi_table)[source]

Table comparing the mean and standard deviation across bootstrap samples of the measured value of xi to the value calculated from bias/variance in PDF space. This is done for each flavour and for the total across all flavours accounting for correlations.

validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_expected_xi_table(fits_bootstrap_pdf_expected_xi)[source]

Tabulate the mean and standard deviation across bootstrap samples of fits_bootstrap_pdf_expected_xi() with a row for each flavour and the total expected xi.

validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_sqrt_ratio_table(fits_bootstrap_pdf_sqrt_ratio)[source]

Tabulate the mean and standard deviation across bootstrap samples of the sqrt ratio of bias/variance in PDF space, with a row for each flavour and the total. For more information on the bootstrap sampling see fits_bootstrap_pdf_ratio().

validphys.closuretest.multiclosure_pdf_output.fits_bootstrap_pdf_xi_table(fits_xi_grid_values, underlying_xi_grid_values, multiclosure_underlyinglaw, multiclosure_nx=4, n_boot=100, boot_seed=9689372, use_x_basis=False)[source]

Perform a bootstrap sampling across fits and replicas of xi, by flavour and total and then tabulate the mean and error.

validphys.closuretest.multiclosure_pdf_output.fits_pdf_bias_variance_ratio(fits_pdf_flavour_ratio, fits_pdf_total_ratio)[source]

Returns a table with the values of mean bias / mean variance with mean referring to mean across fits, by flavour. Includes total across all flavours allowing for correlations.

validphys.closuretest.multiclosure_pdf_output.fits_pdf_compare_xi_to_expected(fits_pdf_expected_xi_from_ratio, xi_flavour_table)[source]

Two-column table comparing the measured value of xi for each flavour to the value calculated from the bias/variance.

validphys.closuretest.multiclosure_pdf_output.fits_pdf_expected_xi_from_ratio(fits_pdf_sqrt_ratio)[source]

Like validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance() but in PDF space. An estimate is made of the integral across the central difference distribution, with domain defined by the replica distribution. For more details see validphys.closuretest.multiclosure_output.expected_xi_from_bias_variance().

validphys.closuretest.multiclosure_pdf_output.fits_pdf_sqrt_ratio(fits_pdf_bias_variance_ratio)[source]

Like fits_pdf_bias_variance_ratio() except taking the sqrt. This is to see how faithful our uncertainty is in units of the standard deviation.

validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_correlation_eigenvalues(fits_correlation_matrix_totalpdf)[source]

Plot scatter points for each of the eigenvalues from the estimated correlation matrix from the multiclosure PDFs in flavour and x.

In the legend add the ratio of the largest eigenvalue over the smallest eigenvalue, aka the l2 condition number of the correlation matrix.

validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_correlation_matrix(fits_correlation_matrix_totalpdf, multiclosure_nx=4)[source]

Like plot_multiclosure_covariance_matrix but plots the total correlation matrix.

validphys.closuretest.multiclosure_pdf_output.plot_multiclosure_covariance_matrix(fits_covariance_matrix_totalpdf, multiclosure_nx=4)[source]

Plot the covariance matrix for all flavours. The covariance matrix has shape n_flavours * n_x, where each block is the covariance of the replica PDFs on the x-grid defined in xi_pdfgrids().

validphys.closuretest.multiclosure_pdf_output.plot_pdf_central_diff_histogram(replica_and_central_diff_totalpdf)[source]

Histogram of the difference between central PDF and underlying law normalised by the corresponding replica standard deviation for all points in x and flavour alongside a scaled Gaussian. Total xi is proportion of the histogram which falls within the central 1-sigma confidence interval.

validphys.closuretest.multiclosure_pdf_output.plot_pdf_matrix(matrix, n_x, **kwargs)[source]

Utility function which, given a covmat/corrmat for all flavours and x, plots it with appropriate labels. Input matrix is expected to be size (n_flavours*n_x) * (n_flavours*n_x).

Parameters:
  • matrix (np.array) – square matrix which must be (n_flavours*n_x) * (n_flavours*n_x) with elements ordered like: (flavour0_x0, flavour0_x1, …, flavourN_x0, …, flavourN_xN) i.e. the points along x for flavour 0, then points along x for flavour 1 etc.

  • **kwargs – keyword arguments for the matplotlib.axes.Axes.imshow function

Notes

See matplotlib.axes.Axes.imshow for more details on the plotting function.

validphys.closuretest.multiclosure_pdf_output.plot_xi_flavour_x(xi_flavour_x, Q, internal_singlet_gluon_xgrid, internal_nonsinglet_xgrid, multiclosure_nx=4, use_x_basis=False)[source]

For each flavour plot xi for each x-point. By default xi is calculated and plotted in the basis which diagonalises the covmat, which is estimated from the union of all the replicas. However, if use_x_basis is True then xi will be calculated and plotted in the x-basis.

validphys.closuretest.multiclosure_pdf_output.xi_flavour_table(xi_flavour_x, xi_totalpdf)[source]

For each flavour take the mean of xi_flavour_x across x to get a single number, which is the proportion of points on the central PDF which are within 1 sigma. This is calculated from the replicas of the underlying PDF.

Returns:

xi_flavour – table of xi by flavour

Return type:

pd.DataFrame

validphys.closuretest.multiclosure_preprocessing module

multiclosure_preprocessing.py

Module containing all of the actions related to preprocessing exponents. In particular, comparing the next preprocessing exponents across the multiple closure fits with the previous effective exponents, to see if there is a big dependence on the level 1 shift.

validphys.closuretest.multiclosure_preprocessing.next_multiclosure_alpha_preprocessing_table(fits, fits_basis, fits_pdf, fits_fitbasis_alpha_lines)[source]

Returns a table with the next alpha preprocessing exponent for each fit with a multiindex column of flavour and next preprocessing range limits.

For more information see _next_multiclosure_preprocessing_table()

validphys.closuretest.multiclosure_preprocessing.next_multiclosure_beta_preprocessing_table(fits, fits_basis, fits_pdf, fits_fitbasis_beta_lines)[source]

Returns a table with the next beta preprocessing exponent for each fit with a multiindex column of flavour and next preprocessing range limits.

For more information see _next_multiclosure_preprocessing_table()

validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_alpha_preprocessing(fits_fitbasis_alpha_lines, fits_pdf, next_multiclosure_alpha_preprocessing_table)[source]

Using the table produced by next_multiclosure_alpha_preprocessing_table(), plot the next alpha preprocessing exponent ranges. The ranges are represented by horizontal error bars, with vertical lines indicating the previous range limits of the first fit.

validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_alpha_preprocessing_range_width(fits_fitbasis_alpha_lines, fits_pdf, next_multiclosure_alpha_preprocessing_table)[source]

Using the table produced by next_multiclosure_alpha_preprocessing_table(), plot the next alpha preprocessing exponent ranges width, aka max alpha - min alpha as a histogram over fits for each flavour. Add a vertical line of the previous range width of the first fit for reference

validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_beta_preprocessing(fits_fitbasis_beta_lines, fits_pdf, next_multiclosure_beta_preprocessing_table)[source]

Using the table produced by next_multiclosure_beta_preprocessing_table(), plot the next beta preprocessing exponent ranges. The ranges are represented by horizontal error bars, with vertical lines indicating the previous range limits of the first fit.

validphys.closuretest.multiclosure_preprocessing.plot_next_multiclosure_beta_preprocessing_range_width(fits_fitbasis_beta_lines, fits_pdf, next_multiclosure_beta_preprocessing_table)[source]

Using the table produced by next_multiclosure_beta_preprocessing_table(), plot the next beta preprocessing exponent ranges width, aka max beta - min beta as a histogram over fits for each flavour. Add a vertical line of the previous range width of the first fit for reference

validphys.closuretest.multiclosure_pseudodata module

multiclosure_pseudodata

actions which load fit pseudodata and compute actions related to overfitting. Estimators here can only be calculated on data used in the fit.

validphys.closuretest.multiclosure_pseudodata.expected_data_delta_chi2(data_fits_cv, internal_multiclosure_data_loader)[source]

For data, calculate the mean of delta chi2 across all fits, returns a tuple of number of data points and unnormalised delta chi2.

validphys.closuretest.multiclosure_pseudodata.expected_delta_chi2_table(groups_expected_delta_chi2, group_dataset_inputs_by_metadata, total_expected_data_delta_chi2)[source]

Tabulate the expectation value of delta chi2 across fits for groups with an additional row with the total across all data at the bottom.

validphys.closuretest.multiclosure_pseudodata.fits_dataset_cvs(fits_dataset)[source]

Internal function for loading the level one data for all fits for a single dataset. This function avoids the stringent metadata checks of the newer python commondata parser.

validphys.closuretest.multiclosure_pseudodata.total_expected_data_delta_chi2(exps_expected_delta_chi2)[source]

Takes expected_data_delta_chi2() evaluated for each experiment and then sums across experiments. Returns the total number of datapoints and unnormalised delta chi2.

Module contents

closuretest

module containing all actions specific to closure test