validphys package
Subpackages
- validphys.closuretest package
- Submodules
- validphys.closuretest.closure_checks module
check_at_least_10_fits()
check_fit_isclosure()
check_fits_areclosures()
check_fits_different_filterseed()
check_fits_have_same_basis()
check_fits_same_filterseed()
check_fits_underlying_law_match()
check_multifit_replicas()
check_t0pdfset_matches_law()
check_t0pdfset_matches_multiclosure_law()
check_use_fitcommondata()
- validphys.closuretest.closure_plots module
- validphys.closuretest.closure_results module
BiasData
VarianceData
bias_dataset()
bias_experiment()
biases_table()
bootstrap_bias_experiment()
bootstrap_variance_experiment()
delta_chi2_bootstrap()
delta_chi2_table()
fit_underlying_pdfs_summary()
fits_bootstrap_bias_table()
fits_bootstrap_variance_table()
summarise_closure_underlying_pdfs()
variance_dataset()
variance_experiment()
- validphys.closuretest.multiclosure module
BootstrappedTheoryResult
PCAInternalMulticlosureLoader
bias_variance_resampling_data()
bias_variance_resampling_dataset()
bias_variance_resampling_total()
bootstrapped_indicator_function_data()
bootstrapped_internal_multiclosure_data_loader()
bootstrapped_internal_multiclosure_data_loader_pca()
bootstrapped_internal_multiclosure_dataset_loader()
bootstrapped_internal_multiclosure_dataset_loader_pca()
bootstrapped_principal_components_bias_variance_data()
bootstrapped_principal_components_bias_variance_dataset()
bootstrapped_principal_components_normalized_delta_data()
data_replica_and_central_diff()
data_xi()
dataset_fits_bias_replicas_variance_samples()
dataset_inputs_fits_bias_replicas_variance_samples()
dataset_replica_and_central_diff()
dataset_xi()
eigendecomposition()
expected_data_bias_variance()
expected_dataset_bias_variance()
expected_total_bias_variance()
experiments_bootstrap_expected_xi()
experiments_bootstrap_ratio()
experiments_bootstrap_sqrt_ratio()
fits_bootstrap_data_bias_variance()
fits_bootstrap_data_xi()
fits_data_bias_variance()
fits_dataset_bias_variance()
fits_normed_dataset_central_delta()
fits_total_bias_variance()
groups_bootstrap_expected_xi()
groups_bootstrap_ratio()
groups_bootstrap_sqrt_ratio()
internal_multiclosure_data_loader()
internal_multiclosure_data_loader_pca()
internal_multiclosure_dataset_loader()
internal_multiclosure_dataset_loader_pca()
n_fit_samples()
n_replica_samples()
principal_components_bias_variance_data()
principal_components_bias_variance_dataset()
principal_components_normalized_delta_data()
standard_indicator_function()
total_bootstrap_ratio()
total_bootstrap_xi()
total_expected_xi_resample()
total_xi_resample()
xi_resampling_data()
xi_resampling_dataset()
xq2_dataset_map()
- validphys.closuretest.multiclosure_output module
compare_measured_expected_xi()
dataset_ratio_error_finite_effects()
dataset_std_xi_error_finite_effects()
dataset_std_xi_means_finite_effects()
dataset_xi_error_finite_effects()
dataset_xi_means_finite_effects()
datasets_bias_variance_ratio()
expected_xi_from_bias_variance()
experiments_bias_variance_ratio()
experiments_bias_variance_table()
experiments_bootstrap_expected_xi_table()
experiments_bootstrap_sqrt_ratio_table()
experiments_bootstrap_xi_comparison()
experiments_bootstrap_xi_table()
fits_measured_xi()
groups_bootstrap_expected_xi_table()
groups_bootstrap_sqrt_ratio_table()
groups_bootstrap_xi_comparison()
groups_bootstrap_xi_table()
plot_bias_variance_distributions()
plot_data_central_diff_histogram()
plot_data_fits_bias_variance()
plot_data_xi()
plot_data_xi_histogram()
plot_dataset_fits_bias_variance()
plot_dataset_xi()
plot_dataset_xi_histogram()
plot_experiments_sqrt_ratio_bootstrap_distribution()
plot_experiments_xi_bootstrap_distribution()
plot_total_fits_bias_variance()
sqrt_datasets_bias_variance_ratio()
sqrt_experiments_bias_variance_ratio()
table_datasets_bias_variance_fits()
total_bias_variance_ratio()
total_expected_xi_error_finite_effects()
total_expected_xi_means_finite_effects()
total_ratio_error_finite_effects()
total_ratio_means_finite_effects()
total_std_xi_error_finite_effects()
total_std_xi_means_finite_effects()
total_xi_error_finite_effects()
total_xi_means_finite_effects()
xq2_data_prcs_maps()
- validphys.closuretest.multiclosure_pdf module
bootstrap_pdf_differences()
fits_bootstrap_pdf_expected_xi()
fits_bootstrap_pdf_ratio()
fits_bootstrap_pdf_sqrt_ratio()
fits_correlation_matrix_totalpdf()
fits_covariance_matrix_by_flavour()
fits_covariance_matrix_totalpdf()
fits_pdf_flavour_ratio()
fits_pdf_total_ratio()
fits_sqrt_covmat_by_flavour()
internal_nonsinglet_xgrid()
internal_singlet_gluon_xgrid()
pdf_central_difference()
pdf_replica_difference()
replica_and_central_diff_totalpdf()
underlying_xi_grid_values()
xi_flavour_x()
xi_grid_values()
xi_pdfgrids()
xi_totalpdf()
- validphys.closuretest.multiclosure_pdf_output module
fits_bootstrap_pdf_compare_xi_to_expected()
fits_bootstrap_pdf_expected_xi_table()
fits_bootstrap_pdf_sqrt_ratio_table()
fits_bootstrap_pdf_xi_table()
fits_pdf_bias_variance_ratio()
fits_pdf_compare_xi_to_expected()
fits_pdf_expected_xi_from_ratio()
fits_pdf_sqrt_ratio()
plot_multiclosure_correlation_eigenvalues()
plot_multiclosure_correlation_matrix()
plot_multiclosure_covariance_matrix()
plot_pdf_central_diff_histogram()
plot_pdf_matrix()
plot_xi_flavour_x()
xi_flavour_table()
- validphys.closuretest.multiclosure_preprocessing module
- validphys.closuretest.multiclosure_pseudodata module
- Module contents
- validphys.compareclosuretemplates package
- validphys.comparefittemplates package
- validphys.cuts package
- validphys.deltachi2templates package
- validphys.hyperplottemplates package
- validphys.mplstyles package
- validphys.photon package
- validphys.plotoptions package
- Submodules
- validphys.plotoptions.core module
- validphys.plotoptions.kintransforms module
DIJET3DXQ2MapMixin
DIJETATLASXQ2MapMixin
DIJETXQ2MapMixin
DISXQ2MapMixin
DYMXQ2MapMixin
DYXQ2MapMixin
EWPTXQ2MapMixin
HQPTXQ2MapMixin
HQQPTXQ2MapMixin
JETXQ2MapMixin
Kintransform
SqrtScaleMixin
dijet_CMS_3D
dijet_CMS_5TEV
dijet_sqrt_scale
dijet_sqrt_scale_ATLAS
dis_sqrt_scale
dyp_sqrt_scale
ewj_jpt_sqrt_scale
ewj_jrap_sqrt_scale
ewj_mll_sqrt_scale
ewj_pt_sqrt_scale
ewj_ptrap_sqrt_scale
ewj_rap_sqrt_scale
ewk_mll_sqrt_scale
ewk_pseudorapity_sqrt_scale
ewk_pt_sqrt_scale
ewk_ptrap_sqrt_scale
ewk_rap_sqrt_scale
hig_rap_sqrt_scale
hqp_mqq_sqrt_scale
hqp_ptq_sqrt_scale
hqp_ptqq_sqrt_scale
hqp_yq_sqrt_scale
hqp_yqq_sqrt_scale
identity
inc_sqrt_scale
jet_sqrt_scale
nmc_process
pht_sqrt_scale
sia_sqrt_scale
- validphys.plotoptions.labelers module
- validphys.plotoptions.plottingoptions module
PlottingOptions
PlottingOptions.all_labels
PlottingOptions.already_digested
PlottingOptions.data_reference
PlottingOptions.dataset_label
PlottingOptions.experiment
PlottingOptions.extra_labels
PlottingOptions.figure_by
PlottingOptions.func_labels
PlottingOptions.kinematics_override
PlottingOptions.line_by
PlottingOptions.nnpdf31_process
PlottingOptions.normalize
PlottingOptions.parse_figure_by()
PlottingOptions.parse_line_by()
PlottingOptions.parse_x()
PlottingOptions.plot_x
PlottingOptions.process_description
PlottingOptions.result_transform
PlottingOptions.theory_reference
PlottingOptions.x
PlottingOptions.x_label
PlottingOptions.x_scale
PlottingOptions.y_label
PlottingOptions.y_scale
ResultTransformations
Scale
TransformFunctions
TransformFunctions.dijet_CMS_3D
TransformFunctions.dijet_CMS_5TEV
TransformFunctions.dijet_sqrt_scale
TransformFunctions.dijet_sqrt_scale_ATLAS
TransformFunctions.dis_sqrt_scale
TransformFunctions.dyp_sqrt_scale
TransformFunctions.ewj_jpt_sqrt_scale
TransformFunctions.ewj_jrap_sqrt_scale
TransformFunctions.ewj_mll_sqrt_scale
TransformFunctions.ewj_pt_sqrt_scale
TransformFunctions.ewj_ptrap_sqrt_scale
TransformFunctions.ewj_rap_sqrt_scale
TransformFunctions.ewk_mll_sqrt_scale
TransformFunctions.ewk_pseudorapity_sqrt_scale
TransformFunctions.ewk_pt_sqrt_scale
TransformFunctions.ewk_ptrap_sqrt_scale
TransformFunctions.ewk_rap_sqrt_scale
TransformFunctions.hig_rap_sqrt_scale
TransformFunctions.hqp_mqq_sqrt_scale
TransformFunctions.hqp_ptq_sqrt_scale
TransformFunctions.hqp_ptqq_sqrt_scale
TransformFunctions.hqp_yq_sqrt_scale
TransformFunctions.hqp_yqq_sqrt_scale
TransformFunctions.identity
TransformFunctions.inc_sqrt_scale
TransformFunctions.jet_sqrt_scale
TransformFunctions.nmc_process
TransformFunctions.pht_sqrt_scale
TransformFunctions.sia_sqrt_scale
- validphys.plotoptions.resulttransforms module
- validphys.plotoptions.utils module
- Module contents
- validphys.scalevariations package
- validphys.scripts package
- Submodules
- validphys.scripts.main module
- validphys.scripts.postfit module
- validphys.scripts.vp_checktheory module
- validphys.scripts.vp_comparefits module
CompareFitApp
CompareFitApp.add_positional_arguments()
CompareFitApp.complete_mapping()
CompareFitApp.get_commandline_arguments()
CompareFitApp.get_config()
CompareFitApp.interactive_author()
CompareFitApp.interactive_current_fit()
CompareFitApp.interactive_current_fit_label()
CompareFitApp.interactive_keywords()
CompareFitApp.interactive_reference_fit()
CompareFitApp.interactive_reference_fit_label()
CompareFitApp.interactive_thcovmat_if_present()
CompareFitApp.interactive_title()
CompareFitApp.try_complete_args()
main()
- validphys.scripts.vp_deltachi2 module
- validphys.scripts.vp_fitrename module
- validphys.scripts.vp_get module
- validphys.scripts.vp_hyperoptplot module
- validphys.scripts.vp_list module
- validphys.scripts.vp_nextfitruncard module
- validphys.scripts.vp_pdffromreplicas module
- validphys.scripts.vp_pdfrename module
- validphys.scripts.vp_upload module
- validphys.scripts.wiki_upload module
- Module contents
- validphys.tests package
- Subpackages
- Submodules
- validphys.tests.conftest module
- validphys.tests.test_alpha_s_bundle_pdf module
- validphys.tests.test_arclengths module
- validphys.tests.test_calcutils module
- validphys.tests.test_closuretest module
- validphys.tests.test_commondataparser module
- validphys.tests.test_core module
- validphys.tests.test_covmatreg module
- validphys.tests.test_covmats module
- validphys.tests.test_cuts module
- validphys.tests.test_datafiles module
- validphys.tests.test_effexponents module
- validphys.tests.test_filter_rules module
- validphys.tests.test_fitdata module
- validphys.tests.test_fitveto module
- validphys.tests.test_hessian2mc module
- validphys.tests.test_loader module
- validphys.tests.test_mc2hessian module
- validphys.tests.test_metaexps module
- validphys.tests.test_multiclosure module
- validphys.tests.test_overfit_metric module
- validphys.tests.test_plots module
- validphys.tests.test_postfit module
- validphys.tests.test_pseudodata module
- validphys.tests.test_pyfkdata module
- validphys.tests.test_pythonmakereplica module
- validphys.tests.test_regressions module
- validphys.tests.test_results module
- validphys.tests.test_sumrules module
- validphys.tests.test_tableloader module
- validphys.tests.test_theorydbutils module
- validphys.tests.test_totalchi2 module
- validphys.tests.test_utils module
- validphys.tests.test_vplistscript module
- validphys.tests.test_weights module
- Module contents
- validphys.theorycovariance package
- Submodules
- validphys.theorycovariance.construction module
ProcessInfo
combine_by_type()
compute_covs_pt_prescrip()
covmat_3fpt()
covmat_3pt()
covmat_5barpt()
covmat_5pt()
covmat_7pt()
covmat_9pt()
covmat_n3lo_ad()
covmat_n3lo_fhmv()
covmat_n3lo_singlet()
covs_pt_prescrip()
experimentplustheory_corrmat_custom()
fromfile_covmat()
procs_index_matched()
theory_corrmat_custom()
theory_covmat_custom()
theory_covmat_custom_fitting()
theory_covmat_custom_per_prescription()
theory_covmat_dataset()
theory_normcovmat_custom()
total_theory_covmat()
total_theory_covmat_fitting()
user_covmat()
user_covmat_fitting()
- validphys.theorycovariance.output module
matrix_plot_labels()
plot_corrmat_heatmap()
plot_covmat_heatmap()
plot_diag_cov_comparison()
plot_diag_cov_comparison_by_experiment()
plot_diag_cov_comparison_by_process()
plot_expcorrmat_heatmap()
plot_expplusthcorrmat_heatmap_custom()
plot_normexpcovmat_heatmap()
plot_normthcovmat_heatmap_custom()
plot_thcorrmat_heatmap_custom()
- validphys.theorycovariance.tests module
LabeledShifts
alltheory_vector()
concatenated_shx_vector()
dataset_alltheory()
deltamiss_plot()
diagdf_theory_covmat()
doubleindex_set_byprocess()
doubleindex_thcovmat()
efficiency()
eigenvector_plot()
evals_nonzero_basis()
fnorm_shifts_byprocess()
fnorm_shifts_ordered()
ordered_alltheory_vector()
projected_condition_num()
projector_eigenvalue_ratio()
shift_diag_cov_comparison()
shift_vector()
sqrtdiags_thcovmat_byprocess()
theory_covmat_eigenvalues()
theory_shift_test()
theta()
ticklocs_thcovmat()
tripleindex_thcovmat_complete()
validation_theory_chi2()
vectors_3pt()
vectors_5barpt()
vectors_5pt()
vectors_7pt()
vectors_9pt()
- validphys.theorycovariance.theorycovarianceutils module
- Module contents
Submodules
validphys.api module
api.py
This module contains the reportengine programmatic API, initialized with the validphys providers, Config and Environment.
Example:
Simple Usage:
>> from validphys.api import API >> fig = API.plot_pdfs(pdf=”NNPDF_nlo_as_0118”, Q=100) >> fig.show()
validphys.app module
app.py
Mainloop of the validphys application. Here we define tailoted extensions to the reporthengine application (such as extra command line flags). Additionally the provider modules that serve as source to the validphys actions are declared here.
The entry point of the validphys application is the main
funcion of this
module.
- class validphys.app.App(name='validphys', providers=['validphys.results', 'validphys.commondata', 'validphys.pdfgrids', 'validphys.pdfplots', 'validphys.dataplots', 'validphys.fitdata', 'validphys.arclength', 'validphys.sumrules', 'validphys.reweighting', 'validphys.kinematics', 'validphys.correlations', 'validphys.eff_exponents', 'validphys.asy_exponents', 'validphys.theorycovariance.construction', 'validphys.theorycovariance.output', 'validphys.theorycovariance.tests', 'validphys.replica_selector', 'validphys.closuretest', 'validphys.mc_gen', 'validphys.theoryinfo', 'validphys.pseudodata', 'validphys.renametools', 'validphys.covmats', 'validphys.hyperoptplot', 'validphys.deltachi2', 'validphys.n3fit_data', 'validphys.mc2hessian', 'reportengine.report', 'validphys.overfit_metric', 'validphys.hessian2mc'])[source]
Bases:
App
- property argparser
- critical_message = 'A critical error occurred. This is likely due to one of the following reasons:\n\n - A bug in validphys.\n - Corruption of the provided resources (e.g. incorrect plotting files).\n - Cosmic rays hitting your CPU and altering the registers.\n\nThe traceback above should help determine the cause of the problem. If you\nbelieve this is a bug in validphys (please discard the cosmic rays first),\nplease open an issue on GitHub<https://github.com/NNPDF/nnpdf/issues>,\nincluding the contents of the following file:\n\n%s\n'
- property default_style
- environment_class
alias of
Environment
validphys.arclength module
arclength.py
Module for the computation and presentation of arclengths.
- class validphys.arclength.ArcLengthGrid(pdf, basis, flavours, stats)
Bases:
tuple
- basis
Alias for field number 1
- flavours
Alias for field number 2
- pdf
Alias for field number 0
- stats
Alias for field number 3
- validphys.arclength.arc_length_table(arc_lengths)[source]
Return a table with the descriptive statistics of the arc lengths over members of the PDF.
- validphys.arclength.arc_lengths(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Compute arc lengths at scale Q
set up a grid with three segments and compute the arclength for each segment. Note: the variation of the PDF over the grid is computed by computing the forward differences between adjacent grid points.
- Parameters:
pdf (validphys.core.PDF object)
Q (float) – scale at which to evaluate PDF
basis (default = "flavour")
flavours (default = None)
- Returns:
validphys.arclength.ArcLengthGrid object
object that contains the PDF, basis, flavours, and computed
arc length statistics.
- validphys.arclength.integrability_number(pdf: ~validphys.core.PDF, Q: ~numbers.Real, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'evolution', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Return sum_i |x_i*f(x_i)|, x_i = {1e-9, 1e-8, 1e-7} for selected flavours
validphys.asy_exponents module
Tools for computing and plotting asymptotic exponents.
- class validphys.asy_exponents.AsyExponentBandPlotter(exponent, *args, **kwargs)[source]
Bases:
BandPDFPlotter
Class inheriting from BandPDFPlotter, changing title and ylabel to reflect the asymptotic exponent being plotted.
- validphys.asy_exponents.alpha_asy(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Returns a list of xplotting_grids containing the value of the asymptotic exponent alpha, as defined by the first relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.
- validphys.asy_exponents.asymptotic_exponents_table(pdf: ~validphys.core.PDF, *, x_alpha: ~numbers.Real = 1e-06, x_beta: ~numbers.Real = 0.9, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, npoints=100)[source]
Returns a table with the values of the asymptotic exponents alpha and beta, as defined in Eq. (4) of [arXiv:1604.00024], at the specified value of x and Q.
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.
- validphys.asy_exponents.beta_asy(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 100, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Returns a list of xplotting_grids containing the value of the asymptotic exponent beta, as defined by the second relationship in Eq. (4) of [arXiv:1604.00024], at the specified value of Q (in GeV), in the interval [xmin, xmax].
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
npoints: the number of sub-intervals in the range [xmin, xmax] on which the derivative is computed.
validphys.calcutils module
calcutils.py
Low level utilities to calculate χ² and such. These are used to implement the higher level functions in results.py
- validphys.calcutils.all_chi2(results)[source]
Return the chi² for all elements in the result, regardless of the Stats class Note that the interpretation of the result will depend on the PDF error type
- validphys.calcutils.all_chi2_theory(results, totcov)[source]
Like all_chi2 but here the chi² are calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.
- validphys.calcutils.bootstrap_values(data, nresamples, *, boot_seed: int | None = None, apply_func: Callable | None = None, args=None)[source]
General bootstrap sample
data is the data which is to be sampled, replicas is assumed to be on the final axis e.g N_bins*N_replicas
boot_seed can be specified if the user wishes to be able to take exact same bootstrap samples multiple times, as default it is set as None, in which case a random seed is used.
If just data and nresamples is provided, then bootstrap_values creates N resamples of the data, where each resample is a Monte Carlo selection of the data across replicas. The mean of each resample is returned
Alternatively, the user can specify a function to be sampled apply_func plus any additional arguments required by that function. bootstrap_values then returns apply_func(bootstrap_data, *args) where bootstrap_data.shape = (data.shape, nresamples). It is critical that apply_func can handle data input in this format.
- validphys.calcutils.calc_chi2(sqrtcov, diffs)[source]
Elementary function to compute the chi², given a Cholesky decomposed lower triangular part and a vector of differences.
- Parameters:
sqrtcov (matrix) – A lower tringular matrix corresponding to the lower part of the Cholesky decomposition of the covariance matrix.
diffs (array) – A vector of differences (e.g. between data and theory). The first dimenssion must match the shape of sqrtcov. The computation will be broadcast over the other dimensions.
- Returns:
chi2 – The result of the χ² for each vector of differences. Will have the same shape as
diffs.shape[1:]
.- Return type:
array
Notes
This function computes the χ² more efficiently and accurately than following the direct definition of inverting the covariance matrix, \(\chi^2 = d\Sigma^{-1}d\), by solving the triangular linear system instead.
Examples
>>> from validphys.calcutils import calc_chi2 >>> import numpy as np >>> import scipy.linalg as la >>> np.random.seed(0) >>> diffs = np.random.rand(10) >>> s = np.random.rand(10,10) >>> cov = s@s.T >>> calc_chi2(la.cholesky(cov, lower=True), diffs) 44.64401691354948 >>> diffs@la.inv(cov)@diffs 44.64401691354948
- validphys.calcutils.calc_phi(sqrtcov, diffs)[source]
Low level function which calculates phi given a Cholesky decomposed lower triangular part and a vector of differences. Primarily used when phi is to be calculated independently from chi2.
The vector of differences diffs is expected to have N_bins on the first axis
- validphys.calcutils.central_chi2(results)[source]
Calculate the chi² from the central value of the theory prediction to the data
- validphys.calcutils.central_chi2_theory(results, totcov)[source]
Like central_chi2 but here the chi² is calculated using a covariance matrix that is the sum of the experimental covmat and the theory covmat.
- validphys.calcutils.get_df_block(matrix: DataFrame, key: str, level)[source]
Given a pandas dataframe whose index and column keys match, and data represents a symmetric matrix returns a diagonal block of this matrix corresponding to matrix`[key, key`] as a numpy array
addtitionally, the user can specify the level of the key for which the cross section is being taken, by default it is set to 1 which corresponds to the dataset level of a theory covariance matrix
- validphys.calcutils.regularize_covmat(covmat: array, norm_threshold=4)[source]
Given a covariance matrix, performs a regularization which is equivalent to performing regularize_l2 on the sqrt of covmat: the l2 norm of the inverse of the correlation matrix calculated from covmat is set to be less than or equal to norm_threshold. If the input covmat already fulfills this criterion it is returned.
- Parameters:
covmat (array) – a covariance matrix which is to be regularized.
norm_threshold (float) – The acceptable l2 norm of the sqrt correlation matrix, by default set to 4.
- Returns:
new_covmat – A new covariance matrix which has been regularized according to prescription above.
- Return type:
array
- validphys.calcutils.regularize_l2(sqrtcov, norm_threshold=4)[source]
Return a regularized version of sqrtcov.
Given sqrtcov an (N, nsys) matrix, such that it’s gram matrix is the covariance matrix (covmat = sqrtcov@sqrtcov.T), first decompose it like
sqrtcov = D@A
, where D is a positive diagonal matrix of standard deviations and A is the “square root” of the correlation matrix,corrmat = A@A.T
. Then produce a new version of A which removes the unstable behaviour and assemble a new square root covariance matrix, which is returned.The stability condition is controlled by norm_threshold. It is
\[\left\Vert A^+ \right\Vert_{L2} \leq \frac{1}{\text{norm_threshold}}\]A+ is the pseudoinverse of A, norm_threshold roughly corresponds to the sqrt of the maximimum relative uncertainty in any systematic.
- Parameters:
sqrtcov (2d array) – An (N, nsys) matrix specifying the uncertainties.
norm_threshold (float) – The tolerance for the regularization.
- Returns:
newsqrtcov – A regularized version of sqrtcov.
- Return type:
2d array
validphys.checks module
Created on Thu Jun 2 19:35:40 2016
@author: Zahari Kassabov
- validphys.checks.check_darwin_single_process(NPROC)[source]
Check that if we are on macOS (platform is Darwin), NPROC is equal to 1. This is related to the infamous issues with multiprocessing on macOS.
The “solution” is to run the code sequentially if NPROC is 1 and enforce that macOS users don’t set NPROC as anything else.
TODO: Once pseudodata is generated in python, try using spawn instead of fork with multiprocessing.
Notes
for the specific NNPDF issue: https://github.com/NNPDF/nnpdf/issues/931
General discussion: https://wefearchange.org/2018/11/forkmacos.rst.html
- validphys.checks.check_dataspecs_fits_different(dataspecs_fit)[source]
Need this check because oterwise the pandas object gets confused
- validphys.checks.check_fits_different(fits)[source]
Need this check because oterwise the pandas object gets confused
- validphys.checks.check_mixband_as_replicas(pdfs, mixband_as_replicas)[source]
Same as check_pdfs_noband, but for the mixband_as_replicas key. Allows mixband_as_replicas to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).
- validphys.checks.check_pdf_normalize_to(pdfs, normalize_to)[source]
Transforn normalize_to into an index.
- validphys.checks.check_pdfs_noband(pdfs, pdfs_noband)[source]
Allows pdfs_noband to be specified as a list of PDF IDs or a list of PDF indexes (starting from one).
- validphys.checks.check_scale(scalename, allow_none=False)[source]
Check that we have a valid matplotlib scale. With allow_none=True, also None is valid.
- validphys.checks.check_speclabels_different(dataspecs_speclabel)[source]
This is needed for grouping dataframes (and because generally indecated a bug)
validphys.commondata module
commondata.py
Module containing actions which return loaded commondata, leverages utils
found in validphys.commondataparser
, and returns objects from
validphys.coredata
- validphys.commondata.loaded_commondata_with_cuts(commondata, cuts)[source]
Load the commondata and apply cuts.
- Parameters:
commondata (validphys.core.CommonDataSpec) – commondata to load and cut.
cuts (validphys.core.cuts, None) – valid cuts, used to cut loaded commondata.
- Returns:
loaded_cut_commondata
- Return type:
validphys.commondataparser module
This module implements parsers for commondata and its associated metadata and uncertainties files
into useful structures that can be fed to the main validphys.coredata.CommonData
class.
A CommonData file is completely defined by a dataset name (which defines the folder in which the information is) and observable name (which defines the specific data, fktables and plotting settings to read).
<experiment>_<process>_<energy>{_<extras>}_<observable>
Where the folder name is <experiment>_<process>_<energy>{_<extras>}
The definition of all information for a given dataset (and all its observable) is in the
metadata.yaml
file and its implemented_observables
.
This module defines a number of parsers using the validobj
library.
The full metadata.yaml
is read as a SetMetaData
object
which contains a list of ObservableMetaData
.
These ObservableMetaData
are the “datasets” of NNPDF for all intents and purposes.
The parent SetMetaData
collects some shared variables such as the version of the dataset,
arxiv, inspire or hepdata ids, the folder in which the data is, etc.
The main class in this module is thus ObservableMetaData
which holds _all_ information
about the particular dataset-observable that we are interested in (and a reference to its parent).
- Inside the
ObservableMetaData
we can find: TheoryMeta
: contains the necessary information to read the (new style) fktablesKinematicsMeta
: containins metadata about the kinematicsPlottingOptions
: plotting style and information for validphysVariant
: variant to be used
The CommonMetaData defines how the CommonData file is to be loaded,
by modifying the CommonMetaData using one of the loaded Variants one can change the resulting
validphys.coredata.CommonData
object.
- class validphys.commondataparser.CommonDataMetadata(name: str, nsys: int, ndata: int, process_type: str)[source]
Bases:
object
Contains metadata information about the data being read
- class validphys.commondataparser.ObservableMetaData(observable_name: str, observable: dict, ndata: int, plotting: validphys.plotoptions.plottingoptions.PlottingOptions, process_type: Annotated[Union[validphys.process_options._Process, str], InputType(Any), Validator(<function ValidProcess at 0x7f0d6c4a39a0>)], kinematic_coverage: list[str], kinematics: validphys.commondataparser.ValidKinematics, data_uncertainties: list[typing.Annotated[pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]], data_central: Optional[Annotated[pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]] = None, theory: Optional[validphys.commondataparser.TheoryMeta] = None, tables: Optional[list] = <factory>, npoints: Optional[list] = <factory>, variants: Optional[Dict[str, validphys.commondataparser.Variant]] = <factory>, applied_variant: Optional[str] = None, ported_from: Optional[str] = None, _parent: Optional[Any] = None)[source]
Bases:
object
- apply_variant(variant_name)[source]
Return a new instance of this class with the variant applied
This class also defines how the variant is applied to the commondata
- check()[source]
Various checks to apply manually to the observable before it is used anywhere These are not part of the __post_init__ call since they can only happen after the metadata has been read, the observable selected and (likely) variants applied.
- property cm_energy
- data_central: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)] | None = None
- data_uncertainties: list[~pathlib.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]]
- digest_plotting_variable(variable)[source]
Digest plotting variables in the
line_by
orfigure_by
fields and return the appropiatekX
or other label such that the plotting functions of validphys can understand it.These might be variables included as part of the kinematics or extra labels defined in the plotting dictionary.
- property experiment
- property is_integrability
- property is_nnpdf_special
Is this an NNPDF special dataset used for e.g., Lagrange multipliers or QED fits
- property is_ported_dataset
Return True if this is an automatically ported dataset that has not been updated
- property is_positivity
- kinematics: ValidKinematics
- property kinlabels
Return the kinematic labels in the same order as they are set in
kinematic_coverage
(which in turns follow the key kinematic_coverage) If this is a ported dataset, rely on the process type using the legacy labels
- load_data_central()[source]
Loads the data for this commondata returns a dataframe
- Returns:
a dataframe containing the data
- Return type:
pd.DataFrame
- load_kinematics(fill_to_three=True, drop_minmax=True)[source]
Returns a dataframe with the kinematic information
- load_uncertainties()[source]
Returns a dataframe with all appropiate uncertainties
- Returns:
a dataframe containing the uncertainties
- Return type:
pd.DataFrame
- property name
- property nnpdf_metadata
- property path_data_central
- property path_kinematics
- property paths_uncertainties
- plotting: PlottingOptions
- property plotting_options
- property process
- process_type: Any), Validator(<function ValidProcess at 0x7f0d6c4a39a0>)]
- property setname
- theory: TheoryMeta | None = None
- class validphys.commondataparser.SetMetaData(setname: str, version: int, version_comment: str, nnpdf_metadata: dict, implemented_observables: list[ObservableMetaData], arXiv: ValidReference | None = None, iNSPIRE: ValidReference | None = None, hepdata: ValidReference | None = None)[source]
Bases:
object
Metadata of the whole set
- property allowed_datasets
Return the implemented datasets as a list <setname>_<observable>
- property allowed_observables
observable} dictionary
- Type:
Returns the implemented observables as a {observable_name.upper()
- arXiv: ValidReference | None = None
- property cm_energy
Return the center of mass energy as GeV if it can be understood from the name otherwise return None
- property folder
- hepdata: ValidReference | None = None
- iNSPIRE: ValidReference | None = None
- implemented_observables: list[ObservableMetaData]
- class validphys.commondataparser.TheoryMeta(FK_tables: list[tuple], operation: ~typing.Annotated[str, InputType(typing.Optional[str]), Validator(<function ValidOperation at 0x7f0d6c1bf250>)] = 'NULL', conversion_factor: float = 1.0, shifts: dict | None = None, normalization: dict | None = None, comment: str | None = None)[source]
Bases:
object
Contains the necessary information to load the associated fktables
The theory metadata must always contain a key
FK_tables
which defines the fktables to be loaded. TheFK_tables
is organized as a double list such that:The inner list is concatenated In practice these are different fktables that might refer to the same observable but that are divided in subgrids for practical reasons. The outer list instead are the operands for whatever operation needs to be computed in order to match the experimental data.
In addition there are other flags that can affect how the fktables are read or used: - operation: defines the operation to apply to the outer list - shifts: mapping with the single fktables and their respective shifts
useful to create “gaps” so that the fktables and the respective experimental data are ordered in the same way (for instance, when some points are missing from a grid)
This class is inmutable, what is read from the commondata metadata should be considered final
Example
>>> from validphys.commondataparser import TheoryMeta ... from validobj import parse_input ... from reportengine.compat import yaml ... theory_raw = ''' ... FK_tables: ... - - fk1 ... - - fk2 ... - fk3 ... operation: ratio ... ''' ... theory = yaml.safe_load(theory_raw) ... parse_input(theory, TheoryMeta) TheoryMeta(FK_tables=[['fk1'], ['fk2', 'fk3']], operation='RATIO', shifts = None, conversion_factor=1.0, comment=None, normalization=None))
- fktables_to_paths(grids_folder)[source]
Given a source for pineappl grids, constructs the lists of fktables to be loaded
- operation: Optional[str]), Validator(<function ValidOperation at 0x7f0d6c1bf250>)] = 'NULL'
- class validphys.commondataparser.ValidKinematics(file: ~pathlib.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)], variables: ~typing.Dict[str, ~validphys.commondataparser.ValidVariable])[source]
Bases:
object
Contains the metadata necessary to load the kinematics of the dataset. The variables should be a dictionary with the key naming the variable and the content complying with the
ValidVariable
spec.Only the kinematics defined by the key
kinematic_coverage
will be loaded, which must be three.Three shall be the number of the counting and the number of the counting shall be three. Four shalt thou not count, neither shalt thou count two, excepting that thou then proceedeth to three. Once the number three, being the number of the counting, be reached, then the kinematics be loaded in the direction of thine validobject.
- apply_label(var, value)[source]
For a given value for a given variable, return the labels as label = value (unit) If the variable is not included in the list of variables, returns None as the variable could’ve been transformed by a kinematic transformation
- file: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]
- get_label(var)[source]
For the given variable, return the label as label (unit) If the label is an “extra” return the last one
- variables: Dict[str, ValidVariable]
- class validphys.commondataparser.ValidReference(url: str, version: int | None = None, journal: str | None = None, tables: list[int] = <factory>)[source]
Bases:
object
Holds literature information for the dataset
- class validphys.commondataparser.ValidVariable(label: str, description: str = '', units: str = '')[source]
Bases:
object
Defines the variables
- class validphys.commondataparser.Variant(data_uncertainties: list[~pathlib.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]] | None = None, theory: ~validphys.commondataparser.TheoryMeta | None = None, data_central: ~pathlib.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)] | None = None, experiment: str | None = None)[source]
Bases:
object
The new commondata format allow the usage of variants A variant can overwrite a number of keys, as defined by this dataclass:
data_uncertainties theory data_central
This class may overwrite some other keys for the benefit of reproducibility of old NNPDF fits, but the usage of these features is undocumented and discouraged.
- data_central: Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)] | None = None
- data_uncertainties: list[~pathlib.Annotated[~pathlib.Path, InputType(<class 'str'>), Validator(<function ValidPath at 0x7f0d6c1bf1c0>)]] | None = None
- theory: TheoryMeta | None = None
- validphys.commondataparser.get_kinlabel_key(process_label)[source]
Since there is no 1:1 correspondence between latex keys and the old libNNPDF names we match the longest key such that the proc label starts with it.
- validphys.commondataparser.get_plot_kinlabels(commondata)[source]
Return the LaTex kinematic labels for a given Commondata
- validphys.commondataparser.load_commondata(spec)[source]
Load the data corresponding to a CommonDataSpec object. Returns an instance of CommonData
- validphys.commondataparser.load_commondata_new(metadata)[source]
TODO: update this docstring since now the load_commondata_new takes the information from the metadata, and the name -> split is done outside
In the current iteration of the commondata, each of the commondata (i.e., an observable from a data publication) correspond to one single observable inside a folder which is named as “<experiment>_<process>_<energy>_<extra>” The observable is defined by a last suffix of the form “_<obs>” so that the full name of the dataset is always:
“<experiment>_<process>_<energy>{_<extra>}_<obs>”
where <extra> is optional.
This function right now works under the assumotion that the folder/observable is separated in the last _ so that:
folder_name = <experiment>_<process>_<energy>{_<extra>}
but note that this convention is still not fully defined.
This function returns a commondata object constructed by parsing the metadata.
Once a variant is selected, it can no longer be changed
Note that this function reproduces parse_commondata below, which parses the _old_ file format
- validphys.commondataparser.load_commondata_old(commondatafile, systypefile, setname)[source]
Parse a commondata file and a systype file into a CommonData.
- Parameters:
commondatafile (file or path to file)
systypefile (file or path to file)
- Returns:
commondata – An object containing the data and information from the commondata and systype files.
- Return type:
- validphys.commondataparser.parse_new_metadata(metadata_file, observable_name, variant=None)[source]
Given a metadata file in the new format and the specific observable to be read load and parse the metadata and select the observable. If any variants are selected, apply them.
The triplet (metadata_file, observable_name, variant) define unequivocally the information to be parsed from the commondata library
validphys.config module
- class validphys.config.Config(input_params, environment=None)[source]
Bases:
Config
,CoreConfig
The effective configuration parser class.
- class validphys.config.CoreConfig(input_params, environment=None)[source]
Bases:
Config
- property loader
- parse_added_filter_rules(rules: (<class 'list'>, <class 'NoneType'>) = None)[source]
Returns a tuple of AddedFilterRule objects. Rules are immutable after parsing. AddedFilterRule objects inherit from FilterRule objects.
- parse_additional_errors(bool)[source]
PDF set used to generate the photon additional errors: they are constructed using the replicas 101-107 of the PDF set LUXqed17_plus_PDF4LHC15_nnlo_100 (that are obtained varying some parameters of the LuxQED approach) in the way described in sec. 2.5 of https://arxiv.org/pdf/1712.07053.pdf
- parse_cut_similarity_threshold(th: Real)[source]
Maximum relative ratio when using fromsimilarpredictons cuts.
- parse_data_grouping(key)[source]
a key which indicates which default grouping to use. Mainly for internal use. It allows the default grouping of experiment to be applied to runcards which don’t specify metadata_group without there being a namespace conflict in the lockfile
- parse_dataset_input(dataset: Mapping)[source]
The mapping that corresponds to the dataset specifications in the fit files
- This mapping is such that
- dataset: str
name of the dataset to load
- variant: str
variant of the dataset to load
- cfac: list
list of cfactors to apply
- frac: float
fraction of the data to consider for training purposes
- weight: float
extra weight to give to the dataset
- custom_group: str
custom group to apply to the dataset
Note that the sys key is deprecated and allowed only for old-format dataset.
Old-format commondata will be translated to the new version in this function.
- parse_default_filter_rules_recorded_spec_(spec)[source]
This function is a hacky fix for parsing the recorded spec of filter rules. The reason we need this function is that without it reportengine detects a conflict in the dataset key.
- parse_experiment(experiment: dict)[source]
A set of datasets where correlated systematics are taken into account. It is a mapping where the keys are the experiment name ‘experiment’ and a list of datasets.
- parse_experiment_input(ei: dict)[source]
The mapping that corresponds to the experiment specification in the fit config files. Currently, this needs to be combined with
experiment_from_input
to yield an useful result.
- parse_filter_defaults(filter_defaults: (<class 'dict'>, <class 'NoneType'>))[source]
A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are
q2min
,w2min
, andmaxTau
.- Parameters:
filter_defaults (dict, None) – A mapping containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are
q2min
,w2min
, andmaxTau
.- Returns:
A hashable object containing the default kinematic limits to be used when filtering data (when using internal cuts). Currently these limits are
q2min
,w2min
, andmaxTau
.- Return type:
- parse_filter_rules(filter_rules: (<class 'list'>, <class 'NoneType'>))[source]
A tuple of FilterRule objects. Rules are immutable after parsing. See https://docs.nnpdf.science/vp/filters.html for details on the syntax
- parse_fit(item)[source]
A fit in the results folder, containing at least a valid filter result. Either just an id (str), or a mapping with ‘id’ and ‘label’.
- parse_fitdeclaration(label: str)[source]
Used to guess some informtion from the fit name, without having to download it. This is meant to be used with other providers like e.g.:
{@with fits_as_from_fitdeclarations::fits_name_from_fitdeclarations@} {@ …do stuff… @} {@endwith@}
- parse_hyperscan(hyperscan)[source]
A hyperscan in the hyperscan_results folder, containing at least one tries.json file
- parse_integdataset(integset: dict, *, theoryid, rules)[source]
An observable corresponding to a PDF in the evolution basis, used as integrability constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.
- parse_metadata_group(group: str)[source]
User specified key to group data by. The key must exist in the PLOTTING file for example experiment
- parse_norm_threshold(val: (<class 'numbers.Number'>, <class 'NoneType'>))[source]
The threshold to use for covariance matrix normalisation, sets the maximum l2 norm of the inverse covariance matrix, by clipping smallest eigenvalues
If norm_threshold is set to None, then no covmat regularization is performed
- parse_pdf(item, unpolarized_bc=None)[source]
A PDF set installed in LHAPDF. If an unpolarized boundary condition it defined, it will be registered as part of the PDF.
Either just an id (str), or a mapping with ‘id’ and ‘label’.
- parse_posdataset(posset: dict, *, theoryid, rules)[source]
An observable used as positivity constrain in the fit. It is a mapping containing ‘dataset’ and ‘maxlambda’.
- parse_reweighting_experiments(experiments, *, theoryid, use_cuts, fit=None)[source]
A list of experiments to be used for reweighting.
- parse_speclabel(label: (<class 'str'>, <class 'NoneType'>))[source]
A label for a dataspec. To be used in some plots
- parse_t0theoryid(theoryID: (<class 'str'>, <class 'int'>))[source]
A number corresponding to the database theory ID where the corresponding theory folder is installed in te data directory.
The t0theoryid is specifically used for SM parameter determinatins (e.g. alphas) using the correlated replicas method of arXiv: 1802.03398. To do an alphas determination we perform multiple fits, each with a different value of alphas in the DGLAP kernel and hard scattering cross section. Then we compute the chi2 for each fit to determine which alphas best describes the data, however, to make a fair comparison we need to ensure that the chi2 (and thus the t0 covariance matrix) has to be exactly the same for each fit. This requires not only to fix the t0pdfset between the different fits, but also to fix the t0theoryid.
- parse_theoryid(item)[source]
A number corresponding to the database theory ID where the corresponding theory folder is installed in the data directory. Either just an id (str or int), or a mapping with ‘id’ and ‘label’.
- parse_unpolarized_bc(item)[source]
Unpolarised PDF used as a Boundary Condition to impose positivity of pPDFs. Either just an id , or a mapping with ‘id’ and ‘label’.
- parse_use_cuts(use_cuts: (<class 'bool'>, <class 'str'>))[source]
Whether to filter the points based on the cuts applied in the fit, or the whole data in the dataset. The possible options are:
internal: Calculate the cuts based on the existing rules. This is the default.
fromfit: Read the cuts stored in the fit.
nocuts: Use the whole dataset.
- parse_use_fitcommondata(do_use: bool)[source]
Use the commondata files in the fit instead of those in the data directory.
- parse_use_t0(do_use_t0: bool)[source]
Whether to use the t0 PDF set to generate covariance matrices.
- produce_basisfromfit(fit)[source]
Set the basis from fit config. In the fit config file the basis is set using the key
fitbasis
, but it is exposed to validphys asbasis
.The name of this production rule is intentionally set to not conflict with the existing
fitbasis
runcard key.
- produce_commondata(*, dataset_input, use_fitcommondata=False, fit=None)[source]
Produce a CommondataSpec from a dataset input
- produce_covariance_matrix(use_pdferr: bool = False)[source]
Modifies which action is used as covariance_matrix depending on the flag use_pdferr
- produce_covmat_t0_considered(use_t0: bool = False)[source]
Modifies which action is used as covariance_matrix depending on the flag use_t0
- produce_cuts(*, commondata, use_cuts)[source]
Obtain cuts for a given dataset input, based on the appropriate policy.
- produce_data(data_input, *, group_name='data')[source]
A set of datasets where correlated systematics are taken into account
- produce_data_input()[source]
Produce the
data_input
which is a flat list ofdataset_input
s. This production rule handles the backwards compatibility with old datasets which specifyexperiments
in the runcard.
- produce_dataset(*, dataset_input, theoryid, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]
Dataset specification from the theory and CommonData. Use the cuts from the fit, if provided. If check_plotting is set to True, attempt to lod and check the PLOTTING files (note this may cause a noticeable slowdown in general).
- produce_dataset_inputs_covariance_matrix(use_pdferr: bool = False)[source]
Modifies which action is used as experiment_covariance_matrix depending on the flag use_pdferr
- produce_dataset_inputs_covmat_t0_considered(use_t0: bool = False)[source]
Modifies which action is used as experiment_covariance_matrix depending on the flag use_t0
- produce_dataset_inputs_fitting_covmat(use_thcovmat_in_fitting=False)[source]
Produces the correct covmat to be used in fitting_data_dict according to some options: whether to include the theory covmat, whether to separate the multiplcative errors and whether to compute the experimental covmat using the t0 prescription.
- produce_dataset_inputs_sampling_covmat(sep_mult=False, use_thcovmat_in_sampling=False)[source]
Produces the correct covmat to be used in make_replica according to some options: whether to include the theory covmat and whether to separate the multiplcative errors.
- produce_dataspecs_with_matched_cuts(dataspecs)[source]
Take a list of namespaces (dataspecs), resolve
dataset
within each of them, and return another list of dataspecs where the datasets all have the same cuts, corresponding to the intersection of the selected points. All the datasets must have the same name (i.e. correspond with the same experimental measurement), but can otherwise differ, for example in the theory used for the experimental predictions.This rule can be combined with
matched_datasets_from_dataspecs
.
- produce_defaults(q2min=None, w2min=None, maxTau=None, default_filter_settings=None, filter_defaults=None, default_filter_settings_recorded_spec_=None)[source]
Produce default values for filters taking into account the values of
q2min
,w2min
andmaxTau
defined at namespace level and those inside afilter_defaults
mapping.Within this function the hashable type FilterDefaults is turned into a dictionary so as to allow for overwriting of the values of q2min, w2min and maxTau. The dictionary is then turned back into a FilterDefaults object.
- produce_experiment_from_input(experiment_input, theoryid, use_cuts, fit=None)[source]
Return a mapping containing a single experiment from an experiment input. NOTE: This might be deprecated in the future.
- produce_filter_data(fakedata: bool = False, theorycovmatconfig=None)[source]
Set the action used to filter the data to filter either real or closure data. If the closure data filter is being used and if the theory covariance matrix is not being closure tested then filter data by experiment for efficiency
- produce_fitcontext(fitinputcontext, fitpdf)[source]
Set PDF, theory ID and data input from the fit config
- produce_fitcontextwithcuts(fit, fitinputcontext)[source]
Like fitinputcontext but setting the cuts policy.
- produce_fitenvironment(fit, fitinputcontext)[source]
Like fitcontext, but additionally forcing various other parameters, such as the cuts policy and Monte Carlo seeding to be the same as the fit.
Notes
This production rule is designed to be used as a namespace to collect over, for use with
validphys.pseudodata.recreate_fit_pseudodata()
and can be added to freely, e.g by setting trvlseed to be from the fit runcard.
- produce_fitq0fromfit(fitinputcontext)[source]
Given a fit, return the fitting scale according to the theory
- produce_fitreplicas(fit)[source]
Production rule mapping the
replica
key to each Monte Carlo fit replica.
- produce_fitthcovmat(use_thcovmat_if_present: bool = False, fit: (<class 'str'>, <class 'NoneType'>) = None)[source]
If a fit is specified and use_thcovmat_if_present is True then returns the corresponding covariance matrix for the given fit if it exists. If the fit doesn’t have a theory covariance matrix then returns False.
- produce_fitunderlyinglaw(fit)[source]
Reads closuretest: fakepdf from fit config file and passes as pdf
- produce_group_dataset_inputs_by_metadata(data_input, processed_metadata_group)[source]
Take the data and the processed_metadata_group key and attempt to group the data, returns a list where each element specifies the data_input for a single group and the group_name
- produce_loaded_theory_covmat(output_path, data_input, user_covmat_path=None, point_prescriptions=None, use_thcovmat_in_sampling=False, use_thcovmat_in_fitting=False)[source]
Loads the theory covmat from the correct file according to how it was generated by vp-setupfit.
- produce_loaded_user_covmat_path(user_covmat_path: str = '')[source]
Path to the user covmat provided by user_covmat_path in the runcard. If no path is provided, returns None. For use in theorycovariance.construction.user_covmat.
- produce_matched_datasets_from_dataspecs(dataspecs)[source]
Take an arbitrary list of mappings called dataspecs and return a new list of mappings called dataspecs constructed as follows.
From each of the original dataspecs, resolve the key process, and all the experiments and datasets therein.
Compute the intersection of the dataset names, and for each element in the intersection construct a mapping with the follwing keys:
process : A string with the common process name.
experiment_name : A string with the common experiment name.
dataset_name : A string with the common dataset name.
dataspecs : A list of mappinngs matching the original “dataspecs”. Each mapping contains:
dataset: A dataset with the name data_set name and the
properties (cuts, theory, etc) corresponding to the original dataspec. * dataset_input: The input line used to build dataset. * All the other keys in the original dataspec.
- produce_matched_positivity_from_dataspecs(dataspecs)[source]
Like produce_matched_datasets_from_dataspecs but for positivity datasets.
- produce_multiclosure_underlyinglaw(fits)[source]
Produce the underlying law for a set of fits. This allows a single t0 like covariance matrix to be loaded for all fits, for use with statistical estimators on multiple closure fits. If the fits don’t all have the same underlying law then an error is raised, offending fit is identified.
- produce_nnfit_theory_covmat(point_prescriptions: list = None, user_covmat_path: str = None)[source]
Return the theory covariance matrix used in the fit.
This function is only used in vp-setupfit to store the necessary covmats as .csv files in the tables directory.
- produce_no_covmat_reg()[source]
explicitly set norm_threshold to None so that no covariance matrix regularization is performed
- produce_pdfreplicas(fitpdf)[source]
Production rule mapping the
replica
key to each postfit replica.
- produce_processed_data_grouping(use_thcovmat_in_fitting=False, use_thcovmat_in_sampling=False, data_grouping=None, data_grouping_recorded_spec_=None)[source]
Process the data_grouping key from the runcard, or lockfile. If data_grouping_recorded_spec_ is present then its value is taken, and the runcard is assumed to be a lockfile.
If data_grouping is None, then, if either use_thcovmat_in_fitting or use_thcovmat_in_sampling (or both) are true (which means that the fit is a thcovmat fit), group all the datasets together, otherwise fall back to the default behaviour of grouping by experiment (called standard_report).
Else, the user can specfiy their own grouping, for example metadata_process.
- produce_processed_metadata_group(processed_data_grouping, metadata_group=None)[source]
Expose the final data grouping result. Either metadata_group is specified by user, in which case uses processed_data_grouping which is experiment by default.
- produce_rules(theoryid, use_cuts, defaults, default_filter_rules=None, filter_rules=None, default_filter_rules_recorded_spec_=None, added_filter_rules: (<class 'tuple'>, <class 'NoneType'>) = None)[source]
Produce filter rules based on the user defined input and defaults.
- produce_t0dataset(*, dataset_input, t0id, cuts, use_fitcommondata=False, fit=None, check_plotting: bool = False)[source]
Same as produce_dataset, but if a
t0theoryid
has been defined in the runcard then those corresponding fktables will be linked.
- produce_t0id(theoryid, t0theoryid=None)[source]
Return the t0id if t0theoryid is set and return theoryid otherwise.
- produce_t0set(t0pdfset=None, use_t0=False)[source]
Return the t0set if use_t0 is True and None otherwise. Raises an error if t0 is requested but no t0set is given.
- produce_theoryids(t0id, point_prescription)[source]
Produces a list of theoryids given a theoryid at central scales and a point prescription. The options for the latter are defined in pointprescriptions.yaml. This hard codes the theories needed for each prescription to avoid user error.
validphys.convolution module
This module implements tools for computing convolutions between PDFs and theory grids, which yield observables.
The high level predictions()
function can be used to extact theory
predictions for experimentally measured quantities:
import numpy as np
from validphys.api import API
from validphys.convolution import predictions
inp = {
'fit': '181023-001-sc',
'use_cuts': 'internal',
'theoryid': 162,
'pdf': 'NNPDF40_nnlo_lowprecision',
'dataset_inputs': {'from_': 'fit'}
}
all_datasets = API.data(**inp).datasets
pdf = API.pdf(**inp)
all_preds = [predictions(ds, pdf) for ds in all_datasets]
Some variants such as central_predictions()
and
linear_predictions()
are useful for more specialized tasks.
These functions work with validphys.core.DatasetSpec
objects,
allowing to account for information on COMPOUND predictions and cuts. A lower
level interface which operates with validphys.coredata.FKTableData
objects is also available.
- validphys.convolution.central_dis_predictions(loaded_fk, pdf)[source]
Implementation of
central_fk_predictions()
for DIS observables.
- validphys.convolution.central_fk_predictions(loaded_fk, pdf)[source]
Same as
fk_predictions()
, but computing predictions for the central PDF member only.
- validphys.convolution.central_hadron_predictions(loaded_fk, pdf)[source]
Implementation of
central_fk_predictions()
for hadronic observables.
- validphys.convolution.central_predictions(dataset, pdf)[source]
Same as
predictions()
but computing the predictions for the central member of the PDF set only. For Monte Carlo PDFs, this is a faster alternative to computing the central predictions as the average of the replica predictions (although a small approximation is involved in the case of hadronic predictions).
- validphys.convolution.dis_predictions(loaded_fk, pdf)[source]
Implementation of
fk_predictions()
for DIS observables.
- validphys.convolution.fk_predictions(loaded_fk, pdf)[source]
Low level function to compute predictions from a FKTable.
- Parameters:
loaded_fk (validphys.coredata.FKTableData) – The FKTable corresponding to the partonic cross section.
pdf (validphys.core.PDF) – The PDF set to use for the convolutions.
- Returns:
df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points (use
validphys.coredata.FKTableData.with_cuts()
to filter out points). The columns correspond to the selected PDF members in the LHAPDF set.- Return type:
pandas.DataFrame
Notes
This function operates on a single FKTable, while the prediction for an experimental quantity generally involves several. Use
predictions()
to compute those.Examples
>>> from validphys.loader import Loader >>> from validphys.convolution import hadron_predictions >>> from validphys.fkparser import load_fktable >>> l = Loader() >>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118') >>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',)) >>> table = load_fktable(ds.fkspecs[0]) >>> hadron_predictions(table, pdf) 1 2 3 4 ... 97 98 99 100 data ... 0 176.688118 170.172930 172.460771 173.792321 ... 179.504636 172.343792 168.372508 169.927820 1 252.682923 244.507916 247.840249 249.541798 ... 256.410844 247.805180 242.246438 244.415529 2 828.076008 813.452551 824.581569 828.213508 ... 838.707211 826.056388 810.310109 816.824167
- validphys.convolution.hadron_predictions(loaded_fk, pdf)[source]
Implementation of
fk_predictions()
for hadronic observables.
- validphys.convolution.linear_fk_predictions(loaded_fk, pdf)[source]
Same as
predictions()
for DIS, but compute linearized predictions for hadronic data, usinglinear_hadron_predictions()
.
- validphys.convolution.linear_hadron_predictions(loaded_fk, pdf)[source]
Implementation of
linear_fk_predictions()
for hadronic observables. Specifically this computes:central_value ⊗ FK ⊗ (2 * replica_values - central_value)
which is the linear expansion of the hadronic observable in the difference between each replica and the central value,
replica_values - central_value
- validphys.convolution.linear_predictions(dataset, pdf)[source]
Same as
predictions()
but computing linearized predictions. These are the same aspredictions
for DIS, but truncates to the terms that are linear in the difference between each member and the central value for hadronic predictions.This approximation is generally a very good approximation in that yields differences that are much smaller that the PDF uncertainty.
- validphys.convolution.predictions(dataset, pdf)[source]
“Compute theory predictions for a given PDF and dataset. Information regading the dataset, on cuts, CFactors and combinations of FKTables is taken into account to construct the predictions.
The result should be comparable to experimental predictions implemented in CommonData.
- Parameters:
dataset (validphys.core.DatasetSpec) – The dataset containing information on the partonic cross section.
pdf (validphys.core.PDF) – The PDF set to use for the convolutions.
- Returns:
df – A dataframe corresponding to the hadronic prediction for each data point for the PDF members. The index of the dataframe corresponds to the selected data points, based on the dataset cuts. The columns correspond to the selected PDF members in the LHAPDF set.
- Return type:
pandas.DataFrame
Examples
Obtain descriptive statistics over PDF replicas for each of the three points in the ATLAS ttbar dataset:
>>> from validphys.loader import Loader >>> l = Loader() >>> ds = l.check_dataset('ATLASTTBARTOT', theoryid=53) >>> from validphys.convolution import predictions >>> pdf = l.check_pdf('NNPDF31_nnlo_as_0118') >>> preds = predictions(ds, pdf) >>> preds.T.describe() data 0 1 2 count 100.000000 100.000000 100.000000 mean 161.271292 231.500367 767.816844 std 2.227304 2.883497 7.327617 min 156.638526 225.283254 750.850250 25% 159.652216 229.486793 762.773527 50% 161.066965 231.281248 767.619249 75% 162.620554 233.306836 772.390286 max 168.390840 240.287549 786.549380
validphys.core module
Core datastructures used in the validphys data model.
- class validphys.core.CommonDataSpec(name, metadata, legacy=False, datafile=None, sysfile=None, plotfiles=None)[source]
Bases:
TupleComp
Holds all the information necessary to load a commondata file and provides methods to easily access them
- Parameters:
name (str) – name of the commondata
metadata (ObservableMetaData) – instance of ObservableMetaData holding all information about the dataset
legacy (bool) – whether this is an old or new format metadata file
The
datafile
,sysfile
and plotfiles` arguments are deprecated and only to be used withlegacy=True
- property legacy_names
- property metadata
- property name
- property ndata
- property nsys
- property plot_kinlabels
- property process_type
- property theory_metadata
- class validphys.core.CutsPolicy(value)[source]
Bases:
Enum
An enumeration.
- FROMFIT = 'fromfit'
- FROM_CUT_INTERSECTION_NAMESPACE = 'fromintersection'
- FROM_SIMILAR_PREDICTIONS_NAMESPACE = 'fromsimilarpredictions'
- INTERNAL = 'internal'
- NOCUTS = 'nocuts'
- class validphys.core.DataGroupSpec(name, datasets, dsinputs=None)[source]
Bases:
TupleComp
,NSList
- property as_markdown
- load_commondata_instance()[source]
Given Experiment load list of validphys.coredata.CommonData objects with cuts already applied
- property thspec
- class validphys.core.DataSetInput(*, name, sys, cfac, frac, weight, custom_group, variant)[source]
Bases:
TupleComp
Represents whatever the user enters in the YAML to specify a dataset.
- class validphys.core.DataSetSpec(*, name, commondata, fkspecs, thspec, cuts, frac=1, op=None, weight=1, rules=())[source]
Bases:
TupleComp
- class validphys.core.FKTableSpec(fkpath, cfactors, metadata=None)[source]
Bases:
TupleComp
Each FKTable is formed by a number of sub-fktables to be concatenated each of which having its own path. Therefore the
fkpath
variable is a list of paths.Before the pineappl implementation, FKTable were already pre-concatenated. The Legacy interface therefore relies on fkpath being just a string or path instead
The metadata of the FKTable for the given dataset is stored as an attribute to this function. This is transitional, eventually it will be held by the associated CommonData in the new format.
- class validphys.core.HessianStats(data, rescale_factor=1)[source]
Bases:
SymmHessianStats
Compute stats in the ‘assymetric’ hessian format: The first index (0) is the central value. The odd indexes are the results for lower eigenvectors and the even are the upper eigenvectors.A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.
- class validphys.core.HyperscanSpec(name, path)[source]
Bases:
FitSpec
The hyperscan spec is just a special case of FitSpec
- get_all_trials(base_params=None)[source]
Read all trials from all tries files. If there are original runcard-based parameters, a reference to them can be passed to the trials so that a full hyperparameter dictionary can be defined
Each hyperopt trial object will also have a reference to all trials in its own file
- label
- name
- path
- sample_trials(n=None, base_params=None, sigma=4.0)[source]
Parse all trials in the hyperscan object and then return an array of
n
trials read from thetries.json
files and sampled according to their reward. Ifn
isNone
, no sapling is performed and all trials are returned- Returns:
Dictionary on the form {parameters
- Return type:
list of trials}
- property tries_files
Return a dictionary with all tries.json files mapped to their replica number
- class validphys.core.IntegrabilitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]
Bases:
LagrangeSetSpec
- class validphys.core.LagrangeSetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]
Bases:
DataSetSpec
Extends DataSetSpec to work around the particularities of the positivity, integrability and other Lagrange Multiplier datasets.
- class validphys.core.PDF(name, boundary=None)[source]
Bases:
TupleComp
Base validphys PDF providing high level access to metadata.
Statistical estimators which depends on the PDF type (MC, Hessian…) are exposed as a
Stats
object through thestats_class
attribute The LHAPDF metadata can directly be accessed through theinfo
attributeExamples
>>> from validphys.api import API >>> from validphys.convolution import predictions >>> args = {"dataset_input":{"dataset": "ATLASTTBARTOT"}, "theoryid":162, "use_cuts":"internal"} >>> ds = API.dataset(**args) >>> pdf = API.pdf(pdf="NNPDF40_nnlo_as_01180") >>> preds = predictions(ds, pdf) >>> preds.shape (3, 100)
- property alphas_mz
Alpha_s(M_Z) as defined in the LHAPDF .info file
- property alphas_vals
List of alpha_s(Q) at various Q for interpolation based alphas. Values as defined in the LHAPDF .info file
- property error_conf_level
Error confidence level as defined in the LHAPDF .info file if no number is given in the LHAPDF .info file defaults to 68%
- property error_type
Error type as defined in the LHAPDF .info file
- property info
Information contained in the LHAPDF .info file
- property infopath
- property is_polarized
Returns
True
if the PDF has a boundary condition associated to it. At the moment LHAPDF provides no mechanism to know whether a PDF is polarized.
- property isinstalled
- property label
- property q_min
Minimum Q as given by the LHAPDF .info file
- register_boundary(unpolarized_bc=None)[source]
Register other PDFs as boundary conditions of this PDF
- property stats_class
Return the stats calculator for this error type
- class validphys.core.PDFcv(name, boundary=None)[source]
Bases:
PDF
An add-on for the PDF class that makes only the central value available
- class validphys.core.PositivitySetSpec(name, commondataspec, fkspec, maxlambda, thspec, rules)[source]
Bases:
LagrangeSetSpec
- class validphys.core.Stats(data)[source]
Bases:
object
Class holding statistical information about the objects used in validphys. This object can be a PDF or any function of a PDF (such as hadronic observable).
By convention, member 0 corresponds to the central value of the PDF. Accordingly, the method
central_value
will return the result held for member 0. Note that this is equal to the mean of theerror_members
only for the PDF itself and linear functions of the PDF (such as DIS-type observable). If you want to obtain the average of the error members you can do:np.mean(stats_instance.error_members, axis=0)
- class validphys.core.SymmHessianStats(data, rescale_factor=1)[source]
Bases:
Stats
Compute stats in the ‘symetric’ hessian format: The first index (0) is the central value. The rest of the indexes are results for each eigenvector. A ‘rescale_factor is allowed in case the eigenvector confidence interval is not 68%’.
- class validphys.core.TheoryIDSpec(id: int, path: pathlib.Path, dbpath: pathlib.Path)[source]
Bases:
object
validphys.coredata module
Data containers backed by Python managed memory (Numpy arrays and Pandas dataframes).
- class validphys.coredata.CFactorData(description: str, central_value: array, uncertainty: array)[source]
Bases:
object
Data contained in a CFactor
- Parameters:
description (str) – Information on how the data was obtained.
central_value (array, shape(ndata)) – The value of the cfactor for each data point.
uncertainty (array, shape(ndata)) – The absolute uncertainty on the cfactor if available.
- central_value: array
- uncertainty: array
- class validphys.coredata.CommonData(setname: str, ndata: int, commondataproc: str, nkin: int, nsys: int, commondata_table: DataFrame, systype_table: DataFrame, legacy: bool = False, legacy_names: list | None = None, kin_variables: list | None = None)[source]
Bases:
object
Data contained in Commondata files, relevant cuts applied.
- Parameters:
setname (str) – Name of the dataset
ndata (int) – Number of data points
commondataproc (str) – Process type, one of 21 options
nkin (int) – Number of kinematics specified
nsys (int) – Number of systematics
commondata_table (pd.DataFrame) – Pandas dataframe containing the commondata
systype_table (pd.DataFrame) – Pandas dataframe containing the systype index for each systematic alongside the uncertainty type (ADD/MULT/RAND) and name (CORR/UNCORR/THEORYCORR/SKIP)
systematics_table (pd.DataFrame) – Panda dataframe containing the table of systematics
- property additive_errors
Returns the systematics which are additive (systype is ADD) as absolute uncertainties (same units as data), with SKIP uncertainties removed.
- property central_values
- commondata_table: DataFrame
- export(folder_path)[source]
Wrapper around export_data and export_uncertainties to write both uncertainties and data after filtering to a given folder
- export_data(buffer)[source]
Exports the central data defined by this commondata instance to the given buffer
- export_uncertainties(buffer)[source]
Exports the uncertainties defined by this commondata instance to the given buffer
- property kinematics
- property multiplicative_errors
Returns the systematics which are multiplicative (systype is MULT) in a percentage format, with SKIP uncertainties removed.
- property stat_errors
- systematic_errors(central_values=None)[source]
Returns all systematic errors as absolute uncertainties, with a single column for each uncertainty. Converts
multiplicative_errors
to units of data and then appends ontoadditive_errors
. By default uses the experimental central values to perform conversion, but the user can supply a 1-D array of central values, with lengthself.ndata
, to use instead of the experimental central values to calculate the absolute contribution of the multiplicative systematics.- Parameters:
central_values (None, np.array) – 1-D array containing alternative central values to combine with multiplicative uncertainties. This array must have length equal to
self.ndata
. By defaultcentral_values
is None, and the central values of the commondata are used.- Returns:
systematic_errors – Dataframe containing systematic errors.
- Return type:
pd.DataFrame
- systype_table: DataFrame
- class validphys.coredata.FKTableData(hadronic: bool, Q0: float, ndata: int, xgrid: ~numpy.ndarray, sigma: ~pandas.core.frame.DataFrame, convolution_types: tuple[str] | None = None, metadata: dict = <factory>, protected: bool = False)[source]
Bases:
object
Data contained in an FKTable
- Parameters:
hadronic (bool) – Whether a hadronic (two PDFs) or a DIS (one PDF) convolution is needed.
Q0 (float) – The scale at which the PDFs should be evaluated (in GeV).
ndata (int) – The number of data points in the grid.
xgrid (array, shape (nx)) – The points in x at which the PDFs should be evaluated.
sigma (pd.DataFrame) –
For hadronic data, the columns are the indexes in the
NfxNf
list of possible flavour combinations of two PDFs. The MultiIndex contains three keys, the data index, an index intoxgrid
for the first PDF and an idex intoxgrid
for the second PDF, indicating if the points inx
where the PDF should be evaluated.For DIS data, the columns are indexes in the
Nf
list of flavours. The MultiIndex contains two keys, the data index and an index intoxgrid
indicating the points inx
where the PDF should be evaluated.convolution_types (tuple[str]) – The type of convolution that the FkTable is expecting for each of the functions to be convolved with (usually the two types of PDF from the two incoming hadrons).
metadata (dict) – Other information contained in the FKTable.
protected (bool) – When a fktable is protected cuts will not be applied. The most common use-case is when a total cross section is used as a normalization table for a differential cross section, in legacy code (<= NNPDF4.0) both fktables would be cut using the differential index.
- determine_pdfs(pdf)[source]
Determine the PDF (or PDFs) that should be used to be convoluted with this fktable. Uses the convolution_types key to decide the PDFs. If convolution_types is not defined, it returns the pdf object.
- get_np_fktable()[source]
Returns the fktable as a dense numpy array that can be directly manipulated with numpy
- The return shape is:
(ndata, nx, nbasis) for DIS (ndata, nx, nx, nbasis) for hadronic
where nx is the length of the xgrid and nbasis the number of flavour contributions that contribute
- property luminosity_mapping
Return the flavour combinations that contribute to the fktable in the form of a single array
- The return shape is:
(nbasis,) for DIS (nbasis*2,) for hadronic
- sigma: DataFrame
- with_cfactor(cfactor)[source]
Returns a copy of the FKTableData object with cfactors applied to the fktable
- with_cuts(cuts)[source]
Return a copy of the FKTable with the cuts applied. The data index of the sigma operator (the outermost level), contains the data point that have been kept. The ndata property is updated to reflect the new number of datapoints. If cuts is None, return the object unmodified.
- Parameters:
cuts (array_like or validphys.core.Cuts or None.) – The cuts to be applied.
- Returns:
res – A copy of the FKtable with the cuts applies.
- Return type:
Notes
The original number of points can be accessed with
table.metadata['GridInfo'].ndata
.Examples
>>> from validphys.fkparser import load_fktable ... from validphys.loader import Loader ... l = Loader() ... ds = l.check_dataset('ATLASTTBARTOT', theoryid=53, cfac=('QCD',)) ... table = load_fktable(ds.fkspecs[0]) ... newtable = table.with_cuts([0,1]) >>> assert set(newtable.sigma.index.get_level_values(0)) == {0,1} >>> assert newtable.ndata == 2 >>> assert newtable.metadata['GridInfo'].ndata == 3
- xgrid: ndarray
validphys.correlations module
Utilities for computing correlations in batch.
@author: Zahari Kassabov
validphys.covmats module
Module for handling logic and manipulation of covariance and correlation matrices on different levels of abstraction
- validphys.covmats.covmat_from_systematics(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, norm_threshold=None, _central_values=None)[source]
Take the statistical uncertainty and systematics table from a
validphys.coredata.CommonData
object and construct the covariance matrix accounting for correlations between systematics.If the systematic has the name
SKIP
then it is ignored in the construction of the covariance matrix.ADDitive or MULTiplicative systypes are handled by either multiplying the additive or multiplicative uncertainties respectively. We convert uncertainties so that they are all in the same units as the data:
Additive (ADD) systematics are left unchanged
multiplicative (MULT) systematics need to be converted from a
percentage by multiplying by the central value and dividing by 100.
Finally, the systematics are split into the five possible archetypes of systematic uncertainties: uncorrelated (UNCORR), correlated (CORR), theory uncorrelated (THEORYUNCORR), theory correlated (THEORYCORR) and special correlated (SPECIALCORR) systematics.
Uncorrelated contributions from statistical error, uncorrelated and theory uncorrelated are added in quadrature to the diagonal of the covmat.
The contribution to the covariance matrix arising due to correlated systematics is schematically
A_correlated @ A_correlated.T
, where A_correlated is a matrix N_dat by N_sys. The total contribution from correlated systematics is found by adding together the result of mutiplying each correlated systematic matrix by its transpose (correlated, theory_correlated and special_correlated).For more information on the generation of the covariance matrix see the paper outlining the procedure, specifically equation 2 and surrounding text.
- Parameters:
loaded_commondata_with_cuts (validphys.coredata.CommonData) – CommonData which stores information about systematic errors, their treatment and description.
dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if
use_weights_in_covmat
. The default weight is 1, which means the returned covmat will be unmodified.use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_central_values (None, np.array) – 1-D array containing alternative central values to combine with the multiplicative errors to calculate their absolute contributions. By default this is None, and the experimental central values are used. However, this can be used to calculate, for example, the t0 covariance matrix by using the predictions from the central member of the t0 pdf.
- Returns:
cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.
- Return type:
np.array
Example
In order to use this function, simply call it from the API
>>> from validphys.api import API >>> inp = dict( ... dataset_input={'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10}, ... theoryid=162, ... use_cuts="internal" ... ) >>> cov = API.covmat_from_systematics(**inp) >>> cov.shape (28, 28)
- validphys.covmats.covmat_stability_characteristic(systematics_matrix_from_commondata)[source]
Return a number characterizing the stability of an experimental covariance matrix against uncertainties in the correlation. It is defined as the L2 norm (largest singular value) of the square root of the inverse correlation matrix. This is equivalent to the square root of the inverse of the smallest singular value of the correlation matrix:
Z = (1/λ⁰)^½
Where λ⁰ is the smallest eigenvalue of the correlation matrix.
This is the number used as threshold in
calcutils.regularize_covmat()
. The interpretation is roughly what precision does the worst correlation need to have in order to not affect meaningfully the χ² computed using the covariance matrix, so for example a stability characteristic of 4 means that correlations need to be known with uncetainties less than 0.25.Examples
>>> from validphys.api import API >>> API.covmat_stability_characteristic(dataset_input={"dataset": "NMC"}, ... theoryid=162, use_cuts="internal") 2.742658604186114
- validphys.covmats.dataset_inputs_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, data_input, use_weights_in_covmat=True, norm_threshold=None, _list_of_central_values=None, _only_additive=False)[source]
Given a list containing
validphys.coredata.CommonData
s, construct the full covariance matrix.This is similar to
covmat_from_systematics()
except that special corr systematics are concatenated across all datasets before being multiplied by their transpose to give off block-diagonal contributions. The other systematics contribute to the block diagonal in the same way ascovmat_from_systematics()
.- Parameters:
dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – list of CommonData objects.
data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if
use_weights_in_covmat
. The default weight is 1, which means the returned covmat will be unmodified.use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_list_of_central_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.
- Returns:
cov_mat – Numpy array which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.
- Return type:
np.array
Example
This function can be called directly from the API:
>>> dsinps = [ ... {'dataset': 'NMC'}, ... {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD']}, ... {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10} ... ] >>> inp = dict(dataset_inputs=dsinps, theoryid=162, use_cuts="internal") >>> cov = API.dataset_inputs_covmat_from_systematics(**inp) >>> cov.shape (235, 235)
Which properly accounts for all dataset settings and cuts.
- validphys.covmats.dataset_inputs_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]
Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it.
- validphys.covmats.dataset_inputs_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None)[source]
Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated.
- validphys.covmats.dataset_inputs_sqrt_covmat(dataset_inputs_covariance_matrix)[source]
Like sqrt_covmat but for an group of datasets
- validphys.covmats.dataset_inputs_stability_table(dataset_inputs_stability, dataset_inputs)[source]
Return a table with py:func:covmat_stability_characteristic for all dataset inputs
- validphys.covmats.dataset_inputs_t0_covmat_from_systematics(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]
Like
t0_covmat_from_systematics()
except for all data- Parameters:
dataset_inputs_loaded_cd_with_cuts (list[validphys.coredata.CommonData]) – The CommonData for all datasets defined in
dataset_inputs
.data_input (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if
use_weights_in_covmat
. The default weight is 1, which means the returned covmat will be unmodified.use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
dataset_inputs_t0_predictions (list[np.array]) – The t0 predictions for all datasets.
- Returns:
t0_covmat – t0 covariance matrix matrix for list of datasets.
- Return type:
np.array
- validphys.covmats.dataset_inputs_t0_exp_covmat(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]
Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it.
- validphys.covmats.dataset_inputs_t0_exp_covmat_separate(dataset_inputs_loaded_cd_with_cuts, *, data_input, use_weights_in_covmat=True, norm_threshold=None, dataset_inputs_t0_predictions)[source]
Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated.
- validphys.covmats.dataset_inputs_t0_total_covmat(dataset_inputs_t0_exp_covmat, loaded_theory_covmat)[source]
Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.
- validphys.covmats.dataset_inputs_t0_total_covmat_separate(dataset_inputs_t0_exp_covmat_separate, loaded_theory_covmat)[source]
Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.
- validphys.covmats.dataset_inputs_total_covmat(dataset_inputs_exp_covmat, loaded_theory_covmat)[source]
Function to compute the covmat to be used for the sampling by make_replica and for the chi2 by fitting_data_dict. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are included in it. Moreover, the theory covmat is added to experimental covmat.
- validphys.covmats.dataset_inputs_total_covmat_separate(dataset_inputs_exp_covmat_separate, loaded_theory_covmat)[source]
Function to compute the covmat to be used for the sampling by make_replica. In this case the t0 prescription is not used for the experimental covmat and the multiplicative errors are separated. Moreover, the theory covmat is added to experimental covmat.
- validphys.covmats.dataset_t0_predictions(t0dataset, t0set)[source]
Returns the t0 predictions for a
dataset
which are the predictions calculated using the central member ofpdf
. Note that ifpdf
has errortypereplicas
, and the dataset is a hadronic observable then the predictions of the central member are subtly different to the central value of the replica predictions.- Parameters:
dataset (validphys.core.DataSetSpec) – dataset for which to calculate t0 predictions
t0set (validphys.core.PDF) – pdf used to calculate the predictions
- Returns:
t0_predictions – 1-D numpy array with predictions for each of the cut datapoints.
- Return type:
np.array
- validphys.covmats.datasets_covmat_differences_table(each_dataset, datasets_covmat_no_reg, datasets_covmat_reg, norm_threshold)[source]
For each dataset calculate and tabulate two max differences upon regularization given a value for norm_threshold:
max relative difference to the diagonal of the covariance matrix (%)
max absolute difference to the correlation matrix of each covmat
- validphys.covmats.dataspecs_datasets_covmat_differences_table(dataspecs_speclabel, dataspecs_covmat_diff_tables)[source]
For each dataspec calculate and tabulate the two covmat differences described in datasets_covmat_differences_table (max relative difference in variance and max absolute correlation difference)
- validphys.covmats.fit_name_with_covmat_label(fit, fitthcovmat)[source]
If theory covariance matrix is being used to calculate statistical estimators for the fit then appends (exp + th) onto the fit name for use in legends and column headers to help the user see what covariance matrix was used to produce the plot or table they are looking at.
- validphys.covmats.generate_exp_covmat(datasets_input, data, use_weights, norm_threshold, _list_of_c_values, only_add)[source]
Function to generate the experimental covmat eventually using the t0 prescription. It is also possible to compute it only with the additive errors.
- Parameters:
dataset_inputs (list[validphys.coredata.CommonData]) – list of CommonData objects.
data (list[validphys.core.DataSetInput]) – Settings for each dataset, each element contains the weight for the current dataset. The elements of the returned covmat for dataset i and j will be divided by sqrt(weight_i)*sqrt(weight_j), if
use_weights_in_covmat
. The default weight is 1, which means the returned covmat will be unmodified.use_weights (bool) – Whether to weight the covmat, True by default.
norm_threshold (number) – threshold used to regularize covariance matrix
_list_of_c_values (None, list[np.array]) – list of 1-D arrays which contain alternative central values which are combined with the multiplicative errors to calculate their absolute contribution. By default this is None and the experimental central values are used.
only_add (bool) – specifies whether to use only the additive errors to compute the covmat
- Returns:
np.array
experimental covariance matrix
- validphys.covmats.groups_corrmat(groups_covmat)[source]
Generates the grouped experimental correlation matrix with groups_covmat as input
- validphys.covmats.groups_covmat(groups_covmat_no_table)[source]
Duplicate of groups_covmat_no_table but with a table decorator.
- validphys.covmats.groups_covmat_no_table(groups_data, groups_index, groups_covmat_collection)[source]
Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:
group name
dataset name
index of the point within the dataset.
- validphys.covmats.groups_invcovmat(groups_data, groups_index, groups_covmat_collection)[source]
Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.
- validphys.covmats.groups_normcovmat(groups_covmat, groups_data_values)[source]
Calculates the grouped experimental covariance matrix normalised to data.
- validphys.covmats.groups_sqrtcovmat(groups_data, groups_index, groups_sqrt_covmat)[source]
Like groups_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.
- validphys.covmats.pdferr_plus_covmat(results_without_covmat, pdf, covmat_t0_considered)[source]
For a given dataset, returns the sum of the covariance matrix given by covmat_t0_considered and the PDF error: - If the PDF error_type is ‘replicas’, a covariance matrix is estimated from
the replica theory predictions
If the PDF error_type is ‘symmhessian’, a covariance matrix is estimated using formulas from (mc2hessian) https://arxiv.org/pdf/1505.06736.pdf
If the PDF error_type is ‘hessian’ a covariance matrix is estimated using the hessian formula from Eq. 5 of https://arxiv.org/pdf/1401.0013.pdf
- Parameters:
dataset (DataSetSpec) – object parsed from the dataset_input runcard key
pdf (PDF) – monte carlo pdf used to estimate PDF error
covmat_t0_considered (np.array) – experimental covariance matrix with the t0 considered
- Returns:
covariance_matrix – sum of the experimental and pdf error as a numpy array
- Return type:
np.array
Examples
use_pdferr makes this action be used for covariance_matrix
>>> from validphys.api import API >>> import numpy as np >>> inp = { 'dataset_input': { 'dataset': 'ATLAS_TTBAR_8TEV_LJ_DIF_YTTBAR-NORM', 'variant': 'legacy', }, 'theoryid': 700, 'pdf': 'NNPDF40_nlo_as_01180', 'use_cuts': 'internal', } >>> a = API.covariance_matrix(**inp, use_pdferr=True) >>> b = API.pdferr_plus_covmat(**inp) >>> (a == b).all() True
- validphys.covmats.pdferr_plus_dataset_inputs_covmat(dataset_inputs_results_without_covmat, data, pdf, dataset_inputs_covmat_t0_considered, fitthcovmat)[source]
Like pdferr_plus_covmat except for an experiment
- validphys.covmats.reorder_thcovmat_as_expcovmat(fitthcovmat, data)[source]
Reorder the thcovmat in such a way to match the order of the experimental covmat, which means the order of the runcard
- validphys.covmats.sqrt_covmat(covariance_matrix)[source]
Function that computes the square root of the covariance matrix.
- Parameters:
covariance_matrix (np.array) – A positive definite covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts) containing uncertainty and correlation information.
- Returns:
sqrt_mat – The square root of the input covariance matrix, which is N_dat x N_dat (where N_dat is the number of data points after cuts), and which is the the lower triangular decomposition. The following should be
True
:np.allclose(sqrt_covmat @ sqrt_covmat.T, covariance_matrix)
.- Return type:
np.array
Notes
The square root is found by using the Cholesky decomposition. However, rather than finding the decomposition of the covariance matrix directly, the (upper triangular) decomposition is found of the corresponding correlation matrix and then the output of this is rescaled and then transposed as
sqrt_matrix = (decomp * sqrt_diags).T
, wheredecomp
is the Cholesky decomposition of the correlation matrix andsqrt_diags
is the square root of the diagonal entries of the covariance matrix. This method is useful in situations in which the covariance matrix is near-singular. See here for more discussion on this.The lower triangular is useful for efficient calculation of the \(\chi^2\)
Example
>>> import numpy as np >>> from validphys.api import API >>> API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal") array([[0.0326543 , 0. , 0. , ..., 0. , 0. , 0. ], [0.00314523, 0.01467259, 0. , ..., 0. , 0. , 0. ], [0.0037817 , 0.00544256, 0.02874822, ..., 0. , 0. , 0. ], ..., [0.00043404, 0.00031169, 0.00020489, ..., 0.00441073, 0. , 0. ], [0.00048717, 0.00033792, 0.00022971, ..., 0.00126704, 0.00435696, 0. ], [0.00067353, 0.00050372, 0.0003203 , ..., 0.00107255, 0.00065041, 0.01002952]]) >>> sqrt_cov = API.sqrt_covmat(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal") >>> cov = API.covariance_matrix(dataset_input={"dataset":"NMC"}, theoryid=162, use_cuts="internal") >>> np.allclose(np.linalg.cholesky(cov), sqrt_cov) True
- validphys.covmats.systematics_matrix_from_commondata(loaded_commondata_with_cuts, dataset_input, use_weights_in_covmat=True, _central_values=None)[source]
Returns a systematics matrix, \(A\), for the corresponding dataset. The systematics matrix is a square root of the covmat:
\[C = A A^T\]and is obtained by concatenating a block diagonal of the uncorrelated uncertainties with the correlated systematics.
- validphys.covmats.t0_covmat_from_systematics(loaded_commondata_with_cuts, *, dataset_input, use_weights_in_covmat=True, norm_threshold=None, dataset_t0_predictions)[source]
Like
covmat_from_systematics()
except uses the t0 predictions to calculate the absolute constributions to the covmat from multiplicative uncertainties. For more info on the t0 predictions seevalidphys.commondata.dataset_t0_predictions()
.- Parameters:
loaded_commondata_with_cuts (validphys.coredata.CommonData) – commondata object for which to generate the covmat.
dataset_input (validphys.core.DataSetInput) – Dataset settings, contains the weight for the current dataset. The returned covmat will be divided by the dataset weight if
use_weights_in_covmat
. The default weight is 1, which means the returned covmat will be unmodified.use_weights_in_covmat (bool) – Whether to weight the covmat, True by default.
dataset_t0_predictions (np.array) – 1-D array with t0 predictions.
- Returns:
t0_covmat – t0 covariance matrix
- Return type:
np.array
validphys.covmats_utils module
covmat_utils.py
Utils functions for constructing covariance matrices from systematics.
Leveraged by validphys.covmats
which contains relevant
actions/providers.
- validphys.covmats_utils.construct_covmat(stat_errors: array, sys_errors: DataFrame)[source]
Basic function to construct a covariance matrix (covmat), given the statistical error and a dataframe of systematics.
Errors with name UNCORR or THEORYUNCORR are added in quadrature with the statistical error to the diagonal of the covmat.
Other systematics are treated as correlated; their covmat contribution is found by multiplying them by their transpose.
- Parameters:
stat_errors (np.array) – a 1-D array of statistical uncertainties
sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.
Notes
This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of
sys_errors
before passing that to this function.
- validphys.covmats_utils.systematics_matrix(stat_errors: array, sys_errors: DataFrame)[source]
Basic function to create a systematics matrix , \(A\), such that:
\[C = A A^T\]Where \(C\) is the covariance matrix. This is achieved by creating a block diagonal matrix by adding the uncorrelated systematics in quadrature then taking the square-root and concatenating the correlated systematics, schematically:
- Parameters:
stat_errors (np.array) – a 1-D array of statistical uncertainties
sys_errors (pd.DataFrame) – a dataframe with shape (N_data * N_sys) and systematic name as the column headers. The uncertainties should be in the same units as the data.
Notes
This function doesn’t contain any logic to ignore certain contributions to the covmat, if you wanted to not include a particular systematic/set of systematics i.e all uncertainties with MULT errors, then filter those out of
sys_errors
before passing that to this function.
validphys.dataplots module
Plots of relations between data PDFs and fits.
- validphys.dataplots.kde_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]
KDE plot for experiments chi2.
- validphys.dataplots.plot_chi2dist(dataset, abs_chi2_data, chi2_stats, pdf)[source]
Plot the distribution of chi²s of the members of the pdfset.
- validphys.dataplots.plot_chi2dist_experiments(total_chi2_data, experiments_chi2_stats, pdf)[source]
Plot the distribution of chi²s of the members of the pdfset.
- validphys.dataplots.plot_chi2dist_sv(dataset, abs_chi2_data_thcovmat, pdf)[source]
Same as
plot_chi2dist
considering also the theory covmat in the calculation
- validphys.dataplots.plot_dataset_inputs_phi_dist(data, dataset_inputs_bootstrap_phi_data)[source]
Generates a bootstrap distribution of phi and then plots a histogram of the individual bootstrap samples for dataset_inputs. By default the number of bootstrap samples is set to a sensible number (500) however this number can be changed by specifying bootstrap_samples in the runcard
- validphys.dataplots.plot_datasets_chi2(groups_data, groups_chi2)[source]
Plot the chi² of all datasets with bars.
- validphys.dataplots.plot_datasets_chi2_spider(groups_data, groups_chi2)[source]
Plot the chi² of all datasets with bars.
- validphys.dataplots.plot_datasets_pdfs_chi2(data, each_dataset_chi2_pdfs, pdfs)[source]
Plot the chi² of all datasets with bars, and for different pdfs.
- validphys.dataplots.plot_datasets_pdfs_chi2_sv(data, each_dataset_chi2_pdfs_sv, pdfs)[source]
Same as
plot_datasets_pdfs_chi2_sv
with the chi²s computed including scale variations
- validphys.dataplots.plot_dataspecs_datasets_chi2(dataspecs_datasets_chi2_table)[source]
Same as plot_fits_datasets_chi2 but for arbitrary dataspecs
- validphys.dataplots.plot_dataspecs_datasets_chi2_spider(dataspecs_datasets_chi2_table)[source]
Same as plot_fits_datasets_chi2_spider but for arbitrary dataspecs
- validphys.dataplots.plot_dataspecs_groups_chi2(dataspecs_groups_chi2_table, processed_metadata_group)[source]
Same as plot_fits_groups_data_chi2 but for arbitrary dataspecs
- validphys.dataplots.plot_dataspecs_positivity(dataspecs_speclabel, dataspecs_positivity_predictions, dataspecs_posdataset, pos_use_kin=False)[source]
Like
plot_positivity()
except plots positivity for each element of dataspecs, allowing positivity predictions to be generated with differenttheory_id
s as well aspdf
s
- validphys.dataplots.plot_fancy(one_or_more_results, commondata, cuts, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]
Read the PLOTTING configuration for the dataset and generate the corrspondig data theory plot.
The input results are assumed to be such that the first one is the data, and the subsequent ones are the predictions for the PDFfs. See
one_or_more_results
. The labelling of the predictions can be influenced by settinglabel
attribute of theories and pdfs.normalize_to: should be either ‘data’, a pdf id or an index of the result (0 for the data, and i for the ith pdf). None means plotting absolute values.
See docs/plotting_format.md for details on the format of the PLOTTING files.
- validphys.dataplots.plot_fancy_dataspecs(dataspecs_results, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None, use_pdferr: bool = False)[source]
General interface for data-theory comparison plots.
The user should define an arbitrary list of mappings called “dataspecs”. In each of these,
dataset
must resolve to a dataset with the same name (but could be e.g. different theories). The production rulematched_datasets_from_datasepcs
may be used for this purpose.The result will be a plot combining all the predictions from the dataspecs mapping (whch could vary in theory, pdf, cuts, etc).
The user can define a “speclabel” key in each datasspec (or only on some). By default, the PDF label will be used in the legend (like in
plot_fancy
).normalize_to must
be either:The string ‘data’ or the integer 0 to plot the ratio to data,
or the 1-based index of the dataspec to normalize to the corresponding prediction,
or None (default) to plot absolute values.
A limitation at the moment is that the data cuts and errors will be taken from the first specifiaction.
- validphys.dataplots.plot_fancy_sv_dataspecs(dataspecs_results_with_scale_variations, dataspecs_commondata, dataspecs_cuts, dataspecs_speclabel, normalize_to: (<class 'str'>, <class 'int'>, <class 'NoneType'>) = None)[source]
Exactly the same as
plot_fancy_dataspecs
but the theoretical results passed down are modified so that the 1-sigma error bands correspond to a combination of the PDF error and the scale variations collected over theoryids
- validphys.dataplots.plot_fits_chi2_spider(fits, fits_groups_chi2, fits_groups_data, processed_metadata_group)[source]
Plots the chi²s of all groups of datasets on a spider/radar diagram.
- validphys.dataplots.plot_fits_datasets_chi2(fits_datasets_chi2_table)[source]
Generate a plot equivalent to
plot_datasets_chi2
using all the fitted datasets as input.
- validphys.dataplots.plot_fits_datasets_chi2_spider(fits_datasets_chi2_table)[source]
Generate a plot equivalent to
plot_datasets_chi2_spider
using all the fitted datasets as input.
- validphys.dataplots.plot_fits_datasets_chi2_spider_bygroup(fits_datasets_chi2_table)[source]
Same as plot_fits_datasets_chi2_spider but one plot for each group.
- validphys.dataplots.plot_fits_groups_data_chi2(fits_groups_chi2_table, processed_metadata_group)[source]
Generate a plot equivalent to
plot_groups_data_chi2
using all the fitted group of data as input.
- validphys.dataplots.plot_fits_groups_data_phi(fits_groups_phi_table, processed_metadata_group)[source]
Plots a set of bars for each fit, each bar represents the value of phi for the corresponding group of datasets, which is defined according to the keys in the PLOTTING info file
- validphys.dataplots.plot_fits_phi_spider(fits, fits_groups_data, fits_groups_data_phi, processed_metadata_group)[source]
Like plot_fits_chi2_spider but for phi.
- validphys.dataplots.plot_groups_data_chi2(groups_data, groups_chi2, processed_metadata_group)[source]
Plot the chi² of all groups of datasets with bars.
- validphys.dataplots.plot_groups_data_chi2_spider(groups_data, groups_chi2, processed_metadata_group, pdf)[source]
Plot the chi² of all groups of datasets as a spider plot.
- validphys.dataplots.plot_groups_data_phi_spider(groups_data, groups_data_phi, processed_metadata_group, pdf)[source]
Plot the phi of all groups of datasets as a spider plot.
- validphys.dataplots.plot_obscorrs(corrpair_datasets, obs_obs_correlations, pdf)[source]
NOTE: EXPERIMENTAL. Plot the correlation matrix between a pair of datasets.
- validphys.dataplots.plot_orbital_momentum(pdf, Q, partial_polarized_sum_rules)[source]
In addition to plotting the correlated spin moments as in plot_polarized_momentum, it also plots the contributions from the Orbital Angular Momentum.
- validphys.dataplots.plot_phi(groups_data, groups_data_phi, processed_metadata_group)[source]
plots phi for each group of data as a bar for a single PDF input
See phi_data for information on how phi is calculated
- validphys.dataplots.plot_phi_scatter_dataspecs(dataspecs_groups, dataspecs_speclabel, dataspecs_groups_bootstrap_phi)[source]
For each of the dataspecs, a bootstrap distribution of phi is generated for all specified groups of datasets. The distribution is then represented as a scatter point which is the median of the bootstrap distribution and an errorbar which spans the 68% confidence interval. By default the number of bootstrap samples is set to a sensible value, however it can be controlled by specifying bootstrap_samples in the runcard.
- validphys.dataplots.plot_polarized_momentum(pdf, Q, partial_polarized_sum_rules, angular_momentum=False)[source]
Plot the correlated uncertainties for the truncated integrals of the polarized gluon and singlet distributions.
- validphys.dataplots.plot_positivity(pdfs, positivity_predictions_for_pdfs, posdataset, pos_use_kin=False)[source]
Plot an errorbar spanning the central 68% CI of a positivity observable as well as a point indicating the central value (according to the
pdf.stats_class.central_value()
).Errorbars and points are plotted on a symlog scale as a function of the data point index (if pos_use_kin==False) or the first kinematic variable (if pos_use_kin==True).
- validphys.dataplots.plot_replica_sum_rules(pdf, sum_rules, Q)[source]
Plot the value of each sum rule as a function of the replica index
- validphys.dataplots.plot_smpdf(pdf, dataset, obs_pdf_correlations, mark_threshold: float = 0.9)[source]
Plot the correlations between the change in the observable and the change in the PDF in (x,fl) space.
mark_threshold is the proportion of the maximum absolute correlation that will be used to mark the corresponding area in x in the background of the plot. The maximum absolute values are used for the comparison.
Examples
>>> from validphys.api import API >>> data_input = { >>> "dataset_input" : {"dataset": "HERACOMBNCEP920"}, >>> "theoryid": 200, >>> "use_cuts": "internal", >>> "pdf": "NNPDF40_nnlo_as_01180", >>> "Q": 1.6, >>> "mark_threshold": 0.2 >>> } >>> smpdf_gen = API.plot_smpdf(**data_input) >>> fig = next(smpdf_gen) >>> fig.show()
- validphys.dataplots.plot_training_length(replica_data, fit)[source]
Generate an histogram for the distribution of training lengths in a given fit. Each bin is normalised by the total number of replicas.
- validphys.dataplots.plot_training_validation(fit, replica_data, replica_filters=None)[source]
Scatter plot with the training and validation chi² for each replica in the fit. The mean is also displayed as well as a line y=x to easily identify whether training or validation chi² is larger.
- validphys.dataplots.plot_trainvaliddist(fit, replica_data)[source]
KDEs for the trainning and validation distributions for each replica in the fit.
- validphys.dataplots.plot_xq2(dataset_inputs_by_groups_xq2map, use_cuts, data_input, display_cuts: bool = True, marker_by: str = 'process type', highlight_label: str = 'highlight', highlight_datasets: (<class 'collections.abc.Sequence'>, <class 'NoneType'>) = None, aspect: str = 'landscape')[source]
Plot the (x,Q²) coverage based of the data based on some LO approximations. These are governed by the relevant kintransform.
The representation of the filtered data depends on the display_cuts and use_cuts options:
If cuts are disabled (use_cuts is CutsPolicy.NOCUTS), all the data
will be plotted (and setting display_cuts to True is an error).
If cuts are enabled (use_cuts is either CutsPolicy.FROMFIT or
CutsPolicy.INTERNAL) and display_cuts is False, the masked points will be ignored.
If cuts are enabled and display_cuts is True, the filtered points
will be displaed and marked.
The points are grouped according to the marker_by option. The possible values are: “process type”, “experiment”, “group” or “dataset”.
Some datasets can be made to appear highlighted in the figure: Define a key called
highlight_datasets
containing the names of the datasets to be highlighted and a key highlight_label with a string containing the label of the highlight, which will appear in the legend.Example
Obtain a plot with some reasonable defaults:
from validphys.api import API inp = {'dataset_inputs': [{'dataset': 'NMCPD_dw'}, {'dataset': 'NMC'}, {'dataset': 'SLACP_dwsh'}, {'dataset': 'SLACD_dw'}, {'dataset': 'BCDMSP_dwsh'}, {'dataset': 'BCDMSD_dw'}, {'dataset': 'CHORUSNUPb_dw'}, {'dataset': 'CHORUSNBPb_dw'}, {'dataset': 'NTVNUDMNFe_dw', 'cfac': ['MAS']}, {'dataset': 'NTVNBDMNFe_dw', 'cfac': ['MAS']}, {'dataset': 'HERACOMBNCEM'}, {'dataset': 'HERACOMBNCEP460'}, {'dataset': 'HERACOMBNCEP575'}, {'dataset': 'HERACOMBNCEP820'}, {'dataset': 'HERACOMBNCEP920'}, {'dataset': 'HERACOMBCCEM'}, {'dataset': 'HERACOMBCCEP'}, {'dataset': 'HERACOMB_SIGMARED_C'}, {'dataset': 'HERACOMB_SIGMARED_B'}, {'dataset': 'DYE886R_dw'}, {'dataset': 'DYE886P', 'cfac': ['QCD']}, {'dataset': 'DYE605_dw', 'cfac': ['QCD']}, {'dataset': 'CDFZRAP_NEW', 'cfac': ['QCD']}, {'dataset': 'D0ZRAP', 'cfac': ['QCD']}, {'dataset': 'D0WMASY', 'cfac': ['QCD']}, {'dataset': 'ATLASWZRAP36PB', 'cfac': ['QCD']}, {'dataset': 'ATLASZHIGHMASS49FB', 'cfac': ['QCD']}, {'dataset': 'ATLASLOMASSDY11EXT', 'cfac': ['QCD']}, {'dataset': 'ATLASWZRAP11CC', 'cfac': ['QCD']}, {'dataset': 'ATLASWZRAP11CF', 'cfac': ['QCD']}, {'dataset': 'ATLASDY2D8TEV', 'cfac': ['QCDEWK']}, {'dataset': 'ATLAS_WZ_TOT_13TEV', 'cfac': ['NRM', 'QCD']}, {'dataset': 'ATLAS_WP_JET_8TEV_PT', 'cfac': ['QCD']}, {'dataset': 'ATLAS_WM_JET_8TEV_PT', 'cfac': ['QCD']}, {'dataset': 'ATLASZPT8TEVMDIST', 'cfac': ['QCD'], 'sys': 10}, {'dataset': 'ATLASZPT8TEVYDIST', 'cfac': ['QCD'], 'sys': 10}, {'dataset': 'ATLASTTBARTOT', 'cfac': ['QCD']}, {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TRAPNORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_TTB_DIFF_8TEV_LJ_TTRAPNORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_TOPDIFF_DILEPT_8TEV_TTRAPNORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_1JET_8TEV_R06_DEC', 'cfac': ['QCD']}, {'dataset': 'ATLAS_2JET_7TEV_R06', 'cfac': ['QCD']}, {'dataset': 'ATLASPHT15', 'cfac': ['QCD', 'EWK']}, {'dataset': 'ATLAS_SINGLETOP_TCH_R_7TEV', 'cfac': ['QCD']}, {'dataset': 'ATLAS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']}, {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_T_RAP_NORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_7TEV_TBAR_RAP_NORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_T_RAP_NORM', 'cfac': ['QCD']}, {'dataset': 'ATLAS_SINGLETOP_TCH_DIFF_8TEV_TBAR_RAP_NORM', 'cfac': ['QCD']}, {'dataset': 'CMSWEASY840PB', 'cfac': ['QCD']}, {'dataset': 'CMSWMASY47FB', 'cfac': ['QCD']}, {'dataset': 'CMSDY2D11', 'cfac': ['QCD']}, {'dataset': 'CMSWMU8TEV', 'cfac': ['QCD']}, {'dataset': 'CMSZDIFF12', 'cfac': ['QCD', 'NRM'], 'sys': 10}, {'dataset': 'CMS_2JET_7TEV', 'cfac': ['QCD']}, {'dataset': 'CMS_2JET_3D_8TEV', 'cfac': ['QCD']}, {'dataset': 'CMSTTBARTOT', 'cfac': ['QCD']}, {'dataset': 'CMSTOPDIFF8TEVTTRAPNORM', 'cfac': ['QCD']}, {'dataset': 'CMSTTBARTOT5TEV', 'cfac': ['QCD']}, {'dataset': 'CMS_TTBAR_2D_DIFF_MTT_TRAP_NORM', 'cfac': ['QCD']}, {'dataset': 'CMS_TTB_DIFF_13TEV_2016_2L_TRAP', 'cfac': ['QCD']}, {'dataset': 'CMS_TTB_DIFF_13TEV_2016_LJ_TRAP', 'cfac': ['QCD']}, {'dataset': 'CMS_SINGLETOP_TCH_TOT_7TEV', 'cfac': ['QCD']}, {'dataset': 'CMS_SINGLETOP_TCH_R_8TEV', 'cfac': ['QCD']}, {'dataset': 'CMS_SINGLETOP_TCH_R_13TEV', 'cfac': ['QCD']}, {'dataset': 'LHCBZ940PB', 'cfac': ['QCD']}, {'dataset': 'LHCBZEE2FB', 'cfac': ['QCD']}, {'dataset': 'LHCBWZMU7TEV', 'cfac': ['NRM', 'QCD']}, {'dataset': 'LHCBWZMU8TEV', 'cfac': ['NRM', 'QCD']}, {'dataset': 'LHCB_Z_13TEV_DIMUON', 'cfac': ['QCD']}, {'dataset': 'LHCB_Z_13TEV_DIELECTRON', 'cfac': ['QCD']}], 'use_cuts': 'internal', 'display_cuts': False, 'theoryid': 162, 'highlight_label': 'Old', 'highlight_datasets': ['NMC', 'CHORUSNUPb_dw', 'CHORUSNBPb_dw']} API.plot_xq2(**inp)
validphys.deltachi2 module
deltachi2.py
Plots and data processing that can be used in a delta chi2 analysis
- class validphys.deltachi2.PDFEpsilonPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
PDFPlotter
Subclassing PDFPlotter in order to plot epsilon (measure of gaussanity) for multiple PDFs, yielding a separate figure for each flavour
- validphys.deltachi2.check_pdf_is_symmhessian(pdf, **kwargs)[source]
Check
pdf
has error type ofsymmhessian
- validphys.deltachi2.check_pdfs_are_montecarlo(pdfs, **kwargs)[source]
Checks that the action is applied only to a pdf consisiting of MC replicas.
- validphys.deltachi2.delta_chi2_hessian(pdf, total_chi2_data)[source]
Return delta_chi2 (computed as in plot_delta_chi2_hessian) relative to each eigenvector of the Hessian set.
- validphys.deltachi2.plot_delta_chi2_hessian_distribution(delta_chi2_hessian, pdf, total_chi2_data)[source]
Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.
- validphys.deltachi2.plot_delta_chi2_hessian_eigenv(delta_chi2_hessian, pdf)[source]
Plot of the chi2 difference between chi2 of each eigenvector of a symmHessian set and the central value for all experiments in a fit. As a function of every eigenvector in a first plot, and as a distribution in a second plot.
- validphys.deltachi2.plot_epsilon(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, eps=None)[source]
Plot the discrepancy (epsilon) of the 1-sigma and 68% bands at each grid value for all pdfs for a given Q. See https://arxiv.org/abs/1505.06736 eq. (11)
xscale is read from pdf plotting_grid scale, which is ‘log’ by default.
eps defines the value at which plot a simple hline
- validphys.deltachi2.plot_kullback_leibler(delta_chi2_hessian)[source]
Determines the Kullback–Leibler divergence by comparing the expectation value of Delta chi2 to the cumulative distribution function of chi-square distribution with one degree of freedom (see: https://en.wikipedia.org/wiki/Chi-square_distribution).
The Kullback-Leibler divergence provides a measure of the difference between two distribution functions, here we compare the chi-squared distribution and the cumulative distribution of the expectation value of Delta chi2.
- validphys.deltachi2.plot_pos_neg_pdfs(pdf, pos_neg_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None)[source]
Plot the the uncertainty of the original hessian pdfs, as well as that of the positive and negative subset.
validphys.eff_exponents module
Tools for computing and plotting effective exponents.
- class validphys.eff_exponents.ExponentBandPlotter(hlines, exponent, *args, **kwargs)[source]
Bases:
BandPDFPlotter
,PreprocessingPlotter
- draw(pdf, grid, flstate)[source]
Overload
BandPDFPlotter.draw()
to plot bands of the effective exponent calculated from the replicas and horizontal lines for the effective exponents of the previous/next fits, if possible.flstate
is an element of the flavours for the first pdf specified in pdfs. If this flavour doesn’t exist in the current pdf’s fitbasis or the set of flavours for which the preprocessing exponents exist for the current pdf no horizontal lines are plotted.
- class validphys.eff_exponents.PreprocessingPlotter(exponent, *args, **kwargs)[source]
Bases:
PDFPlotter
Class inherenting from BandPDFPlotter, changing title and ylabel to reflect the effective exponent being plotted.
- validphys.eff_exponents.alpha_eff(pdf: ~validphys.core.PDF, *, xmin: ~numbers.Real = 1e-06, xmax: ~numbers.Real = 0.001, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Return a list of xplotting_grids containing the value of the effective exponent alpha at the specified values of x and flavour. alpha is relevant at small x, hence the linear scale.
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
Q: The PDF scale in GeV.
- validphys.eff_exponents.beta_eff(pdf, *, xmin: ~numbers.Real = 0.6, xmax: ~numbers.Real = 0.9, npoints: int = 200, Q: ~numbers.Real = 1.65, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Return a list of xplotting_grids containing the value of the effective exponent beta at the specified values of x and flavour. beta is relevant at large x, hence the linear scale.
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
Q: The PDF scale in GeV.
- validphys.eff_exponents.effective_exponents_table_internal(next_effective_exponents_table, *, fit=None, basis)[source]
Returns a table which concatenates previous_effective_exponents_table and next_effective_exponents_table if both tables contain effective exponents in the same basis.
If the previous exponents are in a different basis, or no fit was given to read the previous exponents from, then only the next exponents table is returned, for plotting purposes.
- validphys.eff_exponents.fmt(a)
- validphys.eff_exponents.get_alpha_lines(effective_exponents_table_internal)[source]
Given an effective_exponents_table_internal returns the rows with bounds of the alpha effective exponent for all flavours, used to plot horizontal lines on the alpha effective exponent plots.
- validphys.eff_exponents.get_beta_lines(effective_exponents_table_internal)[source]
Same as get_alpha_lines but for beta
- validphys.eff_exponents.iterate_preprocessing_yaml(fit, next_fit_eff_exps_table, _flmap_np_clip_arg=None)[source]
Using py:func:next_effective_exponents_table update the preprocessing exponents of the input
fit
. This is part of the usual pipeline referred to as “iterating a fit”, for more information see: How to run an iterated fit. A fully iterated runcard can be obtained from the actioniterated_runcard_yaml()
.This action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:
`yaml {@iterate_preprocessing_yaml@} `
Alternatively, using the API, the yaml dump returned by this function can be written to a file e.g
>>> from validphys.api import API >>> yaml_output = API.iterate_preprocessing_yaml(fit=<fit name>) >>> with open("output.yml", "w+") as f: ... f.write(yaml_output)
- Parameters:
fit (validphys.core.FitSpec) – Whose preprocessing range will be iterated, the output runcard will be the same as the one used to run this fit, except with new preprocessing range.
next_fit_eff_exps_table (pd.DataFrame) – Table outputted by
next_fit_eff_exps_table()
containing the next preprocessing ranges._flmap_np_clip_arg (dict) – Internal argument used by
vp-nextfitruncard
. Dictionary containing a mapping like{<flavour>: {<largex/smallx>: {a_min: <min value>, a_max: <max value>}}}
. If a flavour is present in_flmap_np_clip_arg
then the preprocessing ranges will be passed throughnp.clip
with the arguments supplied in the mapping.
- validphys.eff_exponents.iterated_runcard_yaml(fit, update_runcard_description_yaml)[source]
Takes the runcard with preprocessing iterated and description updated then
Updates the t0 pdf set to be
fit
Modifies the random seeds (to random unsigned long ints)
This should facilitate running a new fit with identical input settings as the specified
fit
with the t0, seeds and preprocessing iterated. For more information see: How to run an iterated fitThis action can be used in a report but should be wrapped in a code block to be formatted correctly, for example:
`yaml {@iterated_runcard_yaml@} `
alternatively, using the API, the yaml dump returned by this function can be written to a file e.g
>>> from validphys.api import API >>> yaml_output = API.iterated_runcard_yaml( ... fit=<fit name>, ... _updated_description="My iterated fit" ... ) >>> with open("output.yml", "w+") as f: ... f.write(yaml_output)
- validphys.eff_exponents.next_effective_exponents_table(pdf: ~validphys.core.PDF, *, fitq0fromfit: (<class 'numbers.Real'>, <class 'NoneType'>) = None, x1_alpha: ~numbers.Real = 1e-06, x2_alpha: ~numbers.Real = 0.001, x1_beta: ~numbers.Real = 0.65, x2_beta: ~numbers.Real = 0.95, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Given a PDF, calculate the next effective exponents
By default x1_alpha = 1e-6, x2_alpha = 1e-3, x1_beta = 0.65, and x2_beta = 0.95, but different values can be specified in the runcard. The values control where the bounds of alpha and beta are evaluated:
- alpha_min:
singlet/gluon: the 2x68% c.l. lower value evaluated at x=`x1_alpha` others : min(2x68% c.l. lower value evaluated at x=`x1_alpha` and x=`x2_alpha`)
- alpha_max:
singlet/gluon: min(2 and the 2x68% c.l. upper value evaluated at x=`x1_alpha`) others : min(2 and max(2x68% c.l. upper value evaluated at x=`x1_alpha`
and x=`x2_alpha`))
- beta_min:
max(0 and min(2x68% c.l. lower value evaluated at x=`x1_beta` and x=`x2_beta`))
- beta_max:
max(2x68% c.l. upper value evaluated at x=`x1_beta` and x=`x2_beta`)
- validphys.eff_exponents.plot_alpha_eff(fits_pdf, alpha_eff_fits, fits_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]
Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for
xplotting_grid
for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.
xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.
- validphys.eff_exponents.plot_alpha_eff_internal(pdfs, alpha_eff_pdfs, pdfs_alpha_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]
Plot the central value and the uncertainty of a list of effective exponents as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding alpha effective. Otherwise, plot absolute values. See the help for
xplotting_grid
for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.normalize_to: Either the name of one of the alpha effective or its corresponding index in the list, starting from one, or None to plot absolute values.
- validphys.eff_exponents.plot_beta_eff(fits_pdf, beta_eff_fits, fits_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]
Same as plot_alpha_eff but for beta effective exponents
- validphys.eff_exponents.plot_beta_eff_internal(pdfs, beta_eff_pdfs, pdfs_beta_lines, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ybottom=None, ytop=None)[source]
Same as plot_alpha_eff_internal but for beta effective exponent
- validphys.eff_exponents.previous_effective_exponents(basis: str, fit: (<class 'validphys.core.FitSpec'>, <class 'NoneType'>) = None)[source]
If provided with a fit, check that the basis is the basis which was fitted if so then return the previous effective exponents read from the fit runcard.
- validphys.eff_exponents.previous_effective_exponents_table(fit: FitSpec)[source]
Given a fit, reads the previous exponents from the fit runcard
- validphys.eff_exponents.update_runcard_description_yaml(iterate_preprocessing_yaml, _updated_description=None)[source]
Take the runcard with iterated preprocessing and update the description if
_updated_description
is provided. As withiterate_preprocessing_yaml()
the result can be used in a report but should be wrapped in a code block to be formatted correctly, for example:`yaml {@update_runcard_description_yaml@} `
validphys.filters module
Filters for NNPDF fits
- class validphys.filters.AddedFilterRule(dataset: str | None = None, process_type: str | None = None, rule: str | None = None, reason: str | None = None, local_variables: Mapping[str, str | float] | None = None, PTO: str | None = None, FNS: str | None = None, IC: str | None = None)[source]
Bases:
FilterRule
Dataclass which carries extra filter rule that is added to the default rule.
- exception validphys.filters.BadPerturbativeOrder[source]
Bases:
ValueError
Exception raised when the perturbative order string is not recognized.
- exception validphys.filters.FatalRuleError[source]
Bases:
Exception
Exception raised when a rule application failed at runtime.
- class validphys.filters.FilterDefaults(q2min: float | None = None, w2min: float | None = None, maxTau: float | None = None)[source]
Bases:
object
Dataclass carrying default values for filters (cuts) taking into account the values of
q2min
,w2min
andmaxTau
.
- class validphys.filters.FilterRule(dataset: str | None = None, process_type: str | None = None, rule: str | None = None, reason: str | None = None, local_variables: Mapping[str, str | float] | None = None, PTO: str | None = None, FNS: str | None = None, IC: str | None = None)[source]
Bases:
object
Dataclass which carries the filter rule information.
- exception validphys.filters.MissingRuleAttribute[source]
Bases:
RuleProcessingError
,AttributeError
Exception raised when a rule is missing required attributes.
- class validphys.filters.PerturbativeOrder(string)[source]
Bases:
object
Class that conveniently handles perturbative order declarations for use within the Rule class filter.
- Parameters:
string (str) –
A string in the format of NNLO or equivalently N2LO. This can be followed by one of ! + - or none.
The syntax allows for rules to be executed only if the perturbative order is within a given range. The following enumerates all 4 cases as an example:
NNLO+ only execute the following rule if the pto is 2 or greater NNLO- only execute the following rule if the pto is strictly less than 2 NNLO! only execute the following rule if the pto is strictly not 2 NNLO only execute the following rule if the pto is exactly 2
Any unrecognized string will raise a BadPerturbativeOrder exception.
Example
>>> from validphys.filters import PerturbativeOrder >>> pto = PerturbativeOrder("NNLO+") >>> pto.numeric_pto 2 >>> 1 in pto False >>> 2 in pto True >>> 3 in pto True
- class validphys.filters.Rule(initial_data: FilterRule, *, defaults: dict, theory_parameters: dict, loader=None)[source]
Bases:
object
Rule object to be used to generate cuts mask.
A rule object is created for each rule in ./cuts/filters.yaml
Old commondata relied on the order of the kinematical variables to be the same as specified in the KIN_LABEL dictionary set in this module. The new commondata specification instead defines explicitly the name of the variables in the metadata. Therefore, when using a new-format commondata, the KIN_LABEL dictionary will not be used and the variables defined in it will be used instead.
- Parameters:
initial_data (dict) –
A dictionary containing all the information regarding the rule. This contains the name of the dataset the rule to applies to and/or the process type the rule applies to. Additionally, the rule itself is defined, alongside the reason the rule is used. Finally, the user can optionally define their own custom local variables.
By default these are defined in cuts/filters.yaml
defaults (dict) –
A dictionary containing default values to be used globally in all rules.
By default these are defined in cuts/defaults.yaml
theory_parameters – Dict containing pairs of (theory_parameter, value)
loader (validphys.loader.Loader, optional) – A loader instance used to retrieve the datasets.
- numpy_functions = {'fabs': <ufunc 'fabs'>, 'log': <ufunc 'log'>, 'sqrt': <ufunc 'sqrt'>}
- exception validphys.filters.RuleProcessingError[source]
Bases:
Exception
Exception raised when we couldn’t process a rule.
- validphys.filters.check_additional_errors(additional_errors)[source]
Lux additional errors pdf check
- validphys.filters.check_integrability(integdatasets)[source]
Verify positive datasets are ready for the fit.
- validphys.filters.check_positivity(posdatasets)[source]
Verify positive datasets are ready for the fit.
- validphys.filters.check_unpolarized_bc(unpolarized_bc)[source]
Check that unpolarized PDF bound can be loaded normally.
- validphys.filters.default_filter_rules_input()[source]
Return a tuple of FilterRule objects. These are defined in
filters.yaml
in thevalidphys.cuts
module.
- validphys.filters.default_filter_settings_input()[source]
Return a FilterDefaults dataclass with the default hardcoded filter settings. These are defined in
defaults.yaml
in thevalidphys.cuts
module.
- validphys.filters.filter_closure_data_by_experiment(filter_path, experiments_data, fakepdf, fakenoise, filterseed, data_index, sep_mult)[source]
Applies
_filter_closure_data()
on each experiment in the closure test.This function just peforms a
for
loop overexperiments
, the reason we don’t usereportengine.collect
is that it can permute the order in which closure data is generate, which means that the pseudodata is not reproducible.
- validphys.filters.filter_real_data(filter_path, data)[source]
Filter real data, cutting any points which do not pass the filter rules.
- validphys.filters.get_cuts_for_dataset(commondata, rules) list [source]
Function to generate a list containing the index of all experimental points that passed kinematic cut rules stored in ./cuts/filters.yaml
- Parameters:
commondata (validphys.coredata.CommonData)
rules (List[Rule]) – A list of Rule objects specifying the filters.
- Returns:
mask – List object containing index of all passed experimental values
- Return type:
Example
>>> from validphys.filters import (get_cuts_for_dataset, Rule, ... default_filter_settings, default_filter_rules_input) >>> from validphys.loader import Loader >>> l = Loader() >>> cd = l.check_commondata("NMC") >>> theory = l.check_theoryID(53) >>> filter_defaults = default_filter_settings() >>> params = theory.get_description() >>> rule_list = [Rule(initial_data=i, defaults=filter_defaults, theory_parameters=params) ... for i in default_filter_rules_input()] >>> get_cuts_for_dataset(cd, rules=rule_list)
validphys.fitdata module
Utilities for loading data from fit folders
- class validphys.fitdata.DatasetComp(common, first_only, second_only)
Bases:
tuple
- common
Alias for field number 0
- first_only
Alias for field number 1
- second_only
Alias for field number 2
- class validphys.fitdata.FitInfo(nite, training, validation, chi2, is_positive, arclengths, integnumbers)
Bases:
tuple
- arclengths
Alias for field number 5
- chi2
Alias for field number 3
- integnumbers
Alias for field number 6
- is_positive
Alias for field number 4
- nite
Alias for field number 0
- training
Alias for field number 1
- validation
Alias for field number 2
- validphys.fitdata.check_lhapdf_info(results_dir, fitname)[source]
Check that an LHAPDF info metadata file is present in the fit results
- validphys.fitdata.check_nnfit_results_path(path)[source]
Returns True if the requested path is a valid results directory, i.e if it is a directory and has a ‘nnfit’ subdirectory
- validphys.fitdata.check_replica_files(replica_path, prefix)[source]
Verification of a replica results directory at replica_path for a fit named prefix. Returns True if the results directory is complete
- validphys.fitdata.datasets_properties_table(data_input)[source]
Return dataset properties for each dataset in
data_input
- validphys.fitdata.fit_code_version(fit)[source]
Returns table with the code version from
replica_1/{fitname}.json
files. Note that the version for thensorflow distinguishes between the mkl=on and off version
- validphys.fitdata.fit_datasets_properties_table(fitinputcontext)[source]
Returns table of dataset properties for each dataset used in a fit.
- validphys.fitdata.fit_summary(fit_name_with_covmat_label, replica_data, total_chi2_data, total_phi_data)[source]
Summary table of fit properties - Central chi-squared - Average chi-squared - Training and Validation error functions - Training lengths - Phi
Note: Chi-squared values from the replica_data are not used here (presumably they are fixed to being t0)
This uses a corrected form for the error on phi in comparison to the vp1 value. The error is propagated from the uncertainty on the average chi-squared only.
- validphys.fitdata.fit_theory_covmat_summary(fit, fitthcovmat)[source]
returns a table with a single column for the fit, with three rows indicating if the theory covariance matrix was used in the ‘sampling’ of the pseudodata, the ‘fitting’, and the ‘validphys statistical estimators’ in the current namespace for that fit.
Return a table with the same columns as
replica_data
indexed by the replica fit ID. For identical fits, the values across rows should be the same.If some replica ID is not present for a given fit (e.g. discarded by postfit), the corresponding entries in the table will be null.
- validphys.fitdata.fits_version_table(fits_fit_code_version)[source]
Produces a table of version information for multiple fits.
- validphys.fitdata.load_fitinfo(replica_path, prefix)[source]
Process the data in the
.json.
file for a single replica into aFitInfo
object. If the.json
file does not exist an old-format fit is assumed andold_load_fitinfo
will be called instead.
- validphys.fitdata.match_datasets_by_name(fits, fits_datasets)[source]
Return a tuple with common, first_only and second_only. The elements of the tuple are mappings where the keys are dataset names and the values are the two datasets contained in each fit for common, and the corresponfing dataset inclucded only in the first fit and only in the second fit.
- validphys.fitdata.num_fitted_replicas(fit)[source]
Function to obtain the number of nnfit replicas. That is the number of replicas before postfit was run.
- validphys.fitdata.print_dataset_differences(fits, match_datasets_by_name, print_common: bool = True)[source]
Given exactly two fits, print the datasets that are included in one ” “but not in the other. If print_common is True, also print the datasets that are common.
For the purposes of visual aid, everything is ordered by the dataset name, in terms of the the convention for the commondata means that everything is order by:
Experiment name
Process
Energy
- validphys.fitdata.print_different_cuts(fits, test_for_same_cuts)[source]
Print a summary of the datasets that are included in both fits but have different cuts.
- validphys.fitdata.print_systype_overlap(groups_commondata, group_dataset_inputs_by_metadata)[source]
Returns a set of systypes that overlap between groups. Discards the set of systypes which overlap but do not imply correlations
- validphys.fitdata.replica_data(fit, replica_paths)[source]
Load the necessary data from the
.json
file of each of the replicas. The corresponding PDF set must be installed in the LHAPDF path.The included information is:
(‘nite’, ‘training’, ‘validation’, ‘chi2’, ‘pos_status’, ‘arclenghts’)
- validphys.fitdata.summarise_fits(collected_fit_summaries)[source]
Produces a table of basic comparisons between fits, includes all the fields used in fit_summary
- validphys.fitdata.summarise_theory_covmat_fits(fits_theory_covmat_summary)[source]
Collects the theory covmat summary for all fits and concatenates them into a single table
- validphys.fitdata.t0_chi2_info_table(pdf, dataset_inputs_abs_chi2_data, t0pdfset, use_t0)[source]
Provides table with - t0pdfset name - Central t0-chi-squared - Average t0-chi-squared
- validphys.fitdata.test_for_same_cuts(fits, match_datasets_by_name)[source]
Given two fits, return a list of tuples (first, second) where first and second are DatasetSpecs that correspond to the same dataset but have different cuts, such that first is included in the first fit and second in the second.
validphys.fitveto module
fitveto.py
Module for the determination of passing fit replicas.
- Current active vetoes:
Positivity - Replicas with FitInfo.is_positive == False ChiSquared - Replicas with ChiSquared > nsigma_discard_chi2*StandardDev + Average ArclengthX - Replicas with ArcLengthX > nsigma_discard_arclength*StandardDev + Average Integrability - Replicas with IntegrabilityNumbers < integ_threshold
- validphys.fitveto.determine_vetoes(fitinfos: list, nsigma_discard_chi2: float, nsigma_discard_arclength: float, integ_threshold: float)[source]
Assesses whether replica fitinfo passes standard NNPDF vetoes Returns a dictionary of vetoes and their passing boolean masks. Included in the dictionary is a ‘Total’ veto.
- validphys.fitveto.distribution_veto(dist, prior_mask, nsigma_threshold)[source]
For a given distribution (a list of floats), returns a boolean mask specifying the passing elements. The result is a new mask of the elements that satisfy:
value <= mean + nsigma_threshold*standard_deviation
Only points passing the prior_mask are considered in the average or standard deviation.
validphys.fkparser module
This module implements parsers for FKtable and CFactor files into useful
datastructures, contained in the validphys.coredata
module, which can
be easily pickled and interfaced with common Python libraries.
Most users will be interested in using the high level interface
load_fktable()
. Given a validphys.core.FKTableSpec
object, it returns an instance of validphys.coredata.FKTableData
,
an object with the required information to compute a convolution, with the
CFactors applied.
from validphys.fkparser import load_fktable
from validphys.loader import Loader
l = Loader()
fk = l.check_fktable(setname="ATLASTTBARTOT", theoryID=53, cfac=('QCD',))
res = load_fktable(fk)
- exception validphys.fkparser.BadCFactorError[source]
Bases:
Exception
Exception raised when an CFactor cannot be parsed correctly
- exception validphys.fkparser.BadFKTableError[source]
Bases:
Exception
Exception raised when an FKTable cannot be parsed correctly
- class validphys.fkparser.GridInfo(setname: str, hadronic: bool, ndata: int, nx: int)[source]
Bases:
object
Class containing the basic properties of an FKTable grid.
- validphys.fkparser.load_fktable(spec)[source]
Load the data corresponding to a FKSpec object. The cfactors will be applied to the grid. If we have a new-type fktable, call directly load(), otherwise fallback to the old parser
- validphys.fkparser.open_fkpath(path)[source]
Return a file-like object from the fktable path, regardless of whether it is compressed
Parameters
- path: Path or str
Path like file containing a valid FKTable. It can be either inside a tarball or in plain text.
- returns:
f – A file like object for further processing.
- rtype:
file
- validphys.fkparser.parse_cfactor(f)[source]
Parse an open byte stream into a :py:class`CFactorData`. Raise a BadCFactorError if problems are encountered.
- Parameters:
f (file) – Binary file-like object
- Returns:
cfac – An object containing the data on the cfactor for each point.
- Return type:
- validphys.fkparser.parse_fktable(f)[source]
Parse an open byte stream into an FKTableData. Raise a BadFKTableError if problems are encountered.
- Parameters:
f (file) – Open file-like object. See :func:`open_fkpath`to obtain it.
- Returns:
fktable – An object containing the FKTable data and information.
- Return type:
Notes
This function operates at the level of a single file, and therefore it does not apply CFactors (see
load_fktable()
for that) or handle operations within COMPOUND ensembles.
validphys.gridvalues module
gridvalues.py
Core functionality needed to obtain a set of values from LHAPDF. The tools for representing these grids are in pdfgrids.py (the validphys provider module), and the basis transformations are in pdfbases.py
- validphys.gridvalues.central_grid_values(pdf: PDF, flmat, xmat, qmat)[source]
Same as
grid_values()
but it returns only the central values. The return value is indexed as:grid_values[replica][flavour][x][Q]
where the first dimension (coresponding to the central member of the PDF set) is always one.
- validphys.gridvalues.evaluate_luminosity(pdf_set: LHAPDFSet, n: int, s: float, mx: float, x1: float, x2: float, channel)[source]
Returns PDF luminosity at specified values of mx, x1, x2, sqrts**2 for a given channel.
pdf_set: The interested PDF set s: The square of the center of mass energy GeV^2. mx: The invariant mass bin GeV. x1 and x2: The partonic x1 and x2. channel: The channel tag name from LUMI_CHANNELS.
- validphys.gridvalues.grid_values(pdf: PDF, flmat, xmat, qmat)[source]
Evaluate
x*f(x)
on a grid of points in flavour, x and Q.- Parameters:
pdf (PDF) – Any PDF set
flmat (iterable) – A list of PDG IDs corresponding the the LHAPDF flavours in the grid.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.
- Returns:
A 4-dimension array with the PDF values at the input parameters
for each replica. The return value is indexed as follows:: – grid_values[replica][flavour][x][Q]
See also
validphys.pdfbases.Basis.grid_values()
,interface
,allowing
,and
,aliases
Examples
Compute the maximum difference across replicas between the u and ubar PDFs (times x) for x=0.05 and both Q=10 and Q=100:
>>> from validphys.loader import Loader >>> from validphys.gridvalues import grid_values >>> import numpy as np >>> gv = grid_values(Loader().check_pdf('NNPDF31_nnlo_as_0118'), [-1, 1], [0.5], [10, 100]) >>> #Take the difference across the flavour dimension, the max >>> #across the replica dimension, and leave the Q dimension untouched. >>> np.diff(gv, axis=1).max(axis=0).ravel() array([0.07904731, 0.04989902], dtype=float32)
validphys.hessian2mc module
validphys.hessian2mc.py
This module contains the functions that can be used to convert Hessian sets like MSHT20 and CT18 to Monte Carlo sets. The functions implemented here follow equations (4.3) of the paper arXiv:2203.05506
- validphys.hessian2mc.write_hessian_to_mc_watt_thorne(pdf, mc_pdf_name, num_members, watt_thorne_rnd_seed=1)[source]
Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne (MSHT20) prescription described in Eq. 4.3 of arXiv:2203.05506.
- Parameters:
pdf (validphys.core.PDF) – The Hessian PDF set that is to be converted to Monte Carlo.
mc_pdf_name (str) – The name of the new Monte Carlo PDF set.
- validphys.hessian2mc.write_mc_watt_thorne_replicas(Rjk_std_normal, replicas_df, mc_pdf_path)[source]
Writes the Monte Carlo representation of a PDF set that is in Hessian form using the Watt-Thorne prescription described in Eq. 4.3 of arXiv:2203.05506.
- Parameters:
Rjk_std_normal (np.ndarray) – Array of shape (num_members, n_eig) containing random standard normal numbers.
replicas_df (pd.DataFrame) – DataFrame containing replicas of the hessian set at all scales.
mc_pdf_path (pathlib.Path) – Path to the new Monte Carlo PDF set.
validphys.hyper_algorithm module
This module contains functions dedicated to process the json dictionaries
- validphys.hyper_algorithm.autofilter_dataframe(dataframe, keys, n_to_combine=1, n_to_kill=1, threshold=-1)[source]
Receives a dataframe and a list of keys. Creates combinations of n_to_combine keys and computes the reward Finally removes from the dataframe the n_to_kill worse combinations
Anything under threshold will be removed and will not count towards the n_to_kill (by default threshold = -50 so only things which are really bad will be removed)
- # Arguments:
dataframe: a pandas dataframe
keys: keys to combine
n_to_combine: how many keys do we want to combine
n_to_kill: how many combinations to kill
threshold: anything under this reward will be removed
- # Returns:
- dataframe_sliced: a slice of the dataframe with the weakest combinations
removed
- validphys.hyper_algorithm.bin_generator(df_values, max_n=10)[source]
Receives a dataframe with a list of unique values . If there are more than max_n of them and they are numeric, create max_n bins. If they are already discrete values or there are less than max_n options, output the same input
- # Arguments:
df_values: dataframe with unique values
maximum: maximum number of allowed different values
- # Returns:
new_vals: list of tuples with (initial, end) value of the bin
- validphys.hyper_algorithm.compute_reward(mdict, biggest_ntotal)[source]
Given a combination dictionary computes the reward function:
If the fail rate for this combination is above the fail threshold, rewards is -100
- The formula below for the reward takes into account:
The rate of ok fits that have a loss below the loss_threshold
The rate of fits that failed
The std deviation
How far away is the median from the best loss
How far away are median and average
- validphys.hyper_algorithm.dataframe_removal(dataframe, hit_list)[source]
Removes all combinations defined in hit_list from the dataframe. The hit list is list of dictionaries containing the ‘slice’ key where ‘slice’ must be a slice of ‘dataframe’
- # Arguments:
dataframe: a pandas dataframe
hit_list: the list of element to remove
- # Returns:
new_dataframe: the same dataframe with all elements from hit_list removed
- validphys.hyper_algorithm.get_combinations(key_info, ncomb)[source]
Given a dictionary mapping keys to iterables of possible values (key_info), return a list of the product of all possible mappings of a subset of ncomb keys to single values out of the corresponding possible values, for all such subsets.
For instance, key_info = {
‘key1’ : [val1-1, val1-2, …], ‘key2’ : [val2-1, val2-2, …], }
ncomb = 2
will return a list of dictionaries: [ {‘key1’ : val1-1, ‘key2’, val2-1 … }, {‘key1’ : val1-1, ‘key2’, val2-2 … }, {‘key1’ : val1-2, ‘key2’, val2-1 … }, {‘key1’ : val1-2, ‘key2’, val2-2 … }, ]
Get all combinations of ncomb elements for the keys and values given in the dictionary key_info:
- # Arguments:
key_info: dictionary with the possible values for each key
ncomb: elements to combine
- # Returns:
all_combinations: A list of dictionaries of parameters
- validphys.hyper_algorithm.get_slice(dataframe, query_dict)[source]
Returns a slice of the dataframe where some keys match some values keys_info must be a dictionary {key1 : value1, key2, value2 …} # Arguments:
dataframe: a pandas dataframe
query_dict: a dictionary of combination as given by get_combinations
- validphys.hyper_algorithm.parse_keys(dataframe, keys)[source]
Receives a dataframe and a set of keys Looks into the dataframe to read the possible values of the keys
Returns a dictionary { ‘key’ : [possible values] },
If the values are not discrete then we need to bin it let’s do this for anything with two many numerical values
- # Arguments:
dataframe: a pandas dataframe
keys: keys to combine
- # Returns:
key_info: a dictionary with the possible values for each key
- validphys.hyper_algorithm.process_slice(df_slice)[source]
Function to process a slice into a dictionary with useful stats If the slice is None it means the combination does not apply
- # Arguments:
df_slice: a slice of a pandas dataframe
- # Returns:
proc_dict: a dictionary of stats
- validphys.hyper_algorithm.study_combination(dataframe, query_dict)[source]
Given a dataframe and a dictionary of {key1 : value1, key2: value2} returns a dictionary with a number of stats for that combination
- # Arguments:
dataframe: a pandas dataframe
query_dict: a dictionary for a combination as given by get_combinations
- # Returns:
proc_dict: a dictionary of the “statistics” for this combination
validphys.hyperoptplot module
Module for the parsing and plotting of the results and output of previous hyperparameter scans
- class validphys.hyperoptplot.HyperoptTrial(trial_dict, base_params=None, minimum_losses=1, linked_trials=None)[source]
Bases:
object
Hyperopt trial class. Makes the dictionary-like output of
hyperopt
into an object that can be easily managed- Parameters:
trial_dict (dict) – one single result (a dictionary) from a
tries.json
filebase_params (dict) – Base parameters of the runcard which can be used to complete the hyperparameter dictionary when not all parameters were scanned
minimum_losses (int) – Minimum number of losses to be found in the trial for it to be considered succesful
linked_trials (list) – List of trials coming from the same file as this trial
- property loss
Return the loss of the hyperopt dict
- property params
Parameters for the fit
- property reward
Return and cache the reward value
- property weighted_reward
Return the reward weighted to the mean value of the linked trials
- validphys.hyperoptplot.best_setup(hyperopt_dataframe, hyperscan_config, commandline_args)[source]
Generates a clean table with information on the hyperparameter settings of the best setup.
- validphys.hyperoptplot.evaluate_trial(trial_dict, validation_multiplier, fail_threshold, loss_target)[source]
Read a trial dictionary and compute the true loss and decide whether the run passes or not
- validphys.hyperoptplot.filter_by_string(filter_string)[source]
Receives a data_dict (a parsed trial) and a filter string, returns True if the trial passes the filter
filter string must have the format: key<operator>string where <operator> can be any of !=, =, >, <
- # Arguments:
filter_string: the expresion to evaluate
- # Returns:
- filter_function: a function that takes a data_dict and
returns true if the condition in filter_string passes
- validphys.hyperoptplot.generate_dictionary(replica_path, loss_target, json_name='tries.json', starting_index=0, val_multiplier=0.5, fail_threshold=10.0)[source]
Reads a json file and returns a list of dictionaries
- # Arguments:
replica_path: folder in which the tries.json file can be found
- starting_index: if the trials are to be added to an already existing
set, make sure the id has the correct index!
val_multiplier: validation multipler
fail_threhsold: threshold for the loss to consider a configuration as a failure
- validphys.hyperoptplot.hyperopt_dataframe(commandline_args)[source]
Loads the data generated by running hyperopt and stored in json files into a dataframe, and then filters the data according to the selection criteria provided by the command line arguments. It then returns both the entire dataframe as well as a dataframe object with the hyperopt parametesr of the best setup.
- validphys.hyperoptplot.hyperopt_table(hyperopt_dataframe)[source]
Generates a table containing complete information on all the tested setups that passed the filters set in the commandline arguments.
- validphys.hyperoptplot.order_axis(df, bestdf, key)[source]
Helper function for ordering the axis and make sure the best is always first
- validphys.hyperoptplot.parse_architecture(trial)[source]
This function parses the family of parameters which regards the architecture of the NN
number_of_layers activation_per_layer nodes_per_layer l1, l2, l3, l4… max_layers layer_type dropout initializer
- validphys.hyperoptplot.parse_optimizer(trial)[source]
This function parses the parameters that affect the optimization
optimizer learning_rate (if it exists)
- validphys.hyperoptplot.parse_statistics(trial)[source]
Parse the statistical information of the trial
validation loss testing loss status of the run
- validphys.hyperoptplot.parse_stopping(trial)[source]
This function parses the parameters that affect the stopping
epochs stopping_patience pos_initial pos_multiplier
- validphys.hyperoptplot.parse_trial(trial)[source]
Trials are very convoluted object, very branched inside The goal of this function is to separate said branching so we can create hierarchies
- validphys.hyperoptplot.plot_activation_per_layer(hyperopt_dataframe)[source]
Generates a violin plot of the loss per activation function.
- validphys.hyperoptplot.plot_clipnorm(hyperopt_dataframe, optimizer_name)[source]
Generates a scatter plot of the loss as a function of the clipnorm for a given optimizer.
- validphys.hyperoptplot.plot_epochs(hyperopt_dataframe)[source]
Generates a scatter plot of the loss as a function the number of epochs.
- validphys.hyperoptplot.plot_initializer(hyperopt_dataframe)[source]
Generates a violin plot of the loss per initializer.
- validphys.hyperoptplot.plot_iterations(hyperopt_dataframe)[source]
Generates a scatter plot of the loss as a function of the iteration index.
- validphys.hyperoptplot.plot_learning_rate(hyperopt_dataframe, optimizer_name)[source]
Generates a scatter plot of the loss as a function of the learning rate for a given optimizer.
- validphys.hyperoptplot.plot_number_of_layers(hyperopt_dataframe)[source]
Generates a violin plot of the loss as a function of the number of layers of the model.
validphys.kinematics module
Provides information on the kinematics involved in the data.
Uses the PLOTTING file specification.
- class validphys.kinematics.XQ2Map(experiment, commondata, fitted, masked, group)
Bases:
tuple
- commondata
Alias for field number 1
- experiment
Alias for field number 0
- fitted
Alias for field number 2
- group
Alias for field number 4
- masked
Alias for field number 3
- validphys.kinematics.all_commondata_grouping(all_commondata, metadata_group)[source]
Return a table with the grouping specified by metadata_group key for each dataset for all available commondata.
- validphys.kinematics.all_kinlimits_table(all_kinlimits, use_kinoverride: bool = True)[source]
Return a table with the kinematic limits for the datasets given as input in dataset_inputs. If the PLOTTING overrides are not used, the information on sqrt(k2) will be displayed.
- validphys.kinematics.describe_kinematics(commondata, titlelevel: int = 1)[source]
Output a markdown text describing the stored metadata for a given commondata.
titlelevel can be used to control the header level of the title.
- validphys.kinematics.kinematics_table(kinematics_table_notable)[source]
Same as kinematics_table_notable but writing the table to file
- validphys.kinematics.kinematics_table_notable(commondata, cuts, show_extra_labels: bool = False)[source]
Table containing the kinematics of a commondata object, indexed by their datapoint id. The kinematics will be tranfsormed as per the PLOTTING file of the dataset or process type, and the column headers will be the labels of the variables defined in the metadata.
If
show_extra_labels
isTrue
then extra label defined in the PLOTTING files will be displayed. Otherwise only the original three kinematics will be shown.
- validphys.kinematics.kinlimits(commondata, cuts, use_cuts, use_kinoverride: bool = True)[source]
Return a mapping containing the number of fitted and used datapoints, as well as the label, minimum and maximum value for each of the three kinematics. If
use_kinoverride
is set to False, the PLOTTING files will be ignored and the kinematics will be interpred based on the process type only. If use_cuts is ‘CutsPolicy.NOCUTS’, the information on the total number of points will be displayed, instead of the fitted ones.
validphys.lhaindex module
Created on Fri Jan 23 12:11:23 2015
@author: zah
- validphys.lhaindex.as_from_name(name)[source]
Annoying function needed because this is not in the info files. as(M_z) there is actually as(M_ref).
- validphys.lhaindex.expand_names(globstr)[source]
Return names of installed PDFs. If none is found, return names from index
- validphys.lhaindex.get_lha_datapath()[source]
Return an existing datapath from LHAPDF, starting from the end. If no path is found to exist, recover the old behaviour and returns the last path.
The check for existence intends to solve problems where a previously filled LHAPATH or LHAPDF_DATA_PATH environment variable is pointing to a non-existent path or shared systems where LHAPDF might be compiled with hard-coded paths not available to all users.
validphys.lhapdf_compatibility module
Module for LHAPDF compatibility backends
If LHAPDF is installed, the module will transparently hand over everything to LHAPDF if LHAPDF is not available, it will try to use a combination of the packages
lhapdf-management and pdfflow
which cover all the features of LHAPDF used during the fit (and likely most of validphys)
- validphys.lhapdf_compatibility.make_pdf(pdf_name, member=None)[source]
Load a PDF if member is given, load the single member otherwise, load the entire set as a list
if LHAPDF is provided, it returns LHAPDF PDF instances otherwise it returns and object which is _compatible_ with LHAPDF for lhapdf functions for the selected backend
Parameters:
- pdf_name: str
name of the PDF to load
- member: int
index of the member of the PDF to load
Returns:
list(pdf_sets)
validphys.lhapdfset module
Module containing an LHAPDF class compatible with validphys using the official lhapdf python interface.
The .members
and .central_member
of the LHAPDFSet
are
LHAPDF objects (the typical output from mkPDFs
) and can be used normally.
Examples
>>> from validphys.lhapdfset import LHAPDFSet
>>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas")
>>> len(pdf.members)
101
>>> pdf.central_member.alphasQ(91.19)
0.11800
>>> pdf.members[0].xfxQ2(0.5, 15625)
{-5: 6.983360500601136e-05,
-4: 0.0021818063617227604,
-3: 0.00172453472243952,
-2: 0.0010906577230485718,
-1: 0.0022049272225017286,
1: 0.020051104853608722,
2: 0.0954139944889494,
3: 0.004116641378803191,
4: 0.002180124185625795,
5: 6.922722705177504e-05,
21: 0.007604124516892057}
- class validphys.lhapdfset.LHAPDFSet(name, error_type)[source]
Bases:
object
Wrapper for the lhapdf python interface.
Once instantiated this class will load the PDF set from LHAPDF. If it is a T0 set only the CV will be loaded.
- property central_member
Returns a reference to member 0 of the PDF list
- property flavors
Returns the list of accepted flavors by the LHAPDF set
- grid_values(flavors: ndarray, xgrid: ndarray, qgrid: ndarray)[source]
Returns the PDF values for every member for the required flavours, points in x and pointx in q The return shape is
(members, flavors, xgrid, qgrid)
- Return type:
ndarray of shape (members, flavors, xgrid, qgrid)
Examples
>>> import numpy as np >>> from validphys.lhapdfset import LHAPDFSet >>> pdf = LHAPDFSet("NNPDF40_nnlo_as_01180", "replicas") >>> xgrid = np.random.rand(10) >>> qgrid = np.random.rand(3) >>> flavs = np.arange(-4,4) >>> flavs[4] = 21 >>> results = pdf.grid_values(flavs, xgrid, qgrid)
- property is_t0
Check whether we are in t0 mode
- property members
Return the members of the set the special error type t0 returns only member 0
- property n_members
Return the number of active members in the PDF set
validphys.lhio module
A module that reads and writes LHAPDF grids.
- validphys.lhio.big_matrix(gridlist)[source]
Return a properly indexes matrix of the differences between each member and the central value
- validphys.lhio.generate_replica0(pdf, kin_grids=None, extra_fields=None)[source]
- Generates a replica 0 as an average over an existing set of LHAPDF
replicas and outputs it to the PDF’s parent folder
- Parameters:
pdf (validphys.core.PDF) – An existing validphys PDF object from which the average replica will be (re-)computed
kin_grids (Grids in (x,Q) used to print replica0 upon. If None, the grids) – of the source replicas are used.
- validphys.lhio.hessian_from_lincomb(pdf, V, set_name=None, folder=None, extra_fields=None)[source]
Construct a new LHAPDF grid from a linear combination of members
- validphys.lhio.new_pdf_from_indexes(pdf, indexes, set_name=None, folder=None, extra_fields=None, installgrid=False, use_rep0grid=False)[source]
Create a new PDF set from by selecting replicas from another one.
- Parameters:
pdf (validphys.core.PDF) – An existng validphys PDF object from which the indexes will be selected.
indexes (Iterable[int]) – An iterable with integers corresponding to files in the LHAPDF set. Note that replica 0 will be calculated for you as the mean of the selected replicas.
set_name (str) – The name of the new PDF set.
folder (str, bytes, os.PathLike) – The path where the LHAPDF set will be written. Must exsist.
installgrid (bool, optional, default=``False``.) – Whether to copy the grid to the LHAPDF path.
use_rep0grid (bool, optional, default=``False``) – Whether to fill the original replica 0 grid when computing replica 0, instead of relying that all grids are the same and averaging the files directly. It is slower and will call LHAPDF to fill the grids, but works for sets where the replicas have different grids.
validphys.loader module
Resolve paths to useful objects, and query the existence of different resources within the specified paths.
- exception validphys.loader.CfactorNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.CompoundNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.CutsNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.DataNotFoundError[source]
Bases:
LoadFailedError
- exception validphys.loader.EkoNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.FKTableNotFound[source]
Bases:
LoadFailedError
- class validphys.loader.FallbackLoader(profile=None)[source]
Bases:
Loader
,RemoteLoader
A loader that first tries to find resources locally (calling Loader.check_*) and if it fails, it tries to download them (calling RemoteLoader.download_*).
- exception validphys.loader.FitNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.HyperscanNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.InconsistentMetaDataError[source]
Bases:
LoaderError
- exception validphys.loader.LoadFailedError[source]
Bases:
FileNotFoundError
,LoaderError
- class validphys.loader.Loader(profile=None)[source]
Bases:
LoaderBase
Load various resources from the NNPDF data path.
- property available_datasets
Provide all available datasets that were available before the new commondata was implemented and that have a translation. Returns old names
TODO: This should be substituted by a subset of implemented_dataset that returns only complete datasets.
- property available_ekos
Return a string token for each of the available theories
- property available_fits
- property available_hyperscans
- property available_pdfs
- property available_theories
Return a string token for each of the available theories
- check_commondata(setname, sysnum=None, use_fitcommondata=False, fit=None, variant=None)[source]
Prepare the commondata files to be loaded. A commondata is defined by its name (
setname
) and the variant (variant
)At the moment both old-format and new-format commondata can be utilized and loaded however old-format commondata are deprecated and will be removed in future relases.
The function
parse_dataset_input
inconfig.py
translates all known old commondata into their new names (and variants), therefore this function should only receive requestes for new format.Any actions trying to requests an old-format commondata from this function will log an error message. This error message will eventually become an actual error.
- check_dataset(name, *, rules=None, sysnum=None, theoryid, cfac=(), frac=1, cuts=CutsPolicy.INTERNAL, use_fitcommondata=False, fit=None, weight=1, variant=None)[source]
Loads a given dataset If the dataset contains new-type fktables, use the pineappl loading function, otherwise fallback to legacy
- check_eko(theoryID)[source]
Check the eko (and the parent theory) both exists and returns the path to it
- check_experiment(name: str, datasets: list[DataSetSpec]) DataGroupSpec [source]
Loader method for instantiating DataGroupSpec objects. The NNPDF::Experiment object can then be instantiated using the load method.
- Parameters:
name (str) – A string denoting the name of the resulting DataGroupSpec object.
dataset (List[DataSetSpec]) – A list of DataSetSpec objects pre-created by the user. Note, these too will be loaded by Loader.
- Return type:
Example
>>> from validphys.loader import Loader >>> l = Loader() >>> ds = l.check_dataset("NMC", theoryid=53, cuts="internal") >>> exp = l.check_experiment("My DataGroupSpec Name", [ds])
- check_fk_from_theory_metadata(theory_metadata, theoryID, cfac=None)[source]
Load a pineappl fktable in the new commondata forma Receives a theory metadata describing the fktables necessary for a given observable the theory ID and the corresponding cfactors. The cfactors should correspond directly to the fktables, the “compound folder” is not supported for pineappl theories. As such, the name of the cfactor is expected to be
CF_{cfactor_name}_{fktable_name}
- check_vp_output_file(filename, extra_paths=('.',))[source]
Find a file in the vp-cache folder, or (with higher priority) in the
extra_paths
.
- property commondata_folder
- property implemented_datasets
Provide all implemented datasets that can be found in the datafiles folder regardless of whether they can be used for fits (i.e., whether they include a theory), are “fake” (integrability/positivity) or are missing some information.
- property theorydb_folder
Checks theory db file exists and returns path to it
- class validphys.loader.LoaderBase(profile=None)[source]
Bases:
object
Base class for the NNPDF loader. It can take as input a profile dictionary from which all data can be read. It is possible to override the datapath and resultpath when the class is instantiated.
- property hyperscan_resultpath
- exception validphys.loader.PDFNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.ProfileNotFound[source]
Bases:
LoadFailedError
- class validphys.loader.RemoteLoader(profile=None)[source]
Bases:
LoaderBase
- download_hyperscan(hyperscan_name)[source]
Download a hyperscan run from the remote server Downloads the run to the results folder
- property downloadable_ekos
- property downloadable_fits
- property downloadable_hyperscans
- property downloadable_pdfs
- property downloadable_theories
- property eko_index
- property eko_urls
- property fit_index
- property fit_urls
- property hyperscan_index
- property hyperscan_url
- property lhapdf_pdfs
- property lhapdf_urls
- property nnpdf_pdfs
- property nnpdf_pdfs_index
- property nnpdf_pdfs_urls
- property remote_ekos
- property remote_fits
- property remote_hyperscans
- property remote_keywords
- property remote_nnpdf_pdfs
- property remote_theories
- property theory_index
- property theory_urls
- exception validphys.loader.RemoteLoaderError[source]
Bases:
LoaderError
- exception validphys.loader.SysNotFoundError[source]
Bases:
LoadFailedError
- exception validphys.loader.TheoryDataBaseNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.TheoryMetadataNotFound[source]
Bases:
LoadFailedError
- exception validphys.loader.TheoryNotFound[source]
Bases:
LoadFailedError
validphys.mc2hessian module
mc2hessian.py
This module containts the functionality to compute reduced set using the mc2hessian algorithm (See section 2.1 of of 1602.00005).
- validphys.mc2hessian.gridname(pdf, Neig, mc2hname: (<class 'str'>, <class 'NoneType'>) = None)[source]
If no custom `mc2hname’ is specified, the name of the Hessian PDF is automatically generated.
- validphys.mc2hessian.mc2hessian(pdf, Q, Neig: int, mc2hessian_xgrid, output_path, gridname, installgrid: bool = False)[source]
Produces a Hessian PDF by transfroming a Monte Carlo PDF set.
- Parameters:
pdf (validphys.core.PDF) – An existng validphys PDF object which will be converted into a Hessian PDF set
Q (float) – Energy scale at which the Monte Carlo PDF is sampled
Neig (int) – Number of basis eigenvectors in the Hessian PDF set
mc2hessian_xgrid (numpy.ndarray) – The points in x at which to sample the Monte Carlo PDF set
path (output) – The validphys output path where the PDF will be written
gridname (str) – Name of the Hessian PDF set
installgrid (bool, optional, default=``False``) – Whether to copyt the Hessian grid to the LHAPDF path
- validphys.mc2hessian.mc2hessian_xgrid(xmin: float = 1e-05, xminlin: float = 0.1, xmax: Real = 1, nplog: int = 50, nplin: int = 50)[source]
Provides the points in x to sample the PDF. logspace and linspace will be called with the respsctive parameters.
Generates a grid with
nplog
logarithmically spaced points betweenxmin
andxminlin
followed bynplin
linearly spaced points betweenxminlin
andxmax
validphys.mc_gen module
mc_gen.py
Tools to check the pseudo-data MC generation.
- validphys.mc_gen.art_data_comparison(art_rep_generation, nreplica: int)[source]
Plots per datapoint of the distribution of replica values.
- validphys.mc_gen.art_data_distribution(art_rep_generation, title='Artificial Data Distribution', color='green')[source]
Plot of the distribution of pseudodata.
- validphys.mc_gen.art_data_mean_table(art_rep_generation, groups_data)[source]
Generate table for artdata mean values
- validphys.mc_gen.art_data_moments(art_rep_generation, color='green')[source]
Returns the moments of the distributions per data point, as a histogram.
- validphys.mc_gen.art_data_residuals(art_rep_generation, color='green')[source]
Plot the residuals distribution of pseudodata compared to experiment.
validphys.n3fit_data module
n3fit_data.py
Providers which prepare the data ready for
n3fit.performfit.performfit()
.
- validphys.n3fit_data.fittable_datasets_masked(data, tr_masks)[source]
Generate a list of
validphys.n3fit_data_utils.FittableDataSet
from a group of dataset and the corresponding training/validation masks
- validphys.n3fit_data.fitting_data_dict(data, make_replica, dataset_inputs_loaded_cd_with_cuts, dataset_inputs_fitting_covmat, tr_masks, kfold_masks, fittable_datasets_masked, diagonal_basis=None)[source]
Provider which takes the information from validphys
data
.- Returns:
all_dict_out – Containing all the information of the experiment/dataset for training, validation and experimental With the following keys:
- ’datasets’
list of dictionaries for each of the datasets contained in
data
- ’name’
name of the
data
- typically experiment/group name- ’expdata_true’
non-replica data
- ’covmat’
full covmat
- ’invcovmat_true’
inverse of the covmat (non-replica)
- ’trmask’
mask for the training data
- ’invcovmat’
inverse of the covmat for the training data
- ’ndata’
number of datapoints for the training data
- ’expdata’
experimental data (replica’d) for training
- ’vlmask’
(same as above for validation)
- ’invcovmat_vl’
(same as above for validation)
- ’ndata_vl’
(same as above for validation)
- ’expdata_vl’
(same as above for validation)
- ’positivity’
bool - is this a positivity set?
- ’count_chi2’
should this be counted towards the chi2
- Return type:
- validphys.n3fit_data.integdatasets_fitting_integ_dict(integdatasets=None)[source]
Loads the integrability datasets. Calls same function as
fitting_pos_dict()
, except on each element ofintegdatasets
ifintegdatasets
is not None.- Parameters:
integdatasets (list[validphys.core.IntegrabilitySetSpec]) – list containing the settings for the integrability sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to
dataset_input
.
Examples
>>> from validphys.api import API >>> integdatasets = [{"dataset": "INTEGXT3", "maxlambda": 1e2}] >>> res = API.integdatasets_fitting_integ_dict(integdatasets=integdatasets, theoryid=53) >>> len(res), len(res[0]) (1, 9) >>> res = API.integdatasets_fitting_integ_dict(integdatasets=None) >>> print(res) None
- validphys.n3fit_data.kfold_masks(kpartitions, data)[source]
Collect the masks (if any) due to kfolding for this data. These will be applied to the experimental data before starting the training of each fold.
- Parameters:
kpartitions (list[dict]) – list of partitions, each partition dictionary with key-value pair datasets and a list containing the names of all datasets in that partition. See n3fit/runcards/Basic_hyperopt.yml for an example runcard or the hyperopt documentation for an expanded discussion on k-fold partitions.
data (validphys.core.DataGroupSpec) – full list of data which is to be partitioned.
- Returns:
kfold_masks – A list containing a boolean array for each partition. Each array is a 1-D boolean array with length equal to the number of cut datapoints in
data
. If a dataset is included in a particular fold then the mask will be True for the elements corresponding to those datasets such that data.load().get_cv()[kfold_masks[i]] will return the datapoints in the ith partition. See example below.- Return type:
list[np.array]
Examples
>>> from validphys.api import API >>> partitions=[ ... {"datasets": ["HERACOMBCCEM", "HERACOMBNCEP460", "NMC", "NTVNBDMNFe"]}, ... {"datasets": ["HERACOMBCCEP", "HERACOMBNCEP575", "NMCPD", "NTVNUDMNFe"]} ... ] >>> ds_inputs = [{"dataset": ds} for part in partitions for ds in part["datasets"]] >>> kfold_masks = API.kfold_masks(dataset_inputs=ds_inputs, kpartitions=partitions, theoryid=53, use_cuts="nocuts") >>> len(kfold_masks) # one element for each partition 2 >>> kfold_masks[0] # mask which splits data into first partition array([False, False, False, ..., True, True, True]) >>> data = API.data(dataset_inputs=ds_inputs, theoryid=53, use_cuts="nocuts") >>> fold_data = data.load().get_cv()[kfold_masks[0]] >>> len(fold_data) 604 >>> kfold_masks[0].sum() 604
- validphys.n3fit_data.posdatasets_fitting_pos_dict(posdatasets=None)[source]
Loads all positivity datasets. It is not allowed to be empty.
- Parameters:
integdatasets (list[validphys.core.PositivitySetSpec]) – list containing the settings for the positivity sets. Examples of these can be found in the runcards located in n3fit/runcards. They have a format similar to
dataset_input
.
- validphys.n3fit_data.pseudodata_table(groups_replicas_indexed_make_replica, replicas)[source]
Creates a pandas DataFrame containing the generated pseudodata. The index is
validphys.results.experiments_index()
and the columns are the replica numbers.Notes
Whilst running
n3fit
, this action will only be called if fitting::savepseudodata is true (as per the default setting) and replicas are fitted one at a time. The table can be found in the replica folder i.e. <fit dir>/nnfit/replica_*/
- validphys.n3fit_data.replica_luxseed(replica, luxseed)[source]
Generate the
luxseed
for areplica
. Identical to replica_nnseed but used for a different purpose.
- validphys.n3fit_data.replica_mcseed(replica, mcseed, genrep)[source]
Generates the
mcseed
for areplica
.
- validphys.n3fit_data.replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nnseed)[source]
For a single replica return a tuple of the inputs to this function. Used with collect over replicas to avoid having to perform multiple collects.
See also
replicas_nnseed_fitting_data_dict
,over
- validphys.n3fit_data.replica_training_mask(exps_tr_masks, replica, experiments_index)[source]
Save the boolean mask used to split data into training and validation for a given replica as a pandas DataFrame, indexed by
validphys.results.experiments_index()
. Can be used to reconstruct the training and validation data used in a fit.- Parameters:
exps_tr_masks (list[list[np.array]]) – Result of
tr_masks()
collected over experiments, which creates the nested structure. The outer list is len(group_dataset_inputs_by_experiment) and the inner-most list has an array for each dataset in that particular experiment - as defined by the metadata. The arrays should be 1-D boolean arrays which can be used as masks.replica (int) – The index of the replica.
experiments_index (pd.MultiIndex) – Index returned by
validphys.results.experiments_index()
.
Example
>>> from validphys.api import API >>> ds_inp = [ ... {'dataset': 'NMC', 'frac': 0.75}, ... {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75}, ... {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75} ... ] >>> API.replica_training_mask(dataset_inputs=ds_inp, replica=1, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False) replica 1 group dataset id NMC NMC 0 True 1 True 2 False 3 True 4 True ... ... CMS CMSZDIFF12 45 True 46 True 47 True 48 False 49 True
[345 rows x 1 columns]
- validphys.n3fit_data.replica_training_mask_table(replica_training_mask)[source]
Same as
replica_training_mask
but with a table decorator.
- validphys.n3fit_data.replica_trvlseed(replica, trvlseed, same_trvl_per_replica=False)[source]
Generates the
trvlseed
for areplica
.
- validphys.n3fit_data.tr_masks(data, replica_trvlseed, parallel_models=False, replica=1, replicas=(1,))[source]
Generate the boolean masks used to split data into training and validation points. Returns a list of 1-D boolean arrays, one for each dataset. Each array has length equal to N_data, the datapoints which will be included in the training are
True
such thattr_data = data[tr_mask]
- validphys.n3fit_data.training_mask(replicas_training_mask)[source]
Save the boolean mask used to split data into training and validation for each replica as a pandas DataFrame, indexed by
validphys.results.experiments_index()
. Can be used to reconstruct the training and validation data used in a fit.- Parameters:
replicas_exps_tr_masks (list[list[list[np.array]]]) – Result of
replica_tr_masks()
collected over replicas
Example
>>> from validphys.api import API >>> from reportengine.namespaces import NSList >>> # create namespace list for collects over replicas. >>> reps = NSList(list(range(1, 4)), nskey="replica") >>> ds_inp = [ ... {'dataset': 'NMC', 'frac': 0.75}, ... {'dataset': 'ATLASTTBARTOT', 'cfac':['QCD'], 'frac': 0.75}, ... {'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10, 'frac': 0.75} ... ] >>> API.training_mask(dataset_inputs=ds_inp, replicas=reps, trvlseed=123, theoryid=162, use_cuts="nocuts", mcseed=None, genrep=False) replica 1 replica 2 replica 3 group dataset id NMC NMC 0 True False False 1 True True True 2 False True True 3 True True False 4 True True True ... ... ... ... CMS CMSZDIFF12 45 True True True 46 True False True 47 True True True 48 False True True 49 True True True
[345 rows x 3 columns]
- validphys.n3fit_data.training_mask_table(training_mask)[source]
Same as
training_mask
but with a table decorator
validphys.n3fit_data_utils module
n3fit_data_utils.py
This module reads validphys validphys.core.DataSetSpec
and extracts the relevant information into validphys.n3fit_data_utils.FittableDataSet
The validphys_group_extractor
will loop over every dataset of a given group
loading their fktables (and applying any necessary cuts).
- class validphys.n3fit_data_utils.FittableDataSet(name: str, fktables_data: list, operation: str = 'NULL', frac: float = 1.0, training_mask: ndarray | None = None)[source]
Bases:
object
Representation of the DataSet information necessary to run a fit
- Parameters:
name (str) – name of the dataset
fktables_data (list(
validphys.coredata.FKTableData
)) – list of coredata fktable objectsoperation (str) – operation to be applied to the fktables in the dataset, default “NULL”
frac (float) – fraction of the data to enter the training set
training_mask (bool) – training mask to apply to the fktable
- property hadronic
Returns true if this is a hadronic collision dataset
- property ndata
Number of datapoints in the dataset
- training_mask: ndarray = None
- validphys.n3fit_data_utils.validphys_group_extractor(datasets, tr_masks)[source]
Receives a grouping spec from validphys (most likely an experiment) and loops over its content extracting and parsing all information required for the fit
- Parameters:
datasets (list(
validphys.core.DataSetSpec
)) – List of dataset specs in this grouptr_masks (list(np.array)) – List of training masks to be set for each dataset
- Returns:
loaded_obs
- Return type:
validphys.overfit_metric module
overfit_metric.py
This module contains the functions used to calculate the overfit metric and produce the corresponding tables and figures.
- validphys.overfit_metric.array_expected_overfitting(calculate_chi2s_per_replica, replica_data, number_of_resamples=1000, resampling_fraction=0.95)[source]
Calculates the expected difference in chi2 between: 1. The chi2 of a PDF replica calculated using the corresponding pseudodata
replica used during the fit
- The chi2 of a PDF replica calculated using an alternative i.i.d random
pseudododata replicas
The expected difference along with an error estimate is obtained through a bootstrapping consisting of
number_of_resamples
resamples per pdf replica where each resampling contains a fractionresampling_fraction
of all replicas.- Parameters:
calculate_chi2s_per_replica (np.ndarray) – validation chi2 per pdf replica
replica_data (list(vp.fitdata.FitInfo))
number_of_resamples (int, optional) – number of resamples per pdf replica, by default 1000
resampling_fraction (float, optional) – fraction of replicas used in the bootstrap resampling, by default 0.95
- Returns:
(number_of_resamples*Npdfs,) sized array containing the mean delta chi2 values per resampled list.
- Return type:
np.ndarray
- validphys.overfit_metric.calculate_chi2s_per_replica(pdf, fit_code_version, recreate_pdf_pseudodata_no_table, preds, dataset_inputs, groups_covmat_no_table)[source]
Calculates, for each PDF replica, the chi2 of the validation with the pseudodata generated for all other replicas in the fit
- Parameters:
recreate_pdf_pseudodata_no_table (list[namedtuple]) – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.
preds (list[pd.core.frame.DataFrame]) – List of pandas dataframes, each containing the predictions of the pdf replicas for a dataset_input
dataset_inputs (list[DatasetInput])
groups_covmat_no_table (pdf.core.frame.DataFrame)
- Returns:
(Npdfs, Npdfs) sized matrix containing the chi2 of a pdf replica calculated to a given psuedodata replica. The diagonal values correspond to the cases where the PDF replica has been fitted to the coresponding pseudodata replica
- Return type:
np.ndarray
- validphys.overfit_metric.fit_overfitting_summary(fit, array_expected_overfitting)[source]
Creates a table containing the overfitting information: - mean chi2 difference - bootstrap error - sigmas away from 0
validphys.pdfbases module
pdfbases.py
This holds the concrete labels data relative to the PDF bases, as declaratively as possible.
- class validphys.pdfbases.Basis(labels, *, aliases=None, default_elements=None, element_representations=None)[source]
Bases:
ABC
A Basis maps a set of PDF flavours (typically as given by LHAPDF) to functions thereof. This abstract class provides functionalities to manage labels (used for plotting) and defaults, while the concrete implementation of the transformations is handled by the subclasses (by implementing the
validphys.pdfbases.Basis.apply_grid_values()
method). The high levelvalidphys.pdfbases.Basis.grid_values()
andvalidphys.pdfbases.Basis.central_grid_values()
methods then provide convenient functionality to work with transformations.- labels
A list of strings representing the labels of each possible transformation, in order.
- Type:
- aliases
A mapping from strings to labels appearing in
labels
, specifying equivalent ways to enter elements in the user interface.- Type:
dict, optional
- default_elements
A list of the labels to be computed by default when no subset of elements is specified. If not given it is assumed to be the same as
labels
.- Type:
list, optional
- element_representations
A mapping from strings to labels indicating the preferred string representation of the provided elements (to be used in plotting). If this parameter is not given or the element is not in the mapping, the label itself is used. It may be convenient to set this when heavy use of LaTeX is desired.
- Type:
dict, optional
- abstract apply_grid_values(func, vmat, xmat, qmat)[source]
Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to
func
and implements the transformation from the flavour basis to the basis. Methods likevalidphys.pdfbases.Basis.grid_values()
andvalidphys.pdfbases.Basis.central_grid_values()
are derived from this method by selecting the appropriatefunc
.It should return an array indexed as
grid_values[N][flavour][x][Q]
- Parameters:
func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.
- central_grid_values(pdf, vmat, xmat, qmat)[source]
Same as
Basis.grid_values()
but returning information on the central member of the PDF set.
- elementlabel(element)[source]
Return the printable representation of a given element of this basis.
- grid_values(pdf, vmat, xmat, qmat)[source]
Like
validphys.gridvalues.grid_values()
, but taking and returning vmat in terms of the vectors in this base.- Parameters:
pdf (PDF) – Any PDF set
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.
- Returns:
grid – A 4-dimension array with the PDF values at the input parameters for each replica. The return value is indexed as follows:
grid_values[replica][flavour][x][Q]
- Return type:
np.ndarray
Examples
Compute the median ratio over replicas between singlet and gluon for a fixed point in x and a range of values in Q:
>>> import numpy as np >>> from validphys.loader import Loader >>> from validphys.pdfbases import evolution >>> gv = evolution.grid_values(Loader().check_pdf("NNPDF31_nnlo_as_0118"), ["singlet", "gluon"], [0.01], [2,20,200]) >>> np.median(gv[:,0,...]/gv[:,1,...], axis=0) array([[0.56694959, 0.53782002, 0.60348812]])
- class validphys.pdfbases.LinearBasis(labels, from_flavour_mat, *args, **kwargs)[source]
Bases:
Basis
A basis that implements a linear transformation of flavours.
- from_flavour_mat
A matrix that rotates the flavour basis into this basis.
- Type:
np.ndarray
- apply_grid_values(func, vmat, xmat, qmat)[source]
Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to
func
and implements the transformation from the flavour basis to the basis. Methods likevalidphys.pdfbases.Basis.grid_values()
andvalidphys.pdfbases.Basis.central_grid_values()
are derived from this method by selecting the appropriatefunc
.It should return an array indexed as
grid_values[N][flavour][x][Q]
- Parameters:
func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.
- class validphys.pdfbases.ScalarFunctionTransformation(transform_func, *args, **kwargs)[source]
Bases:
Basis
A basis that transforms the flavour basis into a single element given by
transform_func
.Optional keyword arguments are passed to the constructor of
validphys.pdfbases.Basis
.- transform_func
A callable with the signature
transform_func(func, xmat, qmat)
that fills the grid in \(x\) and \(Q\) usingfunc
and returns a grid with a single basis element.- Type:
callable
- apply_grid_values(func, vmat, xmat, qmat)[source]
Abstract method to implement basis transformations. It outsources the filling of the grid in the flavour basis to
func
and implements the transformation from the flavour basis to the basis. Methods likevalidphys.pdfbases.Basis.grid_values()
andvalidphys.pdfbases.Basis.central_grid_values()
are derived from this method by selecting the appropriatefunc
.It should return an array indexed as
grid_values[N][flavour][x][Q]
- Parameters:
func (callable) – A function that fills the grid defined by the rest of the input with elements in the flavour basis.
vmat (iterable) – A list of flavour aliases valid for the basis.
xmat (iterable) – A list of x values
qmat (iterable) – A list of values in Q, expressed in GeV.
- validphys.pdfbases.check_basis(basis, flavours)[source]
Check to verify a given basis and set of flavours. Returns a dictionary with the relevant instance of the basis class and flavour specification
- validphys.pdfbases.fitbasis_to_NN31IC(flav_info, fitbasis)[source]
Return a rotation matrix R_{ij} which takes from one of the possible fitting basis (evolution, NN31IC, FLAVOUR) to the NN31IC basis, (sigma, g, v, v3, v8, t3, t8, cp), corresponding to the one used in NNPDF31. Denoting the rotation matrix as R_{ij} i is the flavour index and j is the evolution index. The evolution basis (NN31IC) is defined as cp = c + cbar = 2c and sigma = u + ubar + d + dbar + s + sbar + cp v = u - ubar + d - dbar + s - sbar + c - cbar v3 = u - ubar - d + dbar v8 = u - ubar + d - dbar - 2*s + 2*sbar t3 = u + ubar - d - dbar t8 = u + ubar + d + dbar - 2*s - 2*sbar
If the input is already in the evolution basis it returns the identity.
- validphys.pdfbases.parse_flarr(flarr)[source]
Parse a free form list into a list of PDG parton indexes (that may contain indexes or values from PDF_ALIASES)
- validphys.pdfbases.pdg_id_to_canonical_index(flindex)[source]
Given an LHAPDF id, return its index in the ALL_FLAVOURS list.
- validphys.pdfbases.scalar_function_transformation(label, *args, **kwargs)[source]
Convenience decorator factory to produce a
validphys.pdfbases.ScalarFunctionTransformation
basis from a function.- Parameters:
label (str) – The single label of the element produced by the function transformation.
Notes
Optional keyword arguments are passed to the constructor of
validphys.pdfbases.ScalarFunctionTransformation
.- Returns:
decorator – A decorator that can be applied to a suitable transformation function.
- Return type:
callable
validphys.pdfgrids module
High level providers for PDF and luminosity grids, formatted in such a way to facilitate plotting and analysis.
- class validphys.pdfgrids.KineticXPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]
Bases:
XPlottingGrid
Kinetic Energy version of the XPlottingGrid
- class validphys.pdfgrids.Lumi1dGrid(m, grid_values)
Bases:
tuple
- grid_values
Alias for field number 1
- m
Alias for field number 0
- class validphys.pdfgrids.Lumi2dGrid(y, m, grid_values)
Bases:
tuple
- grid_values
Alias for field number 2
- m
Alias for field number 1
- y
Alias for field number 0
- class validphys.pdfgrids.XPlottingGrid(Q: float, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>), flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>), xgrid: ~numpy.ndarray, grid_values: ~validphys.core.Stats, scale: str, derivative_degree: int = 0)[source]
Bases:
object
DataClass holding the value of the PDF at the specified values of x, Q and flavour. The grid_values attribute corresponds to a Stats instance in order to compute statistical estimators in a sensible manner.
- basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>)
- copy_grid(grid_values)[source]
Create a copy of the grid with potentially a different set of values
- derivative()[source]
Return the derivative of the grid with respect to dlogx A call to this function will return a new
XPlottingGrid
instance with the derivative as grid values and with an increasedderivative_degree
- flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>)
- process_label(base_label)[source]
Process the base_label used for plotting. For instance, for derivatives it will add d/dlogx to the base_label.
- xgrid: ndarray
- validphys.pdfgrids.boundary_xplotting_grid(unpolarized_bc: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
A wrapper around xplotting_grid to compute instead unpolarized_bcs.
- validphys.pdfgrids.distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]
Return an object containing the value of the distance PDF at the specified values of x and flavour.
The parameter
normalize_to
identifies the reference PDF set with respect to the distance is computed.This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.
- validphys.pdfgrids.kinetic_xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None)[source]
Returns an object containing the value of the kinetic energy of the PDF at the specified values of x and flavour for a given Q. Utilizes
xplotting_grid
The kinetic energy of the PDF is defined as:\[k = \sqrt{1 + (d/dlogx f)^2}\]
- validphys.pdfgrids.lumigrid1d(pdf: ~validphys.core.PDF, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'NoneType'>, <class 'numbers.Real'>) = None, nbins_m: int = 50, mxmin: ~numbers.Real = 10, mxmax: (<class 'NoneType'>, <class 'numbers.Real'>) = None, scale='log')[source]
Return the integrated luminosity in a grid of nbins_m points, for the values of invariant mass given (proton-proton) collider energy
sqrts
(given in GeV). A rapidity cut on the integration range (if specified) is taken into account.By default, the grid is sampled logarithmically in mass. The limits are given by
mxmin
andmxmax
, given in GeV. By defaultmxmin
is 10 GeV andmxmax
is set based onsqrts
.The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.
- validphys.pdfgrids.lumigrid2d(pdf: PDF, lumi_channel, sqrts: Real, y_lim: Real = 5, nbins_m: int = 100, nbins_y: int = 50)[source]
Return the differential luminosity in a grid of (nbins_m x nbins_y) points, for the allowed values of invariant mass and rpidity for given (proton-proton) collider energy
sqrts
(given in GeV).y_lim
specifies the maximum rapidy.The grid is sampled linearly in rapidity and logarithmically in mass.
The results are computed for all relevant PDF members and wrapped in a stats class, to compute statistics regardless of the error_type.
- validphys.pdfgrids.pull_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]
Return an object containing the value of the pull between the two PDFs at the specified values of x and flavour. The parameter
normalize_to
identifies the reference PDF set with respect to the pull is computed. This method returns pull grids where the relative pull between both PDF sets, defined as the distance in terms of the standard deviations of the reference PDF, is computed. At least one grid will be identical to zero.
- validphys.pdfgrids.variance_distance_grids(pdfs, xplotting_grids, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None)[source]
Return an object containing the value of the variance distance PDF at the specified values of x and flavour.
The parameter
normalize_to
identifies the reference PDF set with respect to the distance is computed.This method returns distance grids where the relative distance between both PDF set is computed. At least one grid will be identical to zero.
- validphys.pdfgrids.xgrid(xmin: Real = 1e-05, xmax: Real = 1, scale: str = 'log', npoints: int = 200)[source]
Return a tuple
(scale, array)
wherescale
is the input scale (“linear” or “log”) andarray
is generated from the input parameters and distributed according to scale.
- validphys.pdfgrids.xplotting_grid(pdf: ~validphys.core.PDF, Q: (<class 'float'>, <class 'int'>), xgrid=None, basis: (<class 'str'>, <class 'validphys.pdfbases.Basis'>) = 'flavour', flavours: (<class 'list'>, <class 'tuple'>, <class 'NoneType'>) = None, derivative: int = 0)[source]
Return an object containing the value of the PDF at the specified values of x and flavour.
basis: Is one of the bases defined in pdfbases.py. This includes ‘flavour’ and ‘evolution’.
flavours: A set of elements from the basis. If None, the defaults for that basis will be selected.
Q: The PDF scale in GeV.
derivative (int): how many derivtives of the PDF should be taken (default=0)
validphys.pdfoutput module
pdfoutput.py
reportengine helpers to enable outputing PDFs.
This module provides one decorator, pdfset
that is used to mark a provider
as generating a PDF set. The providers must take a set_name
and an
output_path
argument. set_name
will be required to be a unique string
that does not correspond to any installed LHAPDF grid, and output_path
will
be modified to actually correspond to <output>/pdfsets.
Within reportengine, the return value of the sets marked with @pdfset
will
be discarded, and the relative path to the output folder will be used instead.
This can be used to formulate links within the report.
validphys.pdfplots module
pdfplots.py
Plots of quantities that are mostly functions of the PDFs only.
- class validphys.pdfplots.AllFlavoursPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
PDFPlotter
Auxiliary class which groups multiple PDF flavours in one plot.
- class validphys.pdfplots.BandPDFPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]
Bases:
PDFPlotter
- class validphys.pdfplots.BandPDFPlotterBC(*args, unpolarized_bcs, boundary_xplotting_grids, **kwargs)[source]
Bases:
BandPDFPlotter
- class validphys.pdfplots.DistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
PDFPlotter
Auxiliary class which draws the distance plots.
- class validphys.pdfplots.FlavourState[source]
Bases:
SimpleNamespace
This is the namespace for the pats specific for each flavour
- class validphys.pdfplots.FlavoursDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
DistancePDFPlotter
,AllFlavoursPlotter
- class validphys.pdfplots.FlavoursPlotter(*args, pdfs_noband=None, show_mc_errors=True, legend_stat_labels=True, **kwargs)[source]
Bases:
AllFlavoursPlotter
,BandPDFPlotter
- class validphys.pdfplots.FlavoursVarDistancePlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
- class validphys.pdfplots.MixBandPDFPlotter(*args, mixband_as_replicas, **kwargs)[source]
Bases:
BandPDFPlotter
Special wrapper class to plot, in the same figure, PDF bands and PDF replicas depending on the type of PDF. Practical use: plot together the PDF central values with the NNPDF bands
- class validphys.pdfplots.PDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
object
Stateful object breaks plotting grids by favour, as a function of x and for fixed Q.
This class has a lot of state, but it should all be defined at initialization time. Things that change e.g. per flavour should be passed explicitly as arguments.
- property Q
- abstract draw(pdf, grid, flstate)[source]
Plot the desired function of the grid and return the array to be used for autoscaling
- property firstgrid
- property normalize_pdf
- property xscale
- class validphys.pdfplots.PullPDFPlotter(pdfs_list, pull_grids_list, xscale, normalize_to, ymin, ymax)[source]
Bases:
object
Auxiliary class which groups multiple pulls in one plot.
pdfs_list is a list of dictionaries, each containing the two PDFs to be used for the pull. pull_grids_list is the list of the pull computed for the PDF pairs described by pdfs_list.
- property Q
- property xscale
- class validphys.pdfplots.ReplicaPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
PDFPlotter
- class validphys.pdfplots.UncertaintyPDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
PDFPlotter
- class validphys.pdfplots.VarDistancePDFPlotter(pdfs, xplotting_grids, xscale, normalize_to, ymin, ymax)[source]
Bases:
DistancePDFPlotter
Auxiliary class which draws the variance distance plots
- validphys.pdfplots.plot_flavours(pdf, xplotting_grid, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]
Plot the absolute central value and the uncertainty of all the flavours of a pdf as a function of x for a given value of Q.
xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.
- validphys.pdfplots.plot_lumi1d(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, show_mc_errors: bool = True, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, pdfs_noband=None, scale='log', legend_stat_labels: bool = True)[source]
Plot PDF luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).
This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel.
normalize_to
works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs. A list pdfs_noband can be passed to supress the error bands for certain PDFs and plot the central values only. legend_stat_labels controls whether to show detailed information on what kind of confidence interval is being plotted in the legend labels.
- validphys.pdfplots.plot_lumi1d_replicas(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]
This function is similar to plot_lumi1d, but instead of plotting the standard deviation and 68% c.i. it plots the luminosities for individual replicas.
Plot PDF replica luminosities at a given center of mass energy. sqrts is the center of mass energy (GeV).
This action plots the luminosity (as computed by lumigrid1d) as a function of invariant mass for all PDFs for a single lumi channel.
normalize_to
works as for plot_pdfs and allows to plot a ratio to the central value of some of the PDFs. ymin and ymax can be used to set exact bounds for the scale. y_cut can be used to specify a rapidity cut over the integration range. show_mc_errors controls whether the 1σ error bands are shown in addition to the 68% confidence intervals for Monte Carlo PDFs.
- validphys.pdfplots.plot_lumi1d_uncertainties(pdfs, pdfs_lumis, lumi_channel, sqrts: ~numbers.Real, y_cut: (<class 'numbers.Real'>, <class 'NoneType'>) = None, normalize_to=None, ymin: (<class 'numbers.Real'>, <class 'NoneType'>) = None, ymax: (<class 'numbers.Real'>, <class 'NoneType'>) = None, scale='log')[source]
Plot PDF luminosity uncertainties at a given center of mass energy. sqrts is the center of mass energy (GeV).
If normalize_to is set, the values are normalized to the central value of the corresponding PDFs. y_cut can be used to specify a rapidity cut over the integration range.
- validphys.pdfplots.plot_lumi2d(pdf, lumi_channel, lumigrid2d, sqrts, display_negative: bool = True)[source]
Plot the absolute luminosity on a grid of invariant mass and rapidity for a given center of mass energy sqrts. The color scale is logarithmic. If display_negative is True, mark the negative values.
The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.
- validphys.pdfplots.plot_lumi2d_uncertainty(pdf, lumi_channel, lumigrid2d, sqrts: Real)[source]
Plot 2D luminosity unciertainty plot at a given center of mass energy. Porting code from https://github.com/scarrazza/lumi2d.
The luminosity is calculated for positive rapidity, and reflected for negative rapidity for display purposes.
- validphys.pdfplots.plot_pdf_pulls(pdfs_list, pull_grids_list, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]
Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.
- validphys.pdfplots.plot_pdf_uncertainties(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]
Plot the PDF standard deviations as a function of x. If normalize_to is set, the ratio to that PDF’s central value is plotted. Otherwise it is the absolute values.
- validphys.pdfplots.plot_pdfdistances(pdfs, distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]
Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations
- validphys.pdfplots.plot_pdfreplicas(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]
Plot the replicas of the specified PDFs. Otherise it works the same as plot_pdfs.
xscale sets the scale of the plot. E.g. ‘linear’ or ‘log’. Default is
deduced from the xplotting_grid, which in turn is ‘log’ by default.
normalize_to should be, a pdf id or an index of the pdf (starting from one).
- validphys.pdfplots.plot_pdfreplicas_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None)[source]
Plot the kinetic energy of the replicas of the specified PDFs. Otherise it works the same as
plot_pdfs_kinetic_energy
.
- validphys.pdfplots.plot_pdfs(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]
Plot the central value and the uncertainty of a list of pdfs as a function of x for a given value of Q. If normalize_to is given, plot the ratios to the corresponding PDF. Otherwise, plot absolute values. See the help for
xplotting_grid
for information on how to set basis, flavours and x ranges. Yields one figure per PDF flavour.normalize_to: Either the name of one of the PDFs or its corresponding index in the list, starting from one, or None to plot absolute values.
xscale: One of the matplotlib allowed scales. If undefined, it will be set based on the scale in xgrid, which should be used instead.
pdfs_noband: A list of PDFs to plot without error bands, i.e. only the central values of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.
show_mc_errors (bool): Plot 1σ bands in addition to 68% errors for Monte Carlo PDF.
legend_stat_labels (bool): Show detailed information on what kind of confidence interval is being plotted in the legend labels.
- validphys.pdfplots.plot_pdfs_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]
Band plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q. The input of this function is similar to those of
plot_pdfs
.
- validphys.pdfplots.plot_pdfs_mixed(pdfs, xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]
This function is similar to plot_pdfs, except instead of only plotting the central value and the uncertainty of the PDFs, those PDFs indicated by mixband_as_replicas will be plotted as replicas without the central value.
Inputs are the same as plot_pdfs, with the exeption of mixband_as_replicas, which only exists here.
mixband_as_replicas: A list of PDFs to plot as replicas, i.e. the central values and replicas of these PDFs will be plotted. The list can be formed of strings, corresponding to PDF IDs, integers (starting from one), corresponding to the index of the PDF in the list of PDFs, or a mixture of both.
- validphys.pdfplots.plot_pdfs_mixed_kinetic_energy(pdfs, kinetic_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True, mixband_as_replicas: (<class 'list'>, <class 'NoneType'>) = None)[source]
Mixed band and replica plotting of the “kinetic energy” of the PDF as a function of x for a given value of Q.
- validphys.pdfplots.plot_pdfvardistances(pdfs, variance_distance_grids, *, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>), ymin=None, ymax=None)[source]
Plots the distances between different PDF sets and a reference PDF set for all flavours. Distances are normalized such that a value of order 10 is unlikely to be explained by purely statistical fluctuations
- validphys.pdfplots.plot_polarized_boundaries(pdfs, xplotting_grids, unpolarized_bcs, boundary_xplotting_grids, xscale: (<class 'str'>, <class 'NoneType'>) = None, normalize_to: (<class 'int'>, <class 'str'>, <class 'NoneType'>) = None, ymin=None, ymax=None, pdfs_noband: (<class 'list'>, <class 'NoneType'>) = None, show_mc_errors: bool = True, legend_stat_labels: bool = True)[source]
Possess the exact same functionalities as plot_pdfs but for a list of Polarized PDF sets. In addition, it plots the unpolarized PDF set used as a Boundary Condition.
validphys.pineparser module
Loader for the pineappl-based FKTables
The FKTables for pineappl have pineappl.lz4
and can be utilized
directly with the pineappl
cli as well as read with pineappl.fk_table
- exception validphys.pineparser.GridFileNotFound[source]
Bases:
FileNotFoundError
PineAPPL file for FK table not found.
- validphys.pineparser.get_yaml_information(yaml_file, theorypath)[source]
Reads the yaml information from a yaml compound file
Transitional function: the call to “pineko” might be to some other commondata reader that will know how to extract the information from the commondata
- validphys.pineparser.pineappl_reader(fkspec)[source]
Receives a fkspec, which contains the path to the fktables that are to be read by pineappl as well as metadata that fixes things like conversion factors or apfelcomb flag. The fkspec contains also the cfactors which are applied _directly_ to each of the fktables.
The output of this function is an instance of FKTableData which can be generated from reading several FKTable files which get concatenated on the ndata (bin) axis.
- For more information on the reading of pineappl tables:
https://pineappl.readthedocs.io/en/latest/modules/pineappl/pineappl.html#pineappl.pineappl.PyFkTable
- About the reader:
- Each pineappl table is a 4-dimensional grid with:
(ndata, active channels, x1, x2)
for DIS grids x2 will contain one single number. The luminosity channels are given in a (flav1, flav2) format and thus need to be converted to the 1-D index of a (14x14) luminosity tensor in order to put in the form of a dataframe.
All grids in pineappl are constructed with the exact same xgrid, the active channels can vary and so when grids are concatenated for an observable the gaps are filled with 0s.
The pineappl grids are such that obs = sum_{bins} fk * f (*f) * bin_w so in order to use them together with old-style grids (obs = sum_{bins} fk * xf (*xf)) it is necessary to remove the factor of x and the normalization of the bins.
- About apfelcomb flags in yamldb files:
old commondata files and old grids have over time been through various iterations while remaining compatibility between each other, and fixes and hacks have been incorporated in one or another for the new theory to be compatible with old commpondata it is necessary to keep track of said hacks (and to apply conversion factors when required)
NOTE: both conversion factors and apfelcomb flags will be eventually removed.
- Returns:
an FKTableData object containing all necessary information to compute predictions
- Return type:
- validphys.pineparser.pineko_yaml(yaml_file, grids_folder)[source]
Given a yaml_file, returns the corresponding dictionary and grids.
The dictionary contains all information and we return an extra field with all the grids to be loaded for the given dataset.
- Parameters:
yaml_file (pathlib.Path) – path of the yaml file for the given dataset
grids_folder (pathlib.Path) – path of the grids folder
check_grid_existence (bool) – if True (default) checks whether the grid exists
- Returns:
yaml_content (dict) – Metadata prepared for the FKTables
paths (list(list(path))) – List (of lists) with all the grids that will need to be loaded
validphys.plotutils module
Basic utilities for plotting functions.
- validphys.plotutils.HandlerSpec
alias of
HandelrSpec
- validphys.plotutils.add_subplot(figsize=None, projection=None, **kwargs)[source]
matplotlib.figure wrapper used to generate a figure and add a subplot.
Use matplotlib.figure.Figure() objects to avoid importing
pyplot
anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.- Parameters:
figsize (2-tuple of floats) – default is None
projections (The projection type of the subplot (Axes).) – default is None
- Returns:
fig, ax = (matplotlib.figure.Figure, fig.add_subplot)
- Return type:
- validphys.plotutils.ax_or_gca(f)[source]
A decorator. When applied to a function, the keyword argument
ax
will automatically be filled with the current axis, if it was None.
- validphys.plotutils.ax_or_newfig(f)[source]
A decorator. When applied to a function, the keyword argument
ax
will automatically be filled with the a new axis corresponding to an empty, if it was None.
- validphys.plotutils.barplot(values, collabels, datalabels, orientation='auto')[source]
The barplot as matplotlib should have it. It resizes on overflow.
values
should be one or two dimensional and should contain the values for the barplot.collabels
must have as many elements asvalues
has columns (or total elements if it is one dimensional), and contains the labels for each column in the bar plot.datalabels
should have as many elements as values has rows, and contains the labels for the individual items to be compared. Iforientation
is"auto"
, the barplot will be horizontal or vertical depending on the number of items. Otherwise, the orientation can ve fixes as"horizontal"
or"vertical"
.- Parameters:
values (array of dimensions M×N or N.) – The input data.
collabels (Iterable[str] of dimensions N) – The labels for each of the bars.
datalabels (Iterable[str] of dimensions M or 1) – The label for each of the datasets to be compared.
orientation ({'auto', 'horizontal', 'vertical'}, 'optional') – The orientation of the bars.
- Returns:
(fig, ax) – a tuple of a matplotlib figure and an axis, like matplotlib.pyplot.subplots. The axis will have a
_bar_orientation
attribute that will either be ‘horizontal’ or ‘vertical’ and will correspond to the actual orientaion of the plot.- Return type:
Examples
>>> import numpy as np >>> from validphys.plotutils import barplot >>> vals = np.random.rand(2,5) >>> collabels = ["A", "B", "C", "D", "e"] >>> fig, ax = barplot(vals, collabels, ['First try', 'Second try']) >>> ax.legend()
- validphys.plotutils.centered_range(n, value=0, distance=1)[source]
Generte a range of
n
points centered aroundvalue
, unifirmely sampled at intervals ofdistance
.
- validphys.plotutils.color_iter()[source]
Yield the colors in the cycle defined in the matplotlib style. When the colores are exhausted a warning will be logged and the cycle will be repeated infinitely. Therefore this avoids the overflow error at runtime when using matplotlib’s
f'C{i}'
color specification (equivalent tocolors[i]
) wheni>len(colors)
- validphys.plotutils.expand_margin(a, b, proportion)[source]
Return a pair of numbers that have the same mean as
(a,b)
and their distance isproportion
times bigger.
- validphys.plotutils.frame_center(ax, x, values)[source]
Set the ylims of the axis
ax
to appropriately displayvalues
, which can be 1 or 2D and are assumed to be sampled uniformly in the coordinates of the plot (in the second dimension, for 2D arrays).
- validphys.plotutils.hatch_iter()[source]
An infinite iterator that yields increasingly denser patterns of hatches suitable for passing as the
hatch
argument of matplotlib functions.
- validphys.plotutils.kde_plot(a, height=0.05, ax=None, label=None, color=None, max_marks=100000)[source]
Plot a Kernel Density Estimate of a 1D array, togther with individual occurrences .
This plot provides a quick visualizaton of the distribution of one dimensional data in a more complete way than an histogram would. It produces both a Kernel Density Estimate (KDE) and individual occurences of the data (rug plot). The KDE uses a Gaussian Kernel with the Silverman rule to select the bandwidth (this is the optimal choice if the input data is Gaussian). The individual ocurrences are displayed as marks along the bottom axis. For performance reasons, and to avoid cluttering the plot, a maximum of
max_marks
marks are displayed; if the length of the data is bigger, a random sample ofmax_marks
is taken.- Parameters:
a (vector) – 1D array of observations.
height (scalar, optional) – Height of marks in the rug plot as proportion of the axis height.
ax (matplotlib axes, optional) – Axes to draw plot into; otherwise grabs current axes.
label (string, optional) – The label for the legend (note that you have to generate the legend yourself).
color (optional) – A matplotlib color specification, used for both the KDE and the rugplot. If not given, the next in the underlying axis cycle will be consumed and used.
max_marks (integer, optional) – The maximum number of points that will be displayed individually.
- Returns:
ax – The Axes object with the plot on it, allowing further customization.
- Return type:
matplotlib axes
Example
>>> import numpy as np >>> dist = np.random.normal(size=100) >>> ax = kde_plot(dist)
- validphys.plotutils.marker_iter_plot()[source]
Because of the mpl strange interface, markers work differently in plots and scatter. This is the same as marker_iter_scatter, but returns kwargs to be passed to
plt.plot()
- validphys.plotutils.marker_iter_scatter()[source]
Yield the possible matplotplib.markers.Markersyle instances with different fillsyles and markers. This can be passed to
plt.scatter
. Forplt.plot
, usemarker_iter_scatter
.
- validphys.plotutils.offset_xcentered(n, ax, *, offset_prop=0.05)[source]
Yield
n
matplotlib transforms in such a way that the correspondingn
transofrmed x values are centered around the middle. The offset between to consecutive points isoffset_prop
in units of the figure dpi scale.
- validphys.plotutils.plot_horizontal_errorbars(cvs, errors, categorylabels, datalabels=None, xlim=None)[source]
A plots with a list of horizontal errorbars oriented vertically.
cvs
anderrors
are the central values and errors both of shape ndatasets x ncategories,cateogorylabels
are the labels of each element for which errorbars are drawn anddatalabels
are the labels of the different datasets that are compared.
- validphys.plotutils.scalar_log_formatter()[source]
Return a matplotlib formatter to display powers of 10 in a log rather than exponential notation.
- Returns:
formatter – an object that can be passed to the
set_major_formatter
matplotlib functions.- Return type:
ticker.FuncFormatter
Examples
>>> from matplotlib.figure import Figure >>> fig = Figure() >>> ax = fig.subplots() >>> ax.plot([0.01, 0.1, 1, 10, 100]) >>> ax.set_yscale("log") >>> ax.yaxis.set_major_formatter(scalar_log_formatter())
- validphys.plotutils.spiderplot(xticks, vals, label, ax)[source]
Makes a spider/radar plot.
xticks: list of names of x tick labels, e.g. datasets vals: list of values to plot corresponding to each xtick label: label for values, e.g. fit name ax: a PolarAxes instance
- validphys.plotutils.subplots(figsize=None, nrows=1, ncols=1, sharex=False, sharey=False, **kwargs)[source]
matplotlib.figure wrapper used to generate a figure and add subplots.
Use matplotlib.figure.Figure() objects to avoid importing
pyplot
anywhere. The reason is that pyplot maintains a global state that makes it misbehave in multithreaded applications such when executed under dask parallel mode.
validphys.process_options module
Module to hold process dependent options
Only variables included in the _Vars enum and processes included in the Processes
dictionary are allowed.
validphys.promptutils module
Module which extends the functionality of promp_toolkit for user inputs/interactivity
validphys.pseudodata module
Tools to obtain and analyse the pseudodata that was seen by the neural networks during the fitting.
- class validphys.pseudodata.DataTrValSpec(pseudodata, tr_idx, val_idx)
Bases:
tuple
- pseudodata
Alias for field number 0
- tr_idx
Alias for field number 1
- val_idx
Alias for field number 2
- validphys.pseudodata.indexed_make_replica(groups_index, make_replica)[source]
Index the make_replica pseudodata appropriately
- validphys.pseudodata.level0_commondata_wc(data, fakepdf)[source]
Given a validphys.core.DataGroupSpec object, load commondata and generate a new commondata instance with central values replaced by fakepdf prediction
- Parameters:
data (validphys.core.DataGroupSpec)
fakepdf (validphys.core.PDF)
- Returns:
list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data.
- Return type:
Example
>>> from validphys.api import API >>> API.level0_commondata_wc(dataset_inputs = [{"dataset":"NMC"}], use_cuts="internal", theoryid=200,fakepdf = "NNPDF40_nnlo_as_01180")
[CommonData(setname=’NMC’, ndata=204, commondataproc=’DIS_NCE’, nkin=3, nsys=16)]
- validphys.pseudodata.make_level1_data(data, level0_commondata_wc, filterseed, data_index, sep_mult)[source]
Given a list of Level 0 commondata instances, return the same list with central values replaced by Level 1 data.
Level 1 data is generated using validphys.make_replica. The covariance matrix, from which the stochastic Level 1 noise is sampled, is built from Level 0 commondata instances (level0_commondata_wc). This, in particular, means that the multiplicative systematics are generated from the Level 0 central values.
Note that the covariance matrix used to generate Level 2 pseudodata is consistent with the one used at Level 1 up to corrections of the order eta * eps, where eta and eps are defined as shown below:
Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0) Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)
where CL0 and CL1 means that the multiplicative entries have been constructed from Level 0 and Level 1 central values respectively.
- Parameters:
data (validphys.core.DataGroupSpec)
level0_commondata_wc (list) – list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 0 fake data. Cuts already applied.
filterseed (int) – random seed used for the generation of Level 1 data
data_index (pandas.MultiIndex)
- Returns:
list of validphys.coredata.CommonData instances corresponding to all datasets within one experiment. The central value is replaced by Level 1 fake data.
- Return type:
Example
>>> from validphys.api import API >>> dataset='NMC' >>> l1_cd = API.make_level1_data(dataset_inputs = [{"dataset":dataset}],use_cuts="internal", theoryid=200, fakepdf = "NNPDF40_nnlo_as_01180",filterseed=1) >>> l1_cd [CommonData(setname='NMC', ndata=204, commondataproc='DIS_NCE', nkin=3, nsys=16)]
- validphys.pseudodata.make_replica(groups_dataset_inputs_loaded_cd_with_cuts, replica_mcseed, dataset_inputs_sampling_covmat, sep_mult=False, genrep=True, max_tries=1000000, resample_negative_pseudodata=True)[source]
Function that takes in a list of
validphys.coredata.CommonData
objects and returns a pseudodata replica accounting for possible correlations between systematic uncertainties.The function loops until positive definite pseudodata is generated for any non-asymmetry datasets. In the case of an asymmetry dataset negative values are permitted so the loop block executes only once.
- Parameters:
groups_dataset_inputs_loaded_cd_with_cuts (list[
validphys.coredata.CommonData
]) – List of CommonData objects which stores information about systematic errors, their treatment and description, for each dataset.replica_mcseed (int, None) – Seed used to initialise the numpy random number generator. If
None
then a random seed is allocated using the default numpy behaviour.dataset_inputs_sampling_covmat (np.array) – Full covmat to be used. It can be either only experimental or also theoretical.
sep_mult (bool) – Specifies whether computing the shifts with the full covmat or whether multiplicative errors should be separated
genrep (bool) – Specifies whether computing replicas or not
max_tries (int) – The stochastic nature of replica generation means one can obtain (unphysical) negative predictions. If after max_tries (default=1e6) no physical configuration is found, it will raise a
ReplicaGenerationError
resample_negative_pseudodata (bool) – When True, replicas that produce negative predictions will be resampled for
max_tries
until all points are positive (default: True)
- Returns:
pseudodata – Numpy array which is N_dat (where N_dat is the combined number of data points after cuts) containing monte carlo samples of data centered around the data central value.
- Return type:
np.array
Example
>>> from validphys.api import API >>> pseudodata = API.make_replica( dataset_inputs=[{"dataset":"NMC"}, {"dataset": "NMCPD"}], use_cuts="nocuts", theoryid=53, replica=1, mcseed=123, genrep=True, ) array([0.25640033, 0.25986534, 0.27165461, 0.29001009, 0.30863588, 0.30100351, 0.31781208, 0.30827054, 0.30258217, 0.32116842, 0.34206012, 0.31866286, 0.2790856 , 0.33257621, 0.33680007,
- validphys.pseudodata.read_replica_pseudodata(fit, context_index, replica)[source]
Function to handle the reading of training and validation splits for a fit that has been produced with the
savepseudodata
flag set toTrue
.The data is read from the PDF to handle the mixing introduced by
postfit
.The data files are concatenated to return all the data that went into a fit. The training and validation indices are also returned so one can access the splits using pandas indexing.
- Raises:
FileNotFoundError – If the training or validation files for the PDF set cannot be found.
CheckError – If the
use_cuts
flag is not set tofromfit
- Returns:
data_indices_list – List of
namedtuple
where each entry corresponds to a given replica. Each element contains attributespseudodata
,tr_idx
, andval_idx
. The latter two being used to slice the former to return training and validation data respectively.- Return type:
list[namedtuple]
Example
>>> from validphys.api import API >>> data_indices_list = API.read_fit_pseudodata(fit="pseudodata_test_fit_n3fit") >>> len(data_indices_list) # Same as nrep 10 >>> rep_info = data_indices_list[0] >>> rep_info.pseudodata.loc[rep_info.tr_idx].head() replica 1 group dataset id ATLAS ATLASZPT8TEVMDIST 1 30.665835 3 15.795880 4 8.769734 5 3.117819 6 0.771079
- validphys.pseudodata.recreate_fit_pseudodata(_recreate_fit_pseudodata, fitreplicas, fit_tr_masks)[source]
Function used to reconstruct the pseudodata seen by each of the Monte Carlo fit replicas.
- Returns:
res – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.
- Return type:
list[namedtuple]
Example
>>> from validphys.api import API >>> API.recreate_fit_pseudodata(fit="pseudodata_test_fit_n3fit")
Notes
This function does not account for the postfit reshuffling.
- validphys.pseudodata.recreate_pdf_pseudodata(_recreate_pdf_pseudodata, pdfreplicas, pdf_tr_masks)[source]
Like
validphys.pseudodata.recreate_fit_pseudodata()
but accounts for the postfit reshuffling of replicas.- Returns:
res – List of namedtuples, each of which contains a dataframe containing all the data points, the training indices, and the validation indices.
- Return type:
list[namedtuple]
Example
>>> from validphys.api import API >>> API.recreate_pdf_pseudodata(fit="pseudodata_test_fit_n3fit")
validphys.renametools module
A collection of utility functions to handle logistics of LHAPDFs and fits. For use by vp-scripts.
- class validphys.renametools.Spinner(delay=0.1)[source]
Bases:
object
Context manager to provide a spinning cursor while validphys performs some other task silently.
When exececuted in a TTY, it shows a spinning cursor for the duration of the context manager. In non interactive prompts, it prints to stdout at the beginning and end.
Example
>>> from validphys.renametools import Spinner >>> with Spinner(): ... import time ... time.sleep(5)
- property interactive
validphys.replica_selector module
replica_selector.py
Tools for filtering replica sets based on criteria on the replicas.
- validphys.replica_selector.alpha_s_bundle_pdf(pdf, pdfs, output_path, target_name: (<class 'str'>, <class 'NoneType'>) = None)[source]
Action that bundles PDFs for distributing to the LHAPDF format. The baseline pdf is declared as the
pdf
key and the PDFs from which the replica 0s are to be added is declared as thepdfs
list.The bundled PDF set is stored inside the
output
directory.- Parameters:
pdf (
validphys.core.PDF
) – The baseline PDF to which the new replicas will be addedpdfs (list of
validphys.core.PDF
) – The list of PDFs from which replica0 will be appendedtarget_name (str or None) – Optional argument specifying the name of the output PDF. If
None
, then the name of the original pdf is used but with_pdfas
appended
validphys.results module
results.py
Tools to obtain theory predictions and basic statistical estimators.
- class validphys.results.Chi2Data(replica_result, central_result, ndata)
Bases:
tuple
- central_result
Alias for field number 1
- ndata
Alias for field number 2
- replica_result
Alias for field number 0
- class validphys.results.DataResult(dataset, covmat, sqrtcovmat)[source]
Bases:
StatsResult
Holds the relevant information from a given dataset
- property central_value
- property covmat
- property label
- property name
- property sqrtcovmat
Lower part of the Cholesky decomposition
- property std_error
- class validphys.results.PositivityResult(stats)[source]
Bases:
StatsResult
- class validphys.results.StatsResult(stats)[source]
Bases:
Result
- property central_value
- property error_members
Returns the error members with shape (Npoints, Npdf)
- property rawdata
Returns the raw data with shape (Npoints, Npdf)
- property std_error
- class validphys.results.ThPredictionsResult(dataobj, stats_class, datasetnames=None, label=None, pdf=None, theoryid=None)[source]
Bases:
StatsResult
Class holding theory prediction, inherits from StatsResult When created with from_convolution, it keeps tracks of the PDF for which it was computed
- property datasetnames
- class validphys.results.ThUncertaintiesResult(central, std_err, label=None)[source]
Bases:
StatsResult
Class holding central theory predictions and the error bar corresponding to the theory uncertainties considered. The error members of this class correspond to central +- error_bar
- property central_value
- property error_members
Returns the error members with shape (Npoints, Npdf)
- property rawdata
Returns the raw data with shape (Npoints, Npdf)
- property std_error
- validphys.results.abs_chi2_data(results)[source]
Return a tuple (member_chi², central_chi², numpoints) for a given dataset
- validphys.results.abs_chi2_data_thcovmat(results_with_theory_covmat)[source]
The same as
abs_chi2_data
but considering as well the theory uncertainties
- validphys.results.chi2_stats(abs_chi2_data)[source]
Compute several estimators from the chi²:
central_mean
npoints
perreplica_mean
perreplica_std
chi2_per_data
- validphys.results.count_negative_points(possets_predictions)[source]
Return the number of replicas with negative predictions for each bin in the positivity observable.
- validphys.results.data_index(data)[source]
Given a core.DataGroupSpec instance, return pd.MultiIndex with the following levels:
experiment
datasets
datapoints indices (cuts already applied to)
- Parameters:
data (core.DataGroupSpec)
- Return type:
pd.MultiIndex
- validphys.results.dataset_chi2_table(chi2_stats, dataset)[source]
Show the chi² estimators for a given dataset
- validphys.results.dataset_inputs_abs_chi2_data(dataset_inputs_results)[source]
Like abs_chi2_data but for a group of inputs
- validphys.results.dataset_inputs_bootstrap_chi2_central(dataset_inputs_results, bootstrap_samples=500, boot_seed=123)[source]
Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of central chi2. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.
- validphys.results.dataset_inputs_bootstrap_phi_data(dataset_inputs_results, bootstrap_samples=500)[source]
Takes the data result and theory prediction given dataset_inputs and then returns a bootstrap distribution of phi. By default bootstrap_samples is set to a sensible value (500). However a different value can be specified in the runcard.
For more information on how phi is calculated see phi_data
- validphys.results.dataset_inputs_chi2_per_point_data(dataset_inputs_abs_chi2_data)[source]
Return the total chi²/ndata for all data, specified by dataset_inputs. Covariance matrix is fully correlated across datasets, with all known correlations.
- validphys.results.dataset_inputs_phi_data(dataset_inputs_abs_chi2_data)[source]
Like phi_data but for group of datasets
- validphys.results.dataset_inputs_results(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]
Like results but for a group of datasets
- validphys.results.dataset_inputs_results_central(data, pdf: PDF, dataset_inputs_covariance_matrix, dataset_inputs_sqrt_covmat)[source]
Like dataset_inputs_results but for a group of datasets and replica0.
- validphys.results.dataset_inputs_results_without_covmat(data, pdf: PDF)[source]
Like dataset_inputs_results but skipping the computation of the covmat
- validphys.results.dataspecs_chi2_differences_table(dataspecs, dataspecs_chi2_table)[source]
Given two dataspecs, print the chi² (using dataspecs_chi2_table) and the difference between the first and the second.
- validphys.results.dataspecs_chi2_table(dataspecs_total_chi2_data, dataspecs_datasets_chi2_table, dataspecs_groups_chi2_table, show_total: bool = False)[source]
Same as fits_chi2_table but for an arbitrary list of dataspecs
- validphys.results.dataspecs_dataset_chi2_difference_table(dataspecs_each_dataset, dataspecs_each_dataset_chi2, dataspecs_speclabel)[source]
Returns a table with difference between the chi2 and the expected chi2 in units of the expected chi2 standard deviation, given by
chi2_diff = (chi2 - N)/sqrt(2N)
for each dataset for each dataspec.
- validphys.results.dataspecs_datasets_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_datasets_chi2_data, per_point_data: bool = True)[source]
Same as fits_datasets_chi2_table but for arbitrary dataspecs.
- validphys.results.dataspecs_datasets_nsigma_table(dataspecs_datasets_chi2_table)[source]
Same as dataspecs_datasets_chi2_table but for nsigma.
- validphys.results.dataspecs_groups_chi2_table(dataspecs_speclabel, dataspecs_groups, dataspecs_groups_chi2_data, per_point_data: bool = True)[source]
Same as fits_groups_chi2_table but for an arbitrary list of dataspecs.
- validphys.results.dataspecs_groups_nsigma_table(dataspecs_groups_chi2_table)[source]
Same as fits_groups_nsigma_table but for an arbitrary list of dataspecs.
- validphys.results.dataspecs_nsigma_table(dataspecs_total_chi2_data, dataspecs_datasets_nsigma_table, dataspecs_groups_nsigma_table, show_total: bool = False)[source]
Same as fits_nsigma_table but for an arbitrary list of dataspecs
- validphys.results.experiments_chi2_stats(total_chi2_data)[source]
Compute several estimators from the chi² for an aggregate of experiments:
central_mean
npoints
perreplica_mean
perreplica_std
chi2_per_data
- validphys.results.experiments_covmat_no_table(experiments_data, experiments_index, experiments_covmat_collection)[source]
Makes the total experiments covariance matrix, which can then be reindexed appropriately by the chosen grouping. The covariance matrix must first be grouped by experiments to ensure correlations within experiments are preserved.
- validphys.results.experiments_invcovmat(experiments_data, experiments_index, experiments_covmat_collection)[source]
Compute and export the inverse covariance matrix. Note that this inverts the matrices with the LU method which is suboptimal.
- validphys.results.experiments_sqrtcovmat(experiments_data, experiments_index, experiments_sqrt_covmat)[source]
Like experiments_covmat, but dump the lower triangular part of the Cholesky decomposition as used in the fit. The upper part indices are set to zero.
- validphys.results.fits_chi2_table(fits_total_chi2_data, fits_datasets_chi2_table, fits_groups_chi2_table, show_total: bool = False)[source]
Show the chi² of each and number of points of each dataset and experiment of each fit, where experiment is a group of datasets according to the experiment key in the PLOTTING info file, computed with the theory corresponding to the fit. Dataset that are not included in some fit appear as NaN
- validphys.results.fits_datasets_chi2_table(fits_name_with_covmat_label, fits_groups, fits_datasets_chi2_data, per_point_data: bool = True)[source]
A table with the chi2 for each included dataset in the fits, computed with the theory corresponding to the fit. The result are indexed in two levels by experiment and dataset, where experiment is the grouping of datasets according to the experiment key in the PLOTTING info file. If points_per_data is True, the chi² will be shown divided by ndata. Otherwise they will be absolute.
- validphys.results.fits_datasets_nsigma_table(fits_datasets_chi2_table)[source]
A table with nsigma values for each dataset included in the fit. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.
- validphys.results.fits_groups_chi2_table(fits_name_with_covmat_label, fits_groups, fits_groups_chi2_data, per_point_data: bool = True)[source]
A table with the chi2 computed with the theory corresponding to each fit for all datasets in the fit, grouped according to a key in the metadata, the grouping can be controlled with metadata_group.
If points_per_data is True, the chi² will be shown divided by ndata. Otherwise chi² values will be absolute.
- validphys.results.fits_groups_nsigma_table(fits_groups_chi2_table)[source]
Similar to fits_groups_chi2_table but for nsigma. nsigma is defined as (chi2 - 1) / sqrt(2/ndata), when the chi2 is normalized by ndata.
- validphys.results.fits_groups_phi_table(fits_name_with_covmat_label, fits_groups, fits_groups_phi)[source]
For every fit, returns phi and number of data points for each group of datasets, which are grouped according to a key in the metadata. The behaviour of the grouping can be controlled with metadata_group runcard key.
- validphys.results.fits_nsigma_table(fits_total_chi2_data, fits_datasets_nsigma_table, fits_groups_nsigma_table, show_total: bool = False)[source]
Show the nsigma of each and number of points of each dataset and experiment for each fit, computed with the theory corresponding to the fit. Datasets that are not included in one of the fit appear as NaN
- validphys.results.group_result_central_table_no_table(groups_results_central, groups_index)[source]
Generate a table containing the data central value and the central prediction
- validphys.results.group_result_table(group_result_table_no_table)[source]
Duplicate of group_result_table_no_table but with a table decorator.
- validphys.results.group_result_table_68cl(groups_results, group_result_table_no_table: DataFrame, pdf: PDF)[source]
Generate a table containing the data central value, the data 68% confidence levels, the central prediction, and 68% confidence level bounds of the prediction.
- validphys.results.group_result_table_no_table(groups_results, groups_index)[source]
Generate a table containing the data central value, the central prediction, and the prediction for each PDF member.
- validphys.results.groups_central_values(group_result_central_table_no_table)[source]
Duplicate of groups_central_values_no_table but takes group_result_table rather than groups_central_values_no_table, and has a table decorator.
- validphys.results.groups_central_values_no_table(group_result_central_table_no_table)[source]
Returns a theoryid-dependent list of central theory predictions for a given group.
- validphys.results.groups_chi2_table(groups_data, pdf, groups_chi2, groups_each_dataset_chi2)[source]
Return a table with the chi² to the groups and each dataset in the groups, grouped by metadata.
- validphys.results.groups_corrmat(groups_covmat)[source]
Generates the grouped experimental correlation matrix with groups_covmat as input
- validphys.results.groups_covmat(groups_covmat_no_table)[source]
Duplicate of groups_covmat_no_table but with a table decorator.
- validphys.results.groups_covmat_no_table(experiments_covmat_no_table, groups_index)[source]
Export the covariance matrix for the groups. It exports the full (symmetric) matrix, with the 3 first rows and columns being:
group name
dataset name
index of the point within the dataset.
- validphys.results.groups_data_values(group_result_table)[source]
Returns list of data values for the input groups.
- validphys.results.groups_index(groups_data)[source]
Return a pandas.MultiIndex with levels for group, dataset and point respectively, the group is determined by a key in the dataset metadata, and controlled by metadata_group key in the runcard.
Example
TODO: add example
- validphys.results.groups_invcovmat(experiments_invcovmat, groups_index)[source]
Like experiments_invcovmat but relabelled to the chosen grouping.
- validphys.results.groups_normcovmat(groups_covmat, groups_data_values)[source]
Calculates the grouped experimental covariance matrix normalised to data.
- validphys.results.groups_sqrtcovmat(experiments_sqrtcovmat, groups_index)[source]
Like experiments_sqrtcovmat but relabelled to the chosen grouping.
- validphys.results.one_or_more_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), covariance_matrix, sqrt_covmat, pdfs: (<class 'NoneType'>, <class 'collections.abc.Sequence'>) = None, pdf: (<class 'NoneType'>, <class 'validphys.core.PDF'>) = None)[source]
Generate a list of results, where the first element is the data values, and the next is either the prediction for pdf or for each of the pdfs. Which of the two is selected intelligently depending on the namespace, when executing as an action.
- validphys.results.pdf_results(dataset: (<class 'validphys.core.DataSetSpec'>, <class 'validphys.core.DataGroupSpec'>), pdfs: ~collections.abc.Sequence, covariance_matrix, sqrt_covmat)[source]
Return a list of results, the first for the data and the rest for each of the PDFs.
- validphys.results.perreplica_chi2_table(groups_data, groups_chi2, total_chi2_data)[source]
Chi² per point for each replica for each group. Also outputs the total chi² per replica. The columns come in two levels: The first is the name of the group, and the second is the number of points.
- validphys.results.phi_data(abs_chi2_data)[source]
Calculate phi using values returned by abs_chi2_data.
Returns tuple of (float, int): (phi, numpoints)
For more information on how phi is calculated see Eq.(24) in 1410.8849
- validphys.results.positivity_predictions_data_result(pdf, posdataset)[source]
Return an object containing the values of the positivuty observable.
- validphys.results.predictions_by_kinematics_table(results, kinematics_table_notable)[source]
Return a table combining the output of
validphys.kinematics.kinematics_table`()
with the data and theory central values.
- validphys.results.proc_result_table_experiment(procs_results_experiment, experiments_index)[source]
- validphys.results.procs_chi2_table(procs_data, pdf, groups_chi2_by_process, groups_each_dataset_chi2_by_process)[source]
Same as groups_chi2_table but by process
- validphys.results.procs_data_values(proc_result_table)[source]
Like groups_data_values but grouped by process.
- validphys.results.procs_data_values_experiment(proc_result_table_experiment)[source]
Like groups_data_values but grouped by experiment.
- validphys.results.relabel_experiments_to_groups(input_covmat, groups_index)[source]
Takes a covmat grouped by experiments and relabels it by groups. This allows grouping over experiments to preserve experimental correlations outwith the chosen grouping.
- validphys.results.results(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]
Tuple of data and theory results for a single pdf. The data will have an associated covariance matrix, which can include a contribution from the theory covariance matrix which is constructed from scale variation. The inclusion of this covariance matrix by default is used where available, however this behaviour can be modified with the flag use_theorycovmat.
The theory is specified as part of the dataset (a remnant of the old C++ layout) A group of datasets is also allowed.
- validphys.results.results_central(dataset: DataSetSpec, pdf: PDF, covariance_matrix, sqrt_covmat)[source]
Same as
results()
but only calculates the prediction for replica0.
- validphys.results.results_with_scale_variations(results, theory_covmat_dataset)[source]
Use the theory covariance matrix to generate a ThPredictionsResult-compatible object modified so that its uncertainties correspond to a combination of the PDF and theory (scale variations) errors added in quadrature. This allows to plot results including scale variations
By doing this we lose all information about prediction for the individual replicas or theories
- validphys.results.results_with_theory_covmat(dataset, results, theory_covmat_dataset)[source]
Returns results with a modfy
DataResult
such that the covariance matrix includes also the theory covmat. This can be used to make use of results that consider scale variations without including the theory covmat as part of the covariance matrix used by other validphys function. Most notably, this can be used to compute the chi2 including theory errors while plotting data theory covariance in which the experimental uncertainties are not stained by the thcovmat
- validphys.results.results_without_covmat(dataset: DataSetSpec, pdf: PDF)[source]
Return a results object with a diagonal covmat so that it can be used to generate results-depending covmats elsewhere. Uses :py:funct:`results` under the hook
- validphys.results.total_chi2_data_from_experiments(experiments_chi2_data, pdf)[source]
Like
dataset_inputs_abs_chi2_data()
, except sums the contribution from each experiment which is more efficient in the case that the total covariance matrix is block diagonal in experiments.This is valid as long as there are no cross experiment correlations from e.g. theory covariance matrices.
- validphys.results.total_phi_data_from_experiments(experiments_phi_data)[source]
Like
dataset_inputs_phi_data()
except calculate phi for each experiment and then sum the contributions. Note that since the definition of phi isphi = sqrt( (<chi2[T_k]> - chi2[<T_k>]) / n_data ),
where k is the replica index, the total phi is
sqrt( sum(n_data*phi**2) / sum(n_data) )
where the sums run over experiment
This is only a valid method of calculating total phi provided that there are no inter-experimental correlations.
validphys.reweighting module
Utilities for reweighting studies.
Implements utilities for calculating the NNPDF weights and unweighted PDF sets. It also allows for some basic statistics.
- validphys.reweighting.chi2_data_for_reweighting_experiments(chi2_data_for_reweighting_experiments_inner, use_t0)[source]
- validphys.reweighting.make_pdf_from_filtered_outliers(fit, chi2filtered_index, set_name: str, output_path=None, installgrid: bool = True)[source]
Produce a new grid with the result of chi2filtered_index
- validphys.reweighting.make_unweighted_pdf(pdf, unweighted_index, set_name: str, output_path=None, installgrid: bool = True)[source]
Generate an unweighted PDF set, from the prior
pdf
and the reweighting_experiments. The PDF is written to a pdfsets directory of the output folder. Return the relative path of the newly created PDF.
- validphys.reweighting.nnpdf_weights(chi2_data_for_reweighting_experiments)[source]
Compute the replica weights according to the NNPDF formula.
- validphys.reweighting.nnpdf_weights_numerator(chi2_data_for_reweighting_experiments)[source]
Compute the numerator of the NNPDF weights. This is useful for P(α), which uses a different normalization.
- validphys.reweighting.p_alpha_study(chi2_data_for_reweighting_experiments)[source]
Compute P(α) in an automatic range
validphys.sumrules module
sumrules.py
Module for the computation of sum rules
Note that this contains only the code for the computation of sum rules from scratch using LHAPDF tables. The code reading the sum rule information output from the fit is present in fitinfo.py
- validphys.sumrules.bad_replica_sumrules(pdf, sum_rules, threshold: Real = 0.01)[source]
Return a table with the sum rules for the replica where some sum rule is farther from the correct value than
threshold
(in absolute value).
- validphys.sumrules.central_sum_rules(pdf: PDF, Q: Real)[source]
Compute the sum rules for the central member, at the scale Q
- validphys.sumrules.central_sum_rules_table(central_sum_rules)[source]
Construct a table with the value of each sum rule for the central member
- validphys.sumrules.partial_polarized_sum_rules(pdf: PDF, Q: Real, lims: tuple = ((0.0001, 0.001), (0.001, 1)))[source]
Compute the partial low- and large-x polarized sum rules. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.
- validphys.sumrules.polarized_sum_rules(partial_polarized_sum_rules)[source]
Compute the full polarized sum rules. The integration is performed with absolute and relative tolerance of 1e-4.
- validphys.sumrules.polarized_sum_rules_table(polarized_sum_rules)[source]
Return a table with the descriptive statistics of the polarized sum rules, over members of the PDF.
- validphys.sumrules.sum_rules(pdf: PDF, Q: Real)[source]
Compute the momentum, uvalence, dvalence, svalence and cvalence sum rules for each member, at the energy scale
Q
. Return a SumRulesGrid object with the list of values for each sum rule. The integration is performed with absolute and relative tolerance of 1e-4.
- validphys.sumrules.sum_rules_table(sum_rules)[source]
Return a table with the descriptive statistics of the sum rules, over members of the PDF.
- validphys.sumrules.unknown_sum_rules(pdf: PDF, Q: Real)[source]
Compute the following integrals - u momentum fraction - ubar momentum fraction - d momentum fraction - dbar momentum fraction - s momentum fraction - sbar momentum fraction - cp momentum fraction - cm momentum fraction - g momentum fraction - T3 - T8
validphys.tableloader module
#tableloader.py
Load from file some of the tables that validphys produces. Contrary to validphys.loader this module consists of functions that take absolute paths, and return mostly dataframes.
- exception validphys.tableloader.TableLoaderError[source]
Bases:
Exception
Errors in the tableloader module.
- validphys.tableloader.combine_pseudoreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2)[source]
Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.
- validphys.tableloader.combine_pseudorreplica_tables(dfs, combined_names, *, blacklist_datasets=None, min_points_required=2)
Return a table in the same format as perreplica_chi2_table with th e minimum value of the chi² for each batch of fits.
- validphys.tableloader.fixup_header(df, head_index, dtype)[source]
Set the type of the column index in place
- validphys.tableloader.get_extrasum_slice(df, components)[source]
Extract a slice of a table that has the components in the format that extra_sums expects.
- validphys.tableloader.load_adapted_fits_chi2_table(filename)[source]
Load the fits_chi2_table and adapt it in the way that suits the
paramfits
module. That is, return a table with the total chi² and another with the number of points.
- validphys.tableloader.load_experiments_covmat(filename)
Parse a dump of a matrix like experiments_covmat.
- validphys.tableloader.load_experiments_invcovmat(filename)
Parse a dump of a matrix like experiments_covmat.
- validphys.tableloader.load_fits_chi2_table(filename)[source]
Load the result of fits_chi2_tavle or similar.
- validphys.tableloader.load_perreplica_chi2_table(filename)[source]
Load the output of
perreplica_chi2_table
.
- validphys.tableloader.parse_data_cv(filename)[source]
Useful for reading DataFrames with just one column.
validphys.theoryinfo module
theoryinfo.py
Actions for displaying theory info for one or more theories.
- validphys.theoryinfo.all_theory_info_table(theory_database)[source]
Produces a DataFrame with all theory info and saves it
- Returns:
all_theory_info_table – dataframe filled with all entries in theorydb file
- Return type:
pd.Dataframe
Example
>>> from validphys.api import API >>> df = API.all_theory_info_table() >>> df['Comments'].iloc[:5] ID 1 3.0 LO benchmark 2 3.0 NLO benchmark 3 3.0 NNLO benchmark 4 3.0 NLO - Q0=1.3 For IC Test 5 3.0 NNLO - Q0=1.3 For IC Test Name: Comments, dtype: object
- validphys.theoryinfo.theory_info_table(theory_database, theory_db_id)[source]
fetches theory info for given theory_db_id constructs DataFrame from it
- Parameters:
theory_db_id (int) – numeric identifier of theory to be queried. Can be specified at the runcard level.
- Returns:
theory_info_table – dataframe filled with theory info for specified theory_db_id
- Return type:
pd.Dataframe
Example
>>> from validphys.api import API >>> df = API.theory_info_table(theory_db_id=53) >>> df.loc['Comments'] Info for theory 53 NNPDF3.1 NNLO central Name: Comments, dtype: object
validphys.uploadutils module
uploadutils.py
Tools to upload resources to remote servers.
- class validphys.uploadutils.ArchiveUploader[source]
Bases:
FileUploader
Uploader for objects comprising many files such as fits or PDFs
- root_url = None
- target_dir = None
- exception validphys.uploadutils.BadSSH[source]
Bases:
UploadError
- class validphys.uploadutils.FileUploader[source]
Bases:
Uploader
Uploader for individual files for single-file resources. It does the ” “same but prints the URL of the file.
- class validphys.uploadutils.FitUploader[source]
Bases:
ArchiveUploader
An uploader for fits. Fits will be automatically compressed before uploading.
- check_fit_md5(output_path)[source]
When
vp-setupfit
is run successfully, it creates anmd5
from the config. We check that themd5
matches thefilter.yml
which is checking thatvp-setupfit
was run and that thefilter.yml
inside the fit folder wasn’t modified.
- property root_url
- property target_dir
- class validphys.uploadutils.HyperscanUploader[source]
Bases:
FitUploader
Uploader for hyperopt scans, which are just special cases of fits
- property root_url
- property target_dir
- class validphys.uploadutils.PDFUploader[source]
Bases:
ArchiveUploader
An uploader for PDFs. PDFs will be automatically compressed before uploading.
- property root_url
- property target_dir
- class validphys.uploadutils.ReportFileUploader[source]
Bases:
FileUploader
,ReportUploader
- class validphys.uploadutils.ReportUploader[source]
Bases:
Uploader
An uploader for validphys reports.
- property root_url
- property target_dir
- class validphys.uploadutils.Uploader[source]
Bases:
object
Base class for implementing upload behaviour. The main abstraction is a context manager
upload_context
which checks that the upload seems possible, then does the work inside the context and then uploads the result. The various derived classes should be used.- check_upload()[source]
Check that it looks possible to upload something. Raise an UploadError if not.
- upload_context(output)[source]
Before entering the context, check that uploading is feasible. On exiting the context, upload output.
- property upload_host
- validphys.uploadutils.check_for_meta(path)[source]
Function that checks if a report input has a
meta.yaml
file. If not it prompts the user to either create one or follow an interactive prompt which assists the user in creating one.- Parameters:
path (pathlib.Path) – Input path
- Return type:
None
- validphys.uploadutils.check_input(path)[source]
A function that checks the type of the input for vp-upload. The type determines where on the vp server the file will end up
A
fit
is defined as any folder structure containing afilter.yml
file at its root.A
pdf
is defined as any folder structure that contains a.info
file and a replica 0 at its root.A
report
is defined as any folder structure that contains anindex.html
at its root.If the input file does not fall under any such category
ValueError
exception is raised and the user is prompted to use eitherrsync
orvalidphys.scripts.wiki_upload
.- Parameters:
path (pathlib.Path) – Path of the input file
- validphys.uploadutils.interactive_meta(path)[source]
Function to interactively create a meta.yaml file
- Parameters:
path (pathlib.Path) – Input path
- Return type:
None
validphys.utils module
- validphys.utils.common_prefix(*s)[source]
Return the longest string that is a prefix to both s1 and s2
- validphys.utils.experiments_to_dataset_inputs(experiments_list)[source]
Flatten a list of old style experiment inputs to the new, flat,
dataset_inputs
style.Example
>>> from validphys.api import API >>> from validphys.utils import experiments_to_dataset_inputs >>> fit = API.fit(fit='NNPDF31_nnlo_as_0118_1000') >>> experiments = fit.as_input()['experiments'] >>> dataset_inputs = experiments_to_dataset_inputs(experiments) >>> dataset_inputs[:3] [{'dataset': 'NMCPD', 'frac': 0.5}, {'dataset': 'NMC', 'frac': 0.5}, {'dataset': 'SLACP', 'frac': 0.5}]
- validphys.utils.generate_path_filtered_data(fit_path, setname)[source]
Utility to ensure that both the loader and tools like setupfit utilize the same convention to generate the names of generated pseudodata
- validphys.utils.sane_groupby_iter(df, by, *args, **kwargs)[source]
Iterate groupby in such a way that first value is always the tuple of the common values.
As a concenience for plotting, if by is None, yield the empty string and the whole dataframe.
- validphys.utils.scale_from_grid(grid)[source]
Guess the appropriate matplotlib scale from a grid object. Returns
'linear'
if the scale of the grid object is linear, and otherwise' log'
.
- validphys.utils.split_by(it, crit)[source]
Split
it
in two lists, the first is such thatcrit
evaluates to True and the second such it doesn’t. Crit can be either a function or an iterable (in this case the originalit
will be sliced if the length ofcrit
is smaller).
- validphys.utils.split_ranges(a, cond=None, *, filter_falses=False)[source]
Split
a
so that each range has the same value forcond
. Iffilter_falses
is true, only the ranges for which the condition is true will be returned.
- validphys.utils.tempfile_cleaner(root, exit_func, exc, prefix=None, **kwargs)[source]
A context manager to handle temporary directory creation and clean-up upon raising an expected exception.
- Parameters:
root (str) – The root directory to create the temporary directory in.
exit_func (Callable) – The exit function to call upon exiting the context manager. Usually one of
shutil.move
orshutil.rmtree
. Use the former if the temporary directory will be the final result directory and the latter if the temporary directory will contain the result directory, for example when downloading a resource.exc (Exception) – The exception to catch within the
with
block.prefix (optional[str]) – A prefix to prepend to the temporary directory.
**kwargs (dict) – Keyword arguments to provide to
exit_func
.
- Returns:
tempdir – The path to the temporary directory.
- Return type:
Example
The following example creates a temporary directory prepended with
tutorial_
in the/tmp
directory. The context manager will listen for aKeyboardInterrupt
and will clean up if this exception is raised. Upon completion of thewith
block, it will rename the temporary tocompleted
as thedst
, usingshutil.move
. The final directory will contain an empty file callednew_file
, which we created within thewith
block.1 import shutil 2 3 from validphys.utils import tempfile_cleaner 4 5 with tempfile_cleaner( 6 root="/tmp", 7 exit_func=shutil.move, 8 exc=KeyboardInterrupt, 9 prefix="tutorial_", 10 dst="completed", 11 ) as tempdir: 12 new_file = tempdir / "new_file" 13 input("Press enter to continue or Ctrl-C to interrupt:\n") 14 new_file.touch()