In PDF fits, not all the data provided by the experimental collaborations are useful. For example, we may wish to discard certain datapoints for which we know small-x resummation or electroweak corrections are important. These effects are problematic since we know them to be important, but we cannot account for them.
In this light, we produce cuts of the data, by filtering data points which we know are free of the above and other problems.
validphys 2, the cuts are handled by the
validphys.filters alongside filter
definitions and defaults found within
Cuts as declarative filters
Due to the nature of data cuts, it is important to be transparent about which cuts
are being applied to which dataset and/or process. Moreover, it is useful for the
rules defining the data cut to be readable such that a non-developmental user can
read and understand the nature of the rule by making these rules functions of
kinematic variables such as
In much the same vein, it is useful for any default values used in the rules to
be readily accessible. For example, suppose there is a minimum value for the square
transferred momenta in the DIS process
q2min, that is used widely by many different
rules. It is important for this variable to be in an obvious and easily accessed
There are certain values which are commonly used by many rules. For example,
q2min usually takes the value
w2min is usually set to
It is thus useful to define these default values somewhere. These values can be found
validphys.cuts inside the
defaults.yaml file. One can overwrite these values
and this is discussed later.
validphys 2 the default filter rules used can be found in the
module within the
filter.yaml file. This file is read by
validphys and is interpreted
By default, these filters can have several entries:
dataset: The dataset this rule applies
process_type: The process type this rule applies to
Pythoncode defining the rule for this filter
reason: (optional) The reason this rule was needed
local_variables: (optional) Any additional, non-standard local variables the user wishes to add for this rule only.
At least one of
process_type is required.
rule entry is always required.
rule entry in the rule definition is
Python code. If the rule
does not apply to this particular datapoint (say the dataset names don’t match) then
None indicating this rule had nothing to do with this particular datapoint.
In this case, we move on to the next rule. However, if the process type or dataset defined
in the rule match that of the datapoint, we evaluate the rule. If the rule evaluates to
False we discard the point, if instead it returns
True we move on to the next rule.
If by the time all the rules have been evaluated and we have yet to return
the datapoint passes and it is kept.
In addition, the user can add any theory parameter they wish. For example, one could
PTO: NNLO which means to evaluate the rule only if the theory is NNLO. These are
discussed further here. One can see a full list of possible theory parameters using:
vp-checktheory <theory id>
rule entry should be interpreted as a
str type within
Python. As such
a rule such as
rule: True is not valid since this is read in as a boolean,
rule: "True" is perfectly valid notation. Moreover, the string
itself should be valid
By default the user can use the following non-builtin mathematical functions in
fabs (floating point absolute value). In addition,
one can use any
numpy function using
np.<function> in their rule definition.
rule: "np.exp(x) > 0.1"
The kinematic variables that can be used within the rule depends on the process type. A full list of available parameters can be found by running:
In : from NNPDF import CommonData In : print(dict(CommonData.kinLabel))
The user may additionally define their own variables by adding the
field to their rule. For example, I can use
w2 in my rule, so long as I define what
I mean by
local_variables: w2: Q2 * (1 - x) / x
local_variables is non-commutative. The order of definition is important.
If a local variable depends on other local variables, then the user must ensure all other
dependencies have already been defined.
The following would raise an error
local_variables: w: sqrt(w2) w2: Q2 * (1 - x) / x
The following would not
local_variables: w2: Q2 * (1 - x) / x w: sqrt(w2)
local_variables have a local scope. They apply to only the rule within which
they are defined.
Theory parameters and perturbative orders
There are particular situations in which we only want to evaluate a rule if the theory input for the PDF matches certain conditions. For example, it may be the case we only keep the datapoint provided the theory includes intrinsic charm or is evaluated at NNLO.
Suppose for example I wish the rule to only be evaluated if the theory includes intrinsic
charm. We note in the
theory.get_description(), the relevant entry is
'IC': 1 (we use
here theory 53 for demonstration purposes). Thus if I want my rule to be applied only if
the theory has intrinsic charm, I simply add to my rule:
Similarly I can condition on flavour number scheme. I again check
and note that the relevant
'FNS'. Thus to only evaluate my rule if the FNS is
FONLL-C, simply add:
Similarly, one can add any such theory description
key into their rule.
Sometimes, we may want to evaluate a rule provided the perturbative order is within
a certain range. For example, we may want a rule to be evaluated if the perturbative
order is strictly less than NLO. This can be done by using directives succeeding the
In the above example, one would thus simply use:
The following are a list of possible directives which can succeed a
+Evaluate this rule if the theory
PTOis greater than or equal to the preceeding PTO
-Evaluate this rule if the theory
PTOis strictly less than the preceeding PTO
!Evaluate this rule if the theory
PTOis not equal to the preceeding PTO
PTO: NNLO! PTO: N3LO- PTO: LO+
If the user doesn’t specify a directive then that implies the rule will only be evaluated if
PTO matches exactly with the
PTO of the theory.
Overwriting filters and default values
One can overwrite the default behaviour by adding to the fit runcard.
Custom rules can be added by adding a
filter_rules: namespace in the fit runcard. This should
be a list of rules in the format outlined above. For example:
filter_rules: - dataset: NMC rule: x > 0.2
filter_rules section to the runcard overwrites the default behaviour and does
not append to the default behaviour. This is done intentionally since a rule cannot be
overwritten by another rule. By adding the above code snippet, this would be the only rule used by
vp-setupfit. As such a bit of copy and pasting may be necessary if one wishes to append a rule.
Similarly the defaults can be overwritten by adding a
filter_defaults namespace to the runcard. For example:
filter_defaults: q2min: 5 w2min: 10
As in the case of the rules, this overwrites the original defaults and does not append to them.
To ensure backwards compatibility with old style runcards, if
w2min are defined
datacuts namespace within the runcard, these values are read in and override the default
values. However, if this overriding occurs, a warning is displayed in standard output.
Consider the following filter from the
- dataset: ATLASZPT7TEV reason: Avoid the region where resummation effects become important. rule: "p_T2 >= 30**2"
this rule applies only to the
ATLASZPT7TEV dataset and keeps all datapoints with a
transverse momentum greater than or equal to 30 MeV. The reason for the conception
of this rule is also provided and we see that it is due to the fact that datapoints
with smaller transverse momentum will be affected by resummation effects.
Now consider the slightly more complicated example:
- dataset: CMSDY2D12 reason: Remove data points for which electroweak corrections are large. PTO: NNLO- local_variables: M: sqrt(M2) min_M: 30.0 max_rapidity: 2.2 rule: M >= min_M and etay <= max_rapidity
This rule only applies to
CMSDY2D12. I wish for the
rule to only be evaluated
theory perturbative order is strictly less than NNLO (i.e LO or NLO).
I check what the process type of
In : from validphys.loader import Loader In : l = Loader() In : cd = l.check_commondata("CMSDY2D12") In : cd.process_type Out: 'EWK_RAP'
Then cross check this against
NNPDF.CommonData.kinLabels to see that the relevant
kinematic variables are:
'EWK_RAP': ('etay', 'M2', 'sqrts'),
I choose to define custom
local_variables in the form of
M which is the square
root of the invariant mass squared, i.e. just the invariant mass. Moreover, I define a
value for minimum
M and maximum rapidity which I use in my
rule as cutoff values.
rule itself is then self-explanatory, notice however, it is written in valid
Python syntax. Finally, the reason for the rule is given which is to cut datapoints
which are affected by electroweak corrections.
As a final example consider the following rule:
- process_type: DIS_NCP_CH reason: | Missing higher order corrections to Delta F_IC, the piece that needs to be added to the FONLL-C calculation in the case of fitted charm. FNS: FONLL-C IC: True rule: "Q2 > 8"
Instead of this rule applying to one particular dataset, we see it is applicable to all
datasets that have process type
DIS_NCP_CH. The reason for the rule is rather involved
yaml’s multiline string syntax is used.
Finally, the user wishes for the
rule to be evaluated only if the theory input has
the FONNL-C flavour number scheme and if the theory uses intrinsic charm. The rule itself