# Specifying data cuts

The experimental `CommonData`

files contain more data points than we
actually fit. Some data points are excluded for reasons such as the
instability of the perturbative expansion in their corresponding
kinematic regions.

There are four possibilities for handling the experimental cuts
within validphys, which are controlled with the `use_cuts`

configuration setting:

`use_cuts: 'nocuts'`

This causes the content of the data files to be taken unmodified. Note that some theory predictions may be ill defined in this situation.

`use_cuts: 'fromfit'`

The cuts are read from the masks given as input to

`n3fit`

, and generated by`vp-setupfit`

. An existing fit is required, to load the cuts, and must contain the masks for all the datasets analyzed in the active namespace.

`use_cuts: 'internal'`

Compute the cut masks as

`vp-setupfit`

would do. Currently the parameters`q2min`

and`w2min`

must be given. These can in turn be set to the same as the fit values by loading the`datacuts`

namespace from the fit. In this case, the cuts will normally coincide with the ones loaded with the`fromfit`

setting.

`use_cuts: 'fromintersection'`

Compute the internal cuts as per

`use_cuts: 'internal'`

within each namespace in a namespace list called`cuts_intersection_spec`

and take the intersection of the results as the cuts for the given dataset. This is useful for example for requiring the common subset of points that pass the cuts at NLO and NNLO.

`use_cuts: 'fromsimilarpredictions'`

Compute the intersection between two namespaces (similar to for

`fromintersection`

) but additionally require that the predictions computed for each dataset across the namespaces are*similar*, specifically that the ratio between the absolute difference in the predictions and the total experimental uncertainty is smaller than a given value,`cut_similarity_threshold`

that must be provided. Note that for this to work with different C-factors across the namespaces, one must provide a different`dataset_inputs`

list for each.This mechanism can be ignored selectively for specific datasets. To do that, add their names to a list called

`do_not_require_similarity_for`

. The datasets in the list do not need to appear in the`cuts_intersection_spec`

namespace and will be filtered according to the internal cuts unconditionally.

The following example demonstrates the first three options:

```
meta:
title: Test the various options for CutsPolicy
author: Zahari Kassabov
keywords: [test, debug]
fit: NNPDF40_nlo_as_01180
theory:
from_: fit
theoryid:
from_: theory
#Load q2min and w2min from the fit
datacuts:
from_: fit
# Used for intersection cuts
cuts_intersection_spec:
- theoryid: 208
- theoryid: 162
dataset_input: {dataset: ATLASDY2D8TEV}
dataspecs:
- speclabel: "No cuts"
use_cuts: "nocuts"
- speclabel: "Fit cuts"
use_cuts: "fromfit"
- speclabel: "Internal cuts"
use_cuts: "internal"
- speclabel: "Intersected cuts"
use_cuts: "fromintersection"
template_text: |
{@with fitpdf::datacuts@}
# Plot
{@fitpdf::datacuts plot_fancy_dataspecs@}
# χ² plots
{@with dataspecs@}
## {@speclabel@}
{@plot_chi2dist@}
{@endwith@}
{@endwith@}
actions_:
- report(main=True)
```

Here we put together the results with the different filtering policies in a data-theory comparison plot and then plot the χ² distribution for each one individually. With these settings the latter three dataspecs give the same result.

The following example demonstrates the use of `fromsimilarpredictions`

:

```
meta:
title: "Test similarity cuts: Threshold 1,2"
author: Zahari Kassabov
keywords: [test]
show_total: True
NNLODatasts: &NNLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy} # N
NLODatasts: &NLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy} # N
- {dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, frac: 0.75, variant: legacy} # N
do_not_require_similarity_for: [ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM]
dataset_inputs: *NLODatasts
cuts_intersection_spec:
- theoryid: 208
pdf: NNPDF40_nlo_as_01180
dataset_inputs: *NLODatasts
- theoryid: 200
pdf: NNPDF40_nnlo_as_01180
dataset_inputs: *NNLODatasts
theoryid: 208
pdf: NNPDF40_nlo_as_01180
dataspecs:
- use_cuts: internal
speclabel: "No cuts"
- cut_similarity_threshold: 2
speclabel: "Threshold 2"
use_cuts: fromsimilarpredictions
- cut_similarity_threshold: 1
speclabel: "Threshold 1"
use_cuts: fromsimilarpredictions
template_text: |
{@dataspecs_chi2_table@}
actions_:
- report(main=True)
```