Specifying data cuts
--------------------

The experimental ``CommonData`` files contain more data points than we
actually fit. Some data points are excluded for reasons such as the
instability of the perturbative expansion in their corresponding
kinematic regions.

There are four possibilities for handling the experimental cuts
within validphys, which are controlled with the ``use_cuts``
configuration setting:

``use_cuts: 'nocuts'``
  * This causes the content of the data files to be taken unmodified.
  Note that some theory predictions may be ill defined in this
  situation.

``use_cuts: 'fromfit'``
  * The cuts are read from the masks given as input to [``n3fit``](../n3fit/index.html), and
  generated by [``vp-setupfit``](scripts.html). An existing fit is required, to load the
  cuts, and must contain the masks for all the datasets analyzed in
  the active namespace.

``use_cuts: 'internal'``
  * Compute the cut masks as ``vp-setupfit`` would do. Currently the
  parameters ``q2min`` and ``w2min`` must be given. These can in turn be
  set to the same as the fit values by loading the ``datacuts``
  namespace from the fit. In this case, the cuts will normally
  coincide with the ones loaded with  the ``fromfit`` setting.

``use_cuts: 'fromintersection'``
  * Compute the internal cuts as per ``use_cuts: 'internal'``
  within each namespace in a [namespace list](#multiple-inputs-and-namespaces) called
  ``cuts_intersection_spec`` and take the intersection of the results as
  the cuts for the given dataset. This is useful for example for
  requiring the common subset of points that pass the cuts at NLO and
  NNLO.

``use_cuts: 'fromsimilarpredictions'``
  * Compute the intersection between two namespaces (similar to for
  ``fromintersection``) but additionally require that the predictions computed for
  each dataset across the namespaces are *similar*, specifically that the ratio
  between the absolute difference in the predictions and the total experimental
  uncertainty is smaller than a given value, ``cut_similarity_threshold`` that
  must be provided. Note that for this to work with different C-factors across
  the namespaces, one must provide a different ``dataset_inputs`` list for each.
  * This mechanism can be ignored selectively for specific datasets. To do
  that, add their names to a list called ``do_not_require_similarity_for``. The
  datasets in the list do not need to appear in the ``cuts_intersection_spec``
  namespace and will be filtered according to the internal cuts unconditionally.


The following example demonstrates the first three options:

```yaml
meta:
    title: Test the various options for CutsPolicy
    author: Zahari Kassabov
    keywords: [test, debug]

fit: NNPDF40_nlo_as_01180

theory:
    from_: fit

theoryid:
    from_: theory

#Load q2min and w2min from the fit
datacuts:
    from_: fit


# Used for intersection cuts
cuts_intersection_spec:
    - theoryid: 208
    - theoryid: 162

dataset_input: {dataset: ATLASDY2D8TEV}

dataspecs:
  - speclabel: "No cuts"
    use_cuts: "nocuts"

  - speclabel: "Fit cuts"
    use_cuts: "fromfit"

  - speclabel: "Internal cuts"
    use_cuts: "internal"

  - speclabel: "Intersected cuts"
    use_cuts: "fromintersection"

template_text: |
    {@with fitpdf::datacuts@}
    # Plot

    {@fitpdf::datacuts plot_fancy_dataspecs@}

    # χ² plots

    {@with dataspecs@}
    ## {@speclabel@}

    {@plot_chi2dist@}

    {@endwith@}
    {@endwith@}


actions_:
    - report(main=True)
```

Here we put together the results with the different filtering policies
in a [data-theory comparison](data-theory-comp) plot and then plot the χ² distribution
for each one individually.  With these settings the latter three
[dataspecs](#general-data-specification-the-dataspec-api) give the
same result.

The following example demonstrates the use of `fromsimilarpredictions`:

```yaml
meta:
    title: "Test similarity cuts: Threshold 1,2"
    author: Zahari Kassabov
    keywords: [test]

show_total: True

NNLODatasts: &NNLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}       # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}      # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy}            # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy}         # N

NLODatasts: &NLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}       # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}      # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy}            # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, frac: 0.75, variant: legacy}      # N

do_not_require_similarity_for: [ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM]


dataset_inputs: *NLODatasts

cuts_intersection_spec:
    - theoryid: 208
      pdf: NNPDF40_nlo_as_01180
      dataset_inputs: *NLODatasts

    - theoryid: 200
      pdf: NNPDF40_nnlo_as_01180
      dataset_inputs: *NNLODatasts


theoryid: 208
pdf: NNPDF40_nlo_as_01180

dataspecs:

    - use_cuts: internal
      speclabel: "No cuts"


    - cut_similarity_threshold: 2
      speclabel: "Threshold 2"
      use_cuts: fromsimilarpredictions


    - cut_similarity_threshold: 1
      speclabel: "Threshold 1"
      use_cuts: fromsimilarpredictions

template_text: |
    {@dataspecs_chi2_table@}

actions_:
    - report(main=True)
```