Specifying data cuts

The experimental CommonData files contain more data points than we actually fit. Some data points are excluded for reasons such as the instability of the perturbative expansion in their corresponding kinematic regions.

There are four possibilities for handling the experimental cuts within validphys, which are controlled with the use_cuts configuration setting:

use_cuts: 'nocuts'

This causes the content of the data files to be taken unmodified. Note that some theory predictions may be ill defined in this situation.

use_cuts: 'fromfit'

The cuts are read from the masks given as input to n3fit, and generated by vp-setupfit. An existing fit is required, to load the cuts, and must contain the masks for all the datasets analyzed in the active namespace.

use_cuts: 'internal'

Compute the cut masks as vp-setupfit would do. Currently the parameters q2min and w2min must be given. These can in turn be set to the same as the fit values by loading the datacuts namespace from the fit. In this case, the cuts will normally coincide with the ones loaded with the fromfit setting.

use_cuts: 'fromintersection'

Compute the internal cuts as per use_cuts: 'internal' within each namespace in a [namespace list](#multiple-inputs-and-namespaces) called cuts_intersection_spec and take the intersection of the results as the cuts for the given dataset. This is useful for example for requiring the common subset of points that pass the cuts at NLO and NNLO.

use_cuts: 'fromsimilarpredictions'

Compute the intersection between two namespaces (similar to for fromintersection) but additionally require that the predictions computed for each dataset across the namespaces are similar, specifically that the ratio between the absolute difference in the predictions and the total experimental uncertainty is smaller than a given value, cut_similarity_threshold that must be provided. Note that for this to work with different C-factors across the namespaces, one must provide a different dataset_inputs list for each.
This mechanism can be ignored selectively for specific datasets. To do that, add their names to a list called do_not_require_similarity_for. The datasets in the list do not need to appear in the cuts_intersection_spec namespace and will be filtered according to the internal cuts unconditionally.

The following example demonstrates the first three options:

meta:
    title: Test the various options for CutsPolicy
    author: Zahari Kassabov
    keywords: [test, debug]

fit: NNPDF40_nlo_as_01180

theory:
    from_: fit

theoryid:
    from_: theory

#Load q2min and w2min from the fit
datacuts:
    from_: fit


# Used for intersection cuts
cuts_intersection_spec:
    - theoryid: 40_000_001
    - theoryid: 40_000_000

dataset_input: {dataset: ATLASDY2D8TEV}

dataspecs:
  - speclabel: "No cuts"
    use_cuts: "nocuts"

  - speclabel: "Fit cuts"
    use_cuts: "fromfit"

  - speclabel: "Internal cuts"
    use_cuts: "internal"

  - speclabel: "Intersected cuts"
    use_cuts: "fromintersection"

template_text: |
    {@with fitpdf::datacuts@}
    # Plot

    {@fitpdf::datacuts plot_fancy_dataspecs@}

    # χ² plots

    {@with dataspecs@}
    ## {@speclabel@}

    {@plot_chi2dist@}

    {@endwith@}
    {@endwith@}


actions_:
    - report(main=True)

Here we put together the results with the different filtering policies in a [data-theory comparison](data-theory-comp) plot and then plot the χ² distribution for each one individually. With these settings the latter three [dataspecs](#general-data-specification-the-dataspec-api) give the same result.

The following example demonstrates the use of fromsimilarpredictions:

meta:
    title: "Test similarity cuts: Threshold 1,2"
    author: Zahari Kassabov
    keywords: [test]

show_total: True

NNLODatasts: &NNLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}       # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}      # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy}            # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy}         # N

NLODatasts: &NLODatasts
- {dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}       # N
- {dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, frac: 1.0, variant: legacy}      # N
- {dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, frac: 1.0, variant: legacy}            # N
- {dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, frac: 1.0, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, frac: 0.75, variant: legacy}         # N
- {dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, frac: 0.75, variant: legacy}      # N

do_not_require_similarity_for: [ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM]


dataset_inputs: *NLODatasts

cuts_intersection_spec:
    - theoryid: 208
      pdf: NNPDF40_nlo_as_01180
      dataset_inputs: *NLODatasts

    - theoryid: 200
      pdf: NNPDF40_nnlo_as_01180
      dataset_inputs: *NNLODatasts


theoryid: 208
pdf: NNPDF40_nlo_as_01180

dataspecs:

    - use_cuts: internal
      speclabel: "No cuts"


    - cut_similarity_threshold: 2
      speclabel: "Threshold 2"
      use_cuts: fromsimilarpredictions


    - cut_similarity_threshold: 1
      speclabel: "Threshold 1"
      use_cuts: fromsimilarpredictions

template_text: |
    {@dataspecs_chi2_table@}

actions_:
    - report(main=True)