How to add a new metadata group
In Data specification it is described how a user can define a custom
grouping at the level of the runcard, by specifying custom_group
in each
dataset_input
and then specifying metadata_group=custom_group
. This is
great for testing, but what if you define a grouping that you want to reuse and
you want others to be able to use it easily as well?
The answer is to add that grouping to the metadata of the datasets, then once the code is reinstalled the custom grouping will always be available.
Step 1: Choose a name for your grouping
The hardest part of any science research is to choose a name which is both
descriptive but also reflects your wit and intelligence. Failing the latter
try to choose a name which is suitably unique and possibly allows for newer
iterations in the future. An example of this would be the name given to the
grouping which groups data by the process type, according to the
theory uncertainties paper:
nnpdf31_process
.
The suffix of ‘process’ indicates what the grouping is related to and the prefix
indicates that our hubris hasn’t prevented us from permitting the possibility
that we can come up with a more
refined version of this grouping in the future, which can have its own prefix.
Remember that each group within the grouping will also need its own name, so try not to exhaust yourself too much before coming up with the group names as well.
The metadata file is your canvas but with great power comes great responsibility.
Step 2: Add the grouping and group name to the metadata (PLOTTING) file
Note
Throughout this section the PLOTTING file may be referred to as the metadata.
Once you have a name for your grouping and a name for each of your groups you
can add these as key-value pairs to the PLOTTING files of each dataset for
which you want to be able to apply this grouping. The PLOTTING files are
found in the Git repository in nnpdfcpp/data/commondata
and follow the naming
convention PLOTTING_<DATASET NAME>.yaml
.
It’s a good idea to add this to the PLOTTING file of all datasets if possible, or else anybody who tries to use your grouping in the future is sure to get very frustrated.
Say I called my grouping nnpdf40_process
and the group which a particular
dataset belongs to is DIS NC
then I would add the following to the PLOTTING file
nnpdf40_process: DIS NC
At this stage it’s worth pointing out that it’s best that both grouping and group names are naturally interpretable as a string.
Step 3: Make sure the grouping is parsed from the metadata
A perhaps surprising step in this tutorial is that you must tell the part of the code which parses the metadata about your new key or else it will get very upset.
This involves adding an attribute related to the new key to
validphys.plotoptions.core.PlottingOptions
.
There are pre-existing examples that you can use as templates.
For the example used above I would have to add:
nnpdf40_process: str
which tells the code to look for an nnpdf40_process
key within the metadata file and
to attempt to parse it as a string. We do not attribute a default value to this new key,
which implies that it must be provided within the metadata file.
In addition to this, you must add the new grouping to
validphys.plotoptions.core.PlotInfo
as a keyword arguments of
the __init__
function and subsequently as an attribute of the class
as follows:
class PlotInfo:
def __init__(
self,
kinlabels,
dataset_label,
*,
...
nnpdf40_process,
...
self.nnpdf40_process = nnpdf40_process
The keyword argument must be placed after the asterix as per standard python
syntax.
Note
It is possible to give a default value by setting a default in the
signature of the function. If you do not set a default then every single dataset
must have that key in its metadata. You may observe that experiment
and nnpdf31_process
are required keys. Any dataset which does not
feature these keys in its metadata can be considered broken or not fully
implemented.
Step 4: Recompile, reinstall and profit
Now everything is in place, you just need to recompile and reinstall the code
which will put the updated metadata files in your environment. Following the
example used throughout I can now specify
metadata_group: nnpdf40_process
and any action which leverages the
metadata grouping mechanism will now group datasets by the new key.