Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for variable mappings #1124

Merged
merged 50 commits into from
Jun 9, 2021
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
154ab4a
Add basic support for variable mappings
May 12, 2021
363e75c
Move get_variable_mappings to _config
May 14, 2021
889173e
Add handling of mip and short_name to get_variable_mappings
May 14, 2021
204bd13
Move to new directory layout with importlib_resources
May 16, 2021
1b5fbd1
Introduce deep_update functionality
May 16, 2021
4cbb17b
Fix dataset handling
May 16, 2021
90ba75f
Use lowercase for project in filename
May 16, 2021
0cf7ad7
Allow for empty var_mapping to support existing fixes
May 16, 2021
3d5cbd8
Return empty dict instead of None to signal "no mappings"
May 17, 2021
97b0243
Change conditional import to work around mypy bug python/mypy#1153
May 17, 2021
3df52f5
Add importlib_resources to doc requirements
May 17, 2021
160e359
Improve code quality
May 17, 2021
69d7301
Add user config directory
May 17, 2021
c68aaf2
Move project variable mappings handling out of Recipe class
May 17, 2021
037422e
Add rudimentary docstring
May 17, 2021
b842a0b
Use variable details instead of variable mappings for better terminology
May 17, 2021
66a4480
Address renaming and logging suggestions
Jun 1, 2021
449daa6
Pass extra_facets through recipe to allow for easy customization
Jun 1, 2021
801c97b
Pre-commit changes
Jun 1, 2021
0624c7d
Pass extra_facets also to fx variables
Jun 1, 2021
3ff7376
Pre-commit changes
Jun 1, 2021
19bd7fb
Rename for consistency
Jun 3, 2021
857b726
Pre-commit changes
Jun 3, 2021
8037ec6
Add extra_facets_dir option to config_user.yml
Jun 3, 2021
b26362c
Add validator for new config option to experimental interface
Jun 3, 2021
2016c8b
Add mapping_key to get_cube_from_list for fixes
Jun 3, 2021
f8db25e
Simplify generation of tuple validator
Jun 4, 2021
5cf38c3
Pass entire variable dict to fix and add_fx_variables instead of only…
Jun 4, 2021
1bb84e6
Don't check for exact argument match if the preprocessor takes *args …
Jun 4, 2021
7f60151
Fix recipe tests to check agains new, more comprehensive dicts
Jun 4, 2021
384d249
Remove extra_facets_dir from example config-user.yml file
Jun 4, 2021
6dae6ad
Add basic documentation
Jun 4, 2021
6a66072
Complete documentation
Jun 4, 2021
783298f
Fix test failing because of coverage upload
bouweandela Jun 7, 2021
6dd52f8
Remove dubious caching
Jun 8, 2021
0a73a18
Add docstrings
Jun 8, 2021
ccf622e
Minor improvements
Jun 8, 2021
686c723
Add basic test for _deep_update
Jun 8, 2021
51d7f76
Add basic tests for _load_extra_facets
Jun 8, 2021
d8a42e3
Simplify handling of fx vars
Jun 8, 2021
75adca1
Fix mypy issues
Jun 9, 2021
004cafd
Remove mapping_key
Jun 9, 2021
f888f8d
Fix fx preprocessor test
Jun 9, 2021
79915cd
Improve formatting
Jun 9, 2021
731794e
Moving extra facet documentation to better places
Jun 9, 2021
3d362d0
Handle extra_facets as dictionary instead of kwargs where possible
Jun 9, 2021
99d5cfa
Add empty defaults to extra_facets to keep tests working
Jun 9, 2021
78e6e9e
Use better default for extra_facets in method signatures
Jun 9, 2021
9e24515
Update documentation with backlinks to main description
Jun 9, 2021
85918f2
Merge branch 'main' into variable-mappings
Jun 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
- coverage-reporter/send_report:
coverage-reports: 'test-reports/coverage.xml'
project-token: $CODACY_PROJECT_TOKEN
skip: true # skip if project-token is not defined (i.e. on a fork)

install:
# Test installation
Expand Down
16 changes: 16 additions & 0 deletions doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -353,3 +353,19 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio
For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``

For more info: http://www.gloh2o.org/

.. _extra-facets-fixes:

Use of extra facets in fixes
============================
In fixes, extra facets can be used to mold data into the form required by the
applicable standard. For example, if the input data is part of an observational
product that delivers surface temperature with a variable name of `t2m` inside a
file named `2m_temperature_1950_monthly.nc`, but the same variable is called
`tas` in the applicable standard, a fix can be created that reads the original
variable from the correct file, and provides a renamed variable to the rest of
the processing chain.

Normally, the applicable standard for variables is CMIP6.

For more details, refer to existing uses of this feature as examples.
70 changes: 70 additions & 0 deletions doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -320,3 +320,73 @@ following documentation section:

These four items here are named people, references and projects listed in the
``config-references.yml`` file.

.. _extra_facets:

Extra Facets
============

Sometimes it is useful to provide extra information for the loading of data,
particularly in the case of native model data, or observational or other data,
that generally follows the established standards, but is not part of the big
supported projects like CMIP, CORDEX, obs4MIPs.

To support this, we provide the extra facets facilities. Facets are the
key-value pairs described in :ref:`Datasets`. Extra facets allows for the
addition of more details per project, dataset, mip table, and variable name.

More precisely, one can provide this information in an extra yaml file, named
`{project}-something.yml`, where `{project}` corresponds to the project as used
by ESMValTool in :ref:`Datasets` and "something" is arbitrary.

Format of the extra facets files
--------------------------------
The extra facets are given in a yaml file, whose file name identifies the
project. Inside the file there is a hierarchy of nested dictionaries with the
following levels. At the top there is the `dataset` facet, followed by the `mip`
table, and finally the `short_name`. The leaf dictionary placed here gives the
extra facets that will be made available to data finder and the fix
infrastructure. The following example illustrates the concept.

.. _extra-facets-example-1:

.. code-block:: yaml
:caption: Extra facet example file `native6-era5.yml`

ERA5:
Amon:
tas: {source_var_name: "t2m", cds_var_name: "2m_temperature"}


Location of the extra facets files
----------------------------------
Extra facets files can be placed in several different places. When we use them
to support a particular use-case within the ESMValTool project, they will be
provided in the sub-folder `extra_facets` inside the package
`esmvalcore._config`. If they are used from the user side, they can be either
placed in `~/.esmvaltool/extra_facets` or in any other directory of the users
choosing. In that case this directory must be added to the `config-user.yml`
file under the `extra_facets_dir` setting, which can take a single directory or
a list of directories.

The order in which the directories are searched is

1. The internal directory `esmvalcore._config/extra_facets`
2. The default user directory `~/.esmvaltool/extra_facets`
3. The custom user directories in the order in which they are given in
`config-user.yml`.

The extra facets files within each of these directories are processed in
lexicographical order according to their file name.

In all cases it is allowed to supersede information from earlier files in later
files. This makes it possible for the user to effectively override even internal
default facets, for example to deal with local particularities in the data
handling.

Use of extra facets
-------------------
For extra facets to be useful, the information that they provide must be
applied. There are fundamentally two places where this comes into play. One is
:ref:`the datafinder<extra-facets-data-finder>`, the other are
:ref:`fixes<extra-facets-fixes>`.
30 changes: 30 additions & 0 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -303,3 +303,33 @@ flexible concatenation between two cubes, depending on the particular setup:
Note that two cube concatenation is the base operation of an iterative process of reducing multiple cubes
from multiple data segments via cube concatenation ie if there is no time-overlapping data, the
cubes concatenation is performed in one step.

.. _extra-facets-data-finder:

Use of extra facets in the datafinder
=====================================
Extra facets can be used to locate data files within the datafinder
framework. This is useful to build paths for directory structures and file names
that follow a different system than the established DRS for, e.g. CMIP.
A common application is the location of variables in multi-variable files as
often found in climate models' native output formats.

Another use case is files that use different names for variables in their
file name than for the netCDF4 variable name.

To apply the extra facets for this purpose, simply use the corresponding tag in
the applicable DRS inside the `config-developer.yml` file. For example, given
the extra facets in :ref:`extra-facets-example-1`, one might write the
following.

.. _extra-facets-example-2:

.. code-block:: yaml
:caption: Example drs use in `config-developer.yml`

native6:
input_file:
default: '{name_in_filename}*.nc'

The same replacement mechanism can be employed everywhere where tags can be
used, particularly in `input_dir` and `input_file`.
1 change: 1 addition & 0 deletions doc/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
autodocsumm
dask[array]
fiona
importlib_resources
jinja2
netCDF4
numpy
Expand Down
2 changes: 2 additions & 0 deletions esmvalcore/_config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
get_activity,
get_institutes,
get_project_config,
get_extra_facets,
load_config_developer,
read_config_developer_file,
read_config_user_file,
Expand All @@ -14,6 +15,7 @@
'read_config_user_file',
'read_config_developer_file',
'load_config_developer',
'get_extra_facets',
'get_project_config',
'get_institutes',
'get_activity',
Expand Down
50 changes: 50 additions & 0 deletions esmvalcore/_config/_config.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
"""Functions dealing with config-user.yml / config-developer.yml."""
import collections.abc
import datetime
import logging
import os
import sys
import warnings
from functools import lru_cache
from pathlib import Path

import yaml
Expand All @@ -13,6 +16,46 @@

CFG = {}

if sys.version_info[:2] >= (3, 9):
# pylint: disable=no-name-in-module
from importlib.resources import files as importlib_files
else:
from importlib_resources import files as importlib_files


def _deep_update(dictionary, update):
for key, value in update.items():
if isinstance(value, collections.abc.Mapping):
dictionary[key] = _deep_update(dictionary.get(key, {}), value)
else:
dictionary[key] = value
return dictionary


@lru_cache
def _load_extra_facets(project, extra_facets_dir):
config = {}
config_paths = [
importlib_files("esmvalcore._config") / "extra_facets",
Path.home() / ".esmvaltool" / "extra_facets",
]
config_paths.extend([Path(p) for p in extra_facets_dir])
for config_path in config_paths:
config_file_paths = config_path.glob(f"{project.lower()}-*.yml")
for config_file_path in sorted(config_file_paths):
logger.debug("Loading extra facets from %s", config_file_path)
with config_file_path.open() as config_file:
config_piece = yaml.safe_load(config_file)
if config_piece:
_deep_update(config, config_piece)
return config


def get_extra_facets(project, dataset, mip, short_name, extra_facets_dir):
"""Read configuration files with additional variable information."""
project_details = _load_extra_facets(project, extra_facets_dir)
return project_details.get(dataset, {}).get(mip, {}).get(short_name, {})


def read_config_user_file(config_file, folder_name, options=None):
"""Read config user file and store settings in a dictionary."""
Expand Down Expand Up @@ -61,6 +104,7 @@ def read_config_user_file(config_file, folder_name, options=None):
'output_file_type': 'png',
'output_dir': 'esmvaltool_output',
'auxiliary_data_dir': 'auxiliary_data',
'extra_facets_dir': tuple(),
'save_intermediary_cubes': False,
'remove_preproc_dir': True,
'max_parallel_tasks': None,
Expand All @@ -83,6 +127,12 @@ def read_config_user_file(config_file, folder_name, options=None):
cfg['output_dir'] = _normalize_path(cfg['output_dir'])
cfg['auxiliary_data_dir'] = _normalize_path(cfg['auxiliary_data_dir'])

if isinstance(cfg['extra_facets_dir'], str):
cfg['extra_facets_dir'] = (_normalize_path(cfg['extra_facets_dir']), )
else:
cfg['extra_facets_dir'] = tuple(
_normalize_path(p) for p in cfg['extra_facets_dir'])

cfg['config_developer_file'] = _normalize_path(
cfg['config_developer_file'])

Expand Down
Loading