Skip to content

Commit

Permalink
Handle IPSL-CM6 (#1153)
Browse files Browse the repository at this point in the history
* Add basic support for variable mappings

* Add first era5 mapping

* Find files for CMIP6 DCPP startdates (#771)

* First attempte

* Do not require start and end years, add them later

* Correct condition

* Avoid key error in fx variables

* Consider two possible paths

* Fix function name

* Fix variable name

* Avoid duplicates in filename

* Add test for startdate expansion

* Add test for the replace tags method

* Rename tag

* Add documentation

* Allow to load subexps per timerange or as a whole

* Fix condition

* Remove 'all_years' functionality

* Fix conditions

* Fix flake

* Remove whitespace

Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>

* Skip regridding if the target grid is almost identical to the source grid (#507)

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>

* Fixes for sos and siconc of BCC models (#1090)

* sos and siconc fixed

* tests added

* test fixed

* fix flake8

* fix flake8

* fix codacy issue

* Pin cf-units and fix tests (cf-units>=2.1.5) (#1140)

* pin cf-units

* pin cf-units

* fix test

* fix test

* Handle IPSL-CM6 (the feature won't actually work without #1124)

* class Huss inherits from cass Tas. Also : Fix codacy diags.

* Replace os.system() by subprocess.run()

* Fix flake8 diags

* var_mapping -> extra_facets

* Rename _config/variable_details to _config/extra_facets

* Fix doc re. lack of 'output_file as a dict', and choice of native6

* Fix codacy diags in ipsl_cm6.py

* Use project IPSLCM to handle IPSL-CM6

* Implement changes according to Bouwe's review, 2021/06/07 (except unit tests)

* Add unit tests for _fixes/ipslcm/ipsl_cm6.py

* delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py

* Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml

* Restore main versions for _recipe.py and cmor_fixes/fix.py

* Restore main version for _recipe.py

* Delete extraneous era5-mappings.yml

* Avoid using mapping_key when calling fix.get_cube_from_list()

* Empty change in fix.py for forcing codacy to re-scan it

* Polish doc

* Polish doc again

* Again...

* and again ...

* Fix typo in comment

* Fixes according to @zklaus review

* Reduce formatting changes

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Minor formatting improvements

* Organize mapping file in each realm in two sections (CMIP6 and IPSL)

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Co-authored-by: sloosvel <45196700+sloosvel@users.noreply.github.com>
Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
Co-authored-by: Benjamin Müller <b.mueller@iggf.geo.uni-muenchen.de>
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>
Co-authored-by: Rémi Kazeroni <70641264+remi-kazeroni@users.noreply.github.com>
Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
  • Loading branch information
9 people authored Jun 11, 2021
1 parent 6c63c1c commit 68d9ef8
Show file tree
Hide file tree
Showing 13 changed files with 598 additions and 55 deletions.
80 changes: 62 additions & 18 deletions doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,38 @@
.. _fixing_data:

***********
Dataset fix
Fixing data
***********

Some (model) datasets contain (known) errors that would normally prevent them
from being processed correctly by the ESMValCore. The errors can be in
The baseline case for ESMValCore input data is CMOR fully compliant
data that is read using Iris' :func:`iris:iris.load_raw`.
ESMValCore also allows for some departures from compliance (see
:ref:`cmor_check_strictness`). Beyond that situation, some datasets
(either model or observations) contain (known) errors that would
normally prevent them from being processed. The issues can be in
the metadata describing the dataset and/or in the actual data.
Typical examples of such errors are missing or wrong attributes (e.g.
attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
coordinate bounds like ''lat_bnds'') or problems with the actual data
(e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).

The ESMValCore can apply on the fly fixes to datasets that have
known errors that can be fixed automatically.
As an extreme case, some data sources simply are not NetCDF
files and must go through some other data load function.

The ESMValCore can apply on the fly fixes to such datasets when
issues can be fixed automatically. This is implemented for a set
of `Natively supported non-CMIP datasets`_. The following provides
details on how to design such fixes.

.. note::
**CMORization as a fix**.
Support for many observational and reanalysis datasets is implemented through
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
However, it is also possible to add support for a dataset that is not part of
a CMIP data request by implementing fixes for it.
This is particularly useful for large datasets, where keeping a copy of both
the original and CMORized dataset is not feasible.
See `Natively supported non-CMIP datasets`_ for a list of currently supported
datasets.

**CMORizer scripts**. Support for many observational and reanalysis
datasets is also possible through a priori reformatting by
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
which are rather relevant for datasets of small volume

.. _fix_structure:

Fix structure
=============
Expand Down Expand Up @@ -326,7 +332,11 @@ strictness to the highest:
Natively supported non-CMIP datasets
====================================
Fixed datasets are supported through the ``native6`` project.
Some fixed datasets and native models formats are supported through
the ``native6`` project or through a dedicated project.
Observational Datasets
----------------------
Put the files containing the data in the directory that you have configured
for the ``native6`` project in your :ref:`user configuration file`, in a
subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
Expand All @@ -335,13 +345,13 @@ definition in the :ref:`recipe <recipe_overview>`.
Below is a list of datasets currently supported.
ERA5
----
~~~~
- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
- Tier: 3
MSWEP
-----
~~~~~
- Supported variables: ``pr``
- Supported frequencies: ``mon``, ``day``, ``3hr``.
Expand All @@ -354,6 +364,39 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio
For more info: http://www.gloh2o.org/
.. _fixing_native_models:
Native models
-------------
The following models are natively supported through the procedure described
above (:ref:`fix_structure`) and at :ref:`configure_native_models`:
IPSL-CM6
~~~~~~~~
Both output formats (i.e. the ``Output`` and the ``Analyse / Time series``
formats) are supported, and should be configured in recipes as e.g.:
.. code-block:: yaml
datasets:
- {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
root: /thredds/tgcc/store}
- {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
root: /thredds/tgcc/store}
.. _ipslcm_extra_facets_example:
The ``Output`` format is an example of a case where variables are grouped in
multi-variable files, which name cannot be computed directly from datasets
attributes alone but requires to use an extra_facets file, which principles are
explained in :ref:`extra_facets`, and which content is :download:`available here
</../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`. These multi-variable
files must also undergo some data selection.
.. _extra-facets-fixes:
Use of extra facets in fixes
Expand All @@ -370,4 +413,5 @@ variable to the rest of the processing chain.
Normally, the applicable standard for variables is CMIP6.
For more details, refer to existing uses of this feature as examples.
For more details, refer to existing uses of this feature as examples,
as e.g. :ref:`for IPSL-CM6<ipslcm_extra_facets_example>`.
2 changes: 1 addition & 1 deletion doc/develop/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ features.
:maxdepth: 1

Preprocessor function <preprocessor_function>
Dataset fix <fixing_data>
Fixing data <fixing_data>
Deriving a variable <derivation>
46 changes: 45 additions & 1 deletion doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,9 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
<https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
This configuration file describes the file system structure and CMOR tables for several
key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
ETHZ, SMHI, BSC), and for native output data for some
models (IPSL, ... see :ref:`configure_native_models`).
CMIP data is stored as part of the Earth System Grid
Federation (ESGF) and the standards for file naming and paths to files are set
out by CMOR and DRS. For a detailed description of these standards and their
adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
Expand Down Expand Up @@ -289,6 +291,48 @@ related to CMOR table settings available:
to get the name of the file containing the ``mip`` table.
Defaults to the value provided in ``cmor_type``.

.. _configure_native_models:

Configuring native models and observation data sets
----------------------------------------------------

ESMValCore can be configured for handling native model output formats
and specific
observation data sets without preliminary reformatting. You can choose
to host this new data source either under a dedicated project or under
project ``native6``; when choosing the latter, such a configuration
involves the following steps:

- allowing for ESMValTool to locate the data files:

- entry ``native6`` of ``config-developer.yml`` should be
complemented with sub-entries for ``input_dir`` and ``input_file``
that goes under a new key representing the
data organization (such as ``MY_DATA_ORG``), and these sub-entries can
use an arbitrary list of ``{placeholders}``. Example :

.. code-block:: yaml
native6:
...
input_dir:
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}'
input_file:
default: '*.nc'
MY_DATA_ORG: '{simulation}_*.nc'
...
- if necessary, provide a so-called ``extra facets file`` which
allows to cope e.g. with variable naming issues for finding
files. See :ref:`extra_facets` and :download:`this example of
such a file for IPSL-CM6
<../../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`.

- ensuring that ESMValCore get the right metadata and data out of
your data files: this is described in :ref:`fixing_data`


.. _config-ref:

References configuration file
Expand Down
75 changes: 43 additions & 32 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _findingdata:

************
Finding data
Input data
************

Overview
Expand All @@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
the input the user needs to specify, giving examples on how to use the data
finding routine under different scenarios.

Data types
==========

.. _CMOR-DRS:

CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
=========================================================
CMIP data
---------
CMIP data is widely available via the Earth System Grid Federation
(`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
via download from the ESGF portal or through the ESGF data nodes hosted
Expand All @@ -45,6 +48,40 @@ From the ESMValTool user perspective the number of data input parameters is
optimized to allow for ease of use. We detail this procedure in the next
section.

Native model data
-----------------
Support for native model data that is not formatted according to a CMIP
data request is quite easy using basic
:ref:`ESMValCore fix procedure <fixing_data>` and has been implemented
for some models :ref:`as described here <fixing_native_models>`

Observational data
------------------
Part of observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml
OBS: /gws/nopw/j04/esmeval/obsdata-v2
and the dataset:

.. code-block:: yaml
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

.. _data-retrieval:

Data retrieval
Expand Down Expand Up @@ -186,8 +223,8 @@ datasets are listed in any recipe, under either the ``datasets`` and/or
.. code-block:: yaml
datasets:
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
``_data_finder`` will use this information to find data for **all** the variables specified in ``diagnostics/variables``.

Expand All @@ -208,7 +245,7 @@ and the dataset you need is specified in your ``recipe.yml`` as:

.. code-block:: yaml
- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
for a variable, e.g.:

Expand Down Expand Up @@ -244,32 +281,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:
.. _observations:

Observational data
==================
Observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml
OBS: /gws/nopw/j04/esmeval/obsdata-v2
and the dataset:

.. code-block:: yaml
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

Data loading
============
Expand Down
2 changes: 1 addition & 1 deletion doc/quickstart/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Getting started

Installation <install>
Configuration <configure>
Finding data <find_data>
Input data <find_data>
Installed recipes <recipes>
Running <run>
Output <output>
Loading

0 comments on commit 68d9ef8

Please sign in to comment.