Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle IPSL-CM6 #1153

Merged
merged 45 commits into from
Jun 11, 2021
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
8b8dd93
Add basic support for variable mappings
May 12, 2021
9f2794f
Add first era5 mapping
May 12, 2021
08e9e2d
Find files for CMIP6 DCPP startdates (#771)
sloosvel May 13, 2021
d846877
Skip regridding if the target grid is almost identical to the source …
May 14, 2021
c20e5c1
Fixes for sos and siconc of BCC models (#1090)
remi-kazeroni May 17, 2021
1f788c2
Pin cf-units and fix tests (cf-units>=2.1.5) (#1140)
valeriupredoi May 21, 2021
8fc038f
Handle IPSL-CM6 (the feature won't actually work without #1124)
senesis May 31, 2021
3b2a05f
class Huss inherits from cass Tas. Also : Fix codacy diags.
senesis May 31, 2021
9303c90
Replace os.system() by subprocess.run()
senesis May 31, 2021
8d55ee8
Fix flake8 diags
senesis Jun 1, 2021
eacd764
var_mapping -> extra_facets
senesis Jun 1, 2021
684953e
Rename _config/variable_details to _config/extra_facets
senesis Jun 5, 2021
896764b
Fix doc re. lack of 'output_file as a dict', and choice of native6
senesis Jun 5, 2021
6590586
Fix codacy diags in ipsl_cm6.py
senesis Jun 5, 2021
02c8bff
Use project IPSLCM to handle IPSL-CM6
senesis Jun 7, 2021
421e0c2
Implement changes according to Bouwe's review, 2021/06/07 (except uni…
senesis Jun 7, 2021
e527ffc
Add unit tests for _fixes/ipslcm/ipsl_cm6.py
senesis Jun 7, 2021
07573a5
Merge branch 'Handle_IPSL_CM6' of ssh://github.com/ESMValGroup/ESMVal…
senesis Jun 7, 2021
4ab2edc
delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py
senesis Jun 7, 2021
436b1ff
Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mapp…
senesis Jun 7, 2021
97b1acb
Restore main versions for _recipe.py and cmor_fixes/fix.py
senesis Jun 7, 2021
da8eb60
Restore main version for _recipe.py
senesis Jun 7, 2021
0a6320a
Delete extraneous era5-mappings.yml
senesis Jun 8, 2021
ae72d9a
Avoid using mapping_key when calling fix.get_cube_from_list()
senesis Jun 9, 2021
6bfe36a
Merge remote-tracking branch 'origin/main' into Handle_IPSL_CM6
senesis Jun 9, 2021
b46a591
Empty change in fix.py for forcing codacy to re-scan it
senesis Jun 9, 2021
2779739
Polish doc
senesis Jun 9, 2021
006f8a6
Polish doc again
senesis Jun 9, 2021
45bee88
Again...
senesis Jun 9, 2021
0af764c
and again ...
senesis Jun 9, 2021
3234ee7
Fix typo in comment
senesis Jun 10, 2021
c8040ef
Fixes according to @zklaus review
senesis Jun 10, 2021
89f45b9
Reduce formatting changes
Jun 10, 2021
6b085e5
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
36dc526
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
36a47fe
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
0625016
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
ad3a0e4
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
f6c81b5
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
c785b97
Update doc/develop/fixing_data.rst
senesis Jun 10, 2021
e98cac6
Update doc/quickstart/find_data.rst
senesis Jun 10, 2021
862b362
Update doc/quickstart/find_data.rst
senesis Jun 10, 2021
4112fa9
Minor formatting improvements
Jun 10, 2021
d7a4471
Organize mapping file in each realm in two sections (CMIP6 and IPSL)
senesis Jun 10, 2021
2c2e2f5
Merge branch 'Handle_IPSL_CM6' of ssh://github.com/ESMValGroup/ESMVal…
senesis Jun 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 64 additions & 18 deletions doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,39 @@
.. _fixing_data:

***********
Dataset fix
Fixing data
***********

Some (model) datasets contain (known) errors that would normally prevent them
from being processed correctly by the ESMValCore. The errors can be in
The baseline case for ESMValCore input data is CMOR fully compliant
data that is read using
`Iris load_raw function <https://scitools-iris.readthedocs.io/en/latest/generated/api/iris.html#iris.load_raw>`_.
ESMValCore also allows for some departures with compliance (see
:ref:`cmor_check_strictness`). Beyond that situation, some datasets
(either model or observations) contain (known) errors that would
normally prevent them from being processed. The issues can be in
the metadata describing the dataset and/or in the actual data.
Typical examples of such errors are missing or wrong attributes (e.g.
attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
coordinate bounds like ''lat_bnds'') or problems with the actual data
(e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).

The ESMValCore can apply on the fly fixes to datasets that have
known errors that can be fixed automatically.
As an extreme case, some others data sources simply are not NetCDF
files and must go through other data load function.

The ESMValCore can apply on the fly fixes to such datasets when
issues can be fixed automatically. This is implemented for a set
of `Natively supported non-CMIP datasets`_. The following provide
details on how to design such fixes.

.. note::
**CMORization as a fix**.
Support for many observational and reanalysis datasets is implemented through
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
However, it is also possible to add support for a dataset that is not part of
a CMIP data request by implementing fixes for it.
This is particularly useful for large datasets, where keeping a copy of both
the original and CMORized dataset is not feasible.
See `Natively supported non-CMIP datasets`_ for a list of currently supported
datasets.

**CMORizer scripts**. Support for many observational and reanalysis
datasets is also possible through a priori reformatting by
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
which are rather relevant for datasets of small volume

.. _fix_structure:

Fix structure
=============
Expand Down Expand Up @@ -326,7 +333,11 @@ strictness to the highest:
Natively supported non-CMIP datasets
====================================

Fixed datasets are supported through the ``native6`` project.
Some fixed datasets and native models formats are supported through
the ``native6`` project or through a dedicated project.

ERA5 and MSWEP datasets
-----------------------
Put the files containing the data in the directory that you have configured
for the ``native6`` project in your :ref:`user configuration file`, in a
subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
Expand All @@ -335,13 +346,13 @@ definition in the :ref:`recipe <recipe_overview>`.
Below is a list of datasets currently supported.

ERA5
----
~~~~

- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
- Tier: 3

MSWEP
-----
~~~~~

- Supported variables: ``pr``
- Supported frequencies: ``mon``, ``day``, ``3hr``.
Expand All @@ -354,6 +365,40 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio

For more info: http://www.gloh2o.org/

.. _fixing_native_models:

Native models : IPSL-CM6,...
-----------------------------

The following models are natively supported through the procedure
described above (:ref:`fix_structure`) and at
:ref:`configure_native_models`:

- **IPSL-CM6** : both output formats (i.e. the ``Output`` and the
``Analyse / Time series`` formats) are supported, and should be
configured in recipes as e.g.:

.. code-block:: yaml

datasets:
- {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
root: /thredds/tgcc/store}
- {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
root: /thredds/tgcc/store}

.. _ipslcm_extra_facets_example:

The ``Output`` format is an example of a case where variables are
grouped in multi-variable files, which name cannot be computed
directly from datasets attributes alone but requires to use an
extra_facets file, which principles are explained in
:ref:`extra_facets`, and which content is :download:`available here
</../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`. These
multi-variable files must also undergo some data selection, which
may involve an external process for performance purpose.

.. _extra-facets-fixes:

Use of extra facets in fixes
Expand All @@ -370,4 +415,5 @@ variable to the rest of the processing chain.

Normally, the applicable standard for variables is CMIP6.

For more details, refer to existing uses of this feature as examples.
For more details, refer to existing uses of this feature as examples,
as e.g. :ref:`for IPSL-CM6<ipslcm_extra_facets_example>`.
2 changes: 1 addition & 1 deletion doc/develop/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ features.
:maxdepth: 1

Preprocessor function <preprocessor_function>
Dataset fix <fixing_data>
Fixing data <fixing_data>
Deriving a variable <derivation>
46 changes: 45 additions & 1 deletion doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,9 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
<https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
This configuration file describes the file system structure and CMOR tables for several
key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
ETHZ, SMHI, BSC), and for native output data for some
models (IPSL, ... see :ref:`configure_native_models`).
CMIP data is stored as part of the Earth System Grid
Federation (ESGF) and the standards for file naming and paths to files are set
out by CMOR and DRS. For a detailed description of these standards and their
adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
Expand Down Expand Up @@ -289,6 +291,48 @@ related to CMOR table settings available:
to get the name of the file containing the ``mip`` table.
Defaults to the value provided in ``cmor_type``.

.. _configure_native_models:

Configuring native models and observation data sets
----------------------------------------------------

ESMValCore can be configured for handling native model output formats
and specific
observation data sets without preliminary reformatting. You can choose
to host this new data source either under a dedicated project or under
project ``native6``; when choosing the latter, such a configuration
involves the following steps:

- allowing for ESMValTool to locate the data files:

- entry ``native6`` of ``config-developer.yml`` should be
complemented with sub-entries for ``input_dir`` and ``input_file``
that goes under a new key representing the
data organization (such as ``MY_DATA_ORG``), and these sub-entries can
use an arbitrary list of ``{placeholders}``. Example :

.. code-block:: yaml

native6:
...
input_dir:
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}'
input_file:
default: '*.nc'
MY_DATA_ORG: '{simulation}_*.nc'
...

- if necessary, provide a so-called ``extra facets file`` which
allows to cope e.g. with variable naming issues for finding
files. See :ref:`extra_facets` and :download:`this example of
such a file for IPSL-CM6
<../../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`.

- ensuring that ESMValCore get the right metadata and data out of
your data files: this is described in :ref:`fixing_data`


.. _config-ref:

References configuration file
Expand Down
75 changes: 43 additions & 32 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _findingdata:

************
Finding data
Input data
************

Overview
Expand All @@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
the input the user needs to specify, giving examples on how to use the data
finding routine under different scenarios.

Data types
==========

.. _CMOR-DRS:

CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
=========================================================
CMIP data
---------
CMIP data is widely available via the Earth System Grid Federation
(`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
via download from the ESGF portal or through the ESGF data nodes hosted
Expand All @@ -45,6 +48,40 @@ From the ESMValTool user perspective the number of data input parameters is
optimized to allow for ease of use. We detail this procedure in the next
section.

Native model data
---------------------------------------------------------
Support for native model data that is not formatted according to a CMIP
data request is quite easy using basic
:ref:`ESMValCore fix procedure <fixing_data>` and has been implemented
for some models :ref:`as described here <fixing_native_models>`

Observational data
---------------------------------------------------------
Part of observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml

OBS: /gws/nopw/j04/esmeval/obsdata-v2

and the dataset:

.. code-block:: yaml

- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}

in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::

/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc

Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

.. _data-retrieval:

Data retrieval
Expand Down Expand Up @@ -186,8 +223,8 @@ datasets are listed in any recipe, under either the ``datasets`` and/or
.. code-block:: yaml

datasets:
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}

``_data_finder`` will use this information to find data for **all** the variables specified in ``diagnostics/variables``.

Expand All @@ -208,7 +245,7 @@ and the dataset you need is specified in your ``recipe.yml`` as:

.. code-block:: yaml

- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}

for a variable, e.g.:

Expand Down Expand Up @@ -244,32 +281,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:

.. _observations:

Observational data
==================
Observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml

OBS: /gws/nopw/j04/esmeval/obsdata-v2

and the dataset:

.. code-block:: yaml

- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}

in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::

/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc

Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

Data loading
============
Expand Down
2 changes: 1 addition & 1 deletion doc/quickstart/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Getting started

Installation <install>
Configuration <configure>
Finding data <find_data>
Input data <find_data>
Installed recipes <recipes>
Running <run>
Output <output>
Loading