Merge branch 'master' into contribution_overhaul

* master: Rework whatsnew into new scheme. (SciTools#3834) Lazy regridding with Linear, Nearest, and AreaWeighted (SciTools#3701) Iris readme minimal (SciTools#3833)
tkknight · Sep 11, 2020 · e959fdf · e959fdf
2 parents a606c6a + b04a3ca
commit e959fdf
Show file tree

Hide file tree

Showing 15 changed files with 318 additions and 111 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 
 <h4 align="center">
-    Iris is a powerful, format-agnostic, community-driven Python library for
+    Iris is a powerful, format-agnostic, community-driven Python package for
     analysing and visualising Earth science data
 </h4>
 
@@ -15,7 +15,7 @@
 <img src="https://api.travis-ci.org/repositories/SciTools/iris.svg?branch=master"
      alt="Travis-CI" /></a>
 <a href='https://scitools-iris.readthedocs.io/en/latest/?badge=latest'>
-    <img src='https://readthedocs.org/projects/scitools-iris/badge/?version=latest' 
+    <img src='https://readthedocs.org/projects/scitools-iris/badge/?version=latest'
     alt='Documentation Status' /></a>
 <a href="https://anaconda.org/conda-forge/iris">
 <img src="https://img.shields.io/conda/dn/conda-forge/iris.svg"
@@ -39,87 +39,7 @@
 <img src="https://img.shields.io/badge/code%20style-black-000000.svg"
      alt="black" /></a>
 </p>
-<br>
 
-<!-- NOTE: toc auto-generated with https://github.com/frnmst/md-toc:
-   $ md_toc github README.md -i
--->
-
-<h1>Table of contents</h1>
-
-[](TOC)
-
-+ [Overview](#overview)
-+ [Documentation](#documentation)
-+ [Installation](#installation)
-+ [Copyright and licence](#copyright-and-licence)
-+ [Get in touch](#get-in-touch)
-+ [Contributing](#contributing)
-
-[](TOC)
-
-# Overview
-
-Iris implements a data model based on the [CF conventions](http://cfconventions.org/)
-giving you a powerful, format-agnostic interface for working with your data.
-It excels when working with multi-dimensional Earth Science data, where tabular
-representations become unwieldy and inefficient.
-
-[CF Standard names](http://cfconventions.org/standard-names.html),
-[units](https://github.com/SciTools/cf_units), and coordinate metadata
-are built into Iris, giving you a rich and expressive interface for maintaining
-an accurate representation of your data. Its treatment of data and
-  associated metadata as first-class objects includes:
-
-  * a visualisation interface based on [matplotlib](https://matplotlib.org/) and
-    [cartopy](https://scitools.org.uk/cartopy/docs/latest/),
-  * unit conversion,
-  * subsetting and extraction,
-  * merge and concatenate,
-  * aggregations and reductions (including min, max, mean and weighted averages),
-  * interpolation and regridding (including nearest-neighbor, linear and area-weighted), and
-  * operator overloads (``+``, ``-``, ``*``, ``/``, etc.)
-
-A number of file formats are recognised by Iris, including CF-compliant NetCDF, GRIB,
-and PP, and it has a plugin architecture to allow other formats to be added seamlessly.
-
-Building upon [NumPy](http://www.numpy.org/) and [dask](https://dask.pydata.org/en/latest/),
-Iris scales from efficient single-machine workflows right through to multi-core clusters and HPC.
-Interoperability with packages from the wider scientific Python ecosystem comes from Iris'
-use of standard NumPy/dask arrays as its underlying data storage.
-
-
-# Documentation
-
-<a href="https://scitools.org.uk/iris/docs/latest/index.html"> <img src="https://img.shields.io/badge/docs-stable-green.svg" alt="Stable docs" /></a> The documentation for *stable released versions* of Iris, including a user guide, example code, and gallery.
-
-<a href="https://scitools-docs.github.io/iris/master/index.html"> <img src="https://img.shields.io/badge/docs-latest-blue.svg" alt="Latest docs" /></a> The documentation for the *latest development version* of Iris.
-
-
-# Installation
-
-The easiest way to install Iris is with [conda](https://conda.io/miniconda.html):
-
-    conda install -c conda-forge iris
-
-Detailed instructions, including information on installing from source,
-are available in the
-[documentation](https://scitools-iris.readthedocs.io/en/latest/installing.html).
-
-# Get in touch
-
-  * Report bugs, or suggest new features using an Issue or Pull Request on [Github](https://github.com/SciTools/iris). You can also comment on existing Issues and Pull Requests.
-  * For discussions from a user perspective you could join our [SciTools Users Google Group](https://groups.google.com/forum/#!forum/scitools-iris).
-  * For those involved in developing Iris we also have an [Iris Developers Google Group](https://groups.google.com/forum/#!forum/scitools-iris-dev).
-  * [StackOverflow](https://stackoverflow.com/questions/tagged/python-iris) For "How do I?".
-
-# Copyright and licence
-
-Iris may be freely distributed, modified and used commercially under the terms
-of its [GNU LGPLv3 license](COPYING.LESSER).
-
-# Contributing
-Information on how to contribute can be found in the
-[Iris developer guide](https://scitools.org.uk/iris/docs/latest/index.html#development-index).
-
-(C) British Crown Copyright 2010 - 2020, Met Office
+<p align="center">
+See the <a href="https://scitools-iris.readthedocs.io/en/latest/">documentation</a> for the <b>latest development version</b> of Iris.
+</P>
diff --git a/docs/iris/src/index.rst b/docs/iris/src/index.rst
@@ -9,7 +9,7 @@ Iris Documentation
 
 .. todolist::
 
-**A powerful, format-agnostic, community-driven Python library for analysing
+**A powerful, format-agnostic, community-driven Python package for analysing
 and visualising Earth science data.**
 
 Iris implements a data model based on the `CF conventions <http://cfconventions.org>`_

diff --git a/docs/iris/src/userguide/citation.rst b/docs/iris/src/userguide/citation.rst
@@ -4,22 +4,23 @@
 Citing Iris
 ===========
 
-If Iris played an important part in your research then please add us to your reference list by using one of the recommendations below.
+If Iris played an important part in your research then please add us to your
+reference list by using one of the recommendations below.
 
 ************
-BibTeX entry 
+BibTeX entry
 ************
 
 For example::
 
  @manual{Iris,
  author = {{Met Office}},
- title = {Iris: A Python library for analysing and visualising meteorological and oceanographic data sets},
+ title = {Iris: A Python package for analysing and visualising meteorological and oceanographic data sets},
  edition = {v1.2},
  year = {2010 - 2013},
  address = {Exeter, Devon },
  url = {http://scitools.org.uk/}
- } 
+ }
 
 
 *******************
@@ -45,7 +46,7 @@ Suggested format::
 
 For example::
 
- Iris. Met Office. git@github.com:SciTools/iris.git 06-03-2013 
+ Iris. Met Office. git@github.com:SciTools/iris.git 06-03-2013
 
 .. _How to cite and describe software: http://software.ac.uk/so-exactly-what-software-did-you-use
 

diff --git a/docs/iris/src/userguide/interpolation_and_regridding.rst b/docs/iris/src/userguide/interpolation_and_regridding.rst
@@ -28,6 +28,11 @@ The following are the regridding schemes that are currently available in Iris:
  * nearest-neighbour regridding (:class:`iris.analysis.Nearest`), and
  * area-weighted regridding (:class:`iris.analysis.AreaWeighted`, first-order conservative).
 
+The linear, nearest-neighbor, and area-weighted regridding schemes support
+lazy regridding, i.e. if the source cube has lazy data, the resulting cube
+will also have lazy data.
+See :doc:`real_and_lazy_data` for an introduction to lazy data.
+
 
 .. _interpolation:
 
@@ -409,3 +414,45 @@ regridded to the target grid. For example::
 In each case ``result`` will be the input cube regridded to the grid defined by
 the target grid cube (in this case ``rotated_psl``) that we used to define the
 cached regridder.
+
+Regridding lazy data
+^^^^^^^^^^^^^^^^^^^^
+
+If you are working with large cubes, especially when you are regridding to a
+high resolution target grid, you may run out of memory when trying to
+regrid a cube. When this happens, make sure the input cube has lazy data
+
+    >>> air_temp = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
+    >>> air_temp
+    <iris 'Cube' of air_temperature / (K) (time: 240; latitude: 37; longitude: 49)>
+    >>> air_temp.has_lazy_data()
+    True
+
+and the regridding scheme supports lazy data. All regridding schemes described
+here support lazy data. If you still run out of memory even while using lazy
+data, inspect the
+`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+:
+
+    >>> air_temp.lazy_data().chunks
+    ((240,), (37,), (49,))
+
+The cube above consist of a single chunk, because it is fairly small. For
+larger cubes, iris will automatically create chunks of an optimal size when
+loading the data. However, because regridding to a high resolution grid
+may dramatically increase the size of the data, the automatically chosen
+chunks might be too large.
+
+As an example of how to solve this, we could manually re-chunk the time
+dimension, to regrid it in 8 chunks of 30 timesteps at a time:
+
+    >>> air_temp.data = air_temp.lazy_data().rechunk([30, None, None])
+    >>> air_temp.lazy_data().chunks
+    ((30, 30, 30, 30, 30, 30, 30, 30), (37,), (49,))
+
+Assuming that Dask is configured such that it processes only a few chunks of
+the data array at a time, this will further reduce memory use.
+
+Note that chunking in the horizontal dimensions is not supported by the
+regridding schemes. Chunks in these dimensions will automatically be combined
+before regridding.
diff --git a/docs/iris/src/userguide/real_and_lazy_data.rst b/docs/iris/src/userguide/real_and_lazy_data.rst
@@ -37,7 +37,9 @@ In Iris, lazy data is provided as a
 `dask array <https://docs.dask.org/en/latest/array.html>`_.
 A dask array also has a shape and data type
 but the dask array's data points remain on disk and only loaded into memory in
-small chunks when absolutely necessary.  This has key performance benefits for
+small
+`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+when absolutely necessary.  This has key performance benefits for
 handling large amounts of data, where both calculation time and storage
 requirements can be significantly reduced.
 

diff --git a/docs/iris/src/whatsnew/latest.rst b/docs/iris/src/whatsnew/latest.rst
@@ -31,6 +31,11 @@ Features
   saved to NetCDF-CF files. Support for `Quality Flags`_ is also provided to
   ensure they load and save with appropriate units. See :pull:`3800`.
 
+* Lazy regridding with the :class:`~iris.analysis.Linear`,
+  :class:`~iris.analysis.Nearest`, and
+  :class:`~iris.analysis.AreaWeighted` regridding schemes.
+  See :pull:`3701`.
+
 
 Dependency Updates
 ==================

diff --git a/lib/iris/_lazy_data.py b/lib/iris/_lazy_data.py
@@ -349,3 +349,39 @@ def lazy_elementwise(lazy_array, elementwise_op):
     dtype = elementwise_op(np.zeros(1, lazy_array.dtype)).dtype
 
     return da.map_blocks(elementwise_op, lazy_array, dtype=dtype)
+
+
+def map_complete_blocks(src, func, dims, out_sizes):
+    """Apply a function to complete blocks.
+
+    Complete means that the data is not chunked along the chosen dimensions.
+
+    Args:
+
+    * src (:class:`~iris.cube.Cube`):
+        Source cube that function is applied to.
+    * func:
+        Function to apply.
+    * dims (tuple of int):
+        Dimensions that cannot be chunked.
+    * out_sizes (tuple of int):
+        Output size of dimensions that cannot be chunked.
+
+    """
+    if not src.has_lazy_data():
+        return func(src.data)
+
+    data = src.lazy_data()
+
+    # Ensure dims are not chunked
+    in_chunks = list(data.chunks)
+    for dim in dims:
+        in_chunks[dim] = src.shape[dim]
+    data = data.rechunk(in_chunks)
+
+    # Determine output chunks
+    out_chunks = list(data.chunks)
+    for dim, size in zip(dims, out_sizes):
+        out_chunks[dim] = size
+
+    return data.map_blocks(func, chunks=out_chunks, dtype=src.dtype)
diff --git a/lib/iris/analysis/__init__.py b/lib/iris/analysis/__init__.py
@@ -2440,6 +2440,10 @@ def regridder(self, src_grid, target_grid):
         constructing your own regridder is preferable. These are detailed in
         the :ref:`user guide <caching_a_regridder>`.
 
+        Supports lazy regridding. Any
+        `chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+        in horizontal dimensions will be combined before regridding.
+
         Args:
 
         * src_grid:
@@ -2514,6 +2518,10 @@ def regridder(self, src_grid_cube, target_grid_cube):
         constructing your own regridder is preferable. These are detailed in
         the :ref:`user guide <caching_a_regridder>`.
 
+        Supports lazy regridding. Any
+        `chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+        in horizontal dimensions will be combined before regridding.
+
         Args:
 
         * src_grid_cube:
@@ -2630,6 +2638,10 @@ def regridder(self, src_grid, target_grid):
         constructing your own regridder is preferable. These are detailed in
         the :ref:`user guide <caching_a_regridder>`.
 
+        Supports lazy regridding. Any
+        `chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+        in horizontal dimensions will be combined before regridding.
+
         Args:
 
         * src_grid:
@@ -2716,6 +2728,8 @@ def regridder(self, src_cube, target_grid):
         constructing your own regridder is preferable. These are detailed in
         the :ref:`user guide <caching_a_regridder>`.
 
+        Does not support lazy regridding.
+
         Args:
 
         * src_cube:
@@ -2791,6 +2805,8 @@ def regridder(self, src_grid, target_grid):
         constructing your own regridder is preferable. These are detailed in
         the :ref:`user guide <caching_a_regridder>`.
 
+        Does not support lazy regridding.
+
         Args:
 
         * src_grid:

diff --git a/lib/iris/analysis/_area_weighted.py b/lib/iris/analysis/_area_weighted.py
@@ -78,6 +78,9 @@ def __call__(self, cube):
         The given cube must be defined with the same grid as the source
         grid used to create this :class:`AreaWeightedRegridder`.
 
+        If the source cube has lazy data, the returned cube will also
+        have lazy data.
+
         Args:
 
         * cube:
@@ -89,6 +92,12 @@ def __call__(self, cube):
             this cube will be converted to values on the new grid using
             area-weighted regridding.
 
+        .. note::
+
+            If the source cube has lazy data,
+            `chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
+            in the horizontal dimensions will be combined before regridding.
+
         """
         src_x, src_y = get_xy_dim_coords(cube)
         if (src_x, src_y) != self._src_grid: