Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iris not working with Dask 0.18.2 #3125

Closed
lbdreyer opened this issue Aug 2, 2018 · 6 comments
Closed

Iris not working with Dask 0.18.2 #3125

lbdreyer opened this issue Aug 2, 2018 · 6 comments

Comments

@lbdreyer
Copy link
Member

lbdreyer commented Aug 2, 2018

Iris master does not work with dask 0.18.2 (but does seem to work with 0.18.1)

I suspect this is why some of the travis tests are failing.

When loading a netcdf file, it fills in the mask.
With Dask 0.18.2

>>> cube = iris.load_cube('my_cube.nc')
>>> cube[0,0].data
array([[9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36],
       [9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36],
       [9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36],
       ...,
       [9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36],
       [9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36],
       [9.96921e+36, 9.96921e+36, 9.96921e+36, ..., 9.96921e+36,
        9.96921e+36, 9.96921e+36]], dtype=float32)

But with dask 0.18.1 this is:

>>> cube = iris.load_cube('my_cube.nc')
>>> cube[0,0].data
masked_array(
  data=[[--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        ...,
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --]],
  mask=[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],
  fill_value=9.96921e+36,
  dtype=float32)
@pp-mo
Copy link
Member

pp-mo commented Aug 3, 2018

Trying to address this -- for the 2d_coords branch -- in #3127

Note that this is also seems to be uncovering a few other glitches that have crept in, presumable just since the tests starting failing. Some of that may be relevant to master ?

@pp-mo
Copy link
Member

pp-mo commented Aug 3, 2018

Think I found it : see dask/dask#3848

@bascrezee
Copy link
Contributor

I found strange behavior in Iris, that is very probably related to this bug. What I find particularly strange, is that simply 'touching' the cube right after loading it preserves the mask, maybe this can serve as a temporary workaround for Iris. In my example, I use the trivial statement assert type(cube.data)==type(cube.data) right after loading the cube. But anything will work, also e.g. print(cube.data). It seems like quantum mechanics, just observing the object, forces it into a certain state! The plot shows you whether it preserved the mask (a line with gaps) or not (a continuous line at the missing value). Hope my example can be of use in solving this issue.

This is my installation:
Dask version: 0.18.2
Iris version: 2.1.0
Numpy version: 1.15.0

@pp-mo
Copy link
Member

pp-mo commented Sep 3, 2018

It seems like quantum mechanics, just observing the object, forces it into a certain state!

This is not dask behaviour, it is intentional Iris-specific behviour.
In Iris, touching a cube's data replaces the lazy array with a real one, i.e. it caches the actual data when it is fetched once, which dask never does.
We say that it 'realises' the cube :

"In Iris, when actual data values are needed from a lazy data array, it is ‘realised’ : this means that all the actual values are read in from the file, and a ‘real’ (i.e. numpy) array replaces the lazy array within the Iris object.",

from userguide section "Real and Lazy Data"

The question in your case is more "why does this preserve the mask" ?
I guess it is because the original cube lazy content (i.e. the loader dask array) loads correctly with the mask, but some derived (still lazy) version is hitting that dask-18.2-specific bug which loses the mask.
And by "derived", I mean it could be as simple as a straight copy of the cube, as it is the dask copy operation that has the bug.

To fix it will be either a different Iris, or a different Dask (!)

  • currently fixed in Iris master by pinning dask to 0.18.1, but that fix is not yet in any release (and in fact will probably be gone again by the time of next release)
  • since also fixed in dask : that is now available in dask 0.19

@bascrezee
Copy link
Contributor

Many thanks for the clear explanation. Upgrading dask indeed fixed the issue for my example.

@DPeterK
Copy link
Member

DPeterK commented Oct 9, 2018

Given that Iris now works with dask 0.19.2, I think we can safely close this issue!

@DPeterK DPeterK closed this as completed Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants