Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask.array deepcopy does not preserve masking since 0.18.2 #3848

Closed
pp-mo opened this issue Aug 3, 2018 · 4 comments · Fixed by #3852
Closed

Dask.array deepcopy does not preserve masking since 0.18.2 #3848

pp-mo opened this issue Aug 3, 2018 · 4 comments · Fixed by #3852
Labels
array good first issue Clearly described and easy to accomplish. Good for beginners to the project.

Comments

@pp-mo
Copy link

pp-mo commented Aug 3, 2018

If 'a' is a dask array wrapped around a numpy masked array (with "from _array(..., asarray=False)",
then the result of "copy.deepcopy(a).compute()" is now (wrongly) not a masked array,

Though the original "a.compute()", and even "copy.copy(a)" are correct.

Example code to show :

import dask.array as da
import numpy.ma as ma
import copy
t = ma.masked_array([1, 2], mask=[0, 1])
a = da.from_array(t, chunks=t.shape, asarray=False)
print(copy.deepcopy(a).compute())

For example, in dask 0.18.1

>>> import dask
>>> import dask.array as da
>>> import numpy.ma as ma
>>> import copy
>>> t = ma.masked_array([1, 2], mask=[0, 1])
>>> a = da.from_array(t, chunks=t.shape, asarray=False)
>>> print(dask.__version__)
0.18.1
>>> print(copy.copy(a).compute())
[1 --]
>>> print(copy.deepcopy(a).compute())
[1 --]
>>> 

But in 0.18.2 ...

>>> import dask
>>> import dask.array as da
>>> import numpy.ma as ma
>>> import copy
>>> 
>>> t = ma.masked_array([1, 2], mask=[0, 1])
>>> a = da.from_array(t, chunks=t.shape, asarray=False)
>>> 
>>> print(dask.__version__)
0.18.2
>>> print(copy.copy(a).compute())
[1 --]
>>> print(copy.deepcopy(a).compute())
[1 2]
>>> 
@pp-mo
Copy link
Author

pp-mo commented Aug 3, 2018

This is causing us to pin dask in iris 😢
as it breaks various unit tests for us.

I suspect the change here, but I don't know really enough to suggest a fix ...
( but surely this anyway does not respect the comment on the previous line, as noted ? )

@mrocklin
Copy link
Member

mrocklin commented Aug 3, 2018 via email

@jcrist
Copy link
Member

jcrist commented Aug 3, 2018

Looks like the np.copy in the map_blocks should be replaced with dask.utils.M.copy (which would call the .copy method on the array).

@jcrist jcrist added good first issue Clearly described and easy to accomplish. Good for beginners to the project. array labels Aug 3, 2018
TAdeJong added a commit to TAdeJong/dask that referenced this issue Aug 5, 2018
This to preserve masking upon deepcopy

refs dask#3848
@mrocklin
Copy link
Member

@pp-mo it looks like there is a proposed fix to your problem in #3852 . Do you have time to take a look?

mrocklin pushed a commit that referenced this issue Aug 21, 2018
* Use the copy method of the object instead of np.copy

This to preserve masking upon deepcopy

refs #3848

* Add test for the copy machinery for da.ma

* Fix flake8 compliance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array good first issue Clearly described and easy to accomplish. Good for beginners to the project.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants