Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinates not deep copy #7463

Open
4 tasks done
mgaspary opened this issue Jan 20, 2023 · 11 comments
Open
4 tasks done

Coordinates not deep copy #7463

mgaspary opened this issue Jan 20, 2023 · 11 comments
Labels

Comments

@mgaspary
Copy link

What happened?

The coordinates are not copied if you perform deepcopy for xarray.
This issue was fixed before. I don't see it, for example, in xarray 2022.12.0, but in the latest version the issue is there again.

import xarray as xr

xarr1 = xr.DataArray(
    np.zeros([2]), 
    coords=dict(x=[0.0, 1.0]), # important to use 'float' here! with 'int' it is working fine
    dims=("x")
)
print(xarr1.x.data[0]) # 0.0

xarr2 = xarr1.copy(deep=True)
xarr2.x.data[0] = 45
print(xarr1.x.data[0]) # gives 45

Interesting, that if your coordinates are int, then the issue is gone

What did you expect to happen?

import xarray as xr

xarr1 = xr.DataArray(
    np.zeros([2]), 
    coords=dict(x=[0.0, 1.0]), # important to use 'float' here! with 'int' it is working fine
    dims=("x")
)
print(xarr1.x.data[0]) # 0.0

xarr2 = xarr1.copy(deep=True)
xarr2.x.data[0] = 45
print(xarr1.x.data[0]) # I expect it to be 0.0

Minimal Complete Verifiable Example

import xarray as xr

xarr1 = xr.DataArray(
    np.zeros([2]), 
    coords=dict(x=[0.0, 1.0]), # important to use 'float' here! with 'int' it is working fine
    dims=("x")
)
print(xarr1.x.data[0]) # 0.0

xarr2 = xarr1.copy(deep=True)
xarr2.x.data[0] = 45
print(xarr1.x.data[0]) # gives 45

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray 2023.1.0

python 3.8.10

@mgaspary mgaspary added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 20, 2023
@headtr1ck
Copy link
Collaborator

It seems that in copy IndexVariables are treated special and are explicitly excluded from copying.
@benbovy what was the reasoning behind this choice?

@benbovy
Copy link
Member

benbovy commented Feb 10, 2023

I think that the reverting change in IndexVariable came after refactoring copy in Xarray introduced some performance regression (#7209 (comment)).

I didn't see #1463 (#1463 (comment)), though. It feels weird to me that we can mutate an IndexVariable via its data property, considering that the underlying index is immutable. IIUC xarr2.x.data[0] = 45 replaces the full index with a new one? I'm not sure if it is a good idea to allow this. For a pandas index that's probably OK (it is reasonably cheap to rebuild a new index) but for a custom index that is expensive to build (e.g., kd-tree) I don't think this behavior is desirable.

@headtr1ck
Copy link
Collaborator

I didn't see #1463 (#1463 (comment)), though. It feels weird to me that we can mutate an IndexVariable via its data property, considering that the underlying index is immutable. IIUC xarr2.x.data[0] = 45 replaces the full index with a new one? I'm not sure if it is a good idea to allow this. For a pandas index that's probably OK (it is reasonably cheap to rebuild a new index) but for a custom index that is expensive to build (e.g., kd-tree) I don't think this behavior is desirable.

should we then raise an error if someone tries to replace values in an index?

@benbovy
Copy link
Member

benbovy commented Feb 10, 2023

Yes I think we should, but I might have missed the rationale behind allowing it if this is intentional.

EDIT: perhaps better to issue a warning first to avoid some breaking change. We could also try to fix it (make a deep copy) at the same time as deprecating it, but that might be tricky without again introducing performance regressions.

@mgaspary
Copy link
Author

Yes I think we should, but I might have missed the rationale behind allowing it if this is intentional.

EDIT: perhaps better to issue a warning first to avoid some breaking change. We could also try to fix it (make a deep copy) at the same time as deprecating it, but that might be tricky without again introducing performance regressions.

I would assume that deepcopy are completely going to copy the object, so if I change internal parts (like coordinates here), 1st object shall not change. It definitely affects the performance, but otherwise deepcopy is not deep anymore

@benbovy
Copy link
Member

benbovy commented Feb 13, 2023

There are two issues:

  • whether we should continue allowing IndexVariable data be updated in place via .data property. IMO we should really deprecate it, especially that now it is possible to have custom, possibly expensive index structures built from one or more coordinates.

  • whether deep=True should deep copy the Xarray index objects. I don't have strong opinion on this. There is a similar discussion on the pandas side: DataFrame.copy(deep=True) is not a deep copy of the index pandas-dev/pandas#19862. I wonder if we reverted the change here because some high-level operations in Xarray were by default deep copying the indexes? I don't think we would want such behavior unless the user explicitly sets deep=True somewhere?

@dcherian
Copy link
Contributor

whether we should continue allowing IndexVariable data be updated in place via .data property. IMO we should really deprecate it

I agree. We don't allow it on .values anyway for this reason.

Now I see we already did this in https://github.com/pydata/xarray/pull/3862/files So I guess that got lost in the indexes refactor.

whether deep=True should deep copy the Xarray index objects

I would expect deep=True to deep-copy absolutely everything. That said if the Index objects are immutable then it doesn't matter?

On that pandas thread I see:

discovered that even if the id of the index was different on the copy, modifying the cp.index.to_numpy() values was corrupting the original.

Seems like we should all be setting the numpy data to be read-only with array.flags.WRITEABLE = False

@lee1043
Copy link

lee1043 commented Sep 23, 2024

Hi, I am hitting the similar issue, and curious if there any plan for this issue. I think when deep=True, reasonable expectation is everything is going to be deep copied, regardless of coordinate or data variable. This seems like a long standing issue (e.g., https://groups.google.com/g/xarray/c/ksMI8TmRPaA?pli=1).

@max-sixty
Copy link
Collaborator

If there are difficulties or ambiguities here, would it be reasonable to switch to deep-copying everything for the moment?

If we can figure out optimizations on things that are definitely immutable, then great, but in the meantime any objections to defaulting to the slower-but-always-correct thing?

@juntyr
Copy link
Contributor

juntyr commented Oct 10, 2024

I also just stumbled across this bug. A rogue plotting script is changing the longitude of a map internally, and even when passing the script a da.copy(), the da's own longitude is modified after plotting. Deep copies should be truly deep. If the performance loss is too large to pass on to everyone, the copy function should get a new optional parameter to force the really-deep copy mode.

@max-sixty
Copy link
Collaborator

I don't hear anyone clamoring to retain skipping indexes from deep copy. Would anyone like to do a PR to include them in deep copy, and we'll plan to merge that unless anyone comes up with a thought-through objection?

@kmuehlbauer kmuehlbauer removed the needs triage Issue that has not been reviewed by xarray team member label Nov 4, 2024
@kmuehlbauer kmuehlbauer mentioned this issue Nov 15, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants