-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834
CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Commented in a few cases where I was not sure about being sure to already have a copy.
@@ -3604,7 +3609,11 @@ def transpose(self, *args, copy: bool = False) -> DataFrame: | |||
if copy: | |||
new_arr = new_arr.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can pass copy=False below (which I think is correct, since in this code path we know that we have multiple blocks, and so self.values
will always be a copy), then this copy if copy=True
should also not be necessary?
I would also add a short comment mentioning why we can specify copy=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought so as well, couldn't figure out why we would do an additional copy here, but wasn't sure if I am missing something.
Added a comment
@@ -3830,7 +3839,7 @@ def _getitem_multilevel(self, key): | |||
else: | |||
new_values = self.values[:, loc] | |||
result = self._constructor( | |||
new_values, index=self.index, columns=result_columns | |||
new_values, index=self.index, columns=result_columns, copy=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure here? loc
can be a slice, in which case self.values[:, loc]
can be a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are correct, this was wrong before as well. Will open a separate pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we generally want a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we should track references when we get a view. Which we did not do before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was handled by #51944
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
@@ -777,6 +777,7 @@ def swapaxes( | |||
return self._constructor( | |||
new_values, | |||
*new_axes, | |||
copy=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be passed here because in case of CoW, the single block case is already handled above? So if we end up here and use CoW, we always have multiple blocks and thus already a copy?
I would maybe add a brief comment mentioning that, maybe like
# ...
copy = False
to have the comment together with the value, and then passing copy
to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. Added a comment
pandas/core/series.py
Outdated
) | ||
return self._constructor( | ||
self._values[indexer], index=new_index, copy=False | ||
).__finalize__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that is the case here, but in general get_loc_level
can return a slice, and so self._values[indexer]
can be a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can get here with a slice, but will check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't construct a case where we would end up with a slice here. Will re-check but would open a separate pr anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, looking at the code, it seems that when passing a tuple we typically convert the slice to a boolean ndarray.
(we could still add a if isinstance(indexer, slice): add_reference
to be sure, but that would probably be not covered by our tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah took another look, If you have a MultiIndex that has tuples in the first level then you can get a slice indexer here... Weird corner case 😃
I'll open another pr for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_ser = self._constructor(new_values, index=new_index, name=self.name) | ||
new_ser = self._constructor( | ||
new_values, index=new_index, name=self.name, copy=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: are we sure this is a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this can be a view. Will open a separate pr since this does not work right now either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you already have another PR for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…onstructor # Conflicts: # pandas/core/ops/__init__.py # pandas/core/series.py
…onstructor # Conflicts: # pandas/core/series.py
So all non-working cases have been merged, I'd merge this tomorrow if green. Afterwards we can merge the copy True pr, this would cause failures now if merged before this one |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…rs (pandas-dev#51834) (cherry picked from commit c98b7c8)
Manual backport -> #52012 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.cc @jbrockmendel This keeps copy=False internally where necessary, to avoid unnecessary copies as a side-effect of #51731 (by default copying numpy arrays in the DataFrame constructor)