CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834

phofl · 2023-03-08T00:31:21Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jbrockmendel This keeps copy=False internally where necessary, to avoid unnecessary copies as a side-effect of #51731 (by default copying numpy arrays in the DataFrame constructor)

jorisvandenbossche

Looks good! Commented in a few cases where I was not sure about being sure to already have a copy.

jorisvandenbossche · 2023-03-13T15:26:22Z

pandas/core/frame.py

@@ -3604,7 +3609,11 @@ def transpose(self, *args, copy: bool = False) -> DataFrame:
            if copy:
                new_arr = new_arr.copy()


If we can pass copy=False below (which I think is correct, since in this code path we know that we have multiple blocks, and so self.values will always be a copy), then this copy if copy=True should also not be necessary?

I would also add a short comment mentioning why we can specify copy=False

Yeah I thought so as well, couldn't figure out why we would do an additional copy here, but wasn't sure if I am missing something.

Added a comment

jorisvandenbossche · 2023-03-13T15:27:41Z

pandas/core/frame.py

@@ -3830,7 +3839,7 @@ def _getitem_multilevel(self, key):
            else:
                new_values = self.values[:, loc]
                result = self._constructor(
-                    new_values, index=self.index, columns=result_columns
+                    new_values, index=self.index, columns=result_columns, copy=False


Are we sure here? loc can be a slice, in which case self.values[:, loc] can be a view?

Yeah you are correct, this was wrong before as well. Will open a separate pr.

Don't we generally want a view?

Yes, but we should track references when we get a view. Which we did not do before

This was handled by #51944

jorisvandenbossche · 2023-03-13T15:33:34Z

pandas/core/generic.py

@@ -777,6 +777,7 @@ def swapaxes(
        return self._constructor(
            new_values,
            *new_axes,
+            copy=False,


This can be passed here because in case of CoW, the single block case is already handled above? So if we end up here and use CoW, we always have multiple blocks and thus already a copy?

I would maybe add a brief comment mentioning that, maybe like

# ... copy = False

to have the comment together with the value, and then passing copy to the constructor.

Yep exactly. Added a comment

jorisvandenbossche · 2023-03-13T15:38:53Z

pandas/core/series.py

-        )
+        return self._constructor(
+            self._values[indexer], index=new_index, copy=False
+        ).__finalize__(self)


I don't know if that is the case here, but in general get_loc_level can return a slice, and so self._values[indexer] can be a view?

I don't think we can get here with a slice, but will check

Did you check this one?

I couldn't construct a case where we would end up with a slice here. Will re-check but would open a separate pr anyway

Indeed, looking at the code, it seems that when passing a tuple we typically convert the slice to a boolean ndarray.
(we could still add a if isinstance(indexer, slice): add_reference to be sure, but that would probably be not covered by our tests)

Ah took another look, If you have a MultiIndex that has tuples in the first level then you can get a slice indexer here... Weird corner case 😃

I'll open another pr for this case

jorisvandenbossche · 2023-03-13T15:39:52Z

pandas/core/series.py

-            new_ser = self._constructor(new_values, index=new_index, name=self.name)
+            new_ser = self._constructor(
+                new_values, index=new_index, name=self.name, copy=False
+            )


Same here: are we sure this is a copy?

No this can be a view. Will open a separate pr since this does not work right now either

Did you already have another PR for this?

…onstructor # Conflicts: # pandas/core/ops/__init__.py # pandas/core/series.py

…onstructor

…onstructor # Conflicts: # pandas/core/series.py

phofl · 2023-03-16T00:09:13Z

So all non-working cases have been merged, I'd merge this tomorrow if green. Afterwards we can merge the copy True pr, this would cause failures now if merged before this one

lumberbot-app · 2023-03-16T07:05:28Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout 2.0.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 c98b7c84bdb3ba94d7cf482802f15fe313c0f5c7

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #51834: CoW: Set copy=False in internal usages of Series/DataFrame constructors'

Push to a named branch:

git push YOURFORK 2.0.x:auto-backport-of-pr-51834-on-2.0.x

Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #51834 on branch 2.0.x (CoW: Set copy=False in internal usages of Series/DataFrame constructors)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

…rs (pandas-dev#51834) (cherry picked from commit c98b7c8)

jorisvandenbossche · 2023-03-16T07:31:53Z

Manual backport -> #52012

…DataFrame constructors (#52012) CoW: Set copy=False in internal usages of Series/DataFrame constructors (#51834) (cherry picked from commit c98b7c8) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>

CoW: Set copy=False in internal usages of Series/DataFrame constructors

910c914

phofl added the Copy / view semantics label Mar 8, 2023

phofl added this to the 2.0 milestone Mar 8, 2023

jorisvandenbossche reviewed Mar 13, 2023

View reviewed changes

phofl added 5 commits March 13, 2023 20:18

Restrict usage of copy

72e1d2d

Add comments

efe7584

Merge remote-tracking branch 'upstream/main' into cow_copy_false_in_c…

353f5e8

…onstructor # Conflicts: # pandas/core/ops/__init__.py # pandas/core/series.py

Merge remote-tracking branch 'upstream/main' into cow_copy_false_in_c…

6845b9f

…onstructor

Merge remote-tracking branch 'upstream/main' into cow_copy_false_in_c…

8297b5e

…onstructor # Conflicts: # pandas/core/series.py

jorisvandenbossche approved these changes Mar 16, 2023

View reviewed changes

jorisvandenbossche merged commit c98b7c8 into pandas-dev:main Mar 16, 2023

lumberbot-app bot added the Still Needs Manual Backport label Mar 16, 2023

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Mar 16, 2023

CoW: Set copy=False in internal usages of Series/DataFrame constructo…

7c45b9c

…rs (pandas-dev#51834) (cherry picked from commit c98b7c8)

jorisvandenbossche mentioned this pull request Mar 16, 2023

Backport PR #51834: CoW: Set copy=False in internal usages of Series/DataFrame constructors #52012

Merged

phofl deleted the cow_copy_false_in_constructor branch March 16, 2023 09:16

phofl removed the Still Needs Manual Backport label Mar 17, 2023

jorisvandenbossche mentioned this pull request Mar 17, 2023

CoW: Set copy=False explicitly internally for Series and DataFrame in io/pytables #52032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834

CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834

phofl commented Mar 8, 2023 •

edited by jorisvandenbossche

Loading

jorisvandenbossche left a comment

jorisvandenbossche Mar 13, 2023

phofl Mar 13, 2023

jorisvandenbossche Mar 13, 2023

phofl Mar 13, 2023

jbrockmendel Mar 14, 2023

phofl Mar 14, 2023

jorisvandenbossche Mar 15, 2023

phofl Mar 15, 2023

jorisvandenbossche Mar 13, 2023

phofl Mar 13, 2023

jorisvandenbossche Mar 13, 2023

phofl Mar 13, 2023

jorisvandenbossche Mar 15, 2023

phofl Mar 15, 2023

jorisvandenbossche Mar 15, 2023

phofl Mar 15, 2023

phofl Mar 15, 2023

jorisvandenbossche Mar 13, 2023

phofl Mar 13, 2023

jorisvandenbossche Mar 15, 2023

phofl Mar 15, 2023 •

edited

Loading

phofl commented Mar 16, 2023

lumberbot-app bot commented Mar 16, 2023

jorisvandenbossche commented Mar 16, 2023

		@@ -3604,7 +3609,11 @@ def transpose(self, *args, copy: bool = False) -> DataFrame:
		if copy:
		new_arr = new_arr.copy()

CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834

CoW: Set copy=False in internal usages of Series/DataFrame constructors #51834

Conversation

phofl commented Mar 8, 2023 • edited by jorisvandenbossche Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

phofl commented Mar 16, 2023

lumberbot-app bot commented Mar 16, 2023

jorisvandenbossche commented Mar 16, 2023

phofl commented Mar 8, 2023 •

edited by jorisvandenbossche

Loading

phofl Mar 15, 2023 •

edited

Loading