-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoW: Add lazy copy mechanism to DataFrame constructor for dict of Index #52947
CoW: Add lazy copy mechanism to DataFrame constructor for dict of Index #52947
Conversation
arrays = [arr if not isinstance(arr, Index) else arr._data for arr in arrays] | ||
if not using_copy_on_write(): | ||
arrays = [ | ||
arr if not isinstance(arr, Index) else arr._data for arr in arrays | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this in the default case of not CoW? This basically just unpacks Index objects, but also eg Series doesn't get unpacked, and this is handled in arrays_to_mgr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets see if ci passes, wasn't sure so wanted to avoid changing it
else x.copy(deep=True) | ||
if isinstance(x, Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this copy is only needed when the dtype is an ExtensionDtype?
And we might actually have the same issue right now with passing a Series here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is currently a "bug" for when passing a dict of Series with EA dtype. Because we are passing a dict, it should result in a copy, but it doesn't when it's an EA dtype and does when default numpy dtype:
In [17]: ser = pd.Series([1, 2, 3], dtype="Int64")
In [18]: df = pd.DataFrame({"col": ser})
In [19]: df.iloc[0, 0] = 10
In [20]: ser
Out[20]:
0 10
1 2
2 3
dtype: Int64
In [21]: ser = pd.Series([1, 2, 3], dtype="int64")
In [22]: df = pd.DataFrame({"col": ser})
In [23]: df.iloc[0, 0] = 10
In [24]: ser
Out[24]:
0 1
1 2
2 3
dtype: int64
And also when creating it from an array instead of Series, it is copied correctly:
In [25]: arr = pd.array([1, 2, 3], dtype="Int64")
In [26]: df = pd.DataFrame({"col": arr})
In [27]: df.iloc[0, 0] = 10
In [28]: arr
Out[28]:
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should change the check for ExtensionArray to check for the dtype being ExtensionDtype? (or move the copy deeper down into arrays_to_mgr
after we know the data have been sanitized / unboxed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to do this independently of the CoW change, can do as a follow up or pre-cursor. What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wild o as follow up then
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
This needs a merge of main now I merged the other PR |
…x_to_frame # Conflicts: # doc/source/whatsnew/v2.1.0.rst
updated |
Thanks! |
will put up a follow up for the copy thing |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.xref #48998