-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy-on-Write (PDEP-7) follow-up overview issue #48998
Comments
This one just caught me out in statsmodels. Seems like it is hard to get high-performance in place filling on a column-by-column basis with CoW. Is this correct? The code that cause the issue was
which is pretty standard IME. I replaced it with
Is there a better way when using CoW? |
Ok, so a better way is |
Yes, that's a good question. In general, the answer is always to do it through a method on the DataFrame directly, and indeed in this case |
PDEP-7: https://pandas.pydata.org/pdeps/0007-copy-on-write.html
An initial implementation was merged in #46958 (with the proposal described in more detail in https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit / discussed in #36195).
In #36195 (comment) I mentioned some next steps that are still needed; moving this to a new issue.
Implementation
Complete the API surface:
copy
keyword?copy
keyword in DataFrame/Series methods #50535 -> CoW: Ignore copy=True when copy_on_write is enabled #51464copy
keyword (except in constructors) #56022.values
,to_numpy()
). Potential idea is to make the returned array read-only by default.*args
or**kwargs
#56456df['a'].fillna(.., inplace=True)
Improve the performance
Provide upgrade path:
Documentation / feedback
Aside from finalizing the implementation, we also need to start documenting this, and it will be super useful to have people give this a try, run their code or test suites with it, etc, so we can iron out bugs / missing warnings / or discover unexpected consequences that need to be addressed/discussed.
Some remaining aspects of the API to figure out:
Series.view()
method -> is deprecatedhead()
/tail()
return eager copies? (to avoid using those methods for exploration trigger CoW) -> API/CoW: Return copies for head and tail #54011The text was updated successfully, but these errors were encountered: