Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API / CoW: constructing DataFrame from DataFrame creates lazy copy #50499

Closed
wants to merge 10 commits into from

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 30, 2022

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jorisvandenbossche

@phofl phofl changed the title BUG: DataFrame constructor not tracking reference if called with df or mgr BUG: DataFrame constructor not tracking reference if called with df Dec 30, 2022
@phofl phofl marked this pull request as draft December 30, 2022 13:03
@phofl phofl marked this pull request as ready for review January 7, 2023 21:57
@jorisvandenbossche jorisvandenbossche changed the title BUG: DataFrame constructor not tracking reference if called with df API / CoW: constructing DataFrame from DataFrame creates lazy copy Jan 16, 2023
@@ -951,6 +951,10 @@ Indexing
- Bug in :class:`BusinessHour` would cause creation of :class:`DatetimeIndex` to fail when no opening hour was included in the index (:issue:`49835`)
-

Copy on write
^^^^^^^^^^^^^
- Bug in :class:`DataFrame` constructor not tracking reference if called with another :class:`DataFrame` (:issue:`50499`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the equivalent Series change I listed this in the "Copy-on-Write improvements" section instead of seeing it as a bug fix (it was a known gap left out of the initial implementation, and it also changes behaviour, so at this stage I wouldn't yet label it as a bug fix)

@@ -205,6 +205,7 @@
to_arrays,
treat_as_nested,
)
from pandas.core.internals.managers import using_copy_on_write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import can now be removed (after a merge with main)

and new_data is self._mgr
or not copy
and using_copy_on_write()
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit how this change is related? We are passing here a Manager object to the constructor, I suppose, and you only changed the constructor when it receives a DataFrame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, this is unrelated to the constructor change, I'll make a separate pr.

Currently, when reindexing with copy=False we don't track references, causing problems.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have some skips in #50536 related to reindex, there are several things not yet fully working it seems

@phofl
Copy link
Member Author

phofl commented Feb 8, 2023

#51239

@phofl phofl closed this Feb 8, 2023
@phofl phofl deleted the cow_mgr_constructor branch February 11, 2023 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants