Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix IntegerArray astype with copy=True/False #34931

Merged
merged 6 commits into from
Jul 10, 2020

Conversation

jorisvandenbossche
Copy link
Member

xref #34307 (comment)

Right now, we were not copying the mask correctly with copy=True, and also not when the data was actually copied (then the mask also needs to be copied, even with copy=False)

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The matplotlib test failure is unrelated and I think reported elsewhere.

Does this need a whatsnew, or is it only on master?

@jorisvandenbossche
Copy link
Member Author

One additional API question that pops up: in numpy, astype(.., copy=False) seems to return itself if it doesn't need to copy (I am actually also relying on this in the if data is self._data check).
But in the EA implementation, we always return a new array object, potentially with the same backing data. But we could in principle also do something like if dtype == self.dtype and not copy: return self (and that should already catch the one case where a copy can be avoided I think, and we could simply always copy in the other cases)

@jorisvandenbossche
Copy link
Member Author

Does this need a whatsnew, or is it only on master?

Not only master it seems, will add a whatsnew.

@TomAugspurger
Copy link
Contributor

I don't have a strong opinion on returning a new object backed by the data vs. just returning self.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm very minor comment

-------
type
"""
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use AbstractMethodError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually use NotImplementedError in the base dtype class for this, so this is consistent with that (not fully sure why we do that though, there might be a reason)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk yeah we should actually change that as this is a better error message (but ok)

@jorisvandenbossche
Copy link
Member Author

Any preferences on what I do with copy=False? (returning a new object vs returning self)

@TomAugspurger
Copy link
Contributor

Matching NumPy (returning the same object) seems OK.

@jorisvandenbossche
Copy link
Member Author

Updated to return same object

@jreback jreback added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Jul 8, 2020
@jreback jreback added this to the 1.1 milestone Jul 8, 2020
@jreback
Copy link
Contributor

jreback commented Jul 8, 2020

lgtm, does this need a whatsnew? e.g. changed in 1.1?

@jorisvandenbossche
Copy link
Member Author

Added a bug fix note

@jreback
Copy link
Contributor

jreback commented Jul 10, 2020

thanks @jorisvandenbossche

@jreback jreback merged commit df24c10 into pandas-dev:master Jul 10, 2020
@jorisvandenbossche jorisvandenbossche deleted the ea-astype-copy branch July 11, 2020 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants