-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Remove unused core.internals methods #19250
Conversation
Codecov Report
@@ Coverage Diff @@
## master #19250 +/- ##
=========================================
Coverage ? 91.55%
=========================================
Files ? 150
Lines ? 48713
Branches ? 0
=========================================
Hits ? 44599
Misses ? 4114
Partials ? 0
Continue to review full report at Codecov.
|
@@ -937,10 +918,10 @@ def putmask(self, mask, new, align=True, inplace=False, axis=0, | |||
|
|||
new_values = self.values if inplace else self.values.copy() | |||
|
|||
if hasattr(new, 'reindex_axis'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of these may not be hit, I think I prefer the more idiomatic
new = getattr(new, 'values', new)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll check to see whether any value-less cases get through while I'm at it.
pandas/core/internals.py
Outdated
@@ -1929,6 +1910,7 @@ def should_store(self, value): | |||
|
|||
|
|||
class DatetimeLikeBlockMixin(object): | |||
freq = None # compat with Datetimelike Index subclasses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put comments above this, but think this comment is out of place, meaning that unless you have internal knowledge of the codebase you won't have any idea what this is, rather put a 1-liner that explains
pandas/core/internals.py
Outdated
@@ -2466,6 +2448,7 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block): | |||
__slots__ = () | |||
is_datetime = True | |||
_can_hold_na = True | |||
tz = None | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pandas/core/internals.py
Outdated
@@ -2603,6 +2586,12 @@ class DatetimeTZBlock(NonConsolidatableMixIn, DatetimeBlock): | |||
_concatenator = staticmethod(_concat._concat_datetime) | |||
is_datetimetz = True | |||
|
|||
get_values = DatetimeBlock.get_values # override NonConsolidatableMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather you write an actual function here (that can call DatetimeBlock.get_values), but that has a doc-string and such
pandas/core/internals.py
Outdated
@@ -1929,6 +1903,9 @@ def should_store(self, value): | |||
|
|||
|
|||
class DatetimeLikeBlockMixin(object): | |||
# We add a dummy `freq` attribute here to make these block subclasses | |||
# behave more like the DatetimelikeIndexOpsMixin implementations | |||
freq = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose for this. Why should blocks behave more like DatetimelineIndexOpsMixin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point for adding tz
(for a little bit of deduplication), but freq
is never used? Why adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose for this. Why should blocks behave more like DatetimelineIndexOpsMixin ?
@jorisvandenbossche Two parallel strands here.
a) #19147 and a bunch of others are moving towards having Series ops delegate to the Index subclass methods, which have careful implementations of timezone and type checking. But longer-term, it would be better for these methods to just delegate to e.g. self._data.blocks[0].__add__
(probably accessed more prettily) and have the implementations in the blocks. This will also make it easier for DataFrame ops to delegate correctly, see #18824. So ideally e.g. TimedeltaBlock and TimedeltaIndex would both inherit their arithmetic/comparison methods from TimedeltaArray or TimedeltaVector or something.
b) #19174 has PeriodArray subclass DatetimelikeOpsMixin, notes that it would be preferable to have the mixin class refactor out index-specific ops and separate the more array-like ops. I've been working along those lines, have found that small namespaces differences are the main thing standing in the way of this refactoring. So partly this is trying to smooth out those differences in a piece-meal way.
That definitely answers some question. Does it answer the right question?
I see the point for adding tz (for a little bit of deduplication), but freq is never used? Why adding it?
Purely for compat. This way DatetimeBlock._box_func == DatetimeTZBlock._box_func == DatetimeIndex._box_func
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that we are thinking about that direction, but those things would be added in an array class, not the block. Anyhow, let's change this if we actually get there, not prematurely, as it is now only adding unused code (I mean the freq
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
pandas/core/internals.py
Outdated
@@ -2559,7 +2540,7 @@ def _try_coerce_result(self, result): | |||
|
|||
@property | |||
def _box_func(self): | |||
return tslib.Timestamp | |||
return lambda x: tslib.Timestamp(x, freq=self.freq, tz=self.tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just return
```self.values._box_func()``, no need to add any attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work for DatetimeTZBlock, but not for DatetimeBlock. This edit is to a) let them share the implementation and b) make the implementation match that of DatetimeIndex to move in the direction of sharing implementations.
pandas/core/internals.py
Outdated
@@ -2467,6 +2440,10 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block): | |||
is_datetime = True | |||
_can_hold_na = True | |||
|
|||
# We add a dummy `tz` attribute here to make these block subclasses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this too (and the tz propert and changes to box_func).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Not a fan of the idea of de-duplicating or just don't want to it in this PR?
pandas/core/internals.py
Outdated
return object dtype as boxed values, such as Timestamps/Timedelta | ||
""" | ||
# Note: this overrides the NonConsolidatableMixIn implementation | ||
return DatetimeBlock.get_values(self, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than do this, move the impl from DatetimeBlock here, you are calling a sub-class, not a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to copy/paste the method? The original version in the PR was just get_values = DatetimeBlock.get_values
to override the version from NonConsolidatableMixin (which is higher in the MRO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the idea was to
- Move
DatetimeBlock.get_values
here to the mixin - Define
DatetimeTZBlock.get_values
as
def get_values(self, dtype=None):
return DatetimeLikeBlockMixin.get_values(self, dtype)
Or, does changing DatetimeTZBlock's MRO to (DatetimeLikeBlockMixin, NonConsolidatableMixin, DatetimeBlock)
break anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeLikeBlockMixin
already defines get_values. The original version from this PR get_values = DatetimeBlock.get_values
could equivalently be get_values = DatetimeLikeBlockMixin.get_values
, which I guess is slightly better.
Or, does changing DatetimeTZBlock's MRO to (DatetimeLikeBlockMixin, NonConsolidatableMixin, DatetimeBlock) break anything?
Haven't tried that, but I expect it would break a lot b/c I think nothing from NonConsolidatableMixin
would get used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that wouldn't work since you'd be inheriting from DatetimeLikeBlockMixin
twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just calling the super method what happens if you just remove it entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just calling the super method what happens if you just remove it entirely?
Argh. If this is removed entirely then the NonConsolidatableMixin.get_values
implementation is used because it is at the top of the MRO. (I haven't checked how it breaks, but am assuming that whoever wrote the existing method overrode it for a reason). That's why all that is needed is a one-liner to point it towards the DatetimeBlock.get_values method.
Travis error looks like OSX; I'll sit tight until advised otherwise. |
pandas/core/internals.py
Outdated
@@ -2614,6 +2587,9 @@ def __init__(self, values, placement, ndim=2, dtype=None): | |||
super(DatetimeTZBlock, self).__init__(values, placement=placement, | |||
ndim=ndim) | |||
|
|||
# override NonConsolidatableMixIn implementation of get_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this for now and revert get_values from the Block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just keep the unnecessary implementation currently in DatetimeTZBlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to get this merged. yes. I really don't like this change the way it is. if you want to refactor into something more palable be my guest. but this PR would be on hold until then. so I would simply separate this (and to be honest it would have to be a convincing change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, am just curious because usually you like removing duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but in this case you are doing a completely orthognal change, which is already pretty spaghetti like, so a re-factor is in order. and separate that these changes.
my basic point is that the inheritance order needs to change for this, IOW NonConsolidated needs to be moved to the end of the MRO (and not the beginning). Then this will simply work (of course that might break other things) |
ping on green. |
That makes sense; thanks for explaining. |
thanks! |
De-duplicate some bits of DatetimeBlock and DatetimeTZBlock.
hasattr(item, 'reindex_axis')
#19243git diff upstream/master -u -- "*.py" | flake8 --diff