Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Remove unused core.internals methods #19250

Merged
merged 14 commits into from
Jan 21, 2018

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 15, 2018

De-duplicate some bits of DatetimeBlock and DatetimeTZBlock.

@codecov
Copy link

codecov bot commented Jan 15, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@ae96187). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19250   +/-   ##
=========================================
  Coverage          ?   91.55%           
=========================================
  Files             ?      150           
  Lines             ?    48713           
  Branches          ?        0           
=========================================
  Hits              ?    44599           
  Misses            ?     4114           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.92% <100%> (?)
#single 41.74% <66.66%> (?)
Impacted Files Coverage Δ
pandas/core/internals.py 94.95% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae96187...59b3e60. Read the comment docs.

@@ -937,10 +918,10 @@ def putmask(self, mask, new, align=True, inplace=False, axis=0,

new_values = self.values if inplace else self.values.copy()

if hasattr(new, 'reindex_axis'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these may not be hit, I think I prefer the more idiomatic

new = getattr(new, 'values', new)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll check to see whether any value-less cases get through while I'm at it.

@@ -1929,6 +1910,7 @@ def should_store(self, value):


class DatetimeLikeBlockMixin(object):
freq = None # compat with Datetimelike Index subclasses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put comments above this, but think this comment is out of place, meaning that unless you have internal knowledge of the codebase you won't have any idea what this is, rather put a 1-liner that explains

@@ -2466,6 +2448,7 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block):
__slots__ = ()
is_datetime = True
_can_hold_na = True
tz = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -2603,6 +2586,12 @@ class DatetimeTZBlock(NonConsolidatableMixIn, DatetimeBlock):
_concatenator = staticmethod(_concat._concat_datetime)
is_datetimetz = True

get_values = DatetimeBlock.get_values # override NonConsolidatableMixin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather you write an actual function here (that can call DatetimeBlock.get_values), but that has a doc-string and such

@jreback jreback added Internals Related to non-user accessible pandas implementation Clean labels Jan 16, 2018
@@ -1929,6 +1903,9 @@ def should_store(self, value):


class DatetimeLikeBlockMixin(object):
# We add a dummy `freq` attribute here to make these block subclasses
# behave more like the DatetimelikeIndexOpsMixin implementations
freq = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose for this. Why should blocks behave more like DatetimelineIndexOpsMixin ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point for adding tz (for a little bit of deduplication), but freq is never used? Why adding it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose for this. Why should blocks behave more like DatetimelineIndexOpsMixin ?

@jorisvandenbossche Two parallel strands here.

a) #19147 and a bunch of others are moving towards having Series ops delegate to the Index subclass methods, which have careful implementations of timezone and type checking. But longer-term, it would be better for these methods to just delegate to e.g. self._data.blocks[0].__add__ (probably accessed more prettily) and have the implementations in the blocks. This will also make it easier for DataFrame ops to delegate correctly, see #18824. So ideally e.g. TimedeltaBlock and TimedeltaIndex would both inherit their arithmetic/comparison methods from TimedeltaArray or TimedeltaVector or something.

b) #19174 has PeriodArray subclass DatetimelikeOpsMixin, notes that it would be preferable to have the mixin class refactor out index-specific ops and separate the more array-like ops. I've been working along those lines, have found that small namespaces differences are the main thing standing in the way of this refactoring. So partly this is trying to smooth out those differences in a piece-meal way.

That definitely answers some question. Does it answer the right question?

I see the point for adding tz (for a little bit of deduplication), but freq is never used? Why adding it?

Purely for compat. This way DatetimeBlock._box_func == DatetimeTZBlock._box_func == DatetimeIndex._box_func.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that we are thinking about that direction, but those things would be added in an array class, not the block. Anyhow, let's change this if we actually get there, not prematurely, as it is now only adding unused code (I mean the freq here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@jorisvandenbossche jorisvandenbossche changed the title Remove unused core.internals methods CLN: Remove unused core.internals methods Jan 16, 2018
@@ -2559,7 +2540,7 @@ def _try_coerce_result(self, result):

@property
def _box_func(self):
return tslib.Timestamp
return lambda x: tslib.Timestamp(x, freq=self.freq, tz=self.tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just return
```self.values._box_func()``, no need to add any attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work for DatetimeTZBlock, but not for DatetimeBlock. This edit is to a) let them share the implementation and b) make the implementation match that of DatetimeIndex to move in the direction of sharing implementations.

@@ -2467,6 +2440,10 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block):
is_datetime = True
_can_hold_na = True

# We add a dummy `tz` attribute here to make these block subclasses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this too (and the tz propert and changes to box_func).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Not a fan of the idea of de-duplicating or just don't want to it in this PR?

return object dtype as boxed values, such as Timestamps/Timedelta
"""
# Note: this overrides the NonConsolidatableMixIn implementation
return DatetimeBlock.get_values(self, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than do this, move the impl from DatetimeBlock here, you are calling a sub-class, not a good idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to copy/paste the method? The original version in the PR was just get_values = DatetimeBlock.get_values to override the version from NonConsolidatableMixin (which is higher in the MRO).

Copy link
Contributor

@TomAugspurger TomAugspurger Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the idea was to

  1. Move DatetimeBlock.get_values here to the mixin
  2. Define DatetimeTZBlock.get_values as
def get_values(self, dtype=None):
    return DatetimeLikeBlockMixin.get_values(self, dtype)

Or, does changing DatetimeTZBlock's MRO to (DatetimeLikeBlockMixin, NonConsolidatableMixin, DatetimeBlock) break anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatetimeLikeBlockMixin already defines get_values. The original version from this PR get_values = DatetimeBlock.get_values could equivalently be get_values = DatetimeLikeBlockMixin.get_values, which I guess is slightly better.

Or, does changing DatetimeTZBlock's MRO to (DatetimeLikeBlockMixin, NonConsolidatableMixin, DatetimeBlock) break anything?

Haven't tried that, but I expect it would break a lot b/c I think nothing from NonConsolidatableMixin would get used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that wouldn't work since you'd be inheriting from DatetimeLikeBlockMixin twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just calling the super method what happens if you just remove it entirely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just calling the super method what happens if you just remove it entirely?

Argh. If this is removed entirely then the NonConsolidatableMixin.get_values implementation is used because it is at the top of the MRO. (I haven't checked how it breaks, but am assuming that whoever wrote the existing method overrode it for a reason). That's why all that is needed is a one-liner to point it towards the DatetimeBlock.get_values method.

@jbrockmendel
Copy link
Member Author

Travis error looks like OSX; I'll sit tight until advised otherwise.

@@ -2614,6 +2587,9 @@ def __init__(self, values, placement, ndim=2, dtype=None):
super(DatetimeTZBlock, self).__init__(values, placement=placement,
ndim=ndim)

# override NonConsolidatableMixIn implementation of get_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this for now and revert get_values from the Block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just keep the unnecessary implementation currently in DatetimeTZBlock?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to get this merged. yes. I really don't like this change the way it is. if you want to refactor into something more palable be my guest. but this PR would be on hold until then. so I would simply separate this (and to be honest it would have to be a convincing change).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, am just curious because usually you like removing duplicate code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but in this case you are doing a completely orthognal change, which is already pretty spaghetti like, so a re-factor is in order. and separate that these changes.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

my basic point is that the inheritance order needs to change for this, IOW NonConsolidated needs to be moved to the end of the MRO (and not the beginning). Then this will simply work (of course that might break other things)

@jreback jreback added this to the 0.23.0 milestone Jan 21, 2018
@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

ping on green.

@jbrockmendel
Copy link
Member Author

That makes sense; thanks for explaining.

@jreback jreback merged commit 6ba65ca into pandas-dev:master Jan 21, 2018
@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

thanks!

@jbrockmendel jbrockmendel deleted the unused_internals branch January 23, 2018 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core.internals checking hasattr(item, 'reindex_axis')
4 participants