Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement roll_monthday, simplify SemiMonthOffset #18762

Merged
merged 19 commits into from
Dec 30, 2017

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 13, 2017

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@@ -844,6 +841,42 @@ cpdef int get_day_of_month(datetime other, day_opt) except? -1:
raise ValueError(day_opt)


def _roll_monthday(n, other, compare):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be de-privatized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you cpdef this? and type things, add a doc-string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have 2 possibilities for types (they must be both either integers or dateimes), then have 2 functions and name it that way, much more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. One consideration that might push in the other direction if we want to get rid of duplicate code: roll_yearday and roll_monthday are basically special cases of roll_qtrday with modby of 12 and 1, respectively. Merging them would mean doing some unnecessary mod calls, no idea what the perf hit would be.

return n


cpdef inline int roll_qtrday(other, n, month, day_opt='start',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback
Copy link
Contributor

jreback commented Dec 13, 2017

once have cleaned code, show perf benchmarks for this (and add if we don't have appropriate ones).

@jreback jreback added Frequency DateOffsets Performance Memory or execution speed performance labels Dec 13, 2017
@jbrockmendel
Copy link
Member Author

Looks like SemiMonthOffset gets the slight bump we'd expect and everything else is noise.

asv continuous -f 1.1 -E virtualenv master HEAD -b offset
[...]
   before     after       ratio
  [96439fb1] [eb5e72f5]
+   16.23μs    23.12μs      1.42  offset.CBDay.time_custom_bday_apply
+    6.87ms     8.53ms      1.24  offset.ApplyIndex.time_apply_series(<QuarterBegin: startingMonth=3>)
+   18.05μs    21.92μs      1.21  offset.CBDay.time_custom_bday_incr
+   16.97μs    19.10μs      1.13  offset.CBDay.time_custom_bday_apply_dt64
+    1.16μs     1.28μs      1.11  timestamp.TimestampProperties.time_offset(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-   12.03μs    10.91μs      0.91  offset.SemiMonthOffset.time_begin_apply
-    6.05ms     5.46ms      0.90  offset.ApplyIndex.time_apply_index(<BusinessYearBegin: month=1>)
-   23.00μs    20.70μs      0.90  offset.SemiMonthOffset.time_end_decr_n
-   11.75μs    10.54μs      0.90  offset.SemiMonthOffset.time_end_apply
-   15.54μs    13.77μs      0.89  offset.SemiMonthOffset.time_end_incr
-    4.34ms     3.83ms      0.88  offset.SeriesArithmetic.time_add_offset_delta
-   29.34μs    25.84μs      0.88  offset.CBDayHolidays.time_custom_bday_cal_incr
-    8.40ms     7.29ms      0.87  offset.ApplyIndex.time_apply_index(<BusinessYearEnd: month=12>)
-   20.91μs    17.11μs      0.82  offset.SemiMonthOffset.time_begin_decr
-   22.30μs    17.18μs      0.77  offset.SemiMonthOffset.time_end_decr
-   14.15ms     7.96ms      0.56  offset.ApplyIndex.time_apply_series(<BusinessMonthBegin>)
-   16.55ms     9.24ms      0.56  offset.ApplyIndex.time_apply_series(<BusinessMonthEnd>)
taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b offset
[...]
    before     after       ratio
  [96439fb1] [eb5e72f5]
+    8.11ms    15.89ms      1.96  offset.ApplyIndex.time_apply_series(<YearEnd: month=12>)
+   20.26μs    29.91μs      1.48  offset.SemiMonthOffset.time_begin_incr_n
+   16.89μs    20.50μs      1.21  offset.Day.time_timeseries_day_apply
+   28.96μs    32.43μs      1.12  offset.CBDayHolidays.time_custom_bday_cal_incr_neg_n
-   15.23μs    13.66μs      0.90  offset.SemiMonthOffset.time_begin_incr
-    7.66ms     6.86ms      0.90  offset.ApplyIndex.time_apply_index(<BusinessQuarterEnd: startingMonth=3>)
-   21.74μs    19.35μs      0.89  offset.SemiMonthOffset.time_end_decr_n
-    8.38ms     7.38ms      0.88  offset.ApplyIndex.time_apply_series(<BusinessYearBegin: month=1>)
-   12.71μs    11.08μs      0.87  offset.SemiMonthOffset.time_end_apply
-   19.09μs    16.60μs      0.87  offset.SemiMonthOffset.time_end_incr_n
-   10.41ms     8.84ms      0.85  offset.ApplyIndex.time_apply_series(<BusinessQuarterEnd: startingMonth=3>)
-   22.04μs    18.60μs      0.84  offset.SemiMonthOffset.time_begin_decr_n
taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b offset
[...]
    before     after       ratio
  [96439fb1] [eb5e72f5]
+   13.10μs    25.49μs      1.95  offset.YearBegin.time_timeseries_year_incr
+    3.54ms     6.22ms      1.76  offset.SeriesArithmetic.time_add_offset_delta
+   23.15ms    26.46ms      1.14  frame_ctor.FromDictwithTimestamp.time_dict_with_timestamp_offsets(<Hour>)
+   14.95μs    16.76μs      1.12  offset.CBDay.time_custom_bday_apply
+   17.63μs    19.41μs      1.10  offset.CBDay.time_custom_bday_incr
-   31.88μs    28.77μs      0.90  offset.CBDayHolidays.time_custom_bday_cal_incr_neg_n
-   20.69μs    18.62μs      0.90  offset.SemiMonthOffset.time_end_decr_n
-   11.50μs    10.33μs      0.90  offset.SemiMonthOffset.time_end_apply
-   20.48μs    18.36μs      0.90  offset.SemiMonthOffset.time_begin_decr
-    8.16ms     7.15ms      0.88  offset.ApplyIndex.time_apply_series(<QuarterEnd: startingMonth=3>)
-   15.90μs    13.51μs      0.85  offset.SemiMonthOffset.time_begin_incr
-   14.35μs    10.19μs      0.71  offset.SemiMonthOffset.time_begin_apply
-   15.78ms     9.18ms      0.58  offset.ApplyIndex.time_apply_series(<BusinessQuarterEnd: startingMonth=3>)

@codecov
Copy link

codecov bot commented Dec 13, 2017

Codecov Report

Merging #18762 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18762      +/-   ##
==========================================
- Coverage   91.58%   91.57%   -0.01%     
==========================================
  Files         150      150              
  Lines       48972    48933      -39     
==========================================
- Hits        44851    44812      -39     
  Misses       4121     4121
Flag Coverage Δ
#multiple 89.94% <100%> (-0.01%) ⬇️
#single 41.75% <39.39%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 96.97% <100%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8433562...a5d9dee. Read the comment docs.

----------
other : datetime or Timestamp
n : number of periods to increment, before adjusting for rolling
day_opt : 'start', 'end', 'business_start', 'business_end'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not match the signature


Parameters
----------
other : datetime or Timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs typing

Parameters
----------
other : datetime or Timestamp
n : number of periods to increment, before adjusting for rolling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not match the signature

@jbrockmendel
Copy link
Member Author

Travis error is in test_geopandas; will wait to re-push until given the go-ahead.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2017

I pushed a fix for geopandas you can rebase

@@ -777,6 +773,7 @@ cpdef datetime shift_month(datetime stamp, int months, object day_opt=None):
return stamp.replace(year=year, month=month, day=day)


# TODO: Can we declare this so it will take datetime _or_ pandas_datetimestruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no please don't functions should not take different types like this. have 2 functions. we specifically generate templates when we have to cover lots of dtypes (not suggesting this at all here though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will remove this comment and address similar things you've mentioned.

----------
other : datetime or int
n : number of periods to increment, before adjusting for rolling
compare : datetime or int (must match `other`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer not to do this, have separate functions. (compare)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, am separating them now. With some luck (and possibly help to confirm that the complexity in the getOffsetOfMonth methods is unneeded) we'll be able to get rid of the need to handle both cases in liboffsets.

@@ -2127,6 +2081,7 @@ def apply(self, other):
n -= 1
elif n < 0 and other > current_easter:
n += 1
# TODO: Why does this handle the 0 case the opposite of others?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if this is intentional?

@jbrockmendel
Copy link
Member Author

Timeout

@jbrockmendel
Copy link
Member Author

Will this be easier if split into smaller pieces?

@@ -827,7 +823,55 @@ cpdef int get_day_of_month(datetime other, day_opt) except? -1:
raise ValueError(day_opt)


cpdef int roll_yearday(other, n, month, day_opt='start') except? -1:
cpdef int _roll_convention(int other, int n, int compare):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not very intutive, nor should you generally be calling this as a public function of a cython routine. name this something else or create separate functions so compare is more readable here. I don't even mind repeating some code, as trying to shove multiple functions into the same one is simply a bad idea here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. This is in reference to _roll_convention? I can rename it I guess, but I don't see how it could be separated into more specific functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

@@ -1122,21 +1097,21 @@ def rule_code(self):

@apply_wraps
def apply(self, other):
n = self.n
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #18877 and #18875, the randomized testing I'm running locally is turning up bugs (pytz.NonExistentTimeError notwithstanding) exclusively in SemiMonthOffset and FY5253Quarter.

other : int, generally the day component of a datetime
n : number of periods to increment, before adjusting for rolling
compare : int, generally the day component of a datetime, in the same
month as the datetime form which `other` was taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very very confusing. you are using this in 2 separate ways. I would prefer 2 functions one for day one for month

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the issue with the docstring in roll_convention? or the name? I dont think the function itself can be broken down any further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this concern be ameliorated if roll_monthday were removed? I expect all of the places where it is used are eventually going to be simplified/fixed to use roll_convention (or something equivalent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but you are using this two differen types so make
roll_monthday_interger, roll_monthday_datetime or whatever. have 1 function do 1 thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the diff is obfuscating the fact that roll_monthday was split several commits back into roll_monthday (which takes datetimes) and roll_convention (which takes ints)?

@@ -827,7 +823,55 @@ cpdef int get_day_of_month(datetime other, day_opt) except? -1:
raise ValueError(day_opt)


cpdef int roll_yearday(other, n, month, day_opt='start') except? -1:
cpdef int _roll_convention(int other, int n, int compare):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

@jbrockmendel
Copy link
Member Author

This is becoming a blocker for fixing bugs in SemiMonthOffsets.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some testing roll_monthday, roll_qtrday, and roll_convention like you have for roll_yearday in test_liboffsets.py



cpdef int roll_yearday(datetime other, int n, int month,
object day_opt='start') except? -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make not-optional



cpdef int roll_qtrday(datetime other, int n, int month, day_opt='start',
int modby=3) except? -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make day_opt non-optional

@@ -827,7 +823,55 @@ cpdef int get_day_of_month(datetime other, day_opt) except? -1:
raise ValueError(day_opt)


cpdef int roll_yearday(other, n, month, day_opt='start') except? -1:
cpdef int _roll_convention(int other, int n, int compare):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this private?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. nice tests!

only minor nitpick and can be fixed later (or here if you want). is that the signature for shift_months and the roll_* use slightly different terms, e.g.

shift_months(datetime stamp, int months)

while

roll_monthday(datetime other, int n)

so maybe make this consistent by making the args
datetime value, int n (roll_convention has an int first arg so can't really call this stamp)

@jreback jreback added this to the 0.23.0 milestone Dec 29, 2017
@jbrockmendel
Copy link
Member Author

only minor nitpick and can be fixed later (or here if you want)

Definitely later. Some of this has crept into the scope of #18959, and roll_monthday I'm hoping will be made unnecessary after some of the upcoming bugfixes.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2017

ok, make a note of that, and rebase. ping on green.

@jbrockmendel
Copy link
Member Author

ping

@jreback jreback merged commit c24a0d2 into pandas-dev:master Dec 30, 2017
@jreback
Copy link
Contributor

jreback commented Dec 30, 2017

thanks!

hexgnu pushed a commit to hexgnu/pandas that referenced this pull request Jan 1, 2018
@jbrockmendel jbrockmendel deleted the qtr_ngt1 branch January 23, 2018 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants