-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586
Comments
OK, after digging more deeply into it.... I discover that This is... deeply non-intuitive. I wish there was a way to change this without breaking a bunch of existing code. I suppose we could add |
IIRC also xref to #9528 as the fill args to |
Given how confusing |
Right, time for another @pandas-dev/pandas-core @pandas-dev/pandas-triage tag Would anyone object to a deprecation cycle to remove It's a really common source of confusion, nobody expects Concretely, I'd suggest:
|
FWIW in finance the APIs typically use ME for "month end" and M for "monthly" when dealing with scheduling. They might have different contexts but M is never used for month end. I would support the change even if its a long way to go. |
Not sure if there is any convention followed by other apps, or anything, but seems like an improvement. The current API seems quite confusing indeed. And changing the behavior seems easy if Just to be clear, we're talking about the |
I'd be OK with this deprecation. I get bit by this all of the time |
Yes, that's right, thanks - in |
You only mention monthly and yearly, but the other frequencies that default to right might have the same issue? (although less used): quarterly (Q), weekly (W), and then the business version of monthly/quarterly/yearly. |
Thanks - yes, those too |
"M" and "Y" would presumably still be used for Period/PeriodDtype. There are occasional issues with people being surprised that |
I would like to work on this issue.
I’ll start with warnings while passing freq=‘M’. |
When I worked in finance, we would do present value valuations to calculate interest rate risk. These kind of inconsistencies were present in some of the software, and they were a nightmare to detect and circumvent. Securities are quite standartized, so if bonds in a particular country mostly pay the coupon at the start of a period, they do so no matter what the period is. Fixing this would save a lot of people somewhere a lot of hours. |
Are we sure we want this? There's an example in the docs which shows: In[366]: p = pd.Period("2014-07", freq="M")
In[367]: p + pd.offsets.MonthEnd(3)
Out[367]: Period('2014-10', 'M')
In[368]: p + pd.offsets.MonthBegin(3)
Traceback
...
ValueError: Input has different freq from Period(freq=M) @jbrockmendel If the prefix were to stay as I think it might be simpler (and easier to teach) to just use |
From today's call: seeing as long-term the idea is to decouple Period from Offsets, both |
To be clear, this is something id like to see, but have no concrete plans to actually implement. There hasn't been a targeted discussion of the idea, except for the mention of it yesterday and lack of objection. |
This has been addressed, so I think the issue can be closed For anyone stumbling across this issue because they're using Example:
|
xref #2665
xref #5440
Resample appears to be use an inconsistent label convention depending on whether the target frequency is sub-daily/daily or super-daily:
label='left'
makes labels at the timestamp corresponding to the start of each frequency bin, andlabel='right'
that makes labels at that timestamp plus the frequency (at the timestamp dividing exactly dividing bins).'left'
to'right'
! My guess is that the default was changed here because users were confused bylabel='left'
no longer falling inside the expected interval. (I guess I could checkgit blame
for the details.)I found this behavior quite surprising and confusing. Is it intentional? I would like to rationalize this if possible, because this strikes me as very poor design. The behavior also couples in a weird way with the
closed
argument (see the linked issues).From my perspective (as someone who uses monthly and yearly data), the sub-daily/daily behavior makes sense and the super-daily behavior is a bug: there's no particular reason why it makes sense to use 1 day as an offset for frequencies with super-daily resolution.
CC @Cd48 @kdebrab
Here's my test script:
The text was updated successfully, but these errors were encountered: