-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.Grouper support? #364
Comments
I wrote a resample function last week based on TimeGrouper. See the dev docs for more details: http://xray.readthedocs.org/en/latest/whats-new.html This should go out in the 0.4.1 release, which I'd like to get out later this week (everyone likes faster release cycles if they are backwards compatible). It would be pretty straightforward to create some sort of API that gives direct access to the resulting GroupBy object. I was considering something like |
Looks good to me. I don't know enough to be able to comment on the API question. |
Well, I guess the first question is -- are there uses for TimeGrouper that you can't easily do with resample? I suppose the simplest (no new method) would be to allow passing a dict where the key is the time dimension and the value is the grouper. Something like |
Unfortunately I'm not familiar enough with pd.resample and pd.TeimGrouper to know the difference in what they can do. One thing that I would like to be able to do that is not covered by resample, and might be covered by TimeGrouper is to group over month only (not month and year), in order to create a plot of mean seasonal cycle (at monthly resolution), or similarly, a daily cycle at hourly resolution. I haven't figured out if I can do that with TimeGrouper yet though. |
Indeed, I need to complete the For your other use case, you just want to group by |
Heh, I meant the pandas docs - they don't specify the
|
For pandas resample, see here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling The doc string could definitely use an update there, too -- see pandas-dev/pandas#5023 (I think I'll try to update this, too) For I'm going to consolidate all the time/date functionality into a new documentation page for the next release of xray, since this is kind of all over the place now. Also, I should probably break up that monolithic page on "Data structures", perhaps into "Basics" and "Advanced" pages. |
Ah, cool, thanks for that link, I missed that in the docs. One thing that would be nice (in both pandas and xray) is a |
Hmm. However, it should work in pandas -- you can do
The simplest way to do timeofday, though, is probably just to calculate |
Nice. Ok, I have hit a stumbling block, and this is much more of a support request, so feel free to direct me else where, but since we're on the topic, I want to do something like:
where The assignment of |
Dunno if this is related to the |
same problem with |
I don't think the timeofday issue is related to using Timedeltas in the index (and it's certainly not related to the Here's an example that seems to be working properly (except for uselessly display timedeltas in nanoseconds):
|
Ok, weird. That example works for me, but even if I take a really short slice of my data set, the same thing won't work:
That last command will not complete - it will run for minutes. Not really sure how to debug that behaviour. Perhaps it's to do with the long/lat/height variables that really should be coordinates (I'm just using the data as it came, but I can clean that, if necessary) |
The problem is that you've created a new
Also, unlike pandas, xray currently does the core loop for all groupby operations in pure Python, which means that yes, it will be slow when you have a very large number of groups (and it loops again to handle your 15 different variables). Using something like Cython or Numba to speedup groupby operations is on my to-do list, but I've found this to be less of a barrier than you might expect for multi-dimensional datasets -- individual group members tend to include more elements than in DataFrames. |
Ah, yep, making the dimension using |
This is a very useful functionality. I am wondering if I can specify the time window, for example, like |
@saulomeirelles Nope, this hasn't been added yet, beyond what you can do with the current |
Thanks, @shoyer ! Here is an example of how I circumvented the problem:
In my case, the |
Hi, being able to pass a import pandas as pd
import xarray as xr
dates = pd.DatetimeIndex(['2017-01-01 15:00', '2017-01-02 14:00', '2017-01-02 23:00'])
da = xr.DataArray([1, 2, 3], dims=['time'], coords={'time': dates})
time_grouper = pd.TimeGrouper(freq='24h', base=15)
# digging around the source code for xr.DataArray.resample i found this
grouped = xr.core.groupby.DataArrayGroupBy(da, 'time', grouper=time_grouper)
for _, sub_da in grouped:
print(sub_da) which prints:
Would it be possible to add a da.groupby('time', grouper=time_grouper) |
Have you tried iterating over a resample object in the v0.10 release
candidate? I believe the new resample API supports iteration.
…On Thu, Nov 2, 2017 at 5:40 PM hazbottles ***@***.***> wrote:
Hi, being able to pass a pd.TimeGrouper to .groupby() would be really
handy. Here is my use-case and work around at the moment (.resample()
doesn't serve my needs because I need to iterate over the groups:
import pandas as pdimport xarray as xr
dates = pd.DatetimeIndex(['2017-01-01 15:00', '2017-01-02 14:00', '2017-01-02 23:00'])
da = xr.DataArray([1, 2, 3], dims=['time'], coords={'time': dates})
time_grouper = pd.TimeGrouper(freq='24h', base=15)
# digging around the source code for xr.DataArray.resample i found this
grouped = xr.core.groupby.DataArrayGroupBy(da, 'time', grouper=time_grouper)
for _, sub_da in grouped:
print(sub_da)
which prints:
<xarray.DataArray (time: 2)>
array([1, 2])
Coordinates:
* time (time) datetime64[ns] 2017-01-01T15:00:00 2017-01-02T14:00:00
<xarray.DataArray (time: 1)>
array([3])
Coordinates:
* time (time) datetime64[ns] 2017-01-02T23:00:00
Would it be possible to add a grouper kwarg to .groupby(), e.g.
da.groupby('time', grouper=time_grouper)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#364 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1sv9n8gxYr6Dne83Hfp0IJl_5gMvks5symDugaJpZM4DredK>
.
|
pd.TimeGrouper is deprecated in latest pandas release, so I imagine this bug should be closed. |
Well, the functionality is still there, it's just recommended that you use
pd.Grouper.
…On Wed, Nov 29, 2017 at 2:47 AM lexual ***@***.***> wrote:
pd.TimeGrouper is deprecated in latest pandas release, so I imagine this
bug should be closed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#364 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1pVxryyv73zANrbH-ncx-UjspMkFks5s7MXRgaJpZM4DredK>
.
|
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
We now support Xarray Grouper objects which are equivalent. |
In pandas, you can pas a
pandas.TimeGrouper
object to a.groupby()
call, and it allows you to group by month, year, day, or other times, without manually creating a new index with those values first. It would be great if you could do this withxray
, but at the moment, I get:Not sure how this will work though, because pandas.TimeGrouper doesn't appear to work with multi-index dataframes yet anyway, so maybe there needs to be a feature request over there too, or maybe it's better to implement something from scratch...
The text was updated successfully, but these errors were encountered: