-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add to_xarray conversion method #11972
Conversation
since I renamed the branch now really sure if there was a way to avoid creating a new PR. oh well. |
if self.ndim == 1: | ||
return xray.DataArray.from_series(self) | ||
elif self.ndim == 2: | ||
return xray.Dataset.from_dataframe(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer should this be different from just xray.DataArray(series)
or xray.Dataset(df)
?
This might be an xray change; seems a bit off to have special handling in pandas though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best would be: xarray.from_pandas(...)
(and you guys handle the construction).
Though ok here as well. as > 2 nd we want to have easy transition for current Panel
users.
@shoyer ping when releasing new |
@MaximilianR it would be quite helpful if you can post / write a mini-doc (where we can incorporate in a doc-string / document) on how to migrate can obviously also add to here: http://xray.readthedocs.org/en/stable/pandas.html |
@jreback Yes I can have a go at that. I could see that being fairly short given the existing docs - just a couple of examples of Panel migration. Is that what you envision? |
@MaximilianR yep, the migration part as well as 'working' with them (which might require another section). E.g. imagine you have |
c03bdd4
to
2a007b5
Compare
|
||
# > 2 dims | ||
coords = [(a, self._get_axis(a)) for a in self._AXIS_ORDERS] | ||
return xray.DataArray(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer you didn't like this?
as an aside, do you want to put this routine in xarray
and I'll just call xray.DataArray.from_pandas(self)
?
bea8f9d
to
2307129
Compare
updated to use works with all index types including MultiIndex! nice! |
I think you need to have RTD use |
@jreback I'm suggesting http://xarray.pydata.org/ instead... it turns out you can't change RTD stubs, so we're stuck with http://xray.readthedocs.org for now. |
|
||
# > 2 dims | ||
coords = [(a, self._get_axis(a)) for a in self._AXIS_ORDERS] | ||
return xarray.DataArray(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add Panel4D support to the DataArray constructor so this could just be xarray.DataArray(self)
... OTOH I'm pretty sure Panel4D is barely used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh its no big deal. you could always add later if its real useful.
expected, | ||
check_index_type=False) | ||
|
||
# not implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what am I doing wrong here?
In [1]: df = DataFrame({'a': list('abc'),
...: 'b': list(range(1, 4)),
...: 'c': np.arange(3, 6).astype('u1'),
...: 'd': np.arange(4.0, 7.0, dtype='float64'),
...: 'e': [True, False, True],
...: 'f': pd.Categorical(list('abc')),
...: 'g': pd.date_range('20130101', periods=3),
...: 'h': pd.date_range('20130101',
...: periods=3,
...: tz='US/Eastern')}
...: )
In [2]: df.to_xarray()
Out[2]:
<xarray.Dataset>
Dimensions: (index: 3)
Coordinates:
* index (index) int64 0 1 2
Data variables:
a (index) object 'a' 'b' 'c'
b (index) int64 1 2 3
c (index) uint8 3 4 5
d (index) float64 4.0 5.0 6.0
e (index) bool True False True
f (index) category 'a' 'b' 'c'
g (index) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03
h (index) datetime64[ns] 2013-01-01T05:00:00 2013-01-02T05:00:00 ...
In [3]: df = DataFrame({'a': list('abc'),
...: 'b': list(range(1, 4)),
...: 'c': np.arange(3, 6).astype('u1'),
...: 'd': np.arange(4.0, 7.0, dtype='float64'),
...: 'e': [True, False, True],
...: 'f': pd.Categorical(list('abc')),
...: 'g': pd.date_range('20130101', periods=3),
...: 'h': pd.date_range('20130101',
...: periods=3,
...: tz='US/Eastern')}
...: )
In [4]: df.index = pd.MultiIndex.from_product([['a'], range(3)],
...: names=['one', 'two'])
In [5]: df
Out[5]:
a b c d e f g h
one two
a 0 a 1 3 4 True a 2013-01-01 2013-01-01 00:00:00-05:00
1 b 2 4 5 False b 2013-01-02 2013-01-02 00:00:00-05:00
2 c 3 5 6 True c 2013-01-03 2013-01-03 00:00:00-05:00
In [6]: df.to_xarray()
ValueError: dimensions ('one', 'two') must have the same length as the number of data dimensions, ndim=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to blame this one (in part) on pandas's Categorical:
ipdb> series.values
[a, b, c]
Categories (3, object): [a, b, c]
ipdb> series.values.reshape(shape)
[a, b, c]
Categories (3, object): [a, b, c]
ipdb> shape
[1, 3]
Instead of erroring, it ignores the reshape argument (to 2D).
This certainly needs a fix in xarray, too, though -- we should use np.asarray
when converting DataFrame columns rather than .values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, going to merge as is then. do you want me to create an issue on xarray for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's merge -- I'll fix that in the next xarray bug fix release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep thxs
@shoyer note that conda has updated to include |
@@ -245,6 +245,7 @@ Optional Dependencies | |||
* `Cython <http://www.cython.org>`__: Only necessary to build development | |||
version. Version 0.19.1 or higher. | |||
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions | |||
* `xarray <http://xarray.readthedocs.org>`__: pandas like handling for > 2 dims. Version 0.7.0 or higher is recommeded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommeded -> recommended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also mention for what functionality this optional dependency is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@jreback @shoyer I've done a (currently very short & basic) draft of how to move from pandas to xarray, with the intention you guys can offer some feedback and we can iterate. Do you want me to do a separate PR into pandas? Or should this go into the 'from Pandas' section of the xarray docs, with a link from pandas' What's New? |
@MaximilianR I think best is to add to the |
Personally I think this maybe more belongs in the pandas docs. I mean, the xarray docs could certainly have a section about how it interplays with pandas (as it already has, http://xarray.pydata.org/en/stable/pandas.html), but as this will also be about how to move from the deprecated panel to xarray, it feels more at place in the pandas docs (I don't think xarray docs should handle about deprecated pandas features). Anyhow, not that important, as just link to wherever it is located. |
@jorisvandenbossche right, I have a note about that above, e.g. about how to transition from using a So @MaximilianR can certainly add to the pandas docs (but should obviously add to xarray as well) |
@@ -271,6 +271,7 @@ In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available thru t | |||
s | |||
s.dt.round('D') | |||
|
|||
<<<<<<< 6693a723aa2a8a53a071860a43804c173a7f92c6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep fixing as I am merging
thxs
@shoyer was just about to merge. anything else besides those 2 comments? (and the linked issue for more docs) |
@jorisvandenbossche I think I broke the doc build, but not sure how: https://travis-ci.org/pydata/pandas/jobs/108337474 |
|
||
See Also | ||
-------- | ||
`xarray docs <http://xarray.pydata.org/en/stable/>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it is this line where it is choking on, as in a See also should come a python object
I would just make a 'Note' section of it instead of See also, and then say "See also the xarray docs .."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supersedes pandas-dev#11950 xref pandas-dev#10000 Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#11972 from jreback/xarray and squashes the following commits: 85de0b7 [Jeff Reback] ENH: add to_xarray conversion method
supersedes #11950
xref #10000
using
xarray
>= 07.0TODO: