-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix concat series loss of timezone #24027
Conversation
Hello @evangelineliu! Thanks for updating the PR.
Comment last updated on December 05, 2018 at 22:33 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #24027 +/- ##
==========================================
+ Coverage 42.46% 42.46% +<.01%
==========================================
Files 161 161
Lines 51557 51556 -1
==========================================
Hits 21892 21892
+ Misses 29665 29664 -1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24027 +/- ##
===========================================
+ Coverage 42.52% 92.2% +49.68%
===========================================
Files 161 162 +1
Lines 51697 51727 +30
===========================================
+ Hits 21982 47695 +25713
+ Misses 29715 4032 -25683
Continue to review full report at Codecov.
|
Can you add “closes We also need a test (the one from the issue is fine) and a release note. |
pandas/core/dtypes/concat.py
Outdated
@@ -193,7 +193,8 @@ def _concat_categorical(to_concat, axis=0): | |||
|
|||
def _concat_asobject(to_concat): | |||
to_concat = [x.get_values() if is_categorical_dtype(x.dtype) | |||
else np.asarray(x).ravel() for x in to_concat] | |||
else np.asarray(x).ravel() if not is_datetime64tz_dtype(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the way
_concat_conpat already handles this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t realize we do call _concat_compat from here... I think we’ve already lost the tz by the time we call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do these if checks not even need to be there? Not sure what this means for the change we had here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback after closer examination, I'm not sure if _concat_compat actually already handles this. If the type is also a category, it simply calls _concat_categorical which then summons _concat_asobject. I think that therefore the change to the code should happen inside _concat_categorical or _concat_asobject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback calling _concat_compat before the list is modified like this causes _concat_compat to just call _concat_categorical again and back and forth until they reach the maximum recursion depth. We have to modify the array somehow before that happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm not sure why we aren't doing
to_concat = [np.asarray(x.astype(object)) for x in to_concat]
The function name is _concat_asobject
, so let's convert to object and concat them. Can you see if that works?
Please write the test first though, and ensure that the test fails with master. Then implement the fix and make sure it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I tried this previously but the build fails (I think that was the dimensions bug we had before) @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we have some inconsistencies in concat
that's breaking this approach. IIRC there's an outstanding PR fixing some of these up.
I was going by the loose rule of "different dtypes means the result type is object". But things like
In [1]: import pandas as pd; import numpy as np; import pickle
In [2]: a = pd.Series([1, 2], dtype='category')
In [3]: b = pd.Series([1, None], dtype='category')
In [4]: pd.concat([a, b], ignore_index=True)
Out[4]:
0 1.0
1 2.0
2 1.0
3 NaN
dtype: float64
mess that up. Not sure what's best here...
_concat_compat isn’t called from categorical. Maybe it could be.
________________________________
From: Jeff Reback <notifications@github.com>
Sent: Saturday, December 1, 2018 10:55 AM
To: pandas-dev/pandas
Cc: Tom Augspurger; Comment
Subject: Re: [pandas-dev/pandas] BUG: Fix concat series loss of timezone (#24027)
@jreback requested changes on this pull request.
________________________________
In pandas/core/dtypes/concat.py<#24027 (comment)>:
@@ -193,7 +193,8 @@ def _concat_categorical(to_concat, axis=0):
def _concat_asobject(to_concat):
to_concat = [x.get_values() if is_categorical_dtype(x.dtype)
- else np.asarray(x).ravel() for x in to_concat]
+ else np.asarray(x).ravel() if not is_datetime64tz_dtype(x)
this is not the way
_concat_conpat already handles this
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#24027 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHInqCusNq4uKR4J1EEmqEonkDD7M1ks5u0rR-gaJpZM4Y8iIc>.
|
Hi, so should we be calling from concat_compat? Why is calling from concat_compat better? Does the current way of fixing the issue conflict with something else in the codebase? @jreback |
Just wanted to test something here, I'm decently sure that these tests will fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a test first thing
Sorry, still trying to figure out why the dimensions would be different for the arrays passed to np.concatenate. In your experience why does this usually happen? |
datetimes with timezones are backed by a DatetimeIndex itself which is only 1-D object. datetimes that are naive are 2-D blocks internaly. |
…actually takes care of the if checks
Taking a look at concat_compat, I agree with @jakezimmer that I'm not super sure if it actually takes care of the if checks that we tried to delete. |
Is this a test for the new output? Because I tried what you said earlier already but it didn't build |
That's the expected output from whatever ends up fixing the original issue. |
Should we add in the test case? I'm a little bit confused about what's happening exactly with the tests |
Yes. That test is asserting the correct behavior when concatenating a Categorical and timezone-aware series. The expected output is an object-dtype series. |
@evangelineliu pushed a small cleanup. pls add a whatsnew note (reshaping bug fixes). ping on green. |
Hi @jreback, I noticed that the fix up commit broke some linting as you can see here. Do you know what could be causing this? Also I added the whatsnew change. This is my first time doing this so would you be able to take a look at the proposed change and see if it is acceptable. |
I'm working soon that in #24092
…On Tue, Dec 4, 2018 at 1:41 PM jakezimmer ***@***.***> wrote:
Hi @jreback <https://github.com/jreback>, I noticed that the fix up commit
<c7dcdb4>
broke some linting as you can see here
<https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4657>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24027 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIjYIy1FN-fq1YfdDip9kBjrH-Tpbks5u1tAHgaJpZM4Y8iIc>
.
|
attempting to rerun the tests
@jreback all tests passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping on green.
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1545,6 +1545,7 @@ Reshaping | |||
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`) | |||
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`) | |||
- Bug in :func:`cut` with ``bins`` as an overlapping ``IntervalIndex`` where multiple bins were returned per item instead of raising a ``ValueError`` (:issue:`23980`) | |||
- Bug in :func:`pandas.concat` when joining series datetimetz with series category would lose timezone (:issue:`23816`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in :func:`pandas.concat` when joining a datetime w/tz aware Series with a categorical dtyped Series, lose timezone (:issue:`23816`)
use double backticks on Series
also pls merge master |
added double backticks on Series
thanks ping on green. |
@jreback do you know what would cause pandas azure pipelines job to fail like this on the "before install" task. It looks like it couldn't download anaconda. The only change was removing that whitespace and it has never happened in previous commits. Is there a way to try that test? You can see the error here under Linux py36_locale_slow: |
thanks @jakezimmer that pipeline failed on some network timeout. |
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
Closes #23816