BUG: Fix concat series loss of timezone #24027

evangelinehl · 2018-11-30T23:03:45Z

Closes #23816

pep8speaks · 2018-11-30T23:03:47Z

Hello @evangelineliu! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/dtypes/concat.py !
There are no PEP8 issues in the file pandas/tests/reshape/test_concat.py !

Comment last updated on December 05, 2018 at 22:33 Hours UTC

codecov · 2018-12-01T04:50:29Z

Codecov Report

Merging #24027 into master will increase coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #24027      +/-   ##
==========================================
+ Coverage   42.46%   42.46%   +<.01%     
==========================================
  Files         161      161              
  Lines       51557    51556       -1     
==========================================
  Hits        21892    21892              
+ Misses      29665    29664       -1

Flag	Coverage Δ
#single	`42.46% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`56.25% <0%> (ø)`	⬆️
pandas/core/reshape/tile.py	`11.69% <0%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b6b346...1804d7d. Read the comment docs.

codecov · 2018-12-01T04:50:29Z

Codecov Report

Merging #24027 into master will increase coverage by 49.68%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           master   #24027       +/-   ##
===========================================
+ Coverage   42.52%    92.2%   +49.68%     
===========================================
  Files         161      162        +1     
  Lines       51697    51727       +30     
===========================================
+ Hits        21982    47695    +25713     
+ Misses      29715     4032    -25683

Flag	Coverage Δ
#multiple	`90.6% <100%> (?)`
#single	`43.02% <0%> (+0.5%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`96.6% <100%> (+40.35%)`	⬆️
pandas/core/internals/construction.py	`96.64% <0%> (ø)`
pandas/core/computation/pytables.py	`92.37% <0%> (+0.3%)`	⬆️
pandas/io/pytables.py	`92.3% <0%> (+0.92%)`	⬆️
pandas/util/_test_decorators.py	`93.24% <0%> (+4.05%)`	⬆️
pandas/compat/__init__.py	`58.3% <0%> (+8.1%)`	⬆️
pandas/core/config_init.py	`99.24% <0%> (+9.84%)`	⬆️
pandas/core/reshape/util.py	`100% <0%> (+11.53%)`	⬆️
pandas/compat/numpy/__init__.py	`92.85% <0%> (+14.28%)`	⬆️
pandas/core/computation/common.py	`85.71% <0%> (+14.28%)`	⬆️
... and 120 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d3ed91...1867b3a. Read the comment docs.

… being imported in concat.py

TomAugspurger · 2018-12-01T16:47:09Z

Can you add “closes
https://github.com/pandas-dev/pandas/issues/23816” to the original post?

We also need a test (the one from the issue is fine) and a release note.

jreback · 2018-12-01T16:55:17Z

pandas/core/dtypes/concat.py

@@ -193,7 +193,8 @@ def _concat_categorical(to_concat, axis=0):

    def _concat_asobject(to_concat):
        to_concat = [x.get_values() if is_categorical_dtype(x.dtype)
-                     else np.asarray(x).ravel() for x in to_concat]
+                     else np.asarray(x).ravel() if not is_datetime64tz_dtype(x)


this is not the way
_concat_conpat already handles this

I didn’t realize we do call _concat_compat from here... I think we’ve already lost the tz by the time we call it.

So do these if checks not even need to be there? Not sure what this means for the change we had here

@jreback after closer examination, I'm not sure if _concat_compat actually already handles this. If the type is also a category, it simply calls _concat_categorical which then summons _concat_asobject. I think that therefore the change to the code should happen inside _concat_categorical or _concat_asobject.

@jreback calling _concat_compat before the list is modified like this causes _concat_compat to just call _concat_categorical again and back and forth until they reach the maximum recursion depth. We have to modify the array somehow before that happens.

I guess I'm not sure why we aren't doing

to_concat = [np.asarray(x.astype(object)) for x in to_concat]

The function name is _concat_asobject, so let's convert to object and concat them. Can you see if that works?

Please write the test first though, and ensure that the test fails with master. Then implement the fix and make sure it works.

I believe I tried this previously but the build fails (I think that was the dimensions bug we had before) @TomAugspurger

Seems like we have some inconsistencies in concat that's breaking this approach. IIRC there's an outstanding PR fixing some of these up.

I was going by the loose rule of "different dtypes means the result type is object". But things like

In [1]: import pandas as pd; import numpy as np; import pickle In [2]: a = pd.Series([1, 2], dtype='category') In [3]: b = pd.Series([1, None], dtype='category') In [4]: pd.concat([a, b], ignore_index=True) Out[4]: 0 1.0 1 2.0 2 1.0 3 NaN dtype: float64

mess that up. Not sure what's best here...

TomAugspurger · 2018-12-01T17:01:11Z

_concat_compat isn’t called from categorical. Maybe it could be.

________________________________ From: Jeff Reback <notifications@github.com> Sent: Saturday, December 1, 2018 10:55 AM To: pandas-dev/pandas Cc: Tom Augspurger; Comment Subject: Re: [pandas-dev/pandas] BUG: Fix concat series loss of timezone (#24027) @jreback requested changes on this pull request.

________________________________ In pandas/core/dtypes/concat.py<#24027 (comment)>:

@@ -193,7 +193,8 @@ def _concat_categorical(to_concat, axis=0):

def _concat_asobject(to_concat): to_concat = [x.get_values() if is_categorical_dtype(x.dtype) - else np.asarray(x).ravel() for x in to_concat] + else np.asarray(x).ravel() if not is_datetime64tz_dtype(x) this is not the way _concat_conpat already handles this — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#24027 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHInqCusNq4uKR4J1EEmqEonkDD7M1ks5u0rR-gaJpZM4Y8iIc>.

evangelinehl · 2018-12-01T17:09:57Z

Hi, so should we be calling from concat_compat? Why is calling from concat_compat better? Does the current way of fixing the issue conflict with something else in the codebase? @jreback

jakezimmer · 2018-12-01T17:48:21Z

Just wanted to test something here, I'm decently sure that these tests will fail.

… aren't

jreback

this needs a test first thing

evangelinehl · 2018-12-02T17:42:36Z

Sorry, still trying to figure out why the dimensions would be different for the arrays passed to np.concatenate. In your experience why does this usually happen?

jreback · 2018-12-02T17:51:33Z

Sorry, still trying to figure out why the dimensions would be different for the arrays passed to np.concatenate. In your experience why does this usually happen?

datetimes with timezones are backed by a DatetimeIndex itself which is only 1-D object. datetimes that are naive are 2-D blocks internaly.

…actually takes care of the if checks

evangelinehl · 2018-12-02T22:52:24Z

Taking a look at concat_compat, I agree with @jakezimmer that I'm not super sure if it actually takes care of the if checks that we tried to delete.

evangelinehl · 2018-12-03T20:41:16Z

Is this a test for the new output? Because I tried what you said earlier already but it didn't build

TomAugspurger · 2018-12-03T20:42:26Z

That's the expected output from whatever ends up fixing the original issue.

evangelinehl · 2018-12-03T20:53:07Z

Should we add in the test case? I'm a little bit confused about what's happening exactly with the tests

TomAugspurger · 2018-12-03T20:55:40Z

Yes. That test is asserting the correct behavior when concatenating a Categorical and timezone-aware series. The expected output is an object-dtype series.

jreback · 2018-12-04T03:38:25Z

@evangelineliu pushed a small cleanup. pls add a whatsnew note (reshaping bug fixes). ping on green.

jakezimmer · 2018-12-04T19:41:50Z

Hi @jreback, I noticed that the fix up commit broke some linting as you can see here. Do you know what could be causing this?

Also I added the whatsnew change. This is my first time doing this so would you be able to take a look at the proposed change and see if it is acceptable.

TomAugspurger · 2018-12-04T19:45:51Z

I'm working soon that in #24092

…

On Tue, Dec 4, 2018 at 1:41 PM jakezimmer ***@***.***> wrote: Hi @jreback <https://github.com/jreback>, I noticed that the fix up commit <c7dcdb4> broke some linting as you can see here <https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4657> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24027 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIjYIy1FN-fq1YfdDip9kBjrH-Tpbks5u1tAHgaJpZM4Y8iIc> .

attempting to rerun the tests

jakezimmer · 2018-12-05T08:00:03Z

@jreback all tests passed!

jreback

ping on green.

jreback · 2018-12-05T12:26:38Z

doc/source/whatsnew/v0.24.0.rst

@@ -1545,6 +1545,7 @@ Reshaping
 - Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
 - Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)
 - Bug in :func:`cut` with ``bins`` as an overlapping ``IntervalIndex`` where multiple bins were returned per item instead of raising a ``ValueError`` (:issue:`23980`)
+- Bug in :func:`pandas.concat` when joining series datetimetz with series category would lose timezone (:issue:`23816`)


Bug in :func:`pandas.concat` when joining a datetime w/tz aware Series with a categorical dtyped Series, lose timezone (:issue:`23816`)

use double backticks on Series

jreback · 2018-12-05T12:27:06Z

also pls merge master

added double backticks on Series

jreback · 2018-12-05T22:48:41Z

thanks ping on green.

jakezimmer · 2018-12-05T22:56:08Z

@jreback do you know what would cause pandas azure pipelines job to fail like this on the "before install" task. It looks like it couldn't download anaconda. The only change was removing that whitespace and it has never happened in previous commits. Is there a way to try that test?

You can see the error here under Linux py36_locale_slow:
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4780

jreback · 2018-12-05T23:22:14Z

thanks @jakezimmer

that pipeline failed on some network timeout.

commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)

jakezimmer force-pushed the master branch from 99f30b8 to 1804d7d Compare December 1, 2018 04:50

BUG: Fix concat series loss of timezone

d80afa0

jakezimmer force-pushed the master branch from 1804d7d to d80afa0 Compare December 1, 2018 06:03

evangelinehl added 2 commits December 1, 2018 11:07

Merge branch 'master' of https://github.com/pandas-dev/pandas

f4b751d

Fixed naming error for is_datetimetz since this function is no longer…

159c4e6

… being imported in concat.py

jreback requested changes Dec 1, 2018

View reviewed changes

jakezimmer added 2 commits December 1, 2018 12:26

Attempted to use _concat_compat to rectify the timezone bug

2450097

Merge remote-tracking branch 'origin/master'

9cb20c4

jakezimmer and others added 5 commits December 1, 2018 12:57

Attempt to fix tz error with concat compat instead of union

7f9dd52

changing behavior to be based on tz

6cb2022

Attempting to fix differing dimensions bug

a4da449

Another attempt to fix dimensions bug

fe83e6d

Just trying to test different versions here

2cbb533

gfyoung added Bug Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 2, 2018

evangelinehl added 2 commits December 2, 2018 10:25

Trying to fix dimensions bug now that Travis CI is passing but others…

f527dcc

… aren't

tests failed so changing it back to when travis ci succeeded

583ce49

jreback requested changes Dec 2, 2018

View reviewed changes

Changing it back because we're trying to figure out if concat_compat …

01a2c10

…actually takes care of the if checks

Restored blank lines

857c6be

evangelinehl and others added 5 commits December 3, 2018 16:45

Added test case for the new tz output

64da4c0

Fixed style issues

9e699e4

Fixed the whitespace issue in linting

64182c5

Merge branch 'master' into PR_TOOL_MERGE_PR_24027

b630d58

fix up

c7dcdb4

jreback added this to the 0.24.0 milestone Dec 4, 2018

jakezimmer added 2 commits December 4, 2018 14:08

updated whatsnew (v0.24.0) to reflect changes

165689e

Merge branch 'master' into master

43b2e2a

no changes since @jreback's fix up commit

634c736

attempting to rerun the tests

jreback requested changes Dec 5, 2018

View reviewed changes

jakezimmer added 2 commits December 5, 2018 13:36

Update v0.24.0.rst

0b86ef9

added double backticks on Series

removed trailing whitespace

1867b3a

jreback approved these changes Dec 5, 2018

View reviewed changes

jreback merged commit b841374 into pandas-dev:master Dec 5, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: Fix concat series loss of timezone (pandas-dev#24027)

1d20cc5

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: Fix concat series loss of timezone (pandas-dev#24027)

3dda928

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix concat series loss of timezone #24027

BUG: Fix concat series loss of timezone #24027

evangelinehl commented Nov 30, 2018 •

edited by gfyoung

Loading

pep8speaks commented Nov 30, 2018 •

edited

Loading

codecov bot commented Dec 1, 2018

codecov bot commented Dec 1, 2018 •

edited

Loading

TomAugspurger commented Dec 1, 2018

jreback Dec 1, 2018

TomAugspurger Dec 1, 2018

evangelinehl Dec 1, 2018

jakezimmer Dec 2, 2018

jakezimmer Dec 2, 2018

TomAugspurger Dec 3, 2018

evangelinehl Dec 3, 2018 •

edited

Loading

TomAugspurger Dec 3, 2018

TomAugspurger commented Dec 1, 2018 via email

evangelinehl commented Dec 1, 2018 •

edited

Loading

jakezimmer commented Dec 1, 2018

jreback left a comment

evangelinehl commented Dec 2, 2018

jreback commented Dec 2, 2018

evangelinehl commented Dec 2, 2018

evangelinehl commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

evangelinehl commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 4, 2018

jakezimmer commented Dec 4, 2018 •

edited

Loading

TomAugspurger commented Dec 4, 2018 via email

jakezimmer commented Dec 5, 2018

jreback left a comment

jreback Dec 5, 2018

jreback commented Dec 5, 2018

jreback commented Dec 5, 2018

jakezimmer commented Dec 5, 2018

jreback commented Dec 5, 2018

BUG: Fix concat series loss of timezone #24027

BUG: Fix concat series loss of timezone #24027

Conversation

evangelinehl commented Nov 30, 2018 • edited by gfyoung Loading

pep8speaks commented Nov 30, 2018 • edited Loading

Comment last updated on December 05, 2018 at 22:33 Hours UTC

codecov bot commented Dec 1, 2018

Codecov Report

codecov bot commented Dec 1, 2018 • edited Loading

Codecov Report

TomAugspurger commented Dec 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evangelinehl Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 1, 2018 via email

evangelinehl commented Dec 1, 2018 • edited Loading

jakezimmer commented Dec 1, 2018

jreback left a comment

Choose a reason for hiding this comment

evangelinehl commented Dec 2, 2018

jreback commented Dec 2, 2018

evangelinehl commented Dec 2, 2018

evangelinehl commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

evangelinehl commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 4, 2018

jakezimmer commented Dec 4, 2018 • edited Loading

TomAugspurger commented Dec 4, 2018 via email

jakezimmer commented Dec 5, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 5, 2018

jreback commented Dec 5, 2018

jakezimmer commented Dec 5, 2018

jreback commented Dec 5, 2018

evangelinehl commented Nov 30, 2018 •

edited by gfyoung

Loading

pep8speaks commented Nov 30, 2018 •

edited

Loading

codecov bot commented Dec 1, 2018 •

edited

Loading

evangelinehl Dec 3, 2018 •

edited

Loading

evangelinehl commented Dec 1, 2018 •

edited

Loading

jakezimmer commented Dec 4, 2018 •

edited

Loading