Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in tz_convert_single? #17734

Closed
jbrockmendel opened this issue Oct 2, 2017 · 5 comments · Fixed by #18228
Closed

Inconsistency in tz_convert_single? #17734

jbrockmendel opened this issue Oct 2, 2017 · 5 comments · Fixed by #18228
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@jbrockmendel
Copy link
Member

Two things in tslib look slightly fishy, merit double-checking.

https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslib.pyx#L3617

cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2):
    [...]
    if get_timezone(tz2) == 'UTC':
        return utc_date
    if is_tzlocal(tz2):
        pandas_datetime_to_datetimestruct(val, PANDAS_FR_ns, &dts)      # <--- Is this right?
        dt = datetime(dts.year, dts.month, dts.day, dts.hour,
                      dts.min, dts.sec, dts.us, tz2)
        delta = int(get_utcoffset(tz2, dt).total_seconds()) * 1000000000
        return utc_date + delta
[...]
    return utc_date + offset

The piece that looks off is the use of val instead of utc_date. The analogous portion of tz_convert reads:

    if is_tzlocal(tz2):
        for i in range(n):
            v = utc_dates[i]
            if v == NPY_NAT:
                result[i] = NPY_NAT
            else:
                pandas_datetime_to_datetimestruct(v, PANDAS_FR_ns, &dts)   # < ---  Is this right?
                dt = datetime(dts.year, dts.month, dts.day, dts.hour,
                              dts.min, dts.sec, dts.us, tz2)
                delta = (int(get_utcoffset(tz2, dt).total_seconds())
                             * 1000000000)
                result[i] = v + delta

i.e. these appear inconsistent. Adding an assertion assert utc_date == val to the relevant case in tz_convert_single does not cause any test failures. Can someone confirm that this is correct as-is?


_localize_tso has a catch-all branch that looks like it should never be hit. If that is correct, then this can be simplified a decent amount.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2017

yeah there is an embedded bug somewhere here, IOW we have both the multi convert and the single one. merits investigating.

@jreback jreback added the Timezones Timezone data dtype label Oct 2, 2017
@jbrockmendel
Copy link
Member Author

OK. One approach is to brute force over args to tz_convert and tz_convert_single to find where they disagree. It looks like this is dependent on mocking out dateutil.tz.tzlocal, which in turn requires mocking time.timezone and a couple of related attrs. Need to figure out what constitute valid values for those.

@jbrockmendel
Copy link
Member Author

Trying to track this down, a couple other questions come up:

The is_tzlocal(tz) branch in tslib.tz_localize_to_utc is very similar to the is_tzlocal(tz1) branch in tslib.tz_convert. The only difference is that there are checks for NPY_NAT in tz_convert. Are those checks unnecessary for tz_localize_to_utc?

The last loop in tz_localize_to_utc looks like a more careful version of the last loop in tz_convert. Is that correct?

@jbrockmendel
Copy link
Member Author

Got a case where tz_convert disagrees with tz_convert_single. Unfortunately this is specific to tzlocal being US/Pacific (haven't found a great way to mock that yet).

>>> tz2 = tzlocal()
>>> tz1 = pytz.timezone('US/Eastern')
>>> val = 1489316399999999000
>>> vals = np.arange(val-3600*24*1000000000, val+3600*24*1000000000, 1800*1000000000)[16:24]
>>> v1 = tslib.tz_convert(vals, tz1, tz2)
>>> v2 = np.array([tslib.tz_convert_single(x, tz1, tz2) for x in vals])
>>> v1 == v2
array([ True,  True,  True,  True,  True, False, False, False], dtype=bool)

Next up: figuring out what the answer is supposed to be in these cases.

@jbrockmendel
Copy link
Member Author

cc: @sinhrks looks like you worked on a related issue #7798

All existing uses of tslib.tz_convert or tslib.tz_convert_single have either tz1='UTC' or tz2='UTC'. Restricted to this subset of inputs, this issue is benign. Is it the case that all uses must have one of the inputs be UTC? If so, this issue has an easy fix, half of which is noting that restriction in the docstring.

If not, then I need a hand finding a canonical answer for the disagreement case.

This was referenced Oct 22, 2017
jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Nov 11, 2017
@gfyoung gfyoung added the Bug label Nov 11, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 12, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants