Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: DatetimeIndex.argsort does not match DatetimeArray.argsort #37863

Closed
jbrockmendel opened this issue Nov 15, 2020 · 2 comments · Fixed by #37965
Closed

API/BUG: DatetimeIndex.argsort does not match DatetimeArray.argsort #37863

jbrockmendel opened this issue Nov 15, 2020 · 2 comments · Fixed by #37965
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Milestone

Comments

@jbrockmendel
Copy link
Member

The DatetimeIndex returns self.asi8.argsort(*args, **kwargs), while the DatetimeArray uses the M8[ns] values. This puts NaTs at the front for DTI and the end for DTA.

Changing the DTI method (actually the DTI/TDI/PI methods) to match their array counterparts breaks 12 tests, 11 of them resample, 1 test_grouping.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 15, 2020
@jbrockmendel jbrockmendel mentioned this issue Nov 15, 2020
5 tasks
@jbrockmendel
Copy link
Member Author

@jreback @mroeschke tracking this down im finding that changing it causes test failures largely in resample tests.

For an example, if we disable the needs_i8_conversion check in Index.argsort, then the following:

index = pd.to_timedelta(["0s", pd.NaT, "2s"])
df = pd.DataFrame({"value": [2, 3, 5]}, index)
rs = df.resample("1s")

>>> result = rs.mean()
pandas/core/resample.py:968: in g
    return self._downsample(_method)
pandas/core/resample.py:1080: in _downsample
    result = obj.groupby(self.grouper, axis=self.axis).aggregate(how, **kwargs)
pandas/core/groupby/generic.py:948: in aggregate
    result, how = aggregate(self, func, *args, **kwargs)
pandas/core/aggregation.py:563: in aggregate
    return obj._try_aggregate_string_function(arg, *args, **kwargs), None
pandas/core/base.py:313: in _try_aggregate_string_function
    return f(*args, **kwargs)
pandas/core/groupby/groupby.py:1491: in mean
    return self._cython_agg_general(
pandas/core/groupby/generic.py:1018: in _cython_agg_general
    agg_mgr = self._cython_agg_blocks(
pandas/core/groupby/generic.py:1116: in _cython_agg_blocks
    new_mgr = data.apply(blk_func, ignore_failures=True)
pandas/core/internals/managers.py:425: in apply
    applied = b.apply(f, **kwargs)
pandas/core/internals/blocks.py:369: in apply
    result = func(self.values, **kwargs)
pandas/core/groupby/generic.py:1067: in blk_func
    result, _ = self.grouper.aggregate(
pandas/core/groupby/ops.py:595: in aggregate
    return self._cython_operation(
pandas/core/groupby/ops.py:548: in _cython_operation
    result = self._aggregate(result, counts, values, codes, func, min_count)
pandas/core/groupby/ops.py:609: in _aggregate
    agg_func(result, counts, values, comp_ids, min_count)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise ValueError("len(index) != len(labels)")
E   ValueError: len(index) != len(labels)

If we breakpoint just before the call to result = self._aggregate(result, counts, values, codes, func, min_count) we can see that in master (with the .asi8 argsort) codes is array([0, 0, 2]) but with the TDA implementation codes is array([0, 2]).

The only relevant-looking argsort call is in Grouper._set_grouper, which looks like:

indexer = self.indexer = ax.argsort(kind="mergesort")

>>> indexer
array([0, 2, 1])   # <-- using the TDA.argsort implementation
>>> indexer
array([1, 0, 2])   # <-- using the .asi8 implementation in master

Any thoughts?

@jreback
Copy link
Contributor

jreback commented Nov 19, 2020

see this PR which I think will fix: #37905, this PR: #36198 (which going to close), but has a list of the issues

#27343
#33548
#35275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants