-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: PeriodIndex speed up #14931
PERF: PeriodIndex speed up #14931
Conversation
great! pls add a whatsnew note. and if you can post the asv results would be great. |
This is before the Shape change. Current branch is v fast: In [5]: index = pd.PeriodIndex(start=2000, periods=20000, freq='B')
In [7]: %timeit -n 1 -r 1 index.shape
1 loop, best of 1: 12.2 µs per loop Re-running ASV now (doesn't seem easy to do a quick dev run...) |
7d27d8b
to
7b2e975
Compare
@MaximilianR that's fine. maybe needs np.random.seed to be set (to make it consitent). but no big deal. |
@@ -49,3 +49,28 @@ def time_value_counts_pindex(self): | |||
self.i.value_counts() | |||
|
|||
|
|||
class period_standard_indexing(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be parameterized with Index type. For another day though.
lgtm. ping on green. |
|
I'll look |
Any idea why this is here? Creating another axis on an index?
|
Weird but different things happen with other indexes: In [7]: pd.DatetimeIndex(start='2000', periods=10, freq='B')
Out[7]:
DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
'2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
'2000-01-13', '2000-01-14'],
dtype='datetime64[ns]', freq='B')
In [8]: pd.DatetimeIndex(start='2000', periods=10, freq='B')[:, None].shape
Out[8]: (10, 1)
In [9]: pd.DatetimeIndex(start='2000', periods=10, freq='B')[:, None]
Out[9]:
array([['2000-01-03T00:00:00.000000000'],
['2000-01-04T00:00:00.000000000'],
['2000-01-05T00:00:00.000000000'],
['2000-01-06T00:00:00.000000000'],
['2000-01-07T00:00:00.000000000'],
['2000-01-10T00:00:00.000000000'],
['2000-01-11T00:00:00.000000000'],
['2000-01-12T00:00:00.000000000'],
['2000-01-13T00:00:00.000000000'],
['2000-01-14T00:00:00.000000000']], dtype='datetime64[ns]')
In [10]: pd.DatetimeIndex(start='2000', periods=10, freq='B').shape
Out[10]: (10,) |
@@ -305,7 +306,7 @@ def _simple_new(cls, values, name=None, freq=None, **kwargs): | |||
if (len(values) > 0 and is_float_dtype(values)): | |||
raise TypeError("PeriodIndex can't take floats") | |||
else: | |||
return PeriodIndex(values, name=name, freq=freq, **kwargs) | |||
return cls(values, name=name, freq=freq, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this here, maybe causing the ndim test sto fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not what's causing the test failure (it could only be different if cls != PeriodIndex
this is a bit bogus, but I suspect this is in there for some matplotib compat (we pass periods to it I think for it to display some things). |
It's only changing the output of In [3]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None].shape
Out[3]: (10,)
In [4]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None].values.shape
Out[4]: (10,)
In [5]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None]._values.shape
Out[5]: (10, 1) I can hack change |
ok taking the test out, if all plotting things pass (iow everything else is ok). |
7b2e975
to
72b14aa
Compare
Current coverage is 84.64% (diff: 100%)@@ master #14931 diff @@
==========================================
Files 144 144
Lines 51023 51025 +2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43190 43191 +1
- Misses 7833 7834 +1
Partials 0 0
|
thanks! |
(cherry picked from commit 4c3d4d4)
Version 0.19.2 * tag 'v0.19.2': (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...
* releases: (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...
git diff upstream/master | flake8 --diff