PERF: PeriodIndex speed up #14931

max-sixty · 2016-12-20T18:46:30Z

closes #PERF: PeriodIndex.size #14822
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2016-12-20T19:06:26Z

great! pls add a whatsnew note. and if you can post the asv results would be great.

max-sixty · 2016-12-20T19:51:58Z

    before     after       ratio
  [3ccb5013] [13aa1309]
+    4.48ms     6.10ms      1.36  period.period_standard_indexing.time_shape
-  659.85μs   572.45μs      0.87  period.period_standard_indexing.time_intersection
-  258.36μs   166.30μs      0.64  period.period_standard_indexing.time_series_loc
-  135.68μs    48.90μs      0.36  period.period_standard_indexing.time_get_loc
-    3.95ms    31.21μs      0.01  period.period_standard_indexing.time_shallow_copy

This is before the Shape change.

Current branch is v fast:

In [5]: index = pd.PeriodIndex(start=2000, periods=20000, freq='B')

In [7]: %timeit -n 1 -r 1 index.shape
1 loop, best of 1: 12.2 µs per loop

Re-running ASV now (doesn't seem easy to do a quick dev run...)

jreback · 2016-12-20T19:58:28Z

@MaximilianR that's fine. maybe needs np.random.seed to be set (to make it consitent). but no big deal.

max-sixty · 2016-12-20T19:58:31Z

asv_bench/benchmarks/period.py

@@ -49,3 +49,28 @@ def time_value_counts_pindex(self):
        self.i.value_counts()


+class period_standard_indexing(object):


Ideally this would be parameterized with Index type. For another day though.

jreback · 2016-12-20T19:58:50Z

lgtm. ping on green.

jreback · 2016-12-20T20:43:52Z

======================================================================
FAIL: test_getitem_ndim2 (pandas.tseries.tests.test_period.TestPeriodIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27_64\envs\pandas\lib\site-packages\pandas\tseries\tests\test_period.py", line 2110, in test_getitem_ndim2
    self.assertEqual(result.shape, (len(idx), ))
AssertionError: Tuples differ: (3L, 1L) != (3,)
First tuple contains 1 additional elements.
First extra element 1:
1L
- (3L, 1L)
+ (3,)
------------------

max-sixty · 2016-12-20T20:47:24Z

I'll look

max-sixty · 2016-12-20T20:53:17Z

Any idea why this is here? Creating another axis on an index?

   2104     def test_getitem_ndim2(self):
   2105         idx = period_range('2007-01', periods=3, freq='M')
   2106
   2107         result = idx[:, None]
   2108         # MPL kludge, internally has incorrect shape
   2109         tm.assertIsInstance(result, PeriodIndex)
-> 2110         self.assertEqual(result.shape, (len(idx), ))

max-sixty · 2016-12-20T21:00:09Z

Weird but different things happen with other indexes:

In [7]: pd.DatetimeIndex(start='2000', periods=10, freq='B')
Out[7]:
DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14'],
              dtype='datetime64[ns]', freq='B')

In [8]: pd.DatetimeIndex(start='2000', periods=10, freq='B')[:, None].shape
Out[8]: (10, 1)

In [9]: pd.DatetimeIndex(start='2000', periods=10, freq='B')[:, None]
Out[9]:
array([['2000-01-03T00:00:00.000000000'],
       ['2000-01-04T00:00:00.000000000'],
       ['2000-01-05T00:00:00.000000000'],
       ['2000-01-06T00:00:00.000000000'],
       ['2000-01-07T00:00:00.000000000'],
       ['2000-01-10T00:00:00.000000000'],
       ['2000-01-11T00:00:00.000000000'],
       ['2000-01-12T00:00:00.000000000'],
       ['2000-01-13T00:00:00.000000000'],
       ['2000-01-14T00:00:00.000000000']], dtype='datetime64[ns]')

In [10]: pd.DatetimeIndex(start='2000', periods=10, freq='B').shape
Out[10]: (10,)

jreback · 2016-12-20T21:17:21Z

pandas/tseries/period.py

@@ -305,7 +306,7 @@ def _simple_new(cls, values, name=None, freq=None, **kwargs):
            if (len(values) > 0 and is_float_dtype(values)):
                raise TypeError("PeriodIndex can't take floats")
            else:
-                return PeriodIndex(values, name=name, freq=freq, **kwargs)
+                return cls(values, name=name, freq=freq, **kwargs)


I would leave this here, maybe causing the ndim test sto fail

It's not what's causing the test failure (it could only be different if cls != PeriodIndex

jreback · 2016-12-20T21:17:29Z

this is a bit bogus, but I suspect this is in there for some matplotib compat (we pass periods to it I think for it to display some things).

max-sixty · 2016-12-20T21:33:19Z

It's only changing the output of .shape - nothing else. This is from master:

In [3]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None].shape
Out[3]: (10,)

In [4]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None].values.shape
Out[4]: (10,)

In [5]: pd.PeriodIndex(start='2000', periods=10, freq='B')[:, None]._values.shape
Out[5]: (10, 1)

I can hack change .shape to only look at the first dimension, or if nothing else fails are you OK taking that test out?

jreback · 2016-12-20T21:53:16Z

ok taking the test out, if all plotting things pass (iow everything else is ok).

codecov-io · 2016-12-21T01:55:02Z

Current coverage is 84.64% (diff: 100%)

Merging #14931 into master will decrease coverage by <.01%

@@             master     #14931   diff @@
==========================================
  Files           144        144          
  Lines         51023      51025     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43190      43191     +1   
- Misses         7833       7834     +1   
  Partials          0          0

Powered by Codecov. Last update 3ab369c...72b14aa

jreback · 2016-12-21T01:58:23Z

thanks!

(cherry picked from commit 4c3d4d4)

Version 0.19.2 * tag 'v0.19.2': (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...

* releases: (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...

jreback added Performance Memory or execution speed performance Period Period data type labels Dec 20, 2016

jreback added this to the 0.19.2 milestone Dec 20, 2016

max-sixty force-pushed the period-perf branch from 7d27d8b to 7b2e975 Compare December 20, 2016 19:56

max-sixty commented Dec 20, 2016

View reviewed changes

jreback reviewed Dec 20, 2016

View reviewed changes

cache and remove boxing

72b14aa

max-sixty force-pushed the period-perf branch from 7b2e975 to 72b14aa Compare December 20, 2016 22:30

jreback merged commit 4c3d4d4 into pandas-dev:master Dec 21, 2016

max-sixty deleted the period-perf branch December 21, 2016 03:28

jreback mentioned this pull request Dec 21, 2016

PERF: PeriodIndex.size #14822

Closed

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Dec 24, 2016

cache and remove boxing (pandas-dev#14931)

a8d8fae

(cherry picked from commit 4c3d4d4)

ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016

cache and remove boxing (pandas-dev#14931)

5ba8ce5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: PeriodIndex speed up #14931

PERF: PeriodIndex speed up #14931

max-sixty commented Dec 20, 2016 •

edited

Loading

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016 •

edited

Loading

jreback commented Dec 20, 2016

max-sixty Dec 20, 2016

jreback commented Dec 20, 2016

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016

max-sixty commented Dec 20, 2016

max-sixty commented Dec 20, 2016

jreback Dec 20, 2016

max-sixty Dec 20, 2016

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016 •

edited

Loading

jreback commented Dec 20, 2016

codecov-io commented Dec 21, 2016

jreback commented Dec 21, 2016

		@@ -49,3 +49,28 @@ def time_value_counts_pindex(self):
		self.i.value_counts()


		class period_standard_indexing(object):

PERF: PeriodIndex speed up #14931

PERF: PeriodIndex speed up #14931

Conversation

max-sixty commented Dec 20, 2016 • edited Loading

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016 • edited Loading

jreback commented Dec 20, 2016

max-sixty Dec 20, 2016

Choose a reason for hiding this comment

jreback commented Dec 20, 2016

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016

max-sixty commented Dec 20, 2016

max-sixty commented Dec 20, 2016

jreback Dec 20, 2016

Choose a reason for hiding this comment

max-sixty Dec 20, 2016

Choose a reason for hiding this comment

jreback commented Dec 20, 2016

max-sixty commented Dec 20, 2016 • edited Loading

jreback commented Dec 20, 2016

codecov-io commented Dec 21, 2016

Current coverage is 84.64% (diff: 100%)

jreback commented Dec 21, 2016

max-sixty commented Dec 20, 2016 •

edited

Loading

max-sixty commented Dec 20, 2016 •

edited

Loading

max-sixty commented Dec 20, 2016 •

edited

Loading