CLN: ASV frame_methods benchmark #18536

mroeschke · 2017-11-28T01:09:23Z

Added np.random.seed(1234) in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144
Ran flake8 and replaced star imports
Moved GetItemSingleColumn, AssignTimeseriesIndex, and InsertColumns to indexing.py, and StringSlice to strings.py
Refactored to use params where relevant

$asv dev -b ^frame_methods

[  1.79%] ··· Running frame_methods.Apply.time_apply_axis_1                                                        299ms
[  3.57%] ··· Running frame_methods.Apply.time_apply_lambda_mean                                                  12.0ms
[  5.36%] ··· Running frame_methods.Apply.time_apply_np_mean                                                      13.4ms
[  7.14%] ··· Running frame_methods.Apply.time_apply_pass_thru                                                    13.5ms
[  8.93%] ··· Running frame_methods.Apply.time_apply_ref_by_name                                                  69.1ms
[ 10.71%] ··· Running frame_methods.Apply.time_apply_user_func                                                     256ms
[ 12.50%] ··· Running frame_methods.Count.time_count_level_mixed_dtypes_multi                                         ok
[ 12.50%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     157ms 
                 1     128ms 
               ====== =======

[ 14.29%] ··· Running frame_methods.Count.time_count_level_multi                                                      ok
[ 14.29%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     113ms 
                 1     147ms 
               ====== =======

[ 16.07%] ··· Running frame_methods.Dropna.time_dropna                                                                ok
[ 16.07%] ···· 
               ===== ======== ========
               --           axis      
               ----- -----------------
                how     0        1    
               ===== ======== ========
                all   135ms    150ms  
                any   60.8ms   63.0ms 
               ===== ======== ========

[ 17.86%] ··· Running frame_methods.Dropna.time_dropna_axis_mixed_dtypes                                              ok
[ 17.86%] ···· 
               ===== ======= =======
               --          axis     
               ----- ---------------
                how     0       1   
               ===== ======= =======
                all   437ms   436ms 
                any   346ms   327ms 
               ===== ======= =======

[ 19.64%] ··· Running frame_methods.Dtypes.time_frame_dtypes                                                       336μs
[ 21.43%] ··· Running frame_methods.Duplicated.time_frame_duplicated                                               340ms
[ 23.21%] ··· Running frame_methods.Duplicated.time_frame_duplicated_wide                                          353ms
[ 25.00%] ··· Running frame_methods.Equals.time_frame_float_equal                                                 8.72ms
[ 26.79%] ··· Running frame_methods.Equals.time_frame_float_unequal                                               24.4ms
[ 28.57%] ··· Running frame_methods.Equals.time_frame_nonunique_equal                                             11.8ms
[ 30.36%] ··· Running frame_methods.Equals.time_frame_nonunique_unequal                                           12.1ms
[ 32.14%] ··· Running frame_methods.Equals.time_frame_object_equal                                                41.4ms
[ 33.93%] ··· Running frame_methods.Equals.time_frame_object_unequal                                              27.3ms
[ 35.71%] ··· Running frame_methods.Fillna.time_frame_fillna                                                          ok
[ 35.71%] ···· 
               ========= ======== ========
               --              method     
               --------- -----------------
                inplace    pad     bfill  
               ========= ======== ========
                  True    14.7ms   18.8ms 
                 False    12.7ms   13.3ms 
               ========= ======== ========

[ 37.50%] ··· Running frame_methods.FromRecords.time_frame_from_records_generator                                     ok
[ 37.50%] ···· 
               ======= ========
                nrows          
               ------- --------
                 None   136ms  
                 1000   2.23ms 
               ======= ========

[ 39.29%] ··· Running frame_methods.GetDtypeCounts.time_frame_get_dtype_counts                                     493μs
[ 41.07%] ··· Running frame_methods.GetDtypeCounts.time_info                                                       999ms
[ 41.07%] ····· <class 'pandas.core.frame.DataFrame'>
                RangeIndex: 10 entries, 0 to 9
                Columns: 10000 entries, 0 to 9999
                dtypes: float64(10000)
                memory usage: 781.3 KB

[ 42.86%] ··· Running frame_methods.GetNumericData.time_frame_get_numeric_data                                     411μs
[ 42.86%] ····· /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/frame_methods.py:17: FutureWarning: consolidate is deprecated and will be removed in a future release.
                self.df = self.df.consolidate()

[ 44.64%] ··· Running frame_methods.Interpolate.time_interpolate                                                      ok
[ 44.64%] ···· 
               ========== ========
                downcast          
               ---------- --------
                  None     98.8ms 
                 infer     159ms  
               ========== ========

[ 46.43%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                            ok
[ 46.43%] ···· 
               ========== ========
                downcast          
               ---------- --------
                  None     3.02ms 
                 infer     6.04ms 
               ========== ========

[ 48.21%] ··· Running frame_methods.Isnull.time_isnull                                                            2.42ms
[ 50.00%] ··· Running frame_methods.Isnull.time_isnull_floats_no_null                                             2.45ms
[ 51.79%] ··· Running frame_methods.Isnull.time_isnull_obj                                                        89.7ms
[ 53.57%] ··· Running frame_methods.Isnull.time_isnull_strngs                                                     79.3ms
[ 55.36%] ··· Running frame_methods.Iteration.time_iteritems                                                      87.8ms
[ 57.14%] ··· Running frame_methods.Iteration.time_iteritems_cached                                               87.6ms
[ 58.93%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                              450ms
[ 60.71%] ··· Running frame_methods.Iteration.time_itertuples                                                     81.1ms
[ 62.50%] ··· Running frame_methods.Lookup.time_frame_fancy_lookup                                                8.36ms
[ 64.29%] ··· Running frame_methods.Lookup.time_frame_fancy_lookup_all                                            55.2ms
[ 66.07%] ··· Running frame_methods.MaskBool.time_frame_mask_bools                                                26.0ms
[ 67.86%] ··· Running frame_methods.MaskBool.time_frame_mask_floats                                               19.2ms
[ 69.64%] ··· Running frame_methods.Nlargest.time_frame_nlargest                                                  3.73ms
[ 71.43%] ··· Running frame_methods.Nunique.time_frame_nunique                                                     679ms
[ 73.21%] ··· Running frame_methods.Quantile.time_frame_quantile                                                      ok
[ 73.21%] ···· 
               ====== ========
                axis          
               ------ --------
                 0     995μs  
                 1     1.59ms 
               ====== ========

[ 75.00%] ··· Running frame_methods.Reindex.time_reindex_axis0                                                    17.7ms
[ 76.79%] ··· Running frame_methods.Reindex.time_reindex_axis1                                                     140ms
[ 78.57%] ··· Running frame_methods.Reindex.time_reindex_both_axes                                                53.3ms
[ 80.36%] ··· Running frame_methods.Reindex.time_reindex_both_axes_ix                                             53.9ms
[ 82.14%] ··· Running frame_methods.Reindex.time_reindex_upcast                                                   15.4ms
[ 83.93%] ··· Running frame_methods.Repr.time_frame_repr_wide                                                     32.0ms
[ 85.71%] ··· Running frame_methods.Repr.time_html_repr_trunc_mi                                                   429ms
[ 87.50%] ··· Running frame_methods.Repr.time_html_repr_trunc_si                                                   405ms
[ 89.29%] ··· Running frame_methods.Repr.time_repr_tall                                                           52.2ms
[ 91.07%] ··· Running frame_methods.Shift.time_shift                                                                  ok
[ 91.07%] ···· 
               ====== ========
                axis          
               ------ --------
                 0     41.7ms 
                 1     45.1ms 
               ====== ========

[ 92.86%] ··· Running frame_methods.SortIndex.time_frame_sort_index                                                   ok
[ 92.86%] ···· 
               =========== ========
                ascending          
               ----------- --------
                   True     21.1ms 
                  False     139ms  
               =========== ========

[ 94.64%] ··· Running frame_methods.SortIndexByColumns.time_frame_sort_index_by_columns                           66.1ms
[ 94.64%] ····· /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/frame_methods.py:498: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)
                self.df.sort_index(by=['key1', 'key2'])

[ 96.43%] ··· Running frame_methods.ToHTML.time_to_html_mixed                                                      527ms
[ 98.21%] ··· Running frame_methods.ToString.time_to_string_floats                                                72.4ms
[100.00%] ··· Running frame_methods.XS.time_frame_xs                                                                  ok
[100.00%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     791μs 
                 1     674μs 
               ====== =======

codecov · 2017-11-28T02:23:59Z

Codecov Report

Merging #18536 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18536      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49801    49801              
==========================================
- Hits        45494    45485       -9     
- Misses       4307     4316       +9

Flag	Coverage Δ
#multiple	`89.13% <ø> (ø)`	⬆️
#single	`40.81% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88ab693...b0956ae. Read the comment docs.

codecov · 2017-11-28T02:24:04Z

Codecov Report

Merging #18536 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18536      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49802    49802              
==========================================
- Hits        45496    45487       -9     
- Misses       4306     4315       +9

Flag	Coverage Δ
#multiple	`89.13% <ø> (ø)`	⬆️
#single	`40.81% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32f562d...12e4686. Read the comment docs.

jorisvandenbossche

Thanks a lot for your work on this!

jorisvandenbossche · 2017-11-28T09:02:51Z

asv_bench/benchmarks/frame_methods.py

    goal_time = 0.2

    def setup(self):
+        np.random.seed(1234)


You mentioned a module level setup function. Can we use that for the random seed to avoid repeating it for each benchmark?

In my comment I was actually referring to the setup function of each benchmark class instead of a module level setup function since the asv docs imply that the setup function of each class is run before each time_* function in that class.

Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.

But in your comment you included this quote from the asv docs:

You can also include a module-level setup function, which will be run for every benchmark within the module, prior to any setup assigned specifically to each function.

which I understood as a module level def setup() that is called before each benchmark. So I think that should work, but not sure.

Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.

Yes, I agree we should try to do that more (at least for those cases where the data is not modified)

jorisvandenbossche · 2017-11-28T09:05:57Z

asv_bench/benchmarks/frame_methods.py

-        for col in df:
-            df[col]
+        for col in self.df3:
+            self.df3[col]

    def time_itertuples(self):
        for row in self.df2.itertuples():
            pass


Now you are at it, can you add a iterrows one as well?

jorisvandenbossche · 2017-11-28T09:09:29Z

asv_bench/benchmarks/frame_methods.py



-#-----------------------------------------------------------------------------
-# from_records issue-6700
+class FromRecords(object):


maybe move to the frame constructor benchmarks?

jorisvandenbossche · 2017-11-28T09:10:57Z

asv_bench/benchmarks/frame_methods.py

+        K = 10
+        self.df = DataFrame({'key1': tm.makeStringIndex(N).values.repeat(K),
+                             'key2': tm.makeStringIndex(N).values.repeat(K),
+                             'value': np.random.randn(N * K)})

    def time_frame_sort_index_by_columns(self):
        self.df.sort_index(by=['key1', 'key2'])


This is actually deprecated, I think we should do sort_values if that exists, otherwise, sort_index

jorisvandenbossche · 2017-11-28T09:11:49Z

asv_bench/benchmarks/frame_methods.py

+        self.df.info()
+
+
+class Nlargest(object):


Maybe combine this one with Quantile? (same setup, and related methods)

Move relevant benchmark to other files Add more cleaning

mroeschke · 2017-11-29T06:29:01Z

@jorisvandenbossche Nice catch; you were correct about the setup. I was able to define a setup function in .pandas_vb_common.py and import it to set the random seed (confirmed with a small test with asv dev and a print statement). However, we have to live with the lint error, F811 redefinition of unused 'setup' from line x.

Otherwise, moved FromRecords to frame_ctor.py, added an iterrows benchmark, changed sort_index to sort_values, and kept the NSort class which now benches nsmallest and nlargest with params.

jorisvandenbossche · 2017-11-29T10:49:58Z

asv_bench/benchmarks/frame_methods.py

+import pandas.util.testing as tm
+from pandas import (DataFrame, Series, MultiIndex, date_range, period_range,
+                    isnull, NaT)
+from .pandas_vb_common import setup


you can put a # noqa on this line to avoid linter warnings

jorisvandenbossche · 2017-11-29T10:50:37Z

Apart from the small comment, looks good to me!

jorisvandenbossche · 2017-11-29T10:51:40Z

Since this is not linted anyhow, not that urgent (although nice for local use that it doesn't give a lint error). So you can always fix it if you do another PR on this.

Thanks!

jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Nov 28, 2017

jorisvandenbossche added this to the 0.22.0 milestone Nov 28, 2017

jorisvandenbossche reviewed Nov 28, 2017

View reviewed changes

mroeschke added 3 commits November 28, 2017 19:09

CLN: ASV frame_methods benchmark

645b77f

Move relevant benchmark to other files Add more cleaning

Add blank line at the end of string.py

6b581f3

Address comments

12e4686

mroeschke force-pushed the asv_clean_frame_method branch from b0956ae to 12e4686 Compare November 29, 2017 06:22

jorisvandenbossche reviewed Nov 29, 2017

View reviewed changes

jorisvandenbossche merged commit 48c5bfc into pandas-dev:master Nov 29, 2017

mroeschke deleted the asv_clean_frame_method branch November 30, 2017 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: ASV frame_methods benchmark #18536

CLN: ASV frame_methods benchmark #18536

mroeschke commented Nov 28, 2017

codecov bot commented Nov 28, 2017

codecov bot commented Nov 28, 2017 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche Nov 28, 2017

mroeschke Nov 28, 2017

jorisvandenbossche Nov 28, 2017

jorisvandenbossche Nov 28, 2017

jorisvandenbossche Nov 28, 2017

jorisvandenbossche Nov 28, 2017

jorisvandenbossche Nov 28, 2017

mroeschke commented Nov 29, 2017

jorisvandenbossche Nov 29, 2017

jorisvandenbossche commented Nov 29, 2017

jorisvandenbossche commented Nov 29, 2017

CLN: ASV frame_methods benchmark #18536

CLN: ASV frame_methods benchmark #18536

Conversation

mroeschke commented Nov 28, 2017

codecov bot commented Nov 28, 2017

Codecov Report

codecov bot commented Nov 28, 2017 • edited Loading

Codecov Report

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Nov 29, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 29, 2017

jorisvandenbossche commented Nov 29, 2017

codecov bot commented Nov 28, 2017 •

edited

Loading