Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: ASV frame_methods benchmark #18536

Merged

Conversation

mroeschke
Copy link
Member

  • Added np.random.seed(1234) in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144

  • Ran flake8 and replaced star imports

  • Moved GetItemSingleColumn, AssignTimeseriesIndex, and InsertColumns to indexing.py, and StringSlice to strings.py

  • Refactored to use params where relevant

$asv dev -b ^frame_methods

[  1.79%] ··· Running frame_methods.Apply.time_apply_axis_1                                                        299ms
[  3.57%] ··· Running frame_methods.Apply.time_apply_lambda_mean                                                  12.0ms
[  5.36%] ··· Running frame_methods.Apply.time_apply_np_mean                                                      13.4ms
[  7.14%] ··· Running frame_methods.Apply.time_apply_pass_thru                                                    13.5ms
[  8.93%] ··· Running frame_methods.Apply.time_apply_ref_by_name                                                  69.1ms
[ 10.71%] ··· Running frame_methods.Apply.time_apply_user_func                                                     256ms
[ 12.50%] ··· Running frame_methods.Count.time_count_level_mixed_dtypes_multi                                         ok
[ 12.50%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     157ms 
                 1     128ms 
               ====== =======

[ 14.29%] ··· Running frame_methods.Count.time_count_level_multi                                                      ok
[ 14.29%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     113ms 
                 1     147ms 
               ====== =======

[ 16.07%] ··· Running frame_methods.Dropna.time_dropna                                                                ok
[ 16.07%] ···· 
               ===== ======== ========
               --           axis      
               ----- -----------------
                how     0        1    
               ===== ======== ========
                all   135ms    150ms  
                any   60.8ms   63.0ms 
               ===== ======== ========

[ 17.86%] ··· Running frame_methods.Dropna.time_dropna_axis_mixed_dtypes                                              ok
[ 17.86%] ···· 
               ===== ======= =======
               --          axis     
               ----- ---------------
                how     0       1   
               ===== ======= =======
                all   437ms   436ms 
                any   346ms   327ms 
               ===== ======= =======

[ 19.64%] ··· Running frame_methods.Dtypes.time_frame_dtypes                                                       336μs
[ 21.43%] ··· Running frame_methods.Duplicated.time_frame_duplicated                                               340ms
[ 23.21%] ··· Running frame_methods.Duplicated.time_frame_duplicated_wide                                          353ms
[ 25.00%] ··· Running frame_methods.Equals.time_frame_float_equal                                                 8.72ms
[ 26.79%] ··· Running frame_methods.Equals.time_frame_float_unequal                                               24.4ms
[ 28.57%] ··· Running frame_methods.Equals.time_frame_nonunique_equal                                             11.8ms
[ 30.36%] ··· Running frame_methods.Equals.time_frame_nonunique_unequal                                           12.1ms
[ 32.14%] ··· Running frame_methods.Equals.time_frame_object_equal                                                41.4ms
[ 33.93%] ··· Running frame_methods.Equals.time_frame_object_unequal                                              27.3ms
[ 35.71%] ··· Running frame_methods.Fillna.time_frame_fillna                                                          ok
[ 35.71%] ···· 
               ========= ======== ========
               --              method     
               --------- -----------------
                inplace    pad     bfill  
               ========= ======== ========
                  True    14.7ms   18.8ms 
                 False    12.7ms   13.3ms 
               ========= ======== ========

[ 37.50%] ··· Running frame_methods.FromRecords.time_frame_from_records_generator                                     ok
[ 37.50%] ···· 
               ======= ========
                nrows          
               ------- --------
                 None   136ms  
                 1000   2.23ms 
               ======= ========

[ 39.29%] ··· Running frame_methods.GetDtypeCounts.time_frame_get_dtype_counts                                     493μs
[ 41.07%] ··· Running frame_methods.GetDtypeCounts.time_info                                                       999ms
[ 41.07%] ····· <class 'pandas.core.frame.DataFrame'>
                RangeIndex: 10 entries, 0 to 9
                Columns: 10000 entries, 0 to 9999
                dtypes: float64(10000)
                memory usage: 781.3 KB

[ 42.86%] ··· Running frame_methods.GetNumericData.time_frame_get_numeric_data                                     411μs
[ 42.86%] ····· /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/frame_methods.py:17: FutureWarning: consolidate is deprecated and will be removed in a future release.
                self.df = self.df.consolidate()

[ 44.64%] ··· Running frame_methods.Interpolate.time_interpolate                                                      ok
[ 44.64%] ···· 
               ========== ========
                downcast          
               ---------- --------
                  None     98.8ms 
                 infer     159ms  
               ========== ========

[ 46.43%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                            ok
[ 46.43%] ···· 
               ========== ========
                downcast          
               ---------- --------
                  None     3.02ms 
                 infer     6.04ms 
               ========== ========

[ 48.21%] ··· Running frame_methods.Isnull.time_isnull                                                            2.42ms
[ 50.00%] ··· Running frame_methods.Isnull.time_isnull_floats_no_null                                             2.45ms
[ 51.79%] ··· Running frame_methods.Isnull.time_isnull_obj                                                        89.7ms
[ 53.57%] ··· Running frame_methods.Isnull.time_isnull_strngs                                                     79.3ms
[ 55.36%] ··· Running frame_methods.Iteration.time_iteritems                                                      87.8ms
[ 57.14%] ··· Running frame_methods.Iteration.time_iteritems_cached                                               87.6ms
[ 58.93%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                              450ms
[ 60.71%] ··· Running frame_methods.Iteration.time_itertuples                                                     81.1ms
[ 62.50%] ··· Running frame_methods.Lookup.time_frame_fancy_lookup                                                8.36ms
[ 64.29%] ··· Running frame_methods.Lookup.time_frame_fancy_lookup_all                                            55.2ms
[ 66.07%] ··· Running frame_methods.MaskBool.time_frame_mask_bools                                                26.0ms
[ 67.86%] ··· Running frame_methods.MaskBool.time_frame_mask_floats                                               19.2ms
[ 69.64%] ··· Running frame_methods.Nlargest.time_frame_nlargest                                                  3.73ms
[ 71.43%] ··· Running frame_methods.Nunique.time_frame_nunique                                                     679ms
[ 73.21%] ··· Running frame_methods.Quantile.time_frame_quantile                                                      ok
[ 73.21%] ···· 
               ====== ========
                axis          
               ------ --------
                 0     995μs  
                 1     1.59ms 
               ====== ========

[ 75.00%] ··· Running frame_methods.Reindex.time_reindex_axis0                                                    17.7ms
[ 76.79%] ··· Running frame_methods.Reindex.time_reindex_axis1                                                     140ms
[ 78.57%] ··· Running frame_methods.Reindex.time_reindex_both_axes                                                53.3ms
[ 80.36%] ··· Running frame_methods.Reindex.time_reindex_both_axes_ix                                             53.9ms
[ 82.14%] ··· Running frame_methods.Reindex.time_reindex_upcast                                                   15.4ms
[ 83.93%] ··· Running frame_methods.Repr.time_frame_repr_wide                                                     32.0ms
[ 85.71%] ··· Running frame_methods.Repr.time_html_repr_trunc_mi                                                   429ms
[ 87.50%] ··· Running frame_methods.Repr.time_html_repr_trunc_si                                                   405ms
[ 89.29%] ··· Running frame_methods.Repr.time_repr_tall                                                           52.2ms
[ 91.07%] ··· Running frame_methods.Shift.time_shift                                                                  ok
[ 91.07%] ···· 
               ====== ========
                axis          
               ------ --------
                 0     41.7ms 
                 1     45.1ms 
               ====== ========

[ 92.86%] ··· Running frame_methods.SortIndex.time_frame_sort_index                                                   ok
[ 92.86%] ···· 
               =========== ========
                ascending          
               ----------- --------
                   True     21.1ms 
                  False     139ms  
               =========== ========

[ 94.64%] ··· Running frame_methods.SortIndexByColumns.time_frame_sort_index_by_columns                           66.1ms
[ 94.64%] ····· /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/frame_methods.py:498: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)
                self.df.sort_index(by=['key1', 'key2'])

[ 96.43%] ··· Running frame_methods.ToHTML.time_to_html_mixed                                                      527ms
[ 98.21%] ··· Running frame_methods.ToString.time_to_string_floats                                                72.4ms
[100.00%] ··· Running frame_methods.XS.time_frame_xs                                                                  ok
[100.00%] ···· 
               ====== =======
                axis         
               ------ -------
                 0     791μs 
                 1     674μs 
               ====== =======

@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18536 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18536      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49801    49801              
==========================================
- Hits        45494    45485       -9     
- Misses       4307     4316       +9
Flag Coverage Δ
#multiple 89.13% <ø> (ø) ⬆️
#single 40.81% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88ab693...b0956ae. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18536 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18536      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49802    49802              
==========================================
- Hits        45496    45487       -9     
- Misses       4306     4315       +9
Flag Coverage Δ
#multiple 89.13% <ø> (ø) ⬆️
#single 40.81% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32f562d...12e4686. Read the comment docs.

@jorisvandenbossche jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Nov 28, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.22.0 milestone Nov 28, 2017
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work on this!

goal_time = 0.2

def setup(self):
np.random.seed(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned a module level setup function. Can we use that for the random seed to avoid repeating it for each benchmark?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my comment I was actually referring to the setup function of each benchmark class instead of a module level setup function since the asv docs imply that the setup function of each class is run before each time_* function in that class.

Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in your comment you included this quote from the asv docs:

You can also include a module-level setup function, which will be run for every benchmark within the module, prior to any setup assigned specifically to each function.

which I understood as a module level def setup() that is called before each benchmark. So I think that should work, but not sure.

Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.

Yes, I agree we should try to do that more (at least for those cases where the data is not modified)

for col in df:
df[col]
for col in self.df3:
self.df3[col]

def time_itertuples(self):
for row in self.df2.itertuples():
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you are at it, can you add a iterrows one as well?



#-----------------------------------------------------------------------------
# from_records issue-6700
class FromRecords(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move to the frame constructor benchmarks?

K = 10
self.df = DataFrame({'key1': tm.makeStringIndex(N).values.repeat(K),
'key2': tm.makeStringIndex(N).values.repeat(K),
'value': np.random.randn(N * K)})

def time_frame_sort_index_by_columns(self):
self.df.sort_index(by=['key1', 'key2'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually deprecated, I think we should do sort_values if that exists, otherwise, sort_index

self.df.info()


class Nlargest(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe combine this one with Quantile? (same setup, and related methods)

@mroeschke mroeschke force-pushed the asv_clean_frame_method branch from b0956ae to 12e4686 Compare November 29, 2017 06:22
@mroeschke
Copy link
Member Author

@jorisvandenbossche Nice catch; you were correct about the setup. I was able to define a setup function in .pandas_vb_common.py and import it to set the random seed (confirmed with a small test with asv dev and a print statement). However, we have to live with the lint error, F811 redefinition of unused 'setup' from line x.

Otherwise, moved FromRecords to frame_ctor.py, added an iterrows benchmark, changed sort_index to sort_values, and kept the NSort class which now benches nsmallest and nlargest with params.

import pandas.util.testing as tm
from pandas import (DataFrame, Series, MultiIndex, date_range, period_range,
isnull, NaT)
from .pandas_vb_common import setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put a # noqa on this line to avoid linter warnings

@jorisvandenbossche
Copy link
Member

Apart from the small comment, looks good to me!

@jorisvandenbossche jorisvandenbossche merged commit 48c5bfc into pandas-dev:master Nov 29, 2017
@jorisvandenbossche
Copy link
Member

Since this is not linted anyhow, not that urgent (although nice for local use that it doesn't give a lint error). So you can always fix it if you do another PR on this.

Thanks!

@mroeschke mroeschke deleted the asv_clean_frame_method branch November 30, 2017 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants