-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: ASV frame_methods benchmark #18536
CLN: ASV frame_methods benchmark #18536
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18536 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 164 164
Lines 49801 49801
==========================================
- Hits 45494 45485 -9
- Misses 4307 4316 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18536 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 164 164
Lines 49802 49802
==========================================
- Hits 45496 45487 -9
- Misses 4306 4315 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work on this!
goal_time = 0.2 | ||
|
||
def setup(self): | ||
np.random.seed(1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned a module level setup function. Can we use that for the random seed to avoid repeating it for each benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my comment I was actually referring to the setup
function of each benchmark class instead of a module level setup
function since the asv docs imply that the setup
function of each class is run before each time_*
function in that class.
Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in your comment you included this quote from the asv docs:
You can also include a module-level setup function, which will be run for every benchmark within the module, prior to any setup assigned specifically to each function.
which I understood as a module level def setup()
that is called before each benchmark. So I think that should work, but not sure.
Eventually I think we should transition to using 'standardized' data that could be initialized once at the module level.
Yes, I agree we should try to do that more (at least for those cases where the data is not modified)
for col in df: | ||
df[col] | ||
for col in self.df3: | ||
self.df3[col] | ||
|
||
def time_itertuples(self): | ||
for row in self.df2.itertuples(): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you are at it, can you add a iterrows
one as well?
|
||
|
||
#----------------------------------------------------------------------------- | ||
# from_records issue-6700 | ||
class FromRecords(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move to the frame constructor benchmarks?
K = 10 | ||
self.df = DataFrame({'key1': tm.makeStringIndex(N).values.repeat(K), | ||
'key2': tm.makeStringIndex(N).values.repeat(K), | ||
'value': np.random.randn(N * K)}) | ||
|
||
def time_frame_sort_index_by_columns(self): | ||
self.df.sort_index(by=['key1', 'key2']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually deprecated, I think we should do sort_values
if that exists, otherwise, sort_index
self.df.info() | ||
|
||
|
||
class Nlargest(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe combine this one with Quantile? (same setup, and related methods)
Move relevant benchmark to other files Add more cleaning
b0956ae
to
12e4686
Compare
@jorisvandenbossche Nice catch; you were correct about the setup. I was able to define a Otherwise, moved |
import pandas.util.testing as tm | ||
from pandas import (DataFrame, Series, MultiIndex, date_range, period_range, | ||
isnull, NaT) | ||
from .pandas_vb_common import setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put a # noqa
on this line to avoid linter warnings
Apart from the small comment, looks good to me! |
Since this is not linted anyhow, not that urgent (although nice for local use that it doesn't give a lint error). So you can always fix it if you do another PR on this. Thanks! |
Added
np.random.seed(1234)
in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144Ran flake8 and replaced star imports
Moved
GetItemSingleColumn
,AssignTimeseriesIndex
, andInsertColumns
toindexing.py
, andStringSlice
tostrings.py
Refactored to use
params
where relevant