PERF: optimize is_numeric_v_string_like #40501

jorisvandenbossche · 2021-03-18T16:17:55Z

In some of the arithmetic benchmarks (xref #39146 (comment)), just this is_numeric_v_string_like check takes up 15-35% of the overall time.

This improves the performance of this check by using some ndarray-specialized dtype checks (checking the kind instead of the generic functions):

In [1]: from pandas.core.dtypes.common import is_numeric_v_string_like

In [2]: arr = np.array([1, 2, 3])

In [3]: %timeit is_numeric_v_string_like(arr, 2.0)
2.3 µs ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  # <-- master
482 ns ± 41.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- PR

jbrockmendel · 2021-03-18T17:15:44Z

nice! two questions, neither blockers for this PR

IIUC this check exists to prevent a numpy-issued deprecation/future warning. Do we know when that ceases to be an issue?
Looks like is_string_like_dtype is only used in one other place (in dtypes.missing). lets just rip it out

jorisvandenbossche · 2021-03-18T17:33:53Z

Looks like is_string_like_dtype is only used in one other place (in dtypes.missing). lets just rip it out

Or I could update the helper function to what I used here, and then keep using it in those 3 places?

jbrockmendel · 2021-03-18T17:50:03Z

Or I could update the helper function to what I used here, and then keep using it in those 3 places?

either way i guess. i prefer to inline the check since i frequently find myself having to go look up what the is_foo_dtype's exact behavior is

jorisvandenbossche · 2021-03-18T18:48:11Z

OK, I am certainly fine with inlining here

jorisvandenbossche · 2021-03-18T18:49:50Z

IIUC this check exists to prevent a numpy-issued deprecation/future warning. Do we know when that ceases to be an issue?

Not directly an idea, but at least with the latest numpy release, you still get this wrong behaviour:

In [1]: np.array([1, 2]) == "a"
<ipython-input-1-c406769a5f40>:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  np.array([1, 2]) == "a"
Out[1]: False

In [2]: np.__version__
Out[2]: '1.20.1'

so even if that would change in a next release, we are still going to need to workaround it for quite a while ..

jbrockmendel

LGTM

PERF: optimize is_numeric_v_string_like

d97ef43

jorisvandenbossche added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 18, 2021

update tests

355cc3d

remove is_string_like_dtype alltogether

c205043

update docstring

4d0de7c

jbrockmendel approved these changes Mar 19, 2021

View reviewed changes

jorisvandenbossche merged commit bfe734f into pandas-dev:master Mar 19, 2021

jorisvandenbossche deleted the ops-perf-numeric-v-string branch March 19, 2021 16:51

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

PERF: optimize is_numeric_v_string_like (pandas-dev#40501)

d115452

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: optimize is_numeric_v_string_like #40501

PERF: optimize is_numeric_v_string_like #40501

jorisvandenbossche commented Mar 18, 2021 •

edited

Loading

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel left a comment

PERF: optimize is_numeric_v_string_like #40501

PERF: optimize is_numeric_v_string_like #40501

Conversation

jorisvandenbossche commented Mar 18, 2021 • edited Loading

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 18, 2021 •

edited

Loading