Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: avoid validating indices repeatedly in ArrayManager.reindex_indexer #40248

Merged

Conversation

jorisvandenbossche
Copy link
Member

xref #39146

In ArrayManager.reindex_indexer, we are currently using take, which validates the indices each time. Since we know we have the same indices for each column, we can validate them once, and use the lower level take_nd.

From the frame_methods.py::Reindex ASV benchmark:

N = 10 ** 3
df = DataFrame(np.random.randn(N * 10, N))
idx = np.arange(4 * N, 7 * N)
df_am = df._as_manager("array")
In [4]: %timeit df_am.reindex(idx)
21 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <-- master
12.6 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  <-- PR

@jorisvandenbossche jorisvandenbossche added the Performance Memory or execution speed performance label Mar 5, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Mar 5, 2021
@jreback jreback merged commit bf270f6 into pandas-dev:master Mar 5, 2021
@jorisvandenbossche jorisvandenbossche deleted the am-perf-reindex-axis0 branch March 5, 2021 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants