Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] Array version of interpolate logic #44736

Closed

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 3, 2021

Similar to what I did in #41104 for fillna (xref #39146).

This also improves the performance for some of the fillna/interpolate benchmarks (mostly due to avoiding the overhead of creating a block for each column). For example (from frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'datetime64[ns]'):

N, M = 10000, 100
dtype = "datetime64[ns]"
data = pd.date_range("2011-01-01", freq="H", periods=N)
df = pd.DataFrame({f"col_{i}": data for i in range(M)})
df[::2] = None
df_am = df._as_manager("array").copy()

In [2]: %timeit df_am.fillna(inplace=True, method="bfill")
7.31 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master
3.57 ms ± 370 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- PR

@jorisvandenbossche jorisvandenbossche added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate ArrayManager labels Dec 3, 2021
pandas/core/missing.py Outdated Show resolved Hide resolved
**kwargs,
)

return _maybe_downcast(interp_values, downcast)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done in the Manager method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is logic that belongs in the interpolate function? (for putting this in the manager, I would need to create a small wrapper function that combines the above method with _maybe_downcast to use in ArrayManager.apply. I don't really see an advantage of that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal. My thought is that post-op downcasting is likely to be the part where AM/BM are different, so putting the logic in the AM/BM method is more likely to be conducive to sharing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that downcasting logic is different at the moment?

@jbrockmendel
Copy link
Member

Can any of this be shared with the Block method?

@jorisvandenbossche
Copy link
Member Author

Can any of this be shared with the Block method?

I think the actual interpolate function is already shared (i.e. interpolate_array_2d). The rest is mostly some boilerplate that is slightly different (eg the Block method has an additional check to split the block). I think the try/except for clean_fill_method could be shared, but I am not sure it is worth creating another function for those few lines.

@jbrockmendel
Copy link
Member

I increasingly think that this type of change should be left until right before BlockManager is ripped out. Until then, this just duplicates code and makes it easy for behavior to accidentally drift apart.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 13, 2022
@simonjayhawkins
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants