-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: move Block construction in groupby aggregation to internals #39997
REF: move Block construction in groupby aggregation to internals #39997
Conversation
For the current BlockManager code, this doesn't actually simplify that much in the groupby code (it does remove accessing the blocks and an explicit BlockManager(..) call, so removing BlockManager internals). In principle I could re-use But it will probably make it a bit cleaner to add ArrayManager support. |
pandas/core/internals/managers.py
Outdated
if len(result_blocks) == 0: | ||
raise DataError("No numeric types to aggregate") | ||
|
||
index = Index(range(result_blocks[0].values.shape[-1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we figure out this length before calling grouped_reduce? if so, i think that we can turn it into an argument, then refactor reduce, apply and possibly quantile to dispatch to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length can certainly be known before hand for the groupby case (self.grouper.ngroups
), I am only not fully sure this is a nice API for the Manager
@@ -1748,17 +1735,17 @@ def _wrap_transformed_output( | |||
|
|||
return result | |||
|
|||
def _wrap_agged_blocks(self, blocks: Sequence[Block], items: Index) -> DataFrame: | |||
def _wrap_agged_manager(self, mgr: BlockManager) -> DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed this is an improvement
lgtm. some failing tests :-> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. comment for the future
index = np.arange(blocks[0].values.shape[-1]) | ||
mgr = BlockManager(blocks, axes=[items, index]) | ||
index = np.arange(mgr.shape[1]) | ||
mgr.axes[1] = ibase.Index(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super janky, maybe can push this down somehow
Possible precursor for #39885