-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] GroupBy cython aggregations (no fallback) #39885
Merged
jorisvandenbossche
merged 13 commits into
pandas-dev:master
from
jorisvandenbossche:am-groupby-basic-agg
Feb 25, 2021
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
df70d2d
[ArrayManager] GroupBy cython aggregations (no fallback)
jorisvandenbossche 9cbbf97
Merge remote-tracking branch 'upstream/master' into am-groupby-basic-agg
jorisvandenbossche 692175e
style
jorisvandenbossche e8e108b
Merge remote-tracking branch 'upstream/master' into am-groupby-basic-agg
jorisvandenbossche a5fb361
Merge remote-tracking branch 'upstream/master' into am-groupby-basic-agg
jorisvandenbossche a7bf71e
common _cython_agg_manager
jorisvandenbossche 8c1b8a2
clean-up test
jorisvandenbossche 06b6f3f
clean-up setting of index axis
jorisvandenbossche 244152b
fix BM.arrays for use in tests
jorisvandenbossche 32bf7d1
typing
jorisvandenbossche b44804e
use add_marker
jorisvandenbossche 50fb97f
remove xfail marker - count is actually implemented now
jorisvandenbossche 1d63f72
Merge remote-tracking branch 'upstream/master' into am-groupby-basic-agg
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -234,16 +234,19 @@ def shape(self) -> Shape: | |
def ndim(self) -> int: | ||
return len(self.axes) | ||
|
||
def set_axis(self, axis: int, new_labels: Index) -> None: | ||
def set_axis( | ||
self, axis: int, new_labels: Index, verify_integrity: bool = True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a |
||
) -> None: | ||
# Caller is responsible for ensuring we have an Index object. | ||
old_len = len(self.axes[axis]) | ||
new_len = len(new_labels) | ||
if verify_integrity: | ||
old_len = len(self.axes[axis]) | ||
new_len = len(new_labels) | ||
|
||
if new_len != old_len: | ||
raise ValueError( | ||
f"Length mismatch: Expected axis has {old_len} elements, new " | ||
f"values have {new_len} elements" | ||
) | ||
if new_len != old_len: | ||
raise ValueError( | ||
f"Length mismatch: Expected axis has {old_len} elements, new " | ||
f"values have {new_len} elements" | ||
) | ||
|
||
self.axes[axis] = new_labels | ||
|
||
|
@@ -282,16 +285,15 @@ def get_dtypes(self): | |
return algos.take_nd(dtypes, self.blknos, allow_fill=False) | ||
|
||
@property | ||
def arrays(self): | ||
def arrays(self) -> List[ArrayLike]: | ||
""" | ||
Quick access to the backing arrays of the Blocks. | ||
|
||
Only for compatibility with ArrayManager for testing convenience. | ||
Not to be used in actual code, and return value is not the same as the | ||
ArrayManager method (list of 1D arrays vs iterator of 2D ndarrays / 1D EAs). | ||
""" | ||
for blk in self.blocks: | ||
yield blk.values | ||
return [blk.values for blk in self.blocks] | ||
|
||
def __getstate__(self): | ||
block_values = [b.values for b in self.blocks] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could combine this with previous check as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice, but the problem is that we still need to keep DatetimeArray intact for DatetimeTZBlock. So we would still need the
if hasattr(arr, "tz") and arr.tz is None
check as well, in which case it doesn't necessarily become more readable to combine both checks.Edit: the diff would be:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of
and getattr(arr, "tz", None) is None
how aboutisinstance(arr.dtype, np.dtype)
. either way works i guessThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still gives the same length of the
if
check as in my diff example above, which I don't find an improvement in readabilityThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah the only possible difference is for mypy