Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] Test DataFrame reductions + implement ignore_failures #39719
[ArrayManager] Test DataFrame reductions + implement ignore_failures #39719
Changes from 3 commits
0644d90
77395d8
ac6117a
748d767
8dc339c
0ecd08d
c5106f0
9239046
0dcef95
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an ugly special case ... But as far as I can see it is the consequence of storing Datetime/TimedeltaArray as the 1D array for those dtypes in ArrayManager instead of the numpy ndarray version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res = arr.reshape(-1, 1)._reduce(op, axis=0, ...)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes. That would need the Manager to get the actual op name, and not the function object.
Now, just passing the op name in addition is not difficult of course.
But I could also do a precursor to actually move creating the function from the op name (and defining
blk_func
) into the Manager as well. It's moving more things into the internals, but I think that's actually a cleaner separation of concern (you could say that the DataFrame shouldn't need to decide which function object the Manager needs to use, it just needs to specify the op name. This actually already happens for EAs, those just ignore the function object)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res is NaT and is_timedelta64_ns_dtype(arr.dtype)
I think this will be OK, but what about if
arr.dtype
is dt64 and the reduction isstd
, then we still want td64nat, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know, that case is not covered (see my comment "what if datetime results in timedelta? (eg std)"). I can make the todo more explicit.
(note that this is a bug that already exists on master for ArrayManager as well, it's not caused by the changes in this PR; it's just not yet addressed by this PR)